PHP Regular Expression

Posted on Thursday September 15, 2011 by Eric Potvin

Regular expressions are a very powerful when you want to validate user inputs. They can find any kind of words or characters and identify errors very accurately. Although, because they are very worthy, they are sometimes quite hard to use.

Here's some quick basics about regular expressions:

Operators

  • ^ The circumflex symbol marks the beginning of a pattern, although in some cases it can be omitted;
  • $ Same as with the circumflex symbol, the dollar sign marks the end of a search pattern;
  • . The period matches any single character;
  • ? It will match the preceding pattern zero or one times;
  • + It will match the preceding pattern one or more times;
  • * It will match the preceding pattern zero or more times;
  • | Boolean OR;
  • - Matches a range of elements;
  • () Groups a different pattern elements together;
  • [] Matches any single character between the square brackets;
  • {min, max} It is used to match exact character counts;
  • \d Matches any single digit;
  • \D Matches any single non digit character;
  • \w Matches any alpha numeric character including underscore;
  • \W Matches any non alpha numeric character excluding the underscore character;
  • \s Matches whitespace character;

Pattern Modifiers

  • i Ignore case;
  • m Multi-line mode;
  • S Extra analysis of pattern;
  • u Pattern is treated as UTF-8;

Regular Expression Matches

  • ab Match if "ab" is present anywhere in the string;
  • ^ab Match if "ab" is present at the beginning of the string;
  • ab$ Match if "ab" is present at the end of the string;
  • ab/i Makes a search in case insensitive mode;
  • ^ab$ The string contains only the "ab";
  • ab* Matches a string that has at least "a" and zero or more "b". eg: "a", "ab", "abbb", etc.;
  • ab+ Matches a string that has at least "a" and at least one "b". eg: "ab", "abbb", etc.;
  • ab? There might be an "a", "b" or not;
  • a?b+$ A possible "a" followed by one or more "b" ending a string;
  • ab{2} Matches a string that has an "a" followed by exactly two "b". eg: "abb";
  • ab{2,} Matches a string that has an "a" and at least two "b". eg: "abb", "abbbb", etc.;
  • ab{3,5} Matches a string that has an "a" and from three to five "b" ("abbb", "abbbb", etc.);
  • ab(cde)* There is 0 or more "cde" after "ab";
  • ab|cde The string contains the "ab" or the "cde";
  • a.cde Any character in place of the dot;
  • ^.{5}$ A string with exactly 5 characters;
  • [abc] There is an "a" or "b" or "c" in the string;
  • [a-z] There are any lowercase letter in the string;
  • [a-z]+ One or more lowercase letters;
  • [a-zA-Z] There are any lower OR uppercase letter in the string;
  • (a|bc)de A string that has either "ade" or "bcde";
  • [0-9.-] Numbers, dot, or minus sign;
  • ^[a-zA-Z0-9_]{1,}$ Any word of at least one letter, number or _ at the beginning of a string;
  • [^A-Za-z0-9] Any symbol (not a number or a letter);
  • ([A-Z]{3}|[0-9]{4}) Matches 3 letters OR 4 numbers;

Let's see some examples:


define('REGEX_EMAIL', '/^[^\W][a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)*\@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*\.[a-zA-Z]{2,6}$/');
define('REGEX_DOMAIN', '/^(([a-zA-Z0-9-]+)\.?)([a-zA-Z0-9-]+)\.(([a-zA-Z]{2,6})|([a-zA-Z]{2,3}\.[a-zA-Z]{2}))$/');

function isEmail($e) {
  return (bool)preg_match(REGEX_EMAIL, $e);
}

function isDomain($d) {
  return (bool)preg_match(REGEX_DOMAIN, $d);
}

function highlightWords($str, $word) {
  return preg_replace('/\b(' . $word . ')\b/i', '<span class="bold">\1</span>', $str);
}

function validateAlpha($str) {
  return (bool)preg_match('/^[A-Za-z0-9_\-\ ]+$/', $str);
}

// Valid Email:
var_dump(isEmail('john.doe@domain.tld')); // will output: bool(true)
var_dump(isEmail('john_doe@do.main.tld')); // will output: bool(true)
var_dump(isEmail('john-doe@do-main.tld')); // will output: bool(true)
var_dump(isEmail('john@doe.main.tld')); // will output: bool(true)

// Invalid Email:
var_dump(isEmail('john@doe')); // will output: bool(false)

// Domain
var_dump(isDomain('domain.com')); // will output: bool(true)
var_dump(isDomain('sub.domain.com')); // will output: bool(true)
var_dump(isDomain('domain.co.uk')); // will output: bool(true)
var_dump(isDomain('www.domain.co.uk')); // will output: bool(true)

// Invalid Domain:
var_dump(isEmail('johndoe.')); // will output: bool(false)

// HighLight words
echo highlightWords('Hi John, How are you?', 'John'); // will output: Hi <span class="bold">John</span>, How are you?

// Validate letters, numbers
var_dump(validateAlpha('abc123')); // will output: bool(true)
var_dump(validateAlpha('0a.8@')); // will output: bool(false)