Regular Expressions Quick Reference
Character Escapes
Character Classes
Quantifiers
Assertions
Usage
Character Table (Unicode Basic Latin)
References

Character Escapes
1. Characters used as operators should be escaped.
<.NET> $ ( ) * + . ? [ \ ] ^ { | }
<JavaScript> $ ( ) * + . ? [ \ ] ^ { | }
2. Some control characters can be used with escapes.
<.NET> \a(\u0007), \b(\u0008), \e(\u001B), \f(\u000C), \n(\u000A), \r(\u000D), \t(\u0009), \v(\u000B)
<JavaScript> \0, [\b], \f, \n, \r, \t, \v
3. General Escapes.
<.NET>
Unicode escapes : \\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F] ASCII characters using octal representation : \\0[0-7]{1,3}
ASCII characters using hexadecimal representation : \\x[0-9a-fA-F][0-9a-fA-F]
ASCII control character escapes : \\c[A-Za-z]
* When followed by a character that is not recognized as an escaped character, matches that character.
* The escaped character \b is a special case. In a regular expression, \b denotes a word boundary (between \w and \W characters) except within a [] character class, where \b refers to the backspace character. In a replacement pattern, \b always denotes a backspace.
<JavaScript>
\cX Where X is a letter from A - Z. Matches a control character in a string.
\xhh Matches the character with the code hh (two hexadecimal digits).
\uhhhh Matches the character with code hhhh (four hexadecimal digits).

Character Classes
<.NET>
. Matches any character except \n. If modified by the Singleline option, a period character matches any character.
[aeiou] Matches any single character included in the specified set of characters.
[^aeiou] Matches any single character not in the specified set of characters.
[0-9a-fA-F] Use of a hyphen (-) allows specification of contiguous character ranges.
\p{name} Matches any character in the named character class specified by name. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing).
\P{name} Matches text not included in groups and block ranges specified in {name}.
\w Matches any word character. Equivalent to the Unicode character categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}].
If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is the same as [a-zA-Z_0-9].
\W Matches any word character. Equivalent to the Unicode character categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}].
If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is the same as [^a-zA-Z_0-9].
\s Matches any white-space character. Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \s is the same as [ \f\n\r\t\v].
\S Matches any white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \s is the same as [^ \f\n\r\t\v].
\d Matches any decimal digit. Same as \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.
\D Matches any decimal digit. Same as \P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior.
<JavaScript>
. (The decimal point) matches any single character except the newline character.
[xyz] A character set. Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen.
[^xyz] A negated or complemented character set. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hyphen.
\w Matches any alphanumeric character including the underscore. Equivalent to [A-Za-z0-9_].
\W Matches any non-word character. Equivalent to [^A-Za-z0-9_].
\s Matches a single white space character, including space, tab, form feed, line feed. Equivalent to [ \f\n\r\t\u00A0\u2028\u2029].
\S Matches a single character other than white space. Equivalent to [^ \f\n\r\t\u00A0\u2028\u2029].
\d Matches a digit character. Equivalent to [0-9].
\D Matches any non-digit character. Equivalent to [^0-9].
<PHP PCRE>
. Outside a character class, a dot in the pattern matches any one character in the subject, including a non-printing character, but not (by default) newline. If the PCRE_DOTALL option is set, then dots match newlines as well.
[] A character class matches a single character in the subject.
If a closing square bracket is required as a member of the class, it should be the first data character in the class (after an initial circumflex, if present) or escaped with a backslash.
If a circumflex is actually required as a member of the class, ensure it is not the first character, or escape it with a backslash.
If a minus character is required in a class, it must be escaped with a backslash or appear in a position where it cannot be interpreted as indicating a range, typically as the first or last character in the class.
The octal or hexadecimal representation of "]" can also be used to end a range.
\d any decimal digit
\D any character that is not a decimal digit
\s any whitespace character
\S any character that is not a whitespace character
\w any word character. A word character is any letter or digit or the underscore character.
\W any non-word character


Quantifiers
<.NET> <JavaScript>
* Specifies zero or more matches. Same as {0,}.
+ Specifies one or more matches. Same as {1,}.
? Specifies zero or one matches. Same as {0,1}.
{n} Specifies exactly n matches.
{n,} Specifies at least n matches.
{n,m} Specifies at least n, but no more than m, matches.
*? Specifies the first match that consumes as few repeats as possible (lazy *).
+? Specifies as few repeats as possible, but at least one (lazy +).
?? Specifies zero repeats if possible, or one (lazy ?).
{n}? Equivalent to {n} (lazy {n}).
{n,}? Specifies as few repeats as possible, but at least n (lazy {n,}).
{n,m}? Specifies as few repeats as possible between n and m (lazy {n,m}).
*lazy quantifier: If ? is used immediately after any of the quantifiers *, +, ?, or {}, it makes the quantifier non-greedy (matching the minimum number of times), as opposed to the default, which is greedy (matching the maximum number of times).
var re = /a{1,3}(\w)/ ; var ary = re.exec('aaab'); window.alert(ary[1]); => "b"
var re = /a{1,3}?(\w)/; var ary = re.exec('aaab'); window.alert(ary[1]); => "a"


Assertions
<.NET>
^ Specifies that the match must occur at the beginning of the string or the beginning of the line.
$ Specifies that the match must occur at the end of the string, before \n at the end of the string, or at the end of the line.
\A Specifies that the match must occur at the beginning of the string (ignores the Multiline option).
\Z Specifies that the match must occur at the end of the string or before \n at the end of the string (ignores the Multiline option).
\z Specifies that the match must occur at the end of the string (ignores the Multiline option).
\G Specifies that the match must occur at the point at which the current search started (often, this is one character beyond where the last search ended). For example, consider a concatenated string composed of discrete groups of characters, where each group is n characters in length. When searching for a match within each group of characters, the regular expression is successful when it finds a match at the character positions 0, n, 2n, 3n, and so on. Matches are successful only when they occur on a positional group boundary.
\b Specifies that the match must occur on a boundary between \w (alphanumeric) and \W (nonalphanumeric) characters. The match must occur on word boundaries -- that is, at the first or last characters in words separated by spaces.
\B Specifies that the match must not occur on a \b boundary.
<JavaScript>
^ Matches beginning of input. If the multiline flag is set to true, also matches immediately after a line break character.
$ Matches end of input. If the multiline flag is set to true, also matches immediately before a line break character.
\b Matches a word boundary, such as a space.
\B Matches a non-word boundary. For example, /\w\Bn/ matches 'on' in "noonday", and /y\B\w/ matches 'ye' in "possibly yesterday."
<PHP PCRE>
^ In the default matching mode, the circumflex character is an assertion which is true only if the current matching point is at the start of the subject string.
$ A dollar character is an assertion which is TRUE only if the current matching point is at the end of the subject string, or immediately before a newline character that is the last character in the string (by default).
\A start of subject (independent of multiline mode)
\Z end of subject or newline at end (independent of multiline mode)
\z end of subject (independent of multiline mode)
\b word boundary
\B not a word boundary
* These assertions have no special meaning in a character class.


Usage
<JavaScript>
1. Creating a regular expression
literal format: aRegexp = /pattern/flags;
construction function: aRegexp = new RegExp("pattern"[, "flags"]);

2. Methods that use regular expressions
aRegexp.exec(aString);
aRegexp.test(aString);
aString.match(aRegexp);
aString.replace(aRegexp[, aNewSubStr]);
aString.replace(aRegexp[, aFunction]);
aString.search(aRegexp);
aString.split([aRegexp[, limit]]);
* When using the constructor function, the normal string escape rules (preceding special characters with \ when included in a string) are necessary.:
re = new RegExp("\\w+") <=> re = /\w+/
When using the literal format, the forward slash should be escaped.:
re = new RegExp("a/b" ) <=> re = /a\/b/
 
<VBScript>
1. Creating a regular expression
Dim aRegexp
aRegexp.Pattern = pattern
aRegexp.IgnoreCase = True(or False)
aRegexp.Global = True(or False)
2. Methods that use regular expressions
aRegexp.Execute(aString)
aRegexp.Replace(aString1, aString2)
aRegexp.Test(aString)
 
<PHP PCRE>
array preg_grep ( string pattern, array input [, int flags] )
int preg_match_all ( string pattern, string subject, array &matches [, int flags [, int offset]] )
mixed preg_match ( string pattern, string subject [, array &matches [, int flags [, int offset]]] )
string preg_quote ( string str [, string delimiter] )
mixed preg_replace_callback ( mixed pattern, callback callback, mixed subject [, int limit] )
mixed preg_replace ( mixed pattern, mixed replacement, mixed subject [, int limit] )
array preg_split ( string pattern, string subject [, int limit [, int flags]] )

Character Table (Unicode Basic Latin)


Characters used as operators should be escaped.


Some control characters can be used with escapes.


Unicode character category


References
.NET : Framework General Reference Regular Expression Language Elements
Java : Regular Expressions and the Java™ Programming Language
JavaScript 1.5 Guide Chapter 4 Regular Expressions
JavaScript 1.5 Reference RegExp
Perl : perlre - Perl regular expressions
PHP : Regular Expression Functions (Perl-Compatible)
Unicode Regular Expression Guidelines
Visual Basic Scripting Edition Introduction to Regular Expressions
<% Call Q_show_lst_mdfd("regularExpression.asp") %>