The term regular expression is a method of matching a string or characters to some specified text. For example, using regular expressions is just as easy to find all forms of the acronym fubar in a text file, as it is to find all links in an HTML document that have the property target="...". Regular expression tools are the Swiss Army knives of text searching.
RegEx is case sensitive, for obvious reasons. Therefore, it is important to address both upper and lower case matching if the capitalization of the input string is unknown. Here's my own super-simplistic guide for using regular expressions...
Basic Construction:
(...) Logical group
{...} Explicit quantifier
[...] Explicit char set
These help order and combine matches. Example: In the string "abcABC123aBcabc", ([aA][bB]c){2} will match "aBcabc" because the brackets [] treats the search as an "OR" condition (a OR A, useful for single characters), the parenthesis () groups "abc" together, and the trailing brackets with a number {2} specifies to match for the previous text in parenthesis () if it occurs 2 times. The explicit character set brackets [] may also be used to specify ranges. Example: [A-Za-z0-9] will match all alpha-numeric characters, both lower and upper case.
^ Start of string
$ End of string
(^...$) This specifies that conditions for the match must occur at the start or end of a string. Example: In the string "abcxyz", (^xyz) will match nothing, (xyz) will match just "xyz", and (xyz$) will match the entire string.
? 0 or 1 instances only (minimal matching has priority)
* 0 or more instances
+ 1 or more instances
These, as with other operations below, always points to the preceding character or group. Examples:
(abc?xyz) will match "axyz", "bxyz", "xyz", "acxyz", etc., but will not match "abbcxyz", etc.
(abc*xyz) will match "aaaxyz", "aaabbbxyz", "xyz", "aaccxyz", etc.
(abc+xyz) will match "abcxyz", "abbcxyz", "aaabcxyz", etc., but will not match "abxyz", etc.
. Any character (except line breaks: \r and \n)
| Alternation (the pipe sign is used for logical OR)
\ Escape notation to specify a special character (ex: \s, \w, \' etc.)
Examples:
(ab.xyz) will match "abcxyz", "abgxyz", "abzxyz", etc., but will not match "abccxyz", etc.
abc(xy|xyz) will match "abcxy" or "abcxyz", but will not match "abczyx", etc.
abc\sxyz will match "abc xyz" because \s is the notation for a space.
Declaring RegEx in C# and VB.NET:
(using basic construction methods from above -- not an accurate e-mail filter)
C# Examples:
Regex reEmail = new Regex (
"(^ (?# Match from start of the string) " &
"(.){2} (?# Find two characters) " &
".* (?# Find zero or more characters) " &
"(@) (?# Find a @ symbol)" &
".+ (?# Find one or more characters)" &
"(.*)|([.]*) (?# Find zero or more characters, or zero or more dots)" &
"[.] (?# Find exactly one dot)" &
"(.){2} (?# Find two characters)" &
".? (?# Find zero or one characters)" &
"$) (?# Match all the way to end of the string)",
RegexOptions.Singleline, RegexOptions.IgnoreCase);
Regex reWebsite = new Regex (
"^(((h|H?)(t|T?)(t|T?)(p|P?)(s|S?))\://)?(www.|[a-zA-Z0-9].)[a-zA-Z0-9\-\.]+\.[a-zA-Z]*$",
RegexOptions.Singleline);
VB.NET Examples:
Dim reEmail As New Regex ( _
"(^ (?# Match from start of the string) " & _
"(.){2} (?# Find two characters) " & _
".* (?# Find zero or more characters) " & _
"(@) (?# Find a @ symbol)" & _
".+ (?# Find one or more characters)" & _
"(.*)|([.]*) (?# Find zero or more characters, or zero or more dots)" & _
"[.] (?# Find exactly one dot)" & _
"(.){2} (?# Find two characters)" & _
".? (?# Find zero or one characters)" & _
"$) (?# Match all the way to end of the string)", _
RegexOptions.Singleline Or RegexOptions.IgnoreCase)
Dim reWebsite As New Regex ( _
"^(((h|H?)(t|T?)(t|T?)(p|P?)(s|S?))\://)?(www.|[a-zA-Z0-9].)[a-zA-Z0-9\-\.]+\.[a-zA-Z]*$", _
RegexOptions.Singleline)
I'd like to thank the authors for their work on the following regular expression articles, guides, and tools. These are excellent web references for regular expressions.
|