Regular Expression Character Classes

Instead of using a single character or escaped character, a character class is a set of characters that is used to match against an input string.

a symbol is used to represent a set of characters. e.g. "."=any character.
escaped character classes to represent a group of specific charcters. e.g. \p{unicode name,\P{unicode name}, \w, \W, \s, \S, \d, \D
a pair square brackets, [], is used to specify a set of characters, e.g. [abc]=a or b or c
- a hyphen character, -, is used as a range separator unless it is the first or last character of the group. e.g. [a-c]=a or b or c
- a leading caret character, ^, is used to specify a negative sense that the set of characters must not appear in an input string. e.g. [^abc]=not (a and b and c)
- a hyphen character, -, is used to indicate a nested excluded group from the base group. e.g. [a-c-[b]]=a or b.

.NET Any Character, .

The period character, ., is used to match any character including the carriage return character, \r or \u000D except the newline character, \n or \u000A. But in a character class, a period, ., is treated as a literal period character.

Character Class	Description	Exception
.	Wildcard: Matches any single character except \n. To match a literal period character (. or \u002E), you must precede it with the escape character (\.).	In a character class, a period, ., is treated as a literal period character.

Escaped Character Class \

The backslash character, \, used in regular expression for character class can be used to indicate the following character classes.

.NET Unicode category or Unicode block: \p{name}

The backslash character, \, followed by the character p is used to indicate a Unicode general category or named block by specifing the name with the category abbreviation or named block name that any one of which may be used to match an input string..

.NET Negative Unicode category or Unicode block: \P{name}

The backslash character, \, followed by the character P is used to indicate a Unicode general category or named block by specifing the name with the category abbreviation or named block name that cannot appear in an input string.

.NET Word Character: \w

The backslash character, \, followed by the character w is used to indicate a set of word characters that any one of which may be used to match an input string. By default, the set of word characters are members of the predefined Unicode categories. In other words, \w is equivalent to [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]. If ECMAScript-compliant behavior is specified, \w is equivalent to [a-zA-Z_0-9].

.NET Non-word Character: \W

The backslash character, \, followed by the character W is used to indicate a set of word characters that cannot appear in an input string. By default, the set of word characters are members of the predefined Unicode categories. In other words, \W is equivalent to [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]. If ECMAScript-compliant behavior is specified, \W is equivalent to [^a-zA-Z_0-9].

.NET Whitespace Character: \s

The backslash character, \, followed by the character s is used to indicate a set of whitespace characters that any one of which may be used to match an input string. By default, the set of whitespace characters are members of the predefined escape sequences and Unicode categories. In other words, \s is equivalent to [\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified, \s is equivalent to [ \f\r\r\t\v].

.NET Non-whitespace Character: \S

The backslash character, \, followed by the character S is used to indicate a set of whitespace characters that cannot appear in an input string. By default, the set of whitespace characters are members of the predefined escape sequences and Unicode categories. In other words, \S is equivalent to [^\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified, \S is equivalent to [^ \f\r\r\t\v].

.NET Decimal Digit Character: \d

The backslash character, \, followed by the character d is used to indicate a set of decimal digit characters that any one of which may be used to match an input string. By default, the set of decimal digit characters are members of the predefined Unicode categories. In other words, \d is equivalent to [\p{Nd}]. If ECMAScript-compliant behavior is specified, \d is equivalent to [0-9].

.NET Non-decimal Digit Character: \D

The backslash character, \, followed by the character D is used to indicate a set of decimal digit characters that cannot appear in an input string. By default, the set of decimal digit characters are members of the predefined Unicode categories. In other words, \D is equivalent to [\P{Nd}]. If ECMAScript-compliant behavior is specified, \D is equivalent to [^0-9].

Character Group, []

.NET Positive Character Group, []

A pair of square brackets is used to specify a set of characters that any one of which may be used to match an input string. The set of characters may be specified individually, as a range, or both.

[*character_group*]: character_group can consist of any combination of one or more literal characters, escape characters, or character classes.
[*firstCharacter*-*lastCharacter*]: firstCharacter is the character that begins the range and lastCharacter is the character that ends the range. Two characters are contiguous if they have adjacent Unicode code points. firstCharacter must be the character with the lower code point, and lastCharacter must be the character with the higher code point. A hyphen character, -, is always interpreted as the range separator unless it is the first or last character of the group.

.NET Negative Character Group, [^]

A pair of square brackets with leading caret is used to specify a set of characters that cannot appear in an input string. The set of characters may be specified individually, as a range, or both.

[*^character_group*]: the leading caret, ^, is used to indicate a negative charactergroup. character_group can consist of any combination of one or more literal characters, escape characters, or character classes that cannot appear in an input string..
[^*firstCharacter*-*lastCharacter*]: the leading caret, ^, is used to indicate a negative charactergroup. firstCharacter is the character that begins the range and lastCharacter is the character that ends the range. Two characters are contiguous if they have adjacent Unicode code points. firstCharacter must be the character with the lower code point, and lastCharacter must be the character with the higher code point. A hyphen character, -, is always interpreted as the range separator unless it is the first or last character of the group.

A negative character group in a larger regular expression pattern is not a zero-width assertion. That is, after evaluating the negative character group, the regular expression engine advances one character in the input string.

.NET Character Subtraction Group, [base_group-[excluded_group]]

A character subtraction group is used to specify a set of characters through subtraction that any one of which may be used to match an input string. The set of character subtraction group is the result of excluding the characters in a base character group from another character excluded group.

The form of character subtraction group is [base_group-[excluded_group]]. The hyphen, -, is used to indicate the following nested group is an character excluded_group.

Supported Unicode

Supported Unicode General Categories Supported Named Blocks

\p {name}		\w	\s	\d
Category	Description
Lu	Letter, Uppercase	✔
Ll	Letter, Lowercase	✔
Lt	Letter, Titlecase	✔
Lm	Letter, Modifier	✔
Lo	Letter, Other	✔
L	All letter characters. This includes the Lu, Ll, Lt, Lm, and Lo characters.
Mn	Mark, Nonspacing	✔
Mc	Mark, Spacing Combining
Me	Mark, Enclosing
M	All diacritic marks. This includes the Mn, Mc, and Me categories.
Nd	Number, Decimal Digit	✔
Nl	Number, Letter
No	Number, Other
N	All numbers. This includes the Nd, Nl, and No categories.
Pc	Punctuation, Connector	✔
Pd	Punctuation, Dash
Ps	Punctuation, Open
Pe	Punctuation, Close
Pi	Punctuation, Initial quote (may behave like Ps or Pe depending on usage)
Pf	Punctuation, Final quote (may behave like Ps or Pe depending on usage)
Po	Punctuation, Other
P	All punctuation characters. This includes the Pc, Pd, Ps, Pe, Pi, Pf, and Po categories.
Sm	Symbol, Math
Sc	Symbol, Currency
Sk	Symbol, Modifier
So	Symbol, Other
S	All symbols. This includes the Sm, Sc, Sk, and So categories.
Zs	Separator, Space
Zl	Separator, Line
Zp	Separator, Paragraph
Z	All separator characters. This includes the Zs, Zl, and Zp categories.		✔
Cc	Other, Control
Cf	Other, Format
Cs	Other, Surrogate
Co	Other, Private Use
Cn	Other, Not Assigned (no characters have this property)
C	All control characters. This includes the Cc, Cf, Cs, Co, and Cn categories.

Block name	Code point range
IsBasicLatin	0000 - 007F
IsLatin-1Supplement	0080 - 00FF
IsLatinExtended-A	0100 - 017F
IsLatinExtended-B	0180 - 024F
IsIPAExtensions	0250 - 02AF
IsSpacingModifierLetters	02B0 - 02FF
IsCombiningDiacriticalMarks	0300 - 036F
IsGreek or IsGreekandCoptic	0370 - 03FF
IsCyrillic	0400 - 04FF
IsCyrillicSupplement	0500 - 052F
IsArmenian	0530 - 058F
IsHebrew	0590 - 05FF
IsArabic	0600 - 06FF
IsSyriac	0700 - 074F
IsThaana	0780 - 07BF
IsDevanagari	0900 - 097F
IsBengali	0980 - 09FF
IsGurmukhi	0A00 - 0A7F
IsGujarati	0A80 - 0AFF
IsOriya	0B00 - 0B7F
IsTamil	0B80 - 0BFF
IsTelugu	0C00 - 0C7F
IsKannada	0C80 - 0CFF
IsMalayalam	0D00 - 0D7F
IsSinhala	0D80 - 0DFF
IsThai	0E00 - 0E7F
IsLao	0E80 - 0EFF
IsTibetan	0F00 - 0FFF
IsMyanmar	1000 - 109F
IsGeorgian	10A0 - 10FF
IsHangulJamo	1100 - 11FF
IsEthiopic	1200 - 137F
IsCherokee	13A0 - 13FF
IsUnifiedCanadianAboriginalSyllabics	1400 - 167F
IsOgham	1680 - 169F
IsRunic	16A0 - 16FF
IsTagalog	1700 - 171F
IsHanunoo	1720 - 173F
IsBuhid	1740 - 175F
IsTagbanwa	1760 - 177F
IsKhmer	1780 - 17FF
IsMongolian	1800 - 18AF
IsLimbu	1900 - 194F
IsTaiLe	1950 - 197F
IsKhmerSymbols	19E0 - 19FF
IsPhoneticExtensions	1D00 - 1D7F
IsLatinExtendedAdditional	1E00 - 1EFF
IsGreekExtended	1F00 - 1FFF
IsGeneralPunctuation	2000 - 206F
IsSuperscriptsandSubscripts	2070 - 209F
IsCurrencySymbols	20A0 - 20CF
IsCombiningDiacriticalMarksforSymbols or IsCombiningMarksforSymbols	20D0 - 20FF
IsLetterlikeSymbols	2100 - 214F
IsNumberForms	2150 - 218F
IsArrows	2190 - 21FF
IsMathematicalOperators	2200 - 22FF
IsMiscellaneousTechnical	2300 - 23FF
IsControlPictures	2400 - 243F
IsOpticalCharacterRecognition	2440 - 245F
IsEnclosedAlphanumerics	2460 - 24FF
IsBoxDrawing	2500 - 257F
IsBlockElements	2580 - 259F
IsGeometricShapes	25A0 - 25FF
IsMiscellaneousSymbols	2600 - 26FF
IsDingbats	2700 - 27BF
IsMiscellaneousMathematicalSymbols-A	27C0 - 27EF
IsSupplementalArrows-A	27F0 - 27FF
IsBraillePatterns	2800 - 28FF
IsSupplementalArrows-B	2900 - 297F
IsMiscellaneousMathematicalSymbols-B	2980 - 29FF
IsSupplementalMathematicalOperators	2A00 - 2AFF
IsMiscellaneousSymbolsandArrows	2B00 - 2BFF
IsCJKRadicalsSupplement	2E80 - 2EFF
IsKangxiRadicals	2F00 - 2FDF
IsIdeographicDescriptionCharacters	2FF0 - 2FFF
IsCJKSymbolsandPunctuation	3000 - 303F
IsHiragana	3040 - 309F
IsKatakana	30A0 - 30FF
IsBopomofo	3100 - 312F
IsHangulCompatibilityJamo	3130 - 318F
IsKanbun	3190 - 319F
IsBopomofoExtended	31A0 - 31BF
IsKatakanaPhoneticExtensions	31F0 - 31FF
IsEnclosedCJKLettersandMonths	3200 - 32FF
IsCJKCompatibility	3300 - 33FF
IsCJKUnifiedIdeographsExtensionA	3400 - 4DBF
IsYijingHexagramSymbols	4DC0 - 4DFF
IsCJKUnifiedIdeographs	4E00 - 9FFF
IsYiSyllables	A000 - A48F
IsYiRadicals	A490 - A4CF
IsHangulSyllables	AC00 - D7AF
IsHighSurrogates	D800 - DB7F
IsHighPrivateUseSurrogates	DB80 - DBFF
IsLowSurrogates	DC00 - DFFF
IsPrivateUse or IsPrivateUseArea	E000 - F8FF
IsCJKCompatibilityIdeographs	F900 - FAFF
IsAlphabeticPresentationForms	FB00 - FB4F
IsArabicPresentationForms-A	FB50 - FDFF
IsVariationSelectors	FE00 - FE0F
IsCombiningHalfMarks	FE20 - FE2F
IsCJKCompatibilityForms	FE30 - FE4F
IsSmallFormVariants	FE50 - FE6F
IsArabicPresentationForms-B	FE70 - FEFF
IsHalfwidthandFullwidthForms	FF00 - FFEF
IsSpecials	FFF0 - FFFF
[\f\n\r\t\v\x85\p{Z}			✔
standard decimal digits 0-9 as well as the decimal digits of a number of other character sets				✔

Examples

Examples of Character Classes

ASP.NET Code Input:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
    <head>
       <title>Sample Page</title>
       <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
       <script runat="server">
           Sub Page_Load()
               Dim xstring As String = "01 345"&ChrW(913)&"67 9abc def"&Chr(13)&Chr(10)&"7890"&Chr(13)&Chr(10)
               Dim xmatchstr As String = ""
               Dim xoption As RegexOptions = RegexOptions.Multiline
               xmatchstr = xmatchstr & "Given string: " & """01 345""&amp;ChrW(913)&amp;""67 9abc def""&amp;Chr(13)&amp;Chr(10)&amp;""7890""&amp;Chr(13)&amp;Chr(10)" & "<br />"
               xmatchstr = xmatchstr & showresult(xstring,".+",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\P{Ll}",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\p{Ll}",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\p{IsBasicLatin}",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\P{IsBasicLatin}",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\w",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\W",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\s",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\S",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\d",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\D",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"[1357\r]",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"[^1357\r]",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"[1357\r-[5]]",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"[^1357\r-[5]]",RegexOptions.None)
               lbl01.Text = xmatchstr
           End Sub
           Function showresult(xstring,xpattern,xoption)
               Dim xmatches As MatchCollection
               Dim xmatchstr As String = ""
               Dim xint As Integer
               xmatchstr = xmatchstr & "<br />Result of Regex.Matches(string,""" & xpattern & """," & xoption & "): "
               xmatches = Regex.Matches(xstring,xpattern,xoption)
               xmatchstr = xmatchstr & "<br />->Result of MatchCollection.Count: """
               xmatchstr = xmatchstr & xmatches.Count & """<br />"
               For xint = 0 to xmatches.Count - 1
                   xmatchstr = xmatchstr & "->->Result of MatchCollection("& xint & ").Value, Index, Length: """
                   xmatchstr = xmatchstr & xmatches(xint).Value & ", " & xmatches(xint).Index & ", " & xmatches(xint).Length & """<br />"
               Next
               Return xmatchstr
           End Function
       </script>
    </head>
    <body>
       <% Response.Write ("<h1>This is a Sample Page of Character Classes</h1>") %>
       <p>
           <%-- Set on Page_Load --%>
           <asp:Label id="lbl01" runat="server" />
       </p>
    </body>
</html>

HTML Web Page Embedded Output:

Source/Reference

https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference
https://docs.microsoft.com/en-us/dotnet/standard/base-types/character-escapes-in-regular-expressions