Enhance document tagging with regular expressions

In Tagging, there a several places where you can specify patterns or regular expressions to constrain metadata that is extracted or tagged. Regular expressions enable you to apply formatting rules, check lengths, etc. to text to make sure they match a specific pattern. In essence, it validates the metadata before they are extracted from the document or tagged in SharePoint.

Here are some basic examples

Regular expressionExample matchesDescription
abc$abc, 123abcAny text ending with abc
^abcabc, abc123Any text that starts with abc
^[0-9]{5}$11111, 12345, 99999Any 5 digit numbers
\d{1,4}1, 24, 445, 3333Any number that is 1 to 4 digits
[A-Za-z]{4}-\d{4}ABCD-1234, GYDL-84504 letters followed by a dash, then 4 numbers
[A-Za-z]{4}[-_ ]\d{4}ABCD-1234, ABCD_1234, ABCD 12344 letters followed either by a dash, underscore or space, then 4 numbers
[A-Za-z]{4}[\W_]\d{4}ABCD-1234, ABCD_1234, ABCD 1234, ABCD+1234, ABCD#12344 letters followed by any non-word separator, then 4 numbers

Some useful regular expressions taken from the resources above:

FieldRegular expressionExample matchesDescription
Social Security Number^\d{3}-\d{2}-\d{4}$111-11-1111Validates the format, type, and length of the supplied input field. The input must consist of 3 numeric characters followed by a dash, then 2 numeric characters followed by a dash, and then 4 numeric characters.
Phone Number^[01]?[- .]?(([2-9]\d{2})[2-9]\d{2})[- .]?\d{3}[- .]?\d{4}$(425) 555-0123, 425-555-0123, 425 555 0123, 1-425-555-0123
E-mail^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$[email protected]Validates an e-mail address.
ZIP Code^(\d{5}-\d{4}\d{5}\d{9})$
Currency (non-negative)^\d+(.\d{2})?$1.00Validates a positive currency amount. If there is a decimal point, it requires 2 numeric characters after the decimal point. For example, 3.00 is valid but 3.1 is not.
Currency (positive or negative)^(-)?\d+(.\d{2})?$1.20, -1.20Validates for a positive or negative currency amount. If there is a decimal point, it requires 2 numeric characters after the decimal point.