Regex Formula in Microsoft Excel

It is well known that Regular expressions are ubiquitous in the developer world as they are used to validate the various website input, as well as searching the different types of word patterns in large strings in the texts, and for many other uses as well.

And in Microsoft Excel, the Regular Expressions (VBA RegEx or simply the VBA Regex) are not much advertised neither more popular in action, it is because most of the users are good in using the simpleLEFT, RIGHT, MID, as well as the FIND functions for the manipulation of their string.

In this tutorial, we will be discussing the different concepts used in Microsoft Excel, which are as follows:

  1. What do you understand by the term Regex in Microsoft Excel?
  2. Does the Regex function exist in Microsoft Excel?
  3. What are the Regex cheat sheets in Microsoft Excel?
  4. What do you mean by Custom RegEx functions in Microsoft Excel?
  5. What do you mean by Regex Match Function in Microsoft Excel?
  6. What is meant by Regex Extract Function in Microsoft Excel?
  7. What is meant by Regex Replace Function In Microsoft Excel?

What do you mean by the word Regex in Microsoft Excel?

In Microsoft Excel, aregular expression(akaregexorregexp) is the special encoded sequence of the characters that primarily defines a search pattern. And by using that pattern, we can effectively find the matching character combination in a given string or can validate data input.

And the regular expressions primarily have their own syntax which consists of unique characters, operators, as well as the constructs.

  • For example: [0-5] that matches any individual digit ranging from 0 to 5.

A Regular expression is mainly used in various programming languages, including the Virtual Basic for Applications (VBA) and JavaScript. The 'latter' has a unique RegExp object, which one can utilize in order to create custom functions under their needs.

Does the Regex function exist in Microsoft Excel?

The answer to the above question is that: regrettably, there are no such inbuilt Regex functions which exist in Microsoft Excel, and if someone wants to make use of the regular expressions in their respective formulas, so they need to create their user-defined function (VBA or .NET based), or they can install third-party tools which are usually supporting the regexes as well.

What are the Regex cheat sheets in Microsoft Excel?

We all know that whether the given regex pattern is very simple or highly sophisticated, it is built with the help of common syntax. And we have mentioned a quick reference to the main Regex patterns that will help an individual to get a grasp of the basics. It may also work as our cheat sheet when studying further examples respectively;

And if in case we are comfortable with regular expressions, we can jump straight to the RegExp functions as well:

Characters

Microsoft Excel uses these most frequently used patterns to match certain characters.

PatternDescriptionExampleMatches
.Wildcard character: The Wild symbols are used for matching out any single character except a line break in an Excel sheet..ota lot, hot, got, @ot
\dDigit character: It is considered as the single digit that ranges from 0 to 9\dIna2b, matches2
\DAny character which is NOT a digit\DIn b2c, matches band c
\sWhitespace character: It includes space, tab, new line and carriage return.\s.In3 cents, matches3 c
\SAny non-whitespace character\S+In30 cents, it used to match out30 cents
\wWord character: It is termed to be any ASCII letter, digit or underscore\w+In5_dogs***, matches5_dogs
\WAny character which is NOT an alphanumeric character or an underscore\W+In5_dogs***, matches***
\tTab
\nNew line\n\d+The two-line string below matches 10 5 dogs 10 Rabbit
\Escapes the special meaning of a character so that we can search for it effectively.\. \w+\.Escapes a period so we can find the literal "." character in a string Mr., Mrs., Prof.

Character classes

With the help of these patterns, we can easily match the various elements of different character sets in an excel sheet respectively:

PatternDescriptionExampleMatches
[characters]Used for matching any single character in the given bracketsd[oi]gdoganddig
[^characters]Used for matching any single character which is NOT present in the bracketsd[^oi]gMatchesdag, dug, d1g It does not match the doganddig
[from-to]Used for matching any character in the given range between the brackets[0-9] [a-z] [A-Z]Any single digit ranging from number 0 to 9. And a single lowercase letter from a to z. And a single uppercase letter from A to Z.

Quantifiers

In Microsoft Excel, the Quantifiers are termed to be the special expressions which are responsible for specifying the total number of characters to match; it should be applied to the character before it respectively.

PatternDescriptionExampleMatches
*Occurrences in terms of Zero or more2b*2, 2b, 2bb, 2ccc, etc.
+One or more occurrencessho+Inshort, matchessho In shooting, matchesshoo
?Occurrence in terms of Zero or Oneboa?tboat, bot
*?Occurrence in terms of Zero or more but with the condition of less possibility.2a*?In2a, 2aaand2aaa, matches2a
+?Occurrence in terms of Zero or more but with the condition of less possibility.po+?Inpotandpoor, matchespo
??Zero or one occurrence, but as fewer as possibleroa??Inroadandrod, it matchesro
{n}Matches the preceding pattern n times\d{3}Exactly 3 digits
{n,}It matches the preceding pattern to n or more times\d{5,}5 or more digits
{n,m}It is used to match the preceding pattern between n and m times effectively\d{7,9}From 7 to 9 digits

Grouping

In Microsoft Excel, the Grouping constructs are primarily used to capture a substring from the given source string so we can easily perform some of the operations with it efficiently:

SyntaxDescriptionExampleMatches
(pattern)Capturing group: It is purposely used to capture a matching substring and assign it an ordinal number.(\d+)In5 dogs and 10 rabbits, captures5(group 1) and10(group 2)
(?:pattern)Non-capturing group: This group is used to match a group but does not capture it.(\d+)(?: rabbit)In5 dogs and 10 rabbits, captures10
\1Assigned to be the Contents of group 1(\d+)\+(\d+)=\2\+\1Matches 5+10=10+5and captures5and10, which are in capturing groups, respectively
\2Assigned to be the Contents of group 2

Anchors

In Microsoft Excel, the Anchors specify a position in the input string in which we have to look for a match.

AnchorDescriptionExampleMatches
^It is the start of the string^\d+Any number of digits which are present at the start of the string. In5 dogs and 10 rabbits, it matches5
$It stands for the "End of string."\d+$Any particular number of digits those are present at the end of the string. In10 plus 5 gives it 15, matches 15
\bWord boundary\bjoy\bIt matchesjoyas a separate word, but not inenjoyment.

Alternation (OR) construct

In Microsoft Excel, the alternation operand enables the OR logic so we can effectively match either this or that element respectively:

ConstructDescriptionExampleMatches
|It is used to match any single element which is separated by the vertical bar.(s|sh)ellsInshe sells sea shells,matchessellandshells

Look-around constructs

The Lookaround constructs in Excel are much helpful in action when we want to match something that is or is not followed as well as preceded by something else. And these expressions are sometimes called "zero-width assertions" it is because they effectively match a position rather than actual characters.

Note. In Virtual Basic for Application (VBA) RegEx flavour, look behinds are not supported.

PatternDescriptionExampleMatches
(?=)Positive lookaheadX(?=Y)It is used to match the expression X when it is followed by Y.
(?!)Negative lookaheadX (?!Y)It matches out the expression X if not followed by Y effectively.
(?<=)Positive look behind(?<=Y)XIt usually matches the given expression X when it is preceded by Y
(?<!)Negative look behind(?<!Y)XIt is used to match the expression X when it is NOT preceded by Y

What do you mean by Custom RegEx functions in Microsoft Excel?

From the above, we already know that Microsoft Excel has no built-in RegEx function, so to enable the regular expressions we all need to create three custom VBA (Virtual Basic for Application) functions (aka user-defined functions).

How does a VBA RegExp function work in Excel?

Now in this section, we will discuss the inner mechanics and exactly what happens at the backend.

So, when we start using the regular expressions in VBA (Virtual Basic for Application), we either need to activate the RegEx object reference library or we can use the CreateObject function efficiently.

The RegExp object is encapsulated with 4 properties which are as follows:

  1. Pattern: It is used to match the particular type of the defined pattern in the given input string.
  2. Global: Global is used for the purpose of controlling whether to find all matches in the given input string or just to find the first one.
  3. Multiline: Multiline can be used to determine whether to match the given pattern across the line breaks in multi-line strings or only in the first line.
  4. IgnoreCase: It defines whether a given regular expression is case-sensitive (default) or case-insensitive.

# Example: Microsoft Excel VBA Regex

Now in this example, we will check whether the mentioned pattern is present in the given text or not with the help of the RegEx. Test.

And we will be following the below steps to make use of the VBA RegEx.

Step 1: We need to define a new sub-procedure in order to create a macro.

Code:


Regex Formula in Microsoft Excel

Step 2: After that, we will define the two variables using RegEx as an Object that can be used to create a RegEx object and Str as a string effectively.

Code:


Regex Formula in Microsoft Excel

Step 3: In this step, we will create our RegEx object with the CreateObject function's help.

Code:


Regex Formula in Microsoft Excel

Step 4: Now, we will add the pattern to be tested with the RegEx function.

Code:


Regex Formula in Microsoft Excel

Step 5: After that, we will define the string within which we need to check the given pattern effectively.

Code:


Regex Formula in Microsoft Excel

Step 6: Then we will make use of the RegEx. Test to test whether the given pattern is present within the variable named str and debug it also.

Code:


Regex Formula in Microsoft Excel

Step 7: After that, we will hit the F5 or Run button from the keyboard to run the code in order to get the output as well.

What do you mean by Regex Match Function in Microsoft Excel?

It was well known that theRegExpMatchfunction is used to search an input string for the given text which matches a regular expression and then returns 'TRUE' if a match is found. Else it will return 'FALSE.'

SYNTAX:

In which;

  • Text (required): Contains one or more strings to search in.
  • Pattern (required): It is the regular expression to match.
  • Match_case (optional: It primarily controls whether to match or ignore the text case in terms of TRUE OR FALSE.

# Example: How one can make use of the regular expressions to match the given strings

In the given dataset, we will assume that we want to identify the respective entries containing SKU codes.

And it was seen that each SKU starts with 2 capital letters, followed by a hyphen and 4 digits, so that we can match them with the help of the following expression.

In which:

  • [A-Z]{2} means any 2 uppercase letters ranging from A to Z, and
  • \d{4} means any 4 digits that range from 0 to 9.
  • And a word boundary \b indicates that an SKU is a separate word and not part of a bigger string.

Now with the pattern which is established above, we will start typing a formula just like we normally do, and the respective function's name will get appear in the list that is suggested by Excel's AutoComplete effectively:

Regex Formula in Microsoft Excel

Supposing that the original string is present in A5, the formula goes as follows:

And for our convenience, we can input the given regular expression in a separate cell and make use of anAbsolute reference($A$2) for thepatternargument, as this ensures that the given cell address will remain unchanged when will copying out the formula to other cells:

And to display our text labels instead of displaying TRUE and FALSE, nest RegExpMatch in the IF function and specify the desired texts in thevalue_if_trueandvalue_if_falsearguments:


Regex Formula in Microsoft Excel

What is meant by Regex Extract Function in Microsoft Excel?

TheRegExpExtractfunction in Microsoft Excel searches for the substrings that match a regular expression and then extract all the matches or the specific match.

Formula:

In which,

  • Text: It is the text string which needs to be searched.
  • Pattern: Pattern is considered to be the regular expression to match.
  • Instance_num: Instance_num is optional and primarily a serial number that indicates which instance needs to be extracted.
  • Match_case: It primarily controls whether to match or ignore the text case in terms of TRUE OR FALSE.

# Example: How one can extract strings with the help of the regular expressions

In this example, we are discussing how to extract the invoice numbers and for this we will be making use of the simple regex function that is used to match out any 7-digit number:

Now we will be putting the pattern in the A2 cell, and we will get the job done with this compact and easy formula respectively:

If in case a pattern is matched, then a formula will extract an invoice number, and if in case no match is found, it will return nothing:

Regex Formula in Microsoft Excel

What is meant by the Regex Replace Function in Microsoft Excel?

A RegExpReplacefunction in Microsoft Excel is primarily used for the purpose of replacing out the values that are matching a regex with the text that we have specified.

In which:

  • Text: It is the text string in the Excel sheet which needs to be searched.
  • Pattern: Pattern is the required field and the regular expression to match.
  • Replacement: Replacement is texts that can be used replace the matching substrings with it.
  • Instance_num: the instance to replace. The default is "all matches".
  • Match_case: It primarily controls whether to match or ignore the text case in terms of TRUE OR FALSE.

# Example: How one can replace or remove the given strings with the help of regexes

We all know that some of our records usually contain credit card numbers, which are very confidential. We can replace it with something or delete it altogether, and both tasks can be achieved with the help of theRegExpReplace function.

How? In a second scenario, we will be replaced with an empty string.

In our sample table, all the card numbers have 16 digits that are written in 4 groups and are separated with spaces, so to find them; we will replicate the pattern with the help of the regular expression:

And for the replacement, the following string can be used effectively:

And here is a complete formula that we can used to replacethe credit card numbers with the help of the insensitive information.

Moreover, with the regex as well as with the replacement text which are present in the separate cells A2 and B2, the formula works equally well:

Regex Formula in Microsoft Excel

In Microsoft Excel, "removing" is a particular case of "replacing", and in order toremove the credit card numbers, we can make use of an empty string ("") for thereplacementargument:


Regex Formula in Microsoft Excel




Latest Courses