Google Analytics 4 Regex (Regular Expressions) Tutorial

Last Updated: January 15, 2024

In this tutorial, I have explained the building blocks of Regular Expressions (or REGEX), so that you can understand them and use them in Google Analytics (including GA4) and Google Tag Manager.

Table of Contents

What is a Regular Expression in GA4 (Google Analytics 4)?

Regular expression (also called ‘regex’) is used to check for a pattern in a string.

Regex performs advanced matching and substitution operations that would be difficult or impossible to achieve using other methods.

You can create more sophisticated and accurate reports in GA4 by using regex. You can carry out advanced data analysis.

For example, ^Colou?r$ is a regular expression that matches both the string: ‘color’ and ‘colour’.

Google Analytics 4 uses JavaScript regex.

Regular expressions are categorized based on the type of syntax and computer language used for their creation.

Implementation of regex functionality using a particular type of syntax and computer language is called a regex engine.

There are many types of regex engines available. The most popular among them are:

JavaScript
PHP
Python
Ruby
Java
C++
Golang
.NET

Different regex engines support different types of syntax, and the meaning of metacharacters (characters with special meanings in a regex) may change depending on the regex engine used.

Thus, a regular expression considered valid under one regex engine may not be considered valid under another.

Whenever you test a regex using a regex tester tool (like regex101.com), you get the option to select the flavour (aka regex engine) under which you want to test your regular expression:

Since the regex engine used by GA4 and Google Tag Manager is JavaScript, you should always select ‘JavaScript’ as the flavour before testing your regular expressions for GA4/GTM.

GA4 regex is made up of characters and metacharacters.

Metacharacters are the building blocks of a regex. These are the characters that have special meanings in a regex.

Following are the examples of metacharacters for the JavaScript regex:

Other Meta Characters for JavaScript Regex:

Google Analytics 4 uses fully matched Regex

Fully matched regex means the regex fully matches a pattern in a string.

Let us suppose you provided the regex ‘car’.

This regex fully matches only one pattern in a string: ‘car’.

By default, the GA4 property uses fully matched regex.

If you want to use partially matched regex in GA4, you would need to use metacharacters.

Partially matched regex means the regex partially matches a pattern in a string.

Let us suppose you provided the regex ‘car’.

This regex partially matches the following patterns in a string: ‘carbohydrates’, ‘carbon’, ‘caramel’, ‘caravan’, ‘cardiac’ etc.

By default, the GA3 (Universal Analytics) property uses partially matched regex.

How to correctly create regex in GA4

Remember the following tips while creating regex in GA4 and Google Tag Manager:

#1 Use the “|” (pipe) symbol wisely. Since “|” represents the ‘or’ condition, it is not wise to use the pipe symbol at the beginning or end of the regular expression, which may then spoil your required dataset.

For example,

This regex ‘/error|/‘ is intended to match the word ‘error’, but because of the pipe at the end, it also matches: “error occurred.”

Similarly,

This regex ‘|error/‘ is intended to match ‘error’, but because of the pipe at the beginning, it also matches: “System error”

#2 If you are unsure about all the possible combinations in a regex, use “.*” to find a list of all possible combinations in your data set.

#3 Avoid using spaces in regular expressions. White spaces in a regular expression can ruin the results you are expecting.

In regex, spaces are not ignored; they are treated as characters to be matched in the string.

This means that if you include a space in your regex, it will look for that space in the target string.

For example, the regex /cat / will match “A cat “. But it won’t match “A cat” because there’s no space after ‘cat’.

To avoid such issues, ensure that you only include spaces in your regular expressions when they are actually needed as part of the pattern you are trying to match.

#4 Regular expressions are case-sensitive. For example, the regular expression ^Cat would match “Cat” but not “cat” or “CAT“.

#5 Google Analytics can support regular expressions with up to 256 characters. If your regular expression exceeds 256 characters, it won’t work. Hence, make sure to keep your regex character limit below 256.

#6 If you use regular expressions in custom JavaScript tags using Google Tag Manager, always remember to add comments in front of regular expressions.

This makes it easier for others (or yourself in the future) to understand the intent behind the regex.

Your regex in GA4 will not work if you don’t understand this.

By default, a GA4 property uses fully matched JavaScript regex, which makes using regular regex pretty difficult.

Fully matched regex means the regex fully matches a pattern in a string.

What that means, a literal string can work as a fully matched regex in GA4, suggesting that GA4 can interpret the input as a straightforward string match rather than a regular expression.

So you can provide the following regex https://www.perplexity.ai/, and it will work if it exactly matches the following string: https://www.perplexity.ai/

Similarly, you can provide the following regex Canada , and it will work if it exactly matches the following string: Canada

Let us suppose you want to filter out the names of all countries in your GA4 report that begin with ‘Ca’.

So, you created the following regex: ‘^Ca’.

Normally, this regex would match any string that starts with ‘Ca’. But if you use this regex in GA4, it will not work.

Because GA4 supports only fully matched regex, and ‘^Ca’ is a partially matched regex.

To make it a fully matched regex, you will need to rewrite this regex like the one below:

^Ca.*

This regex would match any string that starts with ‘Ca’ like ‘Canada’, ‘Cameroon’, ‘Camobodia’ etc.

Any regex builder (including chatgpt) you use will most likely not create a fully matched regex for you.

You will most likely need to manually convert a partially matched regex into a fully matched regex.

This makes using regex in GA4 difficult and time-consuming.

How to create regex fast in GA4

The biggest problem in using regex is creating the correct regex.

Crafting a pattern that is specific enough to match only the desired text and general enough to cover all relevant cases can be tricky.

A small error or oversight in regex can lead to unexpected matches or failures to match, and these issues can be hard to pinpoint.

Overly complex or inefficient regex can lead to performance issues, especially when processing large amounts of text.

Regex syntax can be intimidating for beginners, and it often requires significant time to learn and understand fully.

Here ChatGPT (an AI-enabled chatbot) can help.

You can ask ChatGPT to create a JavaScript regex that matches your specified pattern.

For example:

Once Chatgpt has created the requested regex, you would need to convert into fully matched regex so that it can work in GA4.

You no longer need a PHD in regex to use it.

How to test regex in GA4?

Use the metric/dimension filter provided by a GA4 exploration report to text your regex for GA4.

If you want to use a traditional regex testing tool (like regex101.com), then make sure that you use the ‘JavaScript’ regex engine and that your regex fully matches a string, as GA4 supports only fully matched regex.

Where can you use regex in GA4?

There are many cases where regular expressions are very useful in GA4. Some such cases are:

Setting up subproperties in GA4.
Setting up site search tracking without query parameters.
Setting up Referral Exclusion in GA4.
Setting up data filters in the exploration report in GA4.
Setting up GA4 Custom Events via GTM.
Setting up Content groups in GA4.
Setting up audiences in GA4.
Creating and modifying events in the GA4 UI.

#1 Setting up subproperties in GA4

If you want to create a filtered reporting view in GA4 and you have access to GA4 360 (paid version of GA4), then you can create a subproperty.

A subproperty is like a typical GA4 property, but it gets its data from another property (also called the source property).

To create a subproperty, you will need to use event filter(s). To create an event filter, you will need to specify one or more conditions. You can use regex while defining the conditions:

#2 Setting up site search tracking without query parameters

GA4 automatically tracks site searches once you have enabled Enhanced Measurement Tracking in your GA4 property.

However, there could be a situation in which the site search feature is installed on your website in such a way that the default site search tracking feature provided by GA4 won’t work for you.

In that case, you would need to use GTM to set up site search tracking in your GA4 property.

Let us suppose you have the site search feature installed on your website, but the search term appears in the search URL without a query parameter.

For example:

https://www.optimizesmart.com/search/attribution+modelling

Instead of

https://www.optimizesmart.com/?s=attribution+modelling

In that case, you won’t be able to benefit from the default site search tracking capability of GA4.

You would need to use GTM to set up site search tracking.

While setting up site search tracking via GTM, you need to use regex to extract the search term from the search URL.

#3 Setting up Referral Exclusion in GA4

Google Analytics 4 allows you to set condition(s) that identify unwanted referrals and prevent them from being reported as referral traffic.

This way, you don’t see the referral traffic from certain domains (like your own domain or from a payment gateway like PayPal) in your GA4 reports.

This functionality is called the referral exclusion list in the earlier version of Google Analytics (Universal Analytics).

In the case of GA4, the ‘referral exclusion list’ is known as the “List unwanted referrals”

You can use regex while setting up ‘List unwanted referrals’:

#4 Setting up data filters in the exploration report in GA4

You can use regex while setting up data filters in an exploration report.

does not match regex exploration report ga4

#5 Setting up GA4 Custom Events via GTM

There are four categories of events in GA4: 1. Automatically collected events 2. Enhanced measurement events 3. Recommended events 4. Custom events.

Custom events are the events that you create and use.

Custom events can be any interaction on your website that is not tracked by default.

For example, button clicks, sign-up events, form submissions, etc.

When you set up a custom event via GTM, you create a trigger that fires when the event occurs, and all trigger conditions are true.

You can use regex while creating the trigger conditions:

Setting up GA4 Custom Events via GTM using

#6 Setting up Content groups in GA4

In the context of GA4, a content group is a set of web pages that are based on the same theme.

So in the case of a blog, a content group can be a set of web pages based on the same topic, e.g. Attribution Modelling.

In the case of an ecommerce website, a content group can be a set of web pages that sell similar products, e.g. shoes.

Content groups allow you to measure the performance of a set of web pages at the content category or product category level.

Content groups are especially useful if you have a big website with hundreds or thousands of web pages, and you can realistically measure the web pages’ performance only at the group level and not at the individual page level.

While setting up content groups in GA4, you will need to identify all the web pages which will be part of the content group. You can identify such web pages via regular expressions.

For example:

In order to identify all the web pages on my website that belong to the ‘Attribution Modelling’ content group, I can use the following regular expression:

#7 Setting up audiences in GA4

In the context of GA4, an audience is a group of users that you can club together based on any combinations of attributes or experiences in a particular time frame.

The audiences feature in GA4 allows you to segment your users based on the dimensions, metrics, and events important to your business.

While creating/editing an audience, you need to set up one or more conditions that define the audience criteria. You can use regex while setting up these conditions:

#8 Creating and modifying events in the GA4 UI.

You can use regex while creating/modifying events in the GA4 user interface (UI):

This is a game changer as you now have more control over the event definition.

Understanding the various metacharacters used in GA4 regex

Metacharacter – Forward Slash

Forward Slash (/) has a special meaning in a regex.

It is used to mark the beginning and end of a regular expression.

For example:

/shop/

The regular expression /shop/ matches the pattern ‘shop’ in a string.

So this regular expression will match the following patterns in a string:

“I’m going to the shop to buy some milk.”
“The shop is open from 9am to 5pm.”
“I need to stop by the shop to pick up some bread.”

These strings all contain the exact substring ‘shop’, so the regular expression would match them.

Note that this regular expression will only match the exact string ‘shop’.

It will not match ‘shopping’, ‘shopper’, or any other string that contains ‘shop’ as a substring.

If you want to match any string that contains ‘shop’ as a substring, you can use the . character, which matches any single character (except for the newline).

/shop./

/^[a-z]+$/

In this example, the regular expression /^[a-z]+$/ is used to match a string that consists only of lowercase letters.

The ^ character indicates the beginning of the string, and the $ character indicates the end of the string.

The [a-z] character class matches any lowercase letter, and the + character indicates that one or more of the preceding characters should be matched.

Here are some examples of strings that would match this regular expression:

“abc”
“def”
“ghijklmnopqrstuvwxyz”

These strings all consist only of lowercase letters, so they would be matched by the regular expression.

The regex /^[a-z]+$/ will not match strings “abc123” or “ABC” because they both contain characters other than lowercase letters.

/colou?r/

The regular expression /colou?r/ is a pattern that matches the string ‘colour’ or ‘color’.

It uses the metacharacter ?, which indicates that the preceding character or character class should be matched 0 or 1 time.

In this case, the ? character is placed after the ‘u’ in ‘colou’, indicating that the ‘u’ is optional.

This means that the regular expression will match both the string ‘colour’ and the string ‘color’.

Metacharacter – Back Slash

‘\’ is the escaping character (also known as back slash) that is used to escape from the normal way a subsequent character is interpreted.

Through escaping character, you can convert a regular character into a metacharacter or turn a metacharacter into a regular character.

‘n’ is a regular character.

But if you add escaping character (back slash) before it, then it would become a metacharacter: \n, which is a new line character.

If you use the regex /abcd\n/, it won’t match the string abcd\n3456 because \n would be treated as a newline character instead of a regular character.

Using the regex /abcd\\n/ will match the string abcd\n3456 because \n would be treated as a regular character instead of the newline character.

‘s’ is a regular character.

But if you add escaping character (back slash) before it, then it would become a metacharacter: \s, which is used to check for whitespace characters.

The regular expression /\s/ will match any white space character in the string “Hello world!“.

Using the regex /abcd\s/ won’t match the string abcd\s3456 because \s would be treated as a metacharacter instead of a regular character.

Using the regex /abcd\\s/ will match the string abcd\s3456 because \s would be treated as a regular character instead of the metacharacter.

How to make forward slash a regular character?

If you want regex to treat forward slash as a forward slash and not some special character, then you need to use it along with the escaping character (back slash) like this: \/

So if you want to check for a pattern, say /shop in the string /shop/collection/men/

then your regex should be: /\/shop/

Using the regex //shop/ won’t match any pattern in the string /shop/collection/men/ because /s would be treated as a metacharacter instead of a regular forward slash.

How to make ‘?’ a regular character?

‘?‘ is a metacharacter.

To make it a regular character, you need to add escaping character before \?

So if you want to check for a question mark in the string colou?r

then your regex should be: /colou\?r/

If you use the regex /colou?r/, it would match the string color or colour and not colou?r as then ? will be treated as a metacharacter.

Metacharacter – Caret ^

‘^’ – This is known as ‘Caret’ and is used to denote the beginning of a regular expression.

/^\/Colou?r/ => Check for a pattern which starts with ‘/Color’ or ‘/Colour’.

The regular expression /^\/Colou?r/ consists of three parts:

The start-of-line anchor ^ indicates that the regular expression should only match if the pattern appears at the beginning of a string.

The forward slash “/” is a literal character that the regular expression will try to match.

The string “Colou” is a literal string that the regular expression will try to match.

The question mark (?) and the letter “r“:

The question mark indicates that the preceding character (in this case, the letter “u”) is optional. It will match zero or one occurrence of the preceding character.

The letter “r” is a literal character that the regular expression will try to match.

Together, this regular expression will match strings that start with a forward slash “/”, followed by the characters “co”, followed by zero or one occurrence of the character “u”, and then the character “r”.

For example,

This regular expression would match the following strings:

“/Colour”: This string starts with “/Colour”.

“/Color”: This string starts with “/Color”.

/Colour/?proid=3456/review

/Color-red/?proid=3456/review

This regular expression would not match the following string:

“/coloura”: This string does not start with a forward slash.

/^[nN]ov(ember)? 28(th)?$/

The regular expression /^[nN]ov(ember)? 28(th)?$/ consists of several parts:

The start-of-line anchor ^: This indicates that the regular expression should only match if the pattern appears at the beginning of a string.
The character set [nN]: This matches either the lowercase letter "n" or the uppercase letter "N".
The string "ov": This is a literal string that the regular expression will try to match.
The group (ember)?: This group consists of the string "ember" and the question mark (?). The question mark indicates that the preceding string is optional. It will match zero or one occurrence of the preceding string.
The string " 28": This is a literal string that the regular expression will try to match.
The group (th)?: This group consists of the string "th" and the question mark (?). The question mark indicates that the preceding string is optional. It will match zero or one occurrence of the preceding string.
The end-of-line anchor $: This indicates that the regular expression should only match if the pattern appears at the end of a string.

Together, this regular expression will match strings that start and end with either "nov" or "Nov", optionally followed by "ember", followed by " 28", optionally followed by "th".

For example, this regular expression would match the following strings:

"Nov 28": This string starts and ends with "Nov 28".
"Nov 28th": This string starts and ends with "Nov 28th".
"nov ember 28": This string starts and ends with "nov ember 28".
"Nov ember 28th": This string starts and ends with "Nov ember 28th".

It would not match the following strings:

"Nov": This string does not end with "28".
"Nov 28th 29th": This string does not end with "28th".

/^\/elearning\.html/ => Check for a pattern which starts with ‘/elearning.html’.

The regular expression /^\/elearning\.html/ consists of three parts:

The start-of-line anchor ^: This indicates that the regular expression should only match if the pattern appears at the beginning of a string.
The forward slash "/": This is a literal character that the regular expression will try to match.
The string "elearning.html": This is a literal string that the regular expression will try to match.

Together, this regular expression will match strings that start with a forward slash "/", followed by the characters "elearning.html".

For example, this regular expression would match the following string:

"/elearning.html": This string starts with "/elearning.html".

It would not match the following strings:

"/elearning": This string does not end with ".html".
"elearning.html": This string does not start with a forward slash.

/^\/.*\.php/ => Check for a pattern which starts with any file with .php extension.

The regular expression /^\/.*\.php/ consists of three parts:

The start-of-line anchor ^: This indicates that the regular expression should only match if the pattern appears at the beginning of a string.
The forward slash "/": This is a literal character that the regular expression will try to match.
The group .*\.php: This group consists of the following two parts:

The dot (.) and the asterisk (*): The dot matches any single character, and the asterisk indicates that the preceding character (in this case, the dot) can be matched zero or more times. This group will therefore match any string of characters.
The string ".php": This is a literal string that the regular expression will try to match.

Together, this regular expression will match strings that start with a forward slash "/", followed by any string of characters, followed by the characters ".php".

For example, this regular expression would match the following strings:

"/abc.php": This string starts with "/abc.php".
"/path/to/file.php": This string starts with "/path/to/file.php".

/^\/product-price\.php/ => Check for a pattern which starts with ‘/product-price.php’.

The regular expression /^\/product-price\.php/ consists of three parts:

The start-of-line anchor ^: This indicates that the regular expression should only match if the pattern appears at the beginning of a string.
The forward slash "/": This is a literal character that the regular expression will try to match.
The string "product-price.php": This is a literal string that the regular expression will try to match.

Together, this regular expression will match strings that start with a forward slash "/", followed by the characters "product-price.php".

For example, this regular expression would match the following string:

"/product-price.php": This string starts with "/product-price.php".

Caret also means NOT when used after the opening square bracket.

/[^a]/ => Check for any single character other than the lowercase letter ‘a’.

The regular expression /[^a]/ consists of two parts:

The character set [^a]: This matches any single character that is NOT the letter "a". The caret (^) inside the square brackets indicates that the character set should match any character that is NOT in the set.
The forward slash "/": This indicates the end of the regular expression.

Together, this regular expression will match any single character that is NOT the letter "a".

For example, this regular expression would match the following string:

“bcd”
“defg”
“hijkl

/[^B]/ = > Check for any single character other than the uppercase letter ‘B’.

For example: the regex /product-[^B]/ will match the following strings:

/shop/men/sales/product-b

/shop/men/sales/product-c

/[^1]/ => Check for any single character other than the number ‘1’.

For example: the regex /proid=[^1]/ will match the string:

/men/product-b?proid=3456&gclid=153dwf3533

but will not match the string:

/men/product-b?proid=1456&gclid=153dwf3533

/[^ab]/ => Check for any single character other than the lowercase letters ‘a’ and ‘b’.

For example: the regex /location=[^ab]/ will match the string:

/shop/collection/prodID=141?location=canada

but will not match the string:

/shop/collection/prodID=141?location=america

/shop/collection/prodID=141?location=bermuda

/[^aB]/ => Check for any single character other than the lower case letter ‘a’ and uppercase letter ‘B’.

Here are a few examples of strings that will all be matched by this regular expression:

"c": This string consists of a single character that is not "a" or "B".
"xyz": This string consists of three characters that are not "a" or "B".
"123": This string consists of three characters that are not "a" or "B".
"#$%&": This string consists of four characters that are not "a" or "B".

/[^1B]/ => Check for any single character other than the number ‘1’ and uppercase letter ‘B’

Here are a few examples of strings that will all be matched by this regular expression:

"a": This string consists of a single character that is not "1" or "B".
"xyz": This string consists of three characters that are not "1" or "B".
"123": This string consists of three characters that are not "1" or "B".
"#$%&": This string consists of four characters that are not "1" or "B".

/[^Dog]/ => Check for any single character other than the following: uppercase letter ‘D’, lowercase letter ‘o’ and the lowercase letter ‘g’.

For example: the regex /location=[^Dog]/ will match:

/shop/collection/prodID=141?location=canada

/shop/collection/prodID=141?location=denmark

but will not match:

/shop/collection/prodID=141?location=Denver

/shop/collection/prodID=141?location=ontario

/shop/collection/prodID=141?location=greenland

/[^123b]/ => Check for any single character other than the following characters: number ‘1’, number ‘2’, number ‘3’ and lowercase letter ‘b’.

Here are a few examples of strings that will all be matched by this regular expression:

"a": This string consists of a single character that is not "1", "2", "3", or "b".
"xyz": This string consists of three characters that are not "1", "2", "3", or "b".
"#$%&": This string consists of four characters that are not "1", "2", "3", or "b".

/[^1-3]/ => Check for any single character other than the following: number ‘1’, number ‘2’ and number ‘3’.

For example: the regex /prodID=[^1-3]/ will match:

/shop/collection/prodID=45321&cid=1313

/shop/collection/prodID=5321&cid=13442

but will not match:

/shop/collection/prodID=12321&cid=1313

/shop/collection/prodID=2321&cid=1313

/shop/collection/prodID=321&cid=1313

/[^0-9]/ => Check for any single character other than the number.

For example: the regex /de\/[^0-9]/ will match all those pages in the de/ folder whose name doesn’t start with a number:

/de/school-london

/de/general/

but will not match:

/de/12fggtyooo

/[^a-z]/ => Check for any single character which is not a lowercase letter.

For example: the regex /de\/[^a-z]/ will match all those pages in the de/ folder whose name doesn’t start with a lowercase letter:

/de/1london-school
/de/?productid=423543

but will not match:

/de/school/london

/[^A-Z]/ => Check for any single character which is not an upper case letter.

Here are a few examples of strings that will all be matched by the regular expression /[^A-Z]/:

"a": This string consists of a single character that is not an uppercase letter.
"xyz": This string consists of three characters that are not uppercase letters.
"123": This string consists of three characters that are not uppercase letters.

Metacharacter – Dollar $

‘$’ – It is used to denote the end of a regular expression or end of a line.

Examples

/Colou?r$/ => Check for a pattern which ends with ‘Color’ or ‘Colour’

/Nov(ember)?$/ => Check for a pattern which ends with ‘Nov’ or ‘November’

/elearning\.html$/ => Check for a pattern which ends with ‘elearning.html’

/\.php$/ => Check for a pattern which ends with .php

/product-price\.php$/ => Check for a pattern which ends with ‘product-price.php’

Metacharacter – Square Bracket []

‘[]’ – This square bracket is used to check for any single character in the character set specified in [].

Examples

/[a]/ => Check for a single character which is a lowercase letter ‘a’.

/[ab]/ => Check for a single character which is either a lower case letter ‘a’ or ‘b’.

/[aB]/ => Check for a single character which is either a lower case letter ‘a’ or uppercase letter ‘B’.

/[1B]/ => Check for a single character which is either a number ‘1’ or an uppercase letter ‘B’.

/[Dog]/ => Check for a single character which can be any one of the following: uppercase letter ‘D’, lower case letter ‘o’ or the lowercase letter ‘g’.

/[123b]/ => Check for a single character which can be any one of the following: number ‘1’, number ‘2’, number ‘3’ or lowercase letter ‘b’.

/[1-3]/ => Check for a single character which can be any one number from 1, 2 and 3.

/[0-9]/ => Check for a single character which is a number.

/[a-d]/ => Check for a single character which can be any one of the following lowercase letter: ‘a’, ‘b’, ‘c’ or ‘d’.

/[a-z]/ => Check for a single character which is a lowercase letter.

/[A-Z]/ => Check for a single character which is an upper case letter.

/[A-T]/ => Check for a single character which can be any uppercase letter from ‘A’ to ‘T’.

/[home.php]/ => Check for a single character which can be any one of the following characters:

lowercase letter ‘h’,

lowercase letter ‘o’,

lowercase letter ‘m’,

lowercase letter ‘e’,

special character ‘.’,

lower case letter ‘p’,

lowercase letter ‘h’ or

lowercase letter ‘p’

Note: If you want to check for a letter regardless of its case (upper case or lowercase) then use the regex /[a-zA-Z]/.

Metacharacter – Parenthesis ()

‘()’ – This is known as parenthesis and is used to check for a string.

Examples

/(a)/ => Check for string ‘a’

/(ab)/ => Check for string ‘ab’

/(dog)/ => Check for string ‘dog’

/(dog123)/ => Check for string ‘dog123’

/(0-9)/ => Check for string ‘0-9’

/(A-Z)/ => Check for string ‘A-Z’

/(a-z)/ => Check for string ‘a-z’

/(123dog588)/ => Check for string ‘123dog588’

Note: () is also used to create and store variables. For e.g. /^ (.*) $/

Metacharacter – Question Mark ?

‘?’ is used to check for zero or one occurrence of the preceding character.

For example:

/[a]?/ => Check for zero or one occurrence of the lowercase letter ‘a’.

The regular expression /[a]?/ consists of two parts:

The character set [a]: This matches a single character, the letter "a".
The question mark (?): This indicates that the preceding character set or pattern is optional. It will match zero or one occurrence of the preceding character set or pattern.

Together, this regular expression will match zero or one occurrence of the letter "a".

For example, the following strings will all be matched by this regular expression:

“” (empty string, no occurrences of “a”)
“a” (one occurrence of “a”)
“abc” (one occurrence of “a”)

/[dog]?/ => Check for zero or one occurrence of the lowercase letter ‘d’, ‘o’ or ‘g’.

The regular expression /[dog]?/ consists of two parts:

The character set [dog]: This matches a single character that is either "d", "o", or "g".
The question mark (?): This indicates that the preceding character set or pattern is optional. It will match zero or one occurrence of the preceding character set or pattern.

Together, this regular expression will match zero or one occurrence of the characters "d", "o", or "g".

For example, the following strings will all be matched by this regular expression:

“” (empty string, no occurrences of “d”, “o”, or “g”)
“d” (one occurrence of “d”)
“o” (one occurrence of “o”)
“g” (one occurrence of “g”)

/[^dog]?/ => Check for zero or one occurrence of a character which is not the lowercase letter ‘d’, ‘o’ or ‘g’.

The regular expression /[^dog]?/ consists of two parts:

The character set [^dog]: This matches any character that is NOT "d", "o", or "g". The caret (^) inside the square brackets indicates that the character set should match any character that is NOT in the set.
The question mark (?): This indicates that the preceding character set or pattern is optional. It will match zero or one occurrence of the preceding character set or pattern.

Together, this regular expression will match zero or one occurrence of any character that is NOT "d", "o", or "g".

For example, the following strings will all be matched by this regular expression:

“” (empty string, no occurrences of characters that are not “d”, “o”, or “g”)
“a” (one occurrence of a letter that is not “d”, “o”, or “g”)
“!” (one occurrence of a punctuation mark that is not “d”, “o”, or “g”)
“a!” (one occurrence each of a letter and a punctuation mark that are not “d”, “o”, or “g”)
“a!0” (one occurrence each of a letter, a punctuation mark, and a digit that are not “d”, “o”, or “g”)

/[0-9]?/ => Check for zero or one occurrence of a number.

The regular expression /[0-9]?/ consists of two parts:

The character set [0-9]: This matches any single digit between 0 and 9, inclusive. The range 0-9 inside the square brackets indicates that the character set should match any character that is within that range.
The question mark (?): This indicates that the preceding character set or pattern is optional. It will match zero or one occurrence of the preceding character set or pattern.

Together, this regular expression will match zero or one occurrence of any single digit between 0 and 9, inclusive.

For example, the following strings will all be matched by this regular expression:

“” (empty string, no occurrences of digits)
“0” (one occurrence of “0”)
“9” (one occurrence of “9”)

/[^a-z]?/ => Check for zero or one occurrence of a character which is not a lowercase letter.

The regular expression /[^a-z]?/ consists of two parts:

The character set [^a-z]: This matches any character that is not a lowercase letter in the alphabet. The caret (^) inside the square brackets indicates that the character set should match any character that is NOT in the set.
The question mark (?): This indicates that the preceding character set or pattern is optional. It will match zero or one occurrence of the preceding character set or pattern.

Together, this regular expression will match zero or one occurrence of any character that is NOT a lowercase letter in the alphabet.

For example, the following strings will all be matched by this regular expression:

“abc123” – This string would match the regular expression because it contains a number which is not a lowercase letter. The match would be the 1 character.
“Abc” – This string would match the regular expression because it contains an uppercase letter which is not a lowercase letter. The match would be the A character.

This regular expression will NOT match the following string:

"abc"- This string would not match the regular expression because it does not contain any characters that are not lowercase letters.

Metacharacter – Plus +

‘+‘ is used to check for one or more occurrences of the preceding character.

For example:

/[a]+/ => Check for one or more occurrences of the lowercase letter ‘a’.

For example, the following strings will all be matched by this regular expression:

“a” (one occurrence of “a”)
“aa” (two occurrences of “a”)
“aaa” (three occurrences of “a”)

/[dog]+/ => Check for one or more occurrences of letters ‘d’, ‘o’ or ‘g’ (in any order).

For example, the following strings will all be matched by this regular expression:

“d” (one occurrence of “d”)
“o” (one occurrence of “o”)
“g” (one occurrence of “g”)
“dog” (one occurrence each of “d”, “o”, and “g”)
“god” (one occurrence each of “g”, “o”, and “d”)
“godog” (two occurrences each of “g”, “o”, and “d”)

/[548]+/ => Check for one or more occurrences of numbers ‘5’, ‘4’ or ‘8’ (in any order).

For example, the following strings will all be matched by this regular expression:

“5” (one occurrence of “5”)
“4” (one occurrence of “4”)
“8” (one occurrence of “8”)
“548” (one occurrence each of “5”, “4”, and “8”)
“854” (one occurrence each of “8”, “5”, and “4”)
“854458” (two occurrences each of “8”, “5”, and “4”)

/[0-9]+/ => Check for one or more occurrences of a number.

For example, the following strings will all be matched by this regular expression:

“0” (one occurrence of “0”)
“9” (one occurrence of “9”)
“01” (one occurrence each of “0” and “1”)
“09” (one occurrence each of “0” and “9”)
“0123456789” (one occurrence each of “0” through “9”)

/[a-z]+/ => Check for one or more occurrences of a lowercase letter.

For example, the following strings will all be matched by this regular expression:

“a” (one occurrence of “a”)
“z” (one occurrence of “z”)
“ab” (one occurrence each of “a” and “b”)
“az” (one occurrence each of “a” and “z”)
“abcdefghijklmnopqrstuvwxyz” (one occurrence each of “a” through “z”)

/[^a-z]+/ => Check for one or more characters which are not lowercase letters.

For example, the following strings will all be matched by this regular expression:

“0” (one occurrence of a digit that is not a lowercase letter)
“!” (one occurrence of a punctuation mark that is not a lowercase letter)
“0!” (one occurrence each of a digit and a punctuation mark that are not lowercase letters)
“0!A” (one occurrence each of a digit, a punctuation mark, and an uppercase letter that are not lowercase letters)

/[a-zA-z]+/ => Check for one or more occurrences of uppercase and lowercase letters.

For example, the following strings will all be matched by this regular expression:

“a” (one occurrence of “a”)
“z” (one occurrence of “z”)
“A” (one occurrence of “A”)
“Z” (one occurrence of “Z”)
“ab” (one occurrence each of “a” and “b”)
“az” (one occurrence each of “a” and “z”)
“AZ” (one occurrence each of “A” and “Z”)
“aA” (one occurrence each of “a” and “A”)
“abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ” (one occurrence each of “a” through “z” and “A” through “Z”)

/[a-z0-9]+/ => Check for one or more occurrences of lowercase letters and numbers.

For example, the following strings will all be matched by this regular expression:

“a” (one occurrence of “a”)
“z” (one occurrence of “z”)
“0” (one occurrence of “0”)
“9” (one occurrence of “9”)
“ab” (one occurrence each of “a” and “b”)
“az” (one occurrence each of “a” and “z”)
“09” (one occurrence each of “0” and “9”)
“a0” (one occurrence each of “a” and “0”)
“abcdefghijklmnopqrstuvwxyz0123456789” (one occurrence each of “a” through “z” and “0” through “9”)

/[A-Z0-9]+/ => Check for one or more occurrences of uppercase letters and numbers.

For example, the following strings will all be matched by this regular expression:

“A” (one occurrence of “A”)
“Z” (one occurrence of “Z”)
“0” (one occurrence of “0”)
“9” (one occurrence of “9”)
“AB” (one occurrence each of “A” and “B”)
“AZ” (one occurrence each of “A” and “Z”)
“09” (one occurrence each of “0” and “9”)
“A0” (one occurrence each of “A” and “0”)
“ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789” (one occurrence each of “A” through “Z” and “0” through “9”)

/[^9]+/ => Check for one or more occurrences of characters but not the number 9.

For example, the following strings will all be matched by this regular expression:

“a” (one occurrence of a letter that is not “9”)
“!” (one occurrence of a punctuation mark that is not “9”)
“a!” (one occurrence each of a letter and a punctuation mark that are not “9”)
“a!0” (one occurrence each of a letter, a punctuation mark, and a digit that are not “9”)

/31+/ => Check for one or more occurrences of the numbers 3 and 1 in sequence.

For example, the following strings will all be matched by this regular expression:

“31” (one occurrence of “31”)
“311” (two occurrences of “31”)
“3111” (three occurrences of “31”)

However, the following strings will not be matched:

“” (an empty string, zero occurrences of “31”)
“3” (one occurrence of “3”, but not followed by “1”)
“1” (one occurrence of “1”, but not preceded by “3”)

Metacharacter – Multiply *

‘*‘ is used to check for any number of occurrences (including zero occurrences) of the preceding character.

For example:

/[a]*/ => Check for zero or more occurrences of the lowercase letter ‘a’.

For example, the following strings will all be matched by this regular expression:

“” (an empty string, zero occurrences of “a”)
“a” (one occurrence of “a”)
“aa” (two occurrences of “a”)
“aaa” (three occurrences of “a”)

/[dog]*/ => Check for zero or more occurrences of letters ‘d’, ‘o’ or ‘g’ (in any order).

For example, the following strings will all be matched by this regular expression:

“” (an empty string, zero occurrences of “d”, “o”, or “g”)
“d” (one occurrence of “d”)
“g” (one occurrence of “g”)
“o” (one occurrence of “o”)
“dog” (one occurrence each of “d”, “o”, and “g”)
“god” (one occurrence each of “g”, “o”, and “d”)
“ogd” (one occurrence each of “o”, “g”, and “d”)

/[548]*/ => Check for zero or more occurrences of numbers ‘5’, ‘4’ or ‘8’ (in any order).

For example, the following strings will all be matched by this regular expression:

“” (an empty string, zero occurrences of “5”, “4”, or “8”)
“5” (one occurrence of “5”)
“4” (one occurrence of “4”)
“8” (one occurrence of “8”)
“54” (one occurrence each of “5” and “4”)
“85” (one occurrence each of “8” and “5”)
“548” (one occurrence each of “5”, “4”, and “8”)

/[0-9]*/ => Check for zero or more occurrences of a number.

For example, the following strings will all be matched by this regular expression:

“” (an empty string, zero occurrences of “0” through “9”)
“0” (one occurrence of “0”)
“9” (one occurrence of “9”)
“89” (one occurrence each of “8” and “9”)
“1234” (one occurrence each of “1”, “2”, “3”, and “4”)

/[a-z]*/ => Check for zero or more occurrences of a lowercase letter.

For example, the following strings will all be matched by this regular expression:

“” (an empty string, zero occurrences of “a” through “z”)
“a” (one occurrence of “a”)
“z” (one occurrence of “z”)
“az” (one occurrence each of “a” and “z”)
“abc” (one occurrence each of “a”, “b”, and “c”)

/[^a-z]*/ => Check for zero or more characters which are not lowercase letters.

For example, the following strings will all be matched by this regular expression:

“” (an empty string, zero occurrences of any characters other than “a” through “z”)
“0” (one occurrence of a digit that is not a lowercase letter)
“Z” (one occurrence of an uppercase letter that is not a lowercase letter)
“!” (one occurrence of a punctuation mark that is not a lowercase letter)
“0Z!” (one occurrence each of a digit, an uppercase letter, and a punctuation mark that are not lowercase letters)

/[a-zA-z]*/ => Check for zero or more occurrences of uppercase and lowercase letters.

For example, the following strings will all be matched by this regular expression:

“” (an empty string, zero occurrences of “a” through “z”)
“a” (one occurrence of “a”)
“z” (one occurrence of “z”)
“A” (one occurrence of “A”)
“Z” (one occurrence of “Z”)
“az” (one occurrence each of “a” and “z”)
“azAZ” (one occurrence each of “a”, “z”, “A”, and “Z”)

/[a-z0-9]*/ => Check for zero or more occurrences of lowercase letters and numbers.

For example, the following strings will all be matched by this regular expression:

“” (an empty string, zero occurrences of “a” through “z” or “0” through “9”)
“a” (one occurrence of “a”)
“z” (one occurrence of “z”)
“0” (one occurrence of “0”)
“9” (one occurrence of “9”)
“az” (one occurrence each of “a” and “z”)
“09” (one occurrence each of “0” and “9”)
“abc123” (one occurrence each of “a”, “b”, “c”, “1”, “2”, and “3”)

/[A-Z0-9]*/ => Check for zero or more occurrences of uppercase letters and numbers.

For example, the following strings will all be matched by this regular expression:

“” (an empty string, zero occurrences of “A” through “Z” or “0” through “9”)
“A” (one occurrence of “A”)
“Z” (one occurrence of “Z”)
“0” (one occurrence of “0”)
“9” (one occurrence of “9”)
“AZ” (one occurrence each of “A” and “Z”)
“09” (one occurrence each of “0” and “9”)
“ABC123” (one occurrence each of “A”, “B”, “C”, “1”, “2”, and “3”)

/[^9]*/ => Check for zero or more occurrences of characters but not the number 9.

For example, the following strings will all be matched by this regular expression:

“” (an empty string, zero occurrences of any characters other than “9”)
“a” (one occurrence of a letter that is not “9”)
“0” (one occurrence of a digit that is not “9”)
“!” (one occurrence of a punctuation mark that is not “9”)
“a0!” (one occurrence each of a letter, a digit, and a punctuation mark that are not “9”)

/31*/ => Check for zero or more occurrences of the numbers 3 and 1 in sequence.

For example, the following strings will all be matched by this regular expression:

“” (an empty string, zero occurrences of “31”)
“3” (one occurrence of “3”)
“1” (one occurrence of “1”)
“31” (one occurrence of “31”)
“311” (two occurrences of “31”)
“3111” (three occurrences of “31”)

Metacharacter – Dot .

‘.’ is used to check for a single character (any character that can be typed via a keyboard) other than a line break character (\n).

Here are some examples of strings that would match the regular expression /./:

a
1
#
hello
goodbye
123
abc

Similarly, the regular expression: /Action ., Scene2/ would match the following strings:

Action 1, Scene2
Action A, Scene2
Action 9, Scene2
Action &, Scene2

Here are some examples of strings that would not match the regular expression /Action ., Scene2/

Action123, Scene2 (contains more than one character after Action)
Action , Scene2 (contains a space character after Action instead of a single character)
Action,Scene2 (does not contain a space character after the comma)
Action Scene2 (does not contain a comma)
Scene2, Action (characters are not in the correct order)

Metacharacter – Pipe Symbol |

The metacharacter ‘|’ is used to create the logical OR condition.

For example:

The regular expression /His|Her/ will match any string that contains either the string ‘His‘ or the string ‘Her‘.

Here are some examples of strings that would match this regular expression:

His
Her
His book
Her book
His or Her book

Here are some examples of strings that would not match this regular expression:

HisHer (does not contain either His or Her as separate strings)
HisOrHer (does not contain either His or Her as separate strings)
book (does not contain either His or Her)
Hers (does not contain either His or Her)
this is his book (does not contain either His or Her)

Another example:

The regular expression /his|her|^their|its*|our+/ will match any string that contains any of the following patterns:

The string ‘his‘
The string ‘her‘
The string ‘their‘ at the start of the string
Zero or more occurrences of the string ‘its‘
One or more occurrences of the string ‘our‘

Here are some examples of strings that would match this regular expression:

his
her
their book
their cat
its cat
its cat and its dog
our cat
our cat and our dog

Here are some examples of strings that would not match this regular expression:

hiss (does not contain his or her as separate strings)
herr (does not contain his or her as separate strings)
book (does not contain his, her, their, its, or our)
cat (does not contain his, her, their, its, or our)
cat and dog (does not contain his, her, their, its, or our)

Metacharacter – Exclamation !

The metacharacter exclamation symbol ‘!’ is used to create the logical NOT condition. It is used to negate a character set and that’s why is also known as the negation or not metacharacter.

Note: The exclamation symbol has a different meaning when used inside of a character set. In that case, it does not act as a metacharacter.

Examples

/![a-z]/ => Check for a single character which is not a lowercase letter.
/[!a-z]/ => Check for a single character, either ‘!’ or a lowercase letter. Here ‘!’ is not treated as a metacharacter because it is used inside the character set.
/!(abc)/ => Check for a string which is not the string ‘abc’.
/(!abc)/ => Check for the string ‘!abc’. Here ‘!’ is not treated as a metacharacter because it is used inside the character set.
/![0-9]/ => Check for a single character which is not a number.
/[!0-9]/ => Check for a single character which is either ‘!’ or a number. Here ‘!’ is not treated as a metacharacter because it is used inside the character set.
/a!b/ => Check for the string ‘a!b’. Here ‘!’ is not treated as a metacharacter because it is used inside the character set.

Metacharacter – Curly Brackets {}

{} is used to check for 1 or more occurrences of the preceding character.

It is just like the metacharacter ‘+’ but it provides more control over the number of occurrences of the preceding character you want to match.

For example:

1{1} => check for 1 occurrence of the character ‘1’. This regex will match 1

1{2} => check for 2 occurrences of the character ‘1’. This regex will match 11

1{3} =>check for 3 occurrences of the character ‘1’. This regex will match 111

1{4} => check for 4 occurrences of the character ‘1’. This regex will match 1111

1{1,4} =>check for 1 to 4 occurrences of the character ‘1’. This regex will match 1,11, 111, 1111

[0-9]{2} => check for 2 occurrences of a number or in other words, check for two digits number like 12

[0-9]{3} => check for 3 occurrences of a number or in other words check for three digits number like 123

[0-9]{4} => check for 4 digits number like 1234

[0-9]{1,4} => check for 1 to 4 digits number.

[a]{1} => check for 1 occurrence of the character ‘a’. This regex will match a

[a]{2} => check for 2 occurrences of the character ‘a’. This regex will match aa

[a]{3} =>check for 3 occurrences of the character ‘a’. This regex will match aaa

[a]{4} => check for 4 occurrences of the character ‘a’. This regex will match aaaa

[a]{1,4} =>check for 1 to 4 occurrences of the character ‘a’. This regex will match a,aa,aaa,aaaa

[a-z]{2} => check for 2 occurrences of a lower case letter. This regex will match aa, bb, cc etc

[A-Z]{3} => check for 3 occurrences of a upper case letter. This regex will match AAA, BBB, CCC etc

[a-zA-Z]{2} => check for 2 occurrences of a letter (doesn’t matter whether it is upper case or lower case). This regex will match aa, aA, Aa, AA etc

[a-zA-Z]{1,4} => check for 1 to 4 occurrences of a letter (doesn’t matter whether it is upper case or lower case). This regex will match aaaa, AAAA, aAAA, AAAa etc

(rock){1} => check for 1 occurrence of the string ‘rock’. This regex will match: rock

(rock){2} => check for 2 occurrence of the string ‘rock’. This regex will match: rockrock

(rock){3} => check for 3 occurrence of the string ‘rock’. This regex will match: rockrockrock

(rock){1,4} => check for 1 to 4 occurrence of the string ‘rock’. This regex will match: rock, rockrock, rockrockrock, rockrockrockrock

Metacharacter – White Spaces

To create white space in a regular expression, just use the white space. For e.g.

/(Himanshu Sharma)/ => Check for the string ‘Himanshu Sharma’

/Himanshu Sharma/ => Check for the string ‘Himanshu Sharma’

Inverting Regex in JavaScript

Inverting a regex means inverting its meaning.

You can invert a regex in JavaScript by using positive and negative lookaheads.

Use positive lookahead if you want to match something that is followed by something else.

Use negative lookahead if you want to match something not followed by something else.

Positive Lookahead starts with (?= and ends with )

Negative Lookahead starts with (?! and ends with )

For example, the regex de\/[^a-z] will match all those pages in the de/ folder whose name doesn’t start with a lower case letter:

/de/1london-school
/de/?productid=423543

but will not match:

/de/school/london

The invert of this regular expression would be: match all those pages in the de/ folder whose name starts with a lower case letter:

For example: the regex de\/(?![^a-z]) will match:

/de/school/london

but will not match:

/de/1london-school
/de/?productid=423543

Note: JavaScript only supports lookaheads and not lookbehind. Google Analytics doesn’t support either lookahead or lookbehind.

More Regex Examples

^(*\.html)$ => Check for any number of characters before .html and store them in a variable.

^dog$ => Check for the string ‘dog’

^a+$ => Check for one or more occurrences of a lower case letter ‘a’

^(abc)+$ => Check for one or more occurrences of the string ‘abc’.

^[a-z]+$ => Check for one or more occurrences of a lower case letter.

^(abc)*$ => Check for any number of occurrences of the string ‘abc’.

^a*$ => Check for any number of occurrences of the lower case letter ‘a’

#. Find all the files which start from ‘elearning’ and which have the ‘.html’ file extension

^elearning* \.html$

#. Find all the PHP files

^*\.php$

Advantages of using REGEX in Google Tag Manager

There are many cases where regular expressions are very useful in Google Tag Manager.

Some such cases are:

Setting up complex triggers in GTM.
Using the regex table variable in Google Tag Manager.
Using regex in a custom JavaScript variable.

#1 Setting up complex triggers in GTM:

#2 Using the regex table variable in Google Tag Manager.

#3 Using regex in a custom JavaScript variable.

You can use regex in custom JavaScript variables like when Tracking Site Search without Query Parameter in Google Tag Manager:

regex guide append query parameter to search pages

Advantages of using REGEX in Universal Analytics

There are many cases where regular expressions are very useful in GA3. Some such cases are:

Setting up a goal which can match multiple-goal pages instead of one.
Setting up a funnel in which a funnel step can match multiple pages instead of one.
Excluding traffic from an IP address range via filters.
Setting up complex custom segments.
Understanding the commercial value of long-tail keywords.
Rewriting URLs in GA3 reports.
Filtering data based on complex patterns within the GA3 reporting interface.
Finding referrer spam in Google Analytics.
Blocking spam referrers through the custom advanced filter in Google Analytics.
Using regex while creating content groups in Google Analytics.
Using regex while creating Channel grouping in Google Analytics.
Using regex in the table filter.
Using regex in dashboard widgets.
Using regex while building audiences.
Using regex while tracking site search without query parameter.
Using regex while debugging Google Analytics tracking issues.

#1 Setting up a goal which can match multiple-goal pages instead of one.

You can create one goal that matches multiple pages.

Suppose that after doing a transaction, or generating a lead, your user redirects to a thank you page and that every user has a unique thank you page URL like /product/thank-you/ and product2/thank-you/.

In this case, we can create one goal in Google Analytics for every thank-you page as below

Destination URL matches regex thank\-you\/$

#2 Setting up a funnel in which a funnel step can match multiple pages instead of one.

A proper sales funnel has multiple pages.

For example, in a standard sales funnel, a user lands on a home page which is the first step of the funnel.

After landing on the home page, the user may go to various categories in search of specific products.

In this case, different category pages will have different URLs, and if you want to add all your category URLs into the sales funnels, the regex should be your first option to go with.

In fact, when you set up a funnel, all URLs are treated as regular expressions:

#3. Excluding traffic from an IP address range via filters

Big organizations generally own a range of IP addresses.

Therefore to exclude an organization’s internal traffic, you need to specify an IP range using a regex:

In fact, many filters require regular expressions.

#4 Setting up complex custom segments.

In Google Analytics, a segment allows you to narrow down data in a large dataset.

If your segment contains multiple conditions or logic, you can use regex to define all those conditions in the segment builder.

For example, you can create a segment that filters out branded keywords.

#5. Understanding the commercial value of long-tail keywords:

Long-tail keywords give you extra benefits such as less competition, good amounts of traffic, as well as higher conversions.

You can create a segment for the long tail keywords using regular expressions in Google Analytics.

For example, you can create long-tail keyword segments using the following regular expressions

^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){0}$ =>Filter 1 word keyword phrase

^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){1}$ =>Filter 2 word keyword phrase

^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){2}$ =>Filter 3 word keyword phrase

^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){3}$ =>Filter 4 word keyword phrase

^[^\.\s\-]+([\.\s\-]+[^\.\s\-]+){4}$ =>Filter 5 word keyword phrase

#6. Rewriting URLs in GA3 reports.

Regex is commonly used when rewriting URLs in GA3 reports.

For example, you can use the following custom advanced filter to append the hostname to the request URI:

Note: You can also rewrite URLs in GA3 reports with the ‘search and replace‘ advanced filter.

This comes in handy when your website has very long ugly dynamic URLs, and you can’t figure out what the page is all about just by looking at its URL.

For example, with the ‘Search & Replace‘ advanced filter, you can ask GA to report the following URL:

https://www.abc.com/fder/?catg=2341&pid=428

https://www.abc.com/outdoor/fleeces

#7. Filtering data based on complex patterns within the GA3 reporting interface.

For example, the following regex can segment all the traffic that comes from social media websites:

twitter\.com|facebook\.com|linkedin\.com|plus\.google\.com|t\.co|bit\.ly|reddit\.com

#8 Finding referrer spam in Google Analytics.

For example, you can use the following regex (not foolproof) to filter out all the spam referrers in the ‘Referrals’ report:

button|ilovevitaly|darodar|hulfingtonpost|ranksonic|[0-9]{1,3}\.[0-9]{1,3}|website

#9 Blocking spam referrers through the custom advanced filter in Google Analytics.

For example, the following custom exclude filter can block all of the traffic from spam referrers you identified:

#10 Using regex while creating content groups in Google Analytics:

#11 Using regex while creating Channel grouping in Google Analytics:

#12 Using regex in the table filter

In GA3, standard and custom reports are in table format.

And you will find a filter option where you can exclude or include data.

You also have the flexibility to use the advanced filter and use regular expressions within the advanced filter.

Here I used multiple regex expressions to include and exclude the pages I wanted.

#13 Using regex in dashboard widgets

You can also use regular expressions in the GA3 dashboard widgets while creating a dashboard for a specific data set:

#14 Using regex while building audiences

You can use regular expressions while creating audiences for remarketing and targeting a specific group of customers.

#15 Using regex while Tracking Site Search without Query Parameter.

#16 Using regex while debugging Google Analytics tracking issues.

Testing Regular Expressions (REGEX)

Whether you consider yourself a beginner or advanced in the use of regex, you should always test your regular expressions.

You can test regular expressions through the following:

RegExp Tester chrome extension
Regex101.com online tool
The advanced table filter on the reporting interface in GA3 with the Regex option
The preview feature of your Custom Segment in GA3
GTM debug console window for testing regex used in triggers and variables.
Using RegExpObject to test regex in GTM during run time.

Testing Regex Method #1: RegExp Tester chrome extension

RegExp Tester is a chrome extension which is used to create and validate regular expressions (or regex):

Here the highlighted search result (i.e. optimize smart) is the pattern that matches my regex.

Here my regex job is to filter out two words keyword phrases.

Testing Regex Method #2: Regex101.com online tool

Regex101.com is an online tool used for creating and testing regular expressions.

Following is the interface of the ‘Regex101’ tool:

Note: Use the ‘ECMAScript (JavaScript)’ flavour as Google Analytics accept JavaScript regular expressions.

Testing Regex Method #3: Advanced table reporting filter in GA3

You can create and test regex in GA3 by using the advanced table filter on the reporting interface with the Regex option:

Testing Regex Method #4: Preview feature of Custom Segment in GA3

You can create and test regex in GA3 by using the preview feature of your custom segment:

Testing Regex Method #5: GTM debug console window

For GTM, you can use the debug console window to test the regex used in triggers and variables:

Testing Regex Method #6: Using ‘RegExp’ to test regex in GTM during run time

RegExp is a regular expression object which is used to store a regular expression in JavaScript.

For example:

var regex = /^\/search\/(.*)/;

Here,

‘regex’ (as in var regex) is a regular expression object which is used to store the regular expression “/^\/search\/(.*)/“

‘test’ and ‘exec’ Methods of the ‘RegExp’ object

Both ‘test’ and ‘exec’ are the methods of the ‘RegExp’ object and are often used in Google Tag Manager to test regular expressions using run time.

‘test’ method is used to test for a match in a string.

It returns a boolean value: ‘true’ if it finds a match otherwise, it returns ‘false’

Syntax: RegExpObject.test(string to be searched)

For example:

function() {
  var regex = /^\/search\/(.*)/;
  var pagePath = '/search/enhanced ecommerce tracking/';
  if(regex.test(pagePath) 
  {
  var searchTerm = regex.exec(pagePath)[1];
  var NewUri = "/search/?s=" + searchTerm;
  return NewUri;
  }
  return false;
}

‘exec’ method (as in regex.exec) also test for a match in a string.

But unlike ‘test’, it returns the array which contains the matched text, if it finds the match.

Otherwise, it returns NULL.

Syntax: RegExpObject.exec(string to be searched)

‘exec’ method returns an array of all matched text.

So for the regex ^\/search\/(.*) and pagePath = ‘/search/enhanced ecommerce tracking/’

The regex.exec(pagePath) = [‘/search/enhanced ecommerce tracking/’, ‘enhanced ecommerce tracking/’];

The regex.exec(pagePath)[0] = [‘/search/enhanced ecommerce tracking/’];

The regex.exec(pagePath)[1] = [‘enhanced ecommerce tracking/’];

So when we use regex.exec(pagePath)[1] we can extract the search string from the request URI.

Using regex with mod_rewrite and configuration directives

In order to block referrer spam in GA3 and use regex for other purposes (like SEO), you will need a good understanding of mod_rewrite, configuration directives (like RewriteEngine) and .htaccess file.

About mod_rewrite

It is a module (function) written in the ‘C’ programming language: ‘mod_rewrite.c‘.

This module works only with Apache server 1.2 or later and is called from the .htaccess file (ASCII file, which contains configuration directives and rules for files and folders).

Through this module, you can:

Re-Write URLs
Redirect URLs
Solve Canonical URL issues
Solve Hotlinking issues
Block visitors from accessing a particular folder, file or the whole website.
Create custom 403 and 404 pages.
Deliver content on the basis of the IP address and benefits are endless.
Block referrer spam in Google Analytics.

Types of Configuration Directives

There are 9 types of configuration directives:

RewriteEngine
RewriteOptions
RewriteLog
RewriteLogLevel
RewriteLock
RewriteMap
RewriteBase
RewriteRule
RewriteCond

But here we will talk about only three directives:

RewriteEngine
RewriteRule
RewriteCond.

I have not found any good use of other directives in the context of Google Analytics.

RewriteEngine

This configuration directive is used to enable or disable the mod-rewrite module.

Syntax: RewriteEngine on/off

Default Value: RewriteEngine off

That’s why in the .htaccess file we first enable the mod-rewrite module by adding the following code:

Options +FollowSymLinks
RewriteEngine on

RewriteRule

This configuration directive tells the server to interpret the given statement as a rule.

Syntax: RewriteRule <pattern> <substitution> [FLAGS]

Here pattern is a regular expression and substitution is a URL.

FLAGS can be [R], [F], [NC], [QSA], [L], [OR] etc.

[R] => Redirect. It’s default value is 302. It can be assigned any number from 300 to 400. For e.g.

RewriteRule ^index\.html$ /index.php [r=301]

[F] => Forbidden. It is generally used with a hyphen (-). The hyphen tells the server not to perform any substitution. This flag tells the server not to fulfill the request and return the ‘403’ response code. For e.g.

RewriteRule ^product-price\.php$ -[F]

[NC] => It tells the server to ignore uppercase or lowercase when checking for patterns. For e.g.

RewriteRule ^him*\.php$ [nc]

[QSA] => Query String append. It tells the server to pass query string from the original URL to the new URL.

[L] => Last rule. This tag tells the server not to process any more rules.

[OR] => Logical OR. This flag is used as a logical OR for RewriteCond statements.

RewriteCond

This configuration directive tells the server to interpret the given statement as a condition for the rule which immediately follows it.

Syntax:

Here first mod-rewrite matches each URL with the given pattern.

If no URL matches the pattern, then mod_rewrite process the next rule.

If a URL matches the pattern, then mod_rewrite looks for the corresponding RewriteCond.

If no corresponding RewriteCond exist, then the matched URL is replaced by the substitution.

If the corresponding RewriteCond exist, then each RewriteCond is processed in the order they appear from top to bottom.

Each RewriteCond is processed by matching its test string against its corresponding condition pattern.

If the test string doesn’t match with its condition pattern, then mod_rewrite process the next rule, otherwise it processes the next RewriteCond.

When all RewriteConds are successfully processed, then the matched URL is replaced by the substitution.

A test string can be:

1. A simple text
2. RewriteRule back reference
3. RewriteCond back reference
4. Server Variable

RewriteRule Back Reference

It is of the form $N, where N can be any number from o to 9. It is used to denote that variable that was created in the RewriteRule pattern. For e.g.

RewriteRule ^(.*)$ /index.php/$1 [L]

RewriteCond Back Reference

It is of the form %N, where N can be any number from 1 to 9. It is used to denote that variable that was created in the ‘condpattern’ from the last matched ‘RewriteCond’. For e.g.

RewriteCond %{HTTP_HOST} ^(123\.42\.162\.7)$

RewriteCond %1 ^123\.42\.162\.7$

RewriteRule ……………..

Server Variable

Syntax: % {Variable_Name}

E.g.

1. %{HTTP_HOST} – This variable gives information about the server name and its IP address.

2. %{HTTP_USER_AGENT} – This variable gives information about the user’s operating system and browser.

3. %{QUERY_STRING} – This variable returns the query string.

4. %{HTTP_REFERER} – This variable returns the URL of the referer.

5.%{REMOTE_ADDR} -This variable returns the IP address of the referer.

About .htaccess File

It is an ASCII file that contains configuration directives and rules for files, folders, and the whole website.

You can have more than one .htaccess file on a server. In fact, you can have one .htaccess file per folder/directory.

When you put the file in a directory, the rules mentioned in it are applicable only to all the files and sub-directories in the directory.

When you put the file in the root directory, the rules mentioned in it are applicable to all the files and directories on the server.

A .htaccess file must contain the following two lines:

Options +FollowSymLinks
RewriteEngine on

How to Block Referrer Spam in GA3 via Regex and RewriteCond

Once you have identified spam referrers, block them from visiting your website again.

Since the bot visit is recorded in your server log, you can block such bots through the .htaccess file (or equivalent).

Following are the various methods you can use to block referrer spam:

Block the referrer used by a spambot
Block the IP address used by the spam bot
Block the IP address range used by a spam bot
Block the user agents used by spambots

Method #1: Block the referrer used by a spam bot

Access your .htaccess file and add the following code to block all http and https referrals from a spambot like “blackhatworth.com” and all subdomains of “blackhatworth.com“:

RewriteEngine On

Options +FollowSymlinks

RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*blackhatworth\.com\ [NC,OR]

RewriteRule .* – [F]

Create a similar code to block the referrer used by other spambots.

Method #2 Block the IP address used by the spam bot

Access your .htaccess file and add code like the one below:

RewriteEngine On

Options +FollowSymlinks

Order Deny,Allow

Deny from 234.45.12.33

Note: Do not copy-paste this code into your .htaccess, it won’t work. This is just an example to show you how to block an IP address in the .htaccess file. Spambots can come from many different IP addresses. So you need to keep adding IP addresses used by the spambots affecting your website.

Method #3: Block the IP address range used by a spam bot

If you are sure that a particular range of IP addresses is being used by spam bots then you can block the whole IP address range like the one below:

RewriteEngine On

Options +FollowSymlinks

Deny from 76.149.24.0/24

Allow from all

Here 76.149.24.0/24 is a CIDR range.

CIDR is a method used for representing a range of IP addresses.

Blocking by CIDR is more effective than blocking by individual IP addresses as it takes less space on your server.

Method #4: Block the user agents used by a spam bot

Go through your server log files once in a week and find and ban malicious user agents (user agents used by spambots).

Blocked user agents can not access your website.

You can block rogue user agents like the one below:

RewriteEngine On

Options +FollowSymlinks

RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]

RewriteRule .* – [F,L]

A simple search on Google can give you a big list of several websites that maintain records of known rogue user agents.

Other use cases of Regex (Regex in SEO)

Besides Google Analytics and Google Tag Manager, regex is widely used in Search Engine Optimization (SEO).

The following are the advantages of using regex in SEO:

1. You can convert long ugly dynamic URLs into SEO-friendly URLs.
2. You can apply the correct redirects.
3. Prevent people from hotlinking your images
4. Block spam bots
5. Resolve canonical URL issues
6. Resolve duplicate content issues (to an extent)
7. Deliver geo-specific content based on the IP address

Example-1: Redirect all requests for pages in the media folder to a new page ‘media.html’.

RewriteRule ^media/$ /media.html [r=301,l]

Example-2: Redirect oldaddress.html page to newaddress.html page

RewriteRule ^oldaddress\.html$ /newaddress.html [r=301,l]

Example-3: Redirect one website to another

Redirect 301 https://www.anotherwebsite.com

Example-4: Redirect abc.com/index.html to www.abc.com

RewriteCond %{REQUEST_URL} ^index\.html$
RewriteRule ^(.*)$ https://www.abc.com/$1 [r=301, l]

Example-5: Block a visitor from the IP address 12.34.56.78 to view your file product-prices.html

RewriteCond %{REMOTE_ADDR} ^12\.34\.56\.78$
RewriteRule ^product-prices\.html$ /sorry.html -[F]

Example-6: Block a visitor from the IP address 12.34.56.78 to view your folder ‘sales-demo’

RewriteCond %{REMOTE_ADDR} ^12\.34\.56\.78$
RewriteRule ^sales-demo/$ /sorry.html -[F]

Example-7: Block a visitor from the IP address 12.34.56.78 to view your website www.abc.com

RewriteCond %{REMOTE_ADDR} ^12\.34\.56\.78$
RewriteRule ^.*$ / -[F]

Example-8: Apply 301 from one file to another file

Redirect 301 /file1.html https://www.mywebsite.com/file2.html

The above code will permanently redirect file1.html to file2.html. So whenever a search engine or a visitor will look for file1.html, he will automatically be redirected to file2.html.

Example-9: Convert Dynamic URL into Static Looking SEO friendly URL

RewriteCond % {QUERY_STRING} ^keyval\=25\&Keyval2\=62$ [nc]

RewriteRule ^productdescription.php$ https://www.example.com/whiteboard-accessories.php? [r=301, l]

This code will redirect https://www.example.com/productdescription.php?keyval=25&keyval2=62 to https://www.example.com/whiteboard-accessories.php

Note: You need to put a question mark (?) at the end of the substitution URL, otherwise query string will be appended at the end of the substitution URL.

Example-10: Redirect non-www to www

rewritecond %{http_host} ^mywebsite.com [nc]
rewriterule ^(.*)$ https://www.mywebsite.com/$1 [r=301,nc]

Note: Replace ‘mywebsite’ by your website name

Example-11: Create a Custom 404 page

Create a web page that you want to display as your custom 404 page say custom404.php and then upload your webpage to the root directory. Now add the following code to your .htaccess file:

Options +FollowSymLinks
RewriteEngine on
ErrorDocument 404 https://www.mywebsite.com/custom404.php

Example-12: Block an IP address from accessing your website

Add the following code in your .htaccess file:

Options +FollowSymLinks
RewriteEngine on
Order Deny, Allow
Deny from 61.16.153.67

If you want to block two or more IP addresses:

Options +FollowSymLinks
RewriteEngine on
Order Deny, Allow
Deny from 61.16.153.67
Deny from 124.202.86.42

Example-13: Resolve the Hot Linking Issue

Hot-linking means direct linking to your website file (images, videos, etc). By preventing hot-linking, you can save your server bandwidth.

Add the following code in your .htaccess file:

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^https://(.+\.)?mywebsite\.com/ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*\.(jpg|jpeg|gif|bmp|png|swf)$ – [F]

Replace ‘mywebsite’ by your website name and then use a hotlinking checker tool to find out whether your files (images, videos, etc ) can be hot-linked or not.

Example-14: Enable proxy caching for static resources

Add following code to your .htaccess file

<FilesMatch “\.(gif|jpe?g|png)$”>
Header set Cache-Control “public”
</FilesMatch>

Note: To learn more about regular expressions: https://www.regular-expressions.info/

My best selling books on Digital Analytics and Conversion Optimization

Maths and Stats for Web Analytics and Conversion Optimization

Master the Essentials of Email Marketing Analytics

Attribution Modelling in Google Analytics and Beyond

Attribution Modelling in Google Ads and Facebook

Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.

Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.

Attribution Modelling in Google Analytics and BeyondSECOND EDITION OUT NOW!
Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.

Attribution Modelling in Google Ads and Facebook
This book has been written to help you implement attribution modelling in Google Ads (Google AdWords) and Facebook. It will teach you, how to leverage the knowledge of attribution modelling in order to understand the customer purchasing journey and determine the most effective marketing channels for investment.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Google Analytics 4 Regex (Regular Expressions) Tutorial

Regular Expressions Cheat Sheet For Google Analytics And Google Tag Manager (62 Pages)

What is a Regular Expression in GA4 (Google Analytics 4)?

Google Analytics 4 uses JavaScript regex.

Regular Expressions Cheat Sheet For Google Analytics And Google Tag Manager (62 Pages)

GA4 regex is made up of characters and metacharacters.

Google Analytics 4 uses fully matched Regex

How to correctly create regex in GA4

Your regex in GA4 will not work if you don’t understand this.

Regular Expressions Cheat Sheet For Google Analytics And Google Tag Manager (62 Pages)

How to create regex fast in GA4

How to test regex in GA4?

Where can you use regex in GA4?

#1 Setting up subproperties in GA4

#2 Setting up site search tracking without query parameters

#3 Setting up Referral Exclusion in GA4

#4 Setting up data filters in the exploration report in GA4

#5 Setting up GA4 Custom Events via GTM

#6 Setting up Content groups in GA4

#7 Setting up audiences in GA4

#8 Creating and modifying events in the GA4 UI.

Understanding the various metacharacters used in GA4 regex

Metacharacter – Forward Slash

/shop/

/^[a-z]+$/

/colou?r/

Metacharacter – Back Slash

How to make forward slash a regular character?

How to make ‘?’ a regular character?

Metacharacter – Caret ^

/^\/Colou?r/ => Check for a pattern which starts with ‘/Color’ or ‘/Colour’.

/^[nN]ov(ember)? 28(th)?$/

/^\/elearning\.html/ => Check for a pattern which starts with ‘/elearning.html’.

/^\/.*\.php/ => Check for a pattern which starts with any file with .php extension.

/^\/product-price\.php/ => Check for a pattern which starts with ‘/product-price.php’.

/[^a]/ => Check for any single character other than the lowercase letter ‘a’.

/[^B]/ = > Check for any single character other than the uppercase letter ‘B’.

/[^1]/ => Check for any single character other than the number ‘1’.

/[^ab]/ => Check for any single character other than the lowercase letters ‘a’ and ‘b’.

/[^aB]/ => Check for any single character other than the lower case letter ‘a’ and uppercase letter ‘B’.

/[^1B]/ => Check for any single character other than the number ‘1’ and uppercase letter ‘B’

/[^Dog]/ => Check for any single character other than the following: uppercase letter ‘D’, lowercase letter ‘o’ and the lowercase letter ‘g’.

/[^123b]/ => Check for any single character other than the following characters: number ‘1’, number ‘2’, number ‘3’ and lowercase letter ‘b’.

/[^1-3]/ => Check for any single character other than the following: number ‘1’, number ‘2’ and number ‘3’.

/[^0-9]/ => Check for any single character other than the number.

/[^a-z]/ => Check for any single character which is not a lowercase letter.

/[^A-Z]/ => Check for any single character which is not an upper case letter.

Metacharacter – Dollar $

Metacharacter – Square Bracket []

Metacharacter – Parenthesis ()

Metacharacter – Question Mark ?

/[a]?/ => Check for zero or one occurrence of the lowercase letter ‘a’.

/[dog]?/ => Check for zero or one occurrence of the lowercase letter ‘d’, ‘o’ or ‘g’.

/[^dog]?/ => Check for zero or one occurrence of a character which is not the lowercase letter ‘d’, ‘o’ or ‘g’.

/[0-9]?/ => Check for zero or one occurrence of a number.

/[^a-z]?/ => Check for zero or one occurrence of a character which is not a lowercase letter.

Metacharacter – Plus +

/[a]+/ => Check for one or more occurrences of the lowercase letter ‘a’.

/[dog]+/ => Check for one or more occurrences of letters ‘d’, ‘o’ or ‘g’ (in any order).

/[548]+/ => Check for one or more occurrences of numbers ‘5’, ‘4’ or ‘8’ (in any order).

/[0-9]+/ => Check for one or more occurrences of a number.

/[a-z]+/ => Check for one or more occurrences of a lowercase letter.

/[^a-z]+/ => Check for one or more characters which are not lowercase letters.

/[a-zA-z]+/ => Check for one or more occurrences of uppercase and lowercase letters.

/[a-z0-9]+/ => Check for one or more occurrences of lowercase letters and numbers.

/[A-Z0-9]+/ => Check for one or more occurrences of uppercase letters and numbers.

/[^9]+/ => Check for one or more occurrences of characters but not the number 9.

/31+/ => Check for one or more occurrences of the numbers 3 and 1 in sequence.

Metacharacter – Multiply *

/[a]*/ => Check for zero or more occurrences of the lowercase letter ‘a’.

/[dog]*/ => Check for zero or more occurrences of letters ‘d’, ‘o’ or ‘g’ (in any order).

/[548]*/ => Check for zero or more occurrences of numbers ‘5’, ‘4’ or ‘8’ (in any order).

/[0-9]*/ => Check for zero or more occurrences of a number.

/[a-z]*/ => Check for zero or more occurrences of a lowercase letter.

/[^a-z]*/ => Check for zero or more characters which are not lowercase letters.

/[a-zA-z]*/ => Check for zero or more occurrences of uppercase and lowercase letters.

/[a-z0-9]*/ => Check for zero or more occurrences of lowercase letters and numbers.

/[A-Z0-9]*/ => Check for zero or more occurrences of uppercase letters and numbers.

/[^9]*/ => Check for zero or more occurrences of characters but not the number 9.

/31*/ => Check for zero or more occurrences of the numbers 3 and 1 in sequence.