Getting Started with Regex: A Practical Reference Guide

Regex, short for regular expressions, is a way of finding patterns in text. It can be used to check whether text matches a certain format, extract part of a string, replace unwanted characters, or split messy text into useful pieces.

Regex can look intimidating at first because it uses a lot of symbols, but most patterns are built from a fairly small set of building blocks. Once you understand what those building blocks mean, regex becomes much less mysterious, and can be incredibly useful.

Use Case Example
Validate a format Check whether an ID, postcode or email address follows the expected structure
Extract text Pull the domain from an email address
Clean text Remove punctuation, extra spaces or unwanted characters
Find repeated patterns Identify duplicated words such as the the
Split strings Separate dates, codes, names or categories into useful parts

Regex can appear in lots of different tools and languages. The exact syntax can vary slightly, so it is always worth checking the documentation for the tool you are using, but the basic ideas are usually very similar.

Tool Where Regex Might Appear
Alteryx REGEX_Match, REGEX_Replace, REGEX_CountMatches, REGEX_Parse
Python re.search(), re.findall(), re.sub(), re.match()
SQL Functions such as REGEXP_LIKE, REGEXP_REPLACE or similar, depending on the SQL version
Tableau / Tableau Prep Functions such as REGEXP_MATCH, REGEXP_EXTRACT, and REGEXP_REPLACE can be used to match, extract or replace text patterns
Text editors Find and replace tools often support regex for more flexible searching
Power Query Regex is not as directly built in as it is in some other tools, but similar text cleaning can often be done with text functions or custom approaches

Matching Different Types of Characters

At its simplest, regex matches characters. You can match exact text by typing the text you want to find. For example, the pattern below would find the letters “cat” in a string.

cat

Regex becomes more powerful when you use special characters to describe the type of character you are looking for. For example, you can search for any digit, any whitespace character, or any uppercase letter.

Pattern Meaning Example Match
. Any single character, except usually a new line a, 7, !
\d Any digit 0 to 9
\w Any word character, usually letters, numbers and underscore A, 7, _
\s Any whitespace character Space, tab or new line
\t A tab character A tab space
\n A new line character A line break

Square brackets create a character set. This means “match one character from this set”. For example, you could use a set to match any vowel, any uppercase letter, or anything except a number.

One slightly confusing thing is that some symbols behave differently depending on where they are used. For example, ^ means “start of string” when it appears outside square brackets, but inside square brackets it means “not”.

Also, be careful not to use [A-z] in place of [A-Za-z], as this will include characters that sit between Z and a in the ASCII/Unicode system.

Pattern Meaning Example Match
[A-Z] Any uppercase letter A, B, C
[a-z] Any lowercase letter a, b, c
[A-Za-z] Any uppercase or lowercase letter A, b, Z
[0-9] Any digit from 0 to 9 4
[aeiou] Any vowel from the set a, e, i
[^aeiou] Anything except a lowercase vowel E,Q, w, !
[^0-9] Anything except a digit A, !, space

Controlling How Many Characters Match

Once you have described what type of character you want, you often need to say how many of them should appear. These blocks are called quantifiers.

A useful way to think about regex is:

type of character + number of characters

For example, you might want exactly four digits, one or more letters, or an optional space.

Pattern Meaning Example
+ One or more \d+ matches 7 or 123
* Zero or more A* matches no As, one A, or many As
? Optional / zero or one colou?r matches color and colour
{3} Exactly 3 \d{3} matches exactly three digits
{2,4} Between 2 and 4 \d{2,4} matches two, three or four digits
{2,} 2 or more \d{2,} matches at least two digits

You can combine character types and quantifiers to build useful patterns:

Regex Meaning
\d{4} Four digits
[A-Z]{2} Two uppercase letters
[A-Za-z]+ One or more letters
\w+ One or more word characters
\s? An optional whitespace character

Start, End and Word Boundaries

By default, regex often looks for a pattern anywhere in the string. For example, a pattern for four digits might find 2026 inside a longer piece of text.

That can be useful if you are extracting values, but it is less useful if you are validating whether the whole field matches a specific format. Anchors let you control where the match should happen.

Pattern Meaning Example
^ Start of string ^Hello matches text that starts with Hello
$ End of string world$ matches text that ends with world
\b Word boundary \bcat\b matches cat as a whole word

This distinction is very useful when validating formats.

\d{4} finds four digits somewhere.

^\d{4}$ only matches if the whole string is exactly four digits long.

The word boundary pattern, \b, is useful when you want to match a whole word rather than a sequence of letters inside another word.

It does not match a letter, space or punctuation mark itself. Instead, it matches the position where a word character meets a non-word character, such as the edge between a word and a space, punctuation mark, or the start/end of the string.

For example, you might want to match cat as a complete word, but not the cat inside scatter or category.

Regex Matches Does Not Match
\bcat\b cat scatter, category

Groups, OR and Backreferences

Brackets can be used to group part of a regex pattern. Groups are useful when you want to apply logic to one section of a pattern, such as choosing between two options. They are also useful when you want to refer back to something you have already matched.

Pattern Meaning Example
() Creates a group (cat) groups the word cat
| OR cat|dog matches cat or dog
\1 Refers back to the first captured group (\w+) \1 can find repeated words

For example, the OR symbol lets you match one option or another.

I like (cats|dogs)

Matches I like cats or I like dogs.

Backreferences let you reuse a captured group later in the pattern. This can be useful for finding repeated words. For example:

\b(\w+) \1\b

This can match repeated words such as the the, very very or no no.

Here, (\w+) captures the first word. The \1 then says “match that same thing again”. So if the first group captures the, the backreference looks for the again.
Part Meaning
\b Start at a word boundary
(\w+) Capture one or more word characters as a group
Match the space between the words
\1 Match the same text captured by the first group
\b End at a word boundary

Lookaheads and Lookbehinds

Lookarounds are used when you want to match something based on what comes before or after it, without including that surrounding text in the result.

For example, you might want to extract the number after a pound sign, but not include the pound sign itself. Or you might want to extract the number before a percentage sign, but not include the percentage sign. A lookahead looks forwards. A lookbehind looks backwards.

Pattern Name Meaning
(?=...) Positive lookahead Match only if this comes next
(?!...) Negative lookahead Match only if this does not come next
(?<=...) Positive lookbehind Match only if this came before
(?<!...) Negative lookbehind Match only if this did not come before

Some useful examples include:

Goal Regex Example Result
Find digits after a pound sign (?<=£)\d+ Matches 25 in £25
Find digits before a percentage sign \d+(?=%) Matches 75 in 75%

Escaping Special Characters

Some characters have special meanings in regex.

For example, a full stop means “any character”, not “a literal full stop”. So if you want to match an actual full stop, you usually need to escape it with a backslash.

. means any character.

\. means a literal full stop.

To Match Use
A full stop \.
A question mark \?
An opening bracket \(
A closing bracket \)
A plus sign \+
An asterisk \*

Case-Insensitive Matching

Sometimes you want to match text regardless of whether it uses uppercase or lowercase letters. Depending on the tool, this might be handled by a setting. In some regex flavours, you can use a case-insensitive flag in the pattern itself.

Pattern Meaning Example Matches
(?i)cat Case-insensitive match, if supported by the tool cat, Cat, CAT
[Cc]at Match either uppercase or lowercase C cat, Cat

If your tool does not support the case-insensitive flag, you may need to use a setting or write the pattern differently.

Greedy and Lazy Matching

Regex quantifiers are usually greedy by default. This means they try to match as much as possible. This can be helpful, but sometimes it means regex grabs more text than you expected. Adding a question mark after the quantifier can make it lazy, meaning it matches as little as possible.

In the following example, we can see how the greedy version and the lazy version differ when looking at text inside and outside of quotation marks.

Pattern Behaviour Example Text Example Match
".*" Greedy: matches as much as possible "apple" and "banana" "apple" and "banana"
".*?" Lazy: matches as little as possible "apple" and "banana" "apple", then "banana"

Useful Example Patterns

Here are a few practical regex patterns using the building blocks above.

Goal Regex Example Match
Four digit number ^\d{4}$ 2026
One or more letters ^[A-Za-z]+$ Hello
Simple UK postcode-style pattern ^[A-Z]{1,2}\d[A-Z\d]? \d[A-Z]{2}$ SW1A 1AA
Email domain (?<=@)[A-Za-z0-9.-]+ gmail.com
Repeated word \b(\w+) \1\b the the
Text after a pound sign (?<=£)\d+ 25 in £25
Optional spelling colou?r color or colour
Text before a percentage sign \d+(?=%) 75 in 75%

The postcode example above is deliberately labelled as postcode-style rather than a perfect postcode validator. Real UK postcodes have more detailed rules, so this is a useful starting pattern rather than a complete validation system.

Final Reference Table

Here is a quick summary of the main symbols.

Regex Meaning
Character Types
. Any single character
\d Any digit
\w Any word character
\s Any whitespace character
Character Sets
[A-Z] Any uppercase letter
[^A-Z] Anything except an uppercase letter
Quantifiers
+ One or more
* Zero or more
? Optional / zero or one
{3} Exactly three
{2,4} Between two and four
Anchors and Boundaries
^ Start of string
$ End of string
\b Word boundary
Groups and Backreferences
() Group
| OR
\1 Backreference to the first group
Lookarounds
(?=...) Positive lookahead
(?!...) Negative lookahead
(?<=...) Positive lookbehind
(?<!...) Negative lookbehind

Regex is easiest to learn by building patterns in small pieces. Rather than trying to write the whole thing at once, start with one part. Choose the type of character you want, choose how many of it you want, then decide whether the pattern needs to appear anywhere or match the whole string.

If you want to practice Regex to get to grips with it, I've found https://regex101.com/ especially useful for doing exercises and sense checking my patterns.

Author:
Holly Andersen
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2026 The Information Lab