Getting Started with Regex: A Practical Reference Guide

Regex, short for regular expressions, is a way of finding patterns in text. It can be used to check whether text matches a certain format, extract part of a string, replace unwanted characters, or split messy text into useful pieces.

Regex can look intimidating at first because it uses a lot of symbols, but most patterns are built from a fairly small set of building blocks. Once you understand what those building blocks mean, regex becomes much less mysterious, and can be incredibly useful.

Use Case	Example
Validate a format	Check whether an ID, postcode or email address follows the expected structure
Extract text	Pull the domain from an email address
Clean text	Remove punctuation, extra spaces or unwanted characters
Find repeated patterns	Identify duplicated words such as `the the`
Split strings	Separate dates, codes, names or categories into useful parts

Regex can appear in lots of different tools and languages. The exact syntax can vary slightly, so it is always worth checking the documentation for the tool you are using, but the basic ideas are usually very similar.

Tool	Where Regex Might Appear
Alteryx	`REGEX_Match`, `REGEX_Replace`, `REGEX_CountMatches`, `REGEX_Parse`
Python	`re.search()`, `re.findall()`, `re.sub()`, `re.match()`
SQL	Functions such as `REGEXP_LIKE`, `REGEXP_REPLACE` or similar, depending on the SQL version
Tableau / Tableau Prep	Functions such as `REGEXP_MATCH`, `REGEXP_EXTRACT`, and `REGEXP_REPLACE` can be used to match, extract or replace text patterns
Text editors	Find and replace tools often support regex for more flexible searching
Power Query	Regex is not as directly built in as it is in some other tools, but similar text cleaning can often be done with text functions or custom approaches

Matching Different Types of Characters

At its simplest, regex matches characters. You can match exact text by typing the text you want to find. For example, the pattern below would find the letters “cat” in a string.

cat

Regex becomes more powerful when you use special characters to describe the type of character you are looking for. For example, you can search for any digit, any whitespace character, or any uppercase letter.

Pattern	Meaning	Example Match
`.`	Any single character, except usually a new line	`a`, `7`, `!`
`\d`	Any digit	`0` to `9`
`\w`	Any word character, usually letters, numbers and underscore	`A`, `7`, `_`
`\s`	Any whitespace character	Space, tab or new line
`\t`	A tab character	A tab space
`\n`	A new line character	A line break

Square brackets create a character set. This means “match one character from this set”. For example, you could use a set to match any vowel, any uppercase letter, or anything except a number.

One slightly confusing thing is that some symbols behave differently depending on where they are used. For example, ^ means “start of string” when it appears outside square brackets, but inside square brackets it means “not”.

Also, be careful not to use [A-z] in place of [A-Za-z], as this will include characters that sit between Z and a in the ASCII/Unicode system.

Pattern	Meaning	Example Match
`[A-Z]`	Any uppercase letter	`A`, `B`, `C`
`[a-z]`	Any lowercase letter	`a`, `b`, `c`
`[A-Za-z]`	Any uppercase or lowercase letter	`A`, `b`, `Z`
`[0-9]`	Any digit from 0 to 9	`4`
`[aeiou]`	Any vowel from the set	`a`, `e`, `i`
`[^aeiou]`	Anything except a lowercase vowel	`E`,`Q`, `w`, `!`
`[^0-9]`	Anything except a digit	`A`, `!`, space

Controlling How Many Characters Match

Once you have described what type of character you want, you often need to say how many of them should appear. These blocks are called quantifiers.

A useful way to think about regex is:

type of character + number of characters

For example, you might want exactly four digits, one or more letters, or an optional space.

Pattern	Meaning	Example
`+`	One or more	`\d+` matches `7` or `123`
`*`	Zero or more	`A*` matches no As, one A, or many As
`?`	Optional / zero or one	`colou?r` matches `color` and `colour`
`{3}`	Exactly 3	`\d{3}` matches exactly three digits
`{2,4}`	Between 2 and 4	`\d{2,4}` matches two, three or four digits
`{2,}`	2 or more	`\d{2,}` matches at least two digits

You can combine character types and quantifiers to build useful patterns:

Regex	Meaning
`\d{4}`	Four digits
`[A-Z]{2}`	Two uppercase letters
`[A-Za-z]+`	One or more letters
`\w+`	One or more word characters
`\s?`	An optional whitespace character

Start, End and Word Boundaries

By default, regex often looks for a pattern anywhere in the string. For example, a pattern for four digits might find 2026 inside a longer piece of text.

That can be useful if you are extracting values, but it is less useful if you are validating whether the whole field matches a specific format. Anchors let you control where the match should happen.

Pattern	Meaning	Example
`^`	Start of string	`^Hello` matches text that starts with Hello
`$`	End of string	`world$` matches text that ends with world
`\b`	Word boundary	`\bcat\b` matches cat as a whole word

This distinction is very useful when validating formats.

\d{4} finds four digits somewhere.

^\d{4}$ only matches if the whole string is exactly four digits long.

The word boundary pattern, \b, is useful when you want to match a whole word rather than a sequence of letters inside another word.

It does not match a letter, space or punctuation mark itself. Instead, it matches the position where a word character meets a non-word character, such as the edge between a word and a space, punctuation mark, or the start/end of the string.

For example, you might want to match cat as a complete word, but not the cat inside scatter or category.

Regex	Matches	Does Not Match
`\bcat\b`	`cat`	`scatter`, `category`

Groups, OR and Backreferences

Brackets can be used to group part of a regex pattern. Groups are useful when you want to apply logic to one section of a pattern, such as choosing between two options. They are also useful when you want to refer back to something you have already matched.

Pattern	Meaning	Example
`()`	Creates a group	`(cat)` groups the word cat
`\|`	OR	`cat\|dog` matches cat or dog
`\1`	Refers back to the first captured group	`(\w+) \1` can find repeated words

For example, the OR symbol lets you match one option or another.

I like (cats|dogs)

Matches I like cats or I like dogs.

Backreferences let you reuse a captured group later in the pattern. This can be useful for finding repeated words. For example:

\b(\w+) \1\b

This can match repeated words such as the the, very very or no no.

Here, (\w+) captures the first word. The \1 then says “match that same thing again”. So if the first group captures the, the backreference looks for the again.

Part	Meaning
`\b`	Start at a word boundary
`(\w+)`	Capture one or more word characters as a group
	Match the space between the words
`\1`	Match the same text captured by the first group
`\b`	End at a word boundary

Lookaheads and Lookbehinds

Lookarounds are used when you want to match something based on what comes before or after it, without including that surrounding text in the result.

For example, you might want to extract the number after a pound sign, but not include the pound sign itself. Or you might want to extract the number before a percentage sign, but not include the percentage sign. A lookahead looks forwards. A lookbehind looks backwards.

Pattern	Name	Meaning
`(?=...)`	Positive lookahead	Match only if this comes next
`(?!...)`	Negative lookahead	Match only if this does not come next
`(?<=...)`	Positive lookbehind	Match only if this came before
`(?<!...)`	Negative lookbehind	Match only if this did not come before

Some useful examples include:

Goal	Regex	Example Result
Find digits after a pound sign	`(?<=£)\d+`	Matches `25` in `£25`
Find digits before a percentage sign	`\d+(?=%)`	Matches `75` in `75%`

Escaping Special Characters

Some characters have special meanings in regex.

For example, a full stop means “any character”, not “a literal full stop”. So if you want to match an actual full stop, you usually need to escape it with a backslash.

. means any character.

\. means a literal full stop.

To Match	Use
A full stop	`\.`
A question mark	`\?`
An opening bracket	`\(`
A closing bracket	`\)`
A plus sign	`\+`
An asterisk	`\*`

Case-Insensitive Matching

Sometimes you want to match text regardless of whether it uses uppercase or lowercase letters. Depending on the tool, this might be handled by a setting. In some regex flavours, you can use a case-insensitive flag in the pattern itself.

Pattern	Meaning	Example Matches
`(?i)cat`	Case-insensitive match, if supported by the tool	`cat`, `Cat`, `CAT`
`[Cc]at`	Match either uppercase or lowercase C	`cat`, `Cat`

If your tool does not support the case-insensitive flag, you may need to use a setting or write the pattern differently.

Greedy and Lazy Matching

Regex quantifiers are usually greedy by default. This means they try to match as much as possible. This can be helpful, but sometimes it means regex grabs more text than you expected. Adding a question mark after the quantifier can make it lazy, meaning it matches as little as possible.

In the following example, we can see how the greedy version and the lazy version differ when looking at text inside and outside of quotation marks.

Pattern	Behaviour	Example Text	Example Match
`".*"`	Greedy: matches as much as possible	`"apple" and "banana"`	`"apple" and "banana"`
`".*?"`	Lazy: matches as little as possible	`"apple" and "banana"`	`"apple"`, then `"banana"`

Useful Example Patterns

Here are a few practical regex patterns using the building blocks above.

Goal	Regex	Example Match
Four digit number	`^\d{4}$`	`2026`
One or more letters	`^[A-Za-z]+$`	`Hello`
Simple UK postcode-style pattern	`^[A-Z]{1,2}\d[A-Z\d]? \d[A-Z]{2}$`	`SW1A 1AA`
Email domain	`(?<=@)[A-Za-z0-9.-]+`	`gmail.com`
Repeated word	`\b(\w+) \1\b`	`the the`
Text after a pound sign	`(?<=£)\d+`	`25` in `£25`
Optional spelling	`colou?r`	`color` or `colour`
Text before a percentage sign	`\d+(?=%)`	`75` in `75%`

The postcode example above is deliberately labelled as postcode-style rather than a perfect postcode validator. Real UK postcodes have more detailed rules, so this is a useful starting pattern rather than a complete validation system.

Final Reference Table

Here is a quick summary of the main symbols.

Regex	Meaning
Character Types
`.`	Any single character
`\d`	Any digit
`\w`	Any word character
`\s`	Any whitespace character
Character Sets
`[A-Z]`	Any uppercase letter
`[^A-Z]`	Anything except an uppercase letter
Quantifiers
`+`	One or more
`*`	Zero or more
`?`	Optional / zero or one
`{3}`	Exactly three
`{2,4}`	Between two and four
Anchors and Boundaries
`^`	Start of string
`$`	End of string
`\b`	Word boundary
Groups and Backreferences
`()`	Group
`\|`	OR
`\1`	Backreference to the first group
Lookarounds
`(?=...)`	Positive lookahead
`(?!...)`	Negative lookahead
`(?<=...)`	Positive lookbehind
`(?<!...)`	Negative lookbehind

Regex is easiest to learn by building patterns in small pieces. Rather than trying to write the whole thing at once, start with one part. Choose the type of character you want, choose how many of it you want, then decide whether the pattern needs to appear anywhere or match the whole string.

If you want to practice Regex to get to grips with it, I've found https://regex101.com/ especially useful for doing exercises and sense checking my patterns.

Author:

Holly Andersen

View Profile