Regular Expressions for Beginners: Complete Regex Tutorial

Regular expressions (regex) are powerful patterns for matching and manipulating text. This guide takes you from zero to writing your own patterns for validation, search, and text processing.

What is Regex?

A regular expression (regex or regexp) is a sequence of characters that defines a search pattern. Think of it as a sophisticated "find" feature that can match complex patterns instead of just exact text.

Common Use Cases

Validation: Check if input matches a format (email, phone, password)
Search: Find patterns in text (all URLs, dates, IP addresses)
Replace: Transform text (reformat dates, clean data)
Extract: Pull specific data from strings (parse logs, scrape content)

Try it yourself: Use our Regex Tester to practice patterns as you learn!

Basic Patterns

Literal Characters

Most characters match themselves literally:

Pattern: cat
Matches: "cat" in "The cat sat on the mat"
         ^^^

The Dot (.)

The dot matches any single character (except newline):

Pattern: c.t
Matches: "cat", "cot", "cut", "c@t", "c1t"
Does NOT match: "ct", "cart"

Escaping Special Characters

Special characters need a backslash to match literally:

Special characters: . * + ? ^ $ { } [ ] \ | ( )

Pattern: 3\.14
Matches: "3.14" (the literal dot)

Pattern: \$100
Matches: "$100" (the literal dollar sign)

Character Classes

Match one character from a set of characters.

Square Brackets [ ]

Pattern: [aeiou]
Matches: Any single vowel

Pattern: [0-9]
Matches: Any single digit (0 through 9)

Pattern: [a-zA-Z]
Matches: Any letter (upper or lower case)

Pattern: [^0-9]
Matches: Any character that is NOT a digit

Shorthand Character Classes

Shorthand	Equivalent	Meaning
`\d`	`[0-9]`	Any digit
`\D`	`[^0-9]`	Any non-digit
`\w`	`[a-zA-Z0-9_]`	Word character
`\W`	`[^a-zA-Z0-9_]`	Non-word character
`\s`	`[ \t\n\r\f]`	Whitespace
`\S`	`[^ \t\n\r\f]`	Non-whitespace

Quantifiers

Specify how many times a pattern should match.

Quantifier	Meaning	Example
`*`	0 or more	`ab*c` matches "ac", "abc", "abbc"
`+`	1 or more	`ab+c` matches "abc", "abbc" (not "ac")
`?`	0 or 1	`colou?r` matches "color", "colour"
`{n}`	Exactly n	`\d{4}` matches "2024"
`{n,}`	n or more	`\d{2,}` matches 2+ digits
`{n,m}`	Between n and m	`\d{2,4}` matches 2-4 digits

Greedy vs Lazy

Text: <div>Hello</div><div>World</div>

Greedy: <.*>
Matches: "<div>Hello</div><div>World</div>" (everything)

Lazy: <.*?>
Matches: "<div>" then "</div>" then "<div>" then "</div>"

Add ? after a quantifier to make it lazy (match as few as possible).

Anchors & Boundaries

Match positions, not characters.

Anchor	Meaning	Example
`^`	Start of string/line	`^Hello` matches "Hello world"
`$`	End of string/line	`world$` matches "Hello world"
`\b`	Word boundary	`\bcat\b` matches "cat" not "cats"
`\B`	Not word boundary	`\Bcat` matches "bobcat"

Text: "the cat scattered the cats"

Pattern: cat
Matches: "cat" (4 times - in cat, scattered, cats)

Pattern: \bcat\b
Matches: "cat" (1 time - only the word "cat")

Groups & Capturing

Parentheses ( )

Group parts of a pattern together:

Pattern: (ab)+
Matches: "ab", "abab", "ababab"

Pattern: (Mr|Mrs|Ms)\.?\s\w+
Matches: "Mr. Smith", "Mrs Jones", "Ms. Lee"

Capturing Groups

Parentheses also "capture" matched text for later use:

Pattern: (\d{4})-(\d{2})-(\d{2})
Text: "2024-03-15"

Group 0 (full match): "2024-03-15"
Group 1: "2024"
Group 2: "03"
Group 3: "15"

Non-Capturing Groups

Pattern: (?:Mr|Mrs|Ms)\.?\s(\w+)
Only captures the name, not the title

Backreferences

Pattern: (\w+)\s+\1
Matches repeated words: "the the", "is is"

Pattern: (['"]).*?\1
Matches quoted strings with matching quotes

Practical Examples

Email Validation

Pattern: ^[\w.-]+@[\w.-]+\.\w{2,}$

Breakdown:
^           Start of string
[\w.-]+     Username: letters, numbers, dots, hyphens
@           Literal @
[\w.-]+     Domain name
\.          Literal dot
\w{2,}      TLD (2+ letters)
$           End of string

Matches: user@toolsdock.com, john.doe@company.co.uk

Phone Numbers (US)

Pattern: ^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$

Matches:
(555) 123-4567
555-123-4567
555.123.4567
5551234567

URL Matching

Pattern: https?://[\w.-]+(?:/[\w./-]*)?

Matches:
https://toolsdock.com/
https://sub.domain.com/page.html

Password Validation

Pattern: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Requirements:
- At least 8 characters
- At least one lowercase letter
- At least one uppercase letter
- At least one digit
- At least one special character

Extract Hashtags

Pattern: #\w+

Text: "Learning #regex is #awesome! #programming"
Matches: #regex, #awesome, #programming

Date Reformatting

Pattern: (\d{2})/(\d{2})/(\d{4})
Replace: $3-$1-$2

Input:  03/15/2024
Output: 2024-03-15

Quick Reference Cheatsheet

Characters

.     Any character
\d    Digit [0-9]
\D    Non-digit
\w    Word char [a-zA-Z0-9_]
\W    Non-word char
\s    Whitespace
\S    Non-whitespace

Quantifiers

*     0 or more
+     1 or more
?     0 or 1
{3}   Exactly 3
{3,}  3 or more
{3,5} Between 3 and 5

Anchors

^     Start of string
$     End of string
\b    Word boundary
\B    Non-word boundary

Groups

(...)   Capturing group
(?:...) Non-capturing
\1      Backreference
|       Alternation (or)

Common Mistakes to Avoid

Forgetting to escape special characters: \. not . for literal dot
Greedy matching: Use .*? instead of .* when needed
Missing anchors: Use ^ and $ for full-string validation
Overcomplicating: Simple patterns are easier to maintain
Not testing edge cases: Always test with various inputs

Practice Tools

Regex Tester

Regex Replacer

Frequently Asked Questions

The dot matches any single character except newline. Pattern 'c.t' matches 'cat', 'cot', 'cut', 'c@t', but not 'ct' (needs exactly one character between c and t) or 'cart' (. matches only one character). To match a literal dot, escape it: \.

* matches zero or more occurrences (optional, can repeat). + matches one or more (required at least once). Pattern 'ab*c' matches 'ac', 'abc', 'abbc'. Pattern 'ab+c' matches 'abc', 'abbc' but NOT 'ac' because at least one 'b' is required.

Use ^ for start and $ for end. Pattern '^Hello' matches 'Hello world' but not 'Say Hello'. Pattern 'world$' matches 'Hello world' but not 'world peace'. Combine them: '^exact$' matches only the exact string 'exact'.

Parentheses () create capturing groups that extract matched text for later use. Pattern '(\d{4})-(\d{2})-(\d{2})' on '2024-03-15' captures: group 1 = '2024', group 2 = '03', group 3 = '15'. Use for extracting parts of matches or backreferences.

Common issues: forgetting to escape special characters (use \. for literal dot), using greedy instead of lazy quantifiers (use .*? instead of .*), missing anchors (^$), or not accounting for whitespace. Test with a regex tester to see exactly what matches.

Greedy quantifiers (*, +, {n,}) match as much as possible. Lazy quantifiers (*?, +?, {n,}?) match as little as possible. On '

text

', pattern '<.*>' greedily matches the entire string, while '<.*?>' lazily matches just '

Use word boundaries \b around the pattern. Pattern '\bcat\b' matches 'cat' in 'the cat sat' but not in 'category' or 'bobcat'. \b matches the position between a word character (\w) and a non-word character.

These are shorthand character classes. \d matches any digit [0-9]. \w matches word characters [a-zA-Z0-9_]. \s matches whitespace (space, tab, newline). Uppercase versions match the opposite: \D (non-digit), \W (non-word), \S (non-whitespace).

Yes, by default regex is case-sensitive. 'Cat' won't match 'cat'. Use the 'i' flag for case-insensitive matching: /cat/i matches 'Cat', 'CAT', 'cat'. In character classes, [a-zA-Z] explicitly matches both cases without flags.