Regular Expressions for Beginners: Complete Regex Tutorial
Regular expressions (regex) are powerful patterns for matching and manipulating text. This guide takes you from zero to writing your own patterns for validation, search, and text processing.
What is Regex?
A regular expression (regex or regexp) is a sequence of characters that defines a search pattern. Think of it as a sophisticated "find" feature that can match complex patterns instead of just exact text.
Common Use Cases
- Validation: Check if input matches a format (email, phone, password)
- Search: Find patterns in text (all URLs, dates, IP addresses)
- Replace: Transform text (reformat dates, clean data)
- Extract: Pull specific data from strings (parse logs, scrape content)
Basic Patterns
Literal Characters
Most characters match themselves literally:
Pattern: cat
Matches: "cat" in "The cat sat on the mat"
^^^
The Dot (.)
The dot matches any single character (except newline):
Pattern: c.t
Matches: "cat", "cot", "cut", "c@t", "c1t"
Does NOT match: "ct", "cart"
Escaping Special Characters
Special characters need a backslash to match literally:
Special characters: . * + ? ^ $ { } [ ] \ | ( )
Pattern: 3\.14
Matches: "3.14" (the literal dot)
Pattern: \$100
Matches: "$100" (the literal dollar sign)
Character Classes
Match one character from a set of characters.
Square Brackets [ ]
Pattern: [aeiou]
Matches: Any single vowel
Pattern: [0-9]
Matches: Any single digit (0 through 9)
Pattern: [a-zA-Z]
Matches: Any letter (upper or lower case)
Pattern: [^0-9]
Matches: Any character that is NOT a digit
Shorthand Character Classes
| Shorthand | Equivalent | Meaning |
|---|---|---|
\d | [0-9] | Any digit |
\D | [^0-9] | Any non-digit |
\w | [a-zA-Z0-9_] | Word character |
\W | [^a-zA-Z0-9_] | Non-word character |
\s | [ \t\n\r\f] | Whitespace |
\S | [^ \t\n\r\f] | Non-whitespace |
Quantifiers
Specify how many times a pattern should match.
| Quantifier | Meaning | Example |
|---|---|---|
* | 0 or more | ab*c matches "ac", "abc", "abbc" |
+ | 1 or more | ab+c matches "abc", "abbc" (not "ac") |
? | 0 or 1 | colou?r matches "color", "colour" |
{n} | Exactly n | \d{4} matches "2024" |
{n,} | n or more | \d{2,} matches 2+ digits |
{n,m} | Between n and m | \d{2,4} matches 2-4 digits |
Greedy vs Lazy
Text: <div>Hello</div><div>World</div>
Greedy: <.*>
Matches: "<div>Hello</div><div>World</div>" (everything)
Lazy: <.*?>
Matches: "<div>" then "</div>" then "<div>" then "</div>"
Add ? after a quantifier to make it lazy (match as few as possible).
Anchors & Boundaries
Match positions, not characters.
| Anchor | Meaning | Example |
|---|---|---|
^ | Start of string/line | ^Hello matches "Hello world" |
$ | End of string/line | world$ matches "Hello world" |
\b | Word boundary | \bcat\b matches "cat" not "cats" |
\B | Not word boundary | \Bcat matches "bobcat" |
Text: "the cat scattered the cats"
Pattern: cat
Matches: "cat" (4 times - in cat, scattered, cats)
Pattern: \bcat\b
Matches: "cat" (1 time - only the word "cat")
Groups & Capturing
Parentheses ( )
Group parts of a pattern together:
Pattern: (ab)+
Matches: "ab", "abab", "ababab"
Pattern: (Mr|Mrs|Ms)\.?\s\w+
Matches: "Mr. Smith", "Mrs Jones", "Ms. Lee"
Capturing Groups
Parentheses also "capture" matched text for later use:
Pattern: (\d{4})-(\d{2})-(\d{2})
Text: "2024-03-15"
Group 0 (full match): "2024-03-15"
Group 1: "2024"
Group 2: "03"
Group 3: "15"
Non-Capturing Groups
Pattern: (?:Mr|Mrs|Ms)\.?\s(\w+)
Only captures the name, not the title
Backreferences
Pattern: (\w+)\s+\1
Matches repeated words: "the the", "is is"
Pattern: (['"]).*?\1
Matches quoted strings with matching quotes
Practical Examples
Email Validation
Pattern: ^[\w.-]+@[\w.-]+\.\w{2,}$
Breakdown:
^ Start of string
[\w.-]+ Username: letters, numbers, dots, hyphens
@ Literal @
[\w.-]+ Domain name
\. Literal dot
\w{2,} TLD (2+ letters)
$ End of string
Matches: user@toolsdock.com, john.doe@company.co.uk
Phone Numbers (US)
Pattern: ^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$
Matches:
(555) 123-4567
555-123-4567
555.123.4567
5551234567
URL Matching
Pattern: https?://[\w.-]+(?:/[\w./-]*)?
Matches:
https://toolsdock.com/
https://sub.domain.com/page.html
Password Validation
Pattern: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Requirements:
- At least 8 characters
- At least one lowercase letter
- At least one uppercase letter
- At least one digit
- At least one special character
Extract Hashtags
Pattern: #\w+
Text: "Learning #regex is #awesome! #programming"
Matches: #regex, #awesome, #programming
Date Reformatting
Pattern: (\d{2})/(\d{2})/(\d{4})
Replace: $3-$1-$2
Input: 03/15/2024
Output: 2024-03-15
Quick Reference Cheatsheet
. Any character
\d Digit [0-9]
\D Non-digit
\w Word char [a-zA-Z0-9_]
\W Non-word char
\s Whitespace
\S Non-whitespace
* 0 or more
+ 1 or more
? 0 or 1
{3} Exactly 3
{3,} 3 or more
{3,5} Between 3 and 5
^ Start of string
$ End of string
\b Word boundary
\B Non-word boundary
(...) Capturing group
(?:...) Non-capturing
\1 Backreference
| Alternation (or)
Common Mistakes to Avoid
- Forgetting to escape special characters:
\.not.for literal dot - Greedy matching: Use
.*?instead of.*when needed - Missing anchors: Use
^and$for full-string validation - Overcomplicating: Simple patterns are easier to maintain
- Not testing edge cases: Always test with various inputs