Regular Expressions Introduction
About 536 wordsAbout 7 min
2025-08-07
Basic Definitions
Alphabet and Regular Expressions
- Alphabet (Σ): A finite set of symbols
- Regular Expression: Built from alphabet symbols using:
- Union (+): r1+r2 matches either r1 or r2
- Concatenation (⋅): r1⋅r2 matches r1 followed by r2
- Kleene Star (∗): r∗ matches zero or more repetitions of r
Formal Definition
A regular expression over alphabet Σ is defined recursively:
Base Cases:
- ∅ is a regular expression denoting the empty language
- ϵ is a regular expression denoting {ϵ}
- For each a∈Σ, a is a regular expression denoting {a}
Inductive Cases:
- If r1 and r2 are regular expressions:
- (r1+r2) is a regular expression
- (r1⋅r2) is a regular expression
- (r1∗) is a regular expression
- If r1 and r2 are regular expressions:
Precedence Rules
When parentheses are omitted, the following precedence applies (highest to lowest):
- Kleene Star (∗)
- Concatenation (⋅)
- Union (+)
Tips
Always use parentheses to clarify precedence when combining operators
Examples
Simple Regular Expressions
Regular Expression | Language Description | Example Strings |
---|---|---|
a | Single 'a' | "a" |
a+b | 'a' or 'b' | "a", "b" |
a⋅b | 'a' followed by 'b' | "ab" |
a∗ | Zero or more 'a's | "", "a", "aa", "aaa", ... |
(a+b)∗ | Any combination of 'a' and 'b' | "", "a", "b", "aa", "ab", "ba", "bb", ... |
More Complex Examples
Regular Expression: (a+b)∗⋅a⋅(a+b)∗
This describes all strings over {a,b} that contain at least one 'a'.
- (a+b)∗ matches any prefix (including empty)
- a matches exactly one 'a'
- (a+b)∗ matches any suffix (including empty)
Fundamental Properties
Union Properties
- Commutative: r1+r2=r2+r1
- Associative: (r1+r2)+r3=r1+(r2+r3)
- Identity: r+∅=r
Concatenation Properties
- Associative: (r1⋅r2)⋅r3=r1⋅(r2⋅r3)
- Identity: r⋅ϵ=ϵ⋅r=r
- Annihilator: r⋅∅=∅⋅r=∅
Kleene Star Properties
- Basic: ∅∗=ϵ∗=ϵ
- Iteration: (r∗)∗=r∗
- Closure: ϵ+r⋅r∗=r∗
Language Operations
Important Language Classes
Finite Languages: Can be expressed using union
- L={w1,w2,...,wn}=w1+w2+...+wn
Infinite Languages: Require Kleene star
- L={an∣n≥0}=a∗
- L={an∣n≥1}=a⋅a∗
Concatenation of Languages
- If L1={ab,a} and L2={b,bc}
- Then L1⋅L2={abb,abbc,ab,abc}
Practical Considerations
Shorthand Notation
In practice, we often use:
- ab instead of a⋅b
- r+ for one or more repetitions: r+=r⋅r∗
- r? for optional elements: r?=ϵ+r
Common Patterns
Pattern | Meaning | Example |
---|---|---|
a∗ | Zero or more 'a's | "", "a", "aa", ... |
a+ | One or more 'a's | "a", "aa", "aaa", ... |
a? | Zero or one 'a' | "", "a" |
an | Exactly n 'a's | "a" (when n=1), "aa" (when n=2) |
Summary
Regular expressions provide a powerful and concise way to describe patterns in strings. By combining the three fundamental operations (union, concatenation, and Kleene star), we can express both finite and infinite languages with remarkable precision.
The key insight is that regular expressions correspond to the class of regular languages, which can be recognized by finite automata, forming an important equivalence in automata theory.
Changelog
2aa48
-web-deploy(Auto): Update base URL for web-pages branchon
Copyright
Copyright Ownership:WARREN Y.F. LONG
License under:Attribution-NonCommercial-NoDerivatives 4.0 International (CC-BY-NC-ND-4.0)