Regular expressions provide a powerful way to search, match, and manipulate text in Java. Java's regex support is primarily through the java.util.regex
package.
Core Classes
- Pattern: Compiled representation of a regular expression
- Matcher: Engine that performs match operations on a character sequence using a pattern
Basic Syntax
Common Metacharacters
-
.
- Any character -
\d
- Digit (equivalent to[0-9]
) -
\D
- Non-digit -
\s
- Whitespace character -
\S
- Non-whitespace character -
\w
- Word character (equivalent to[a-zA-Z_0-9]
) -
\W
- Non-word character -
^
- Beginning of line -
$
- End of line -
\b
- Word boundary
Quantifiers
-
*
- 0 or more occurrences -
+
- 1 or more occurrences -
?
- 0 or 1 occurrence -
{n}
- Exactly n occurrences -
{n,}
- n or more occurrences -
{n,m}
- Between n and m occurrences
Character Classes
-
[abc]
- a, b, or c -
[^abc]
- Any character except a, b, or c -
[a-z]
- Any lowercase letter -
[a-zA-Z]
- Any letter
Basic Usage
1. Pattern Matching
import java.util.regex.*;
public class RegexExample {
public static void main(String[] args) {
String text = "The quick brown fox jumps over the lazy dog";
String patternString = "fox";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
if (matcher.find()) {
System.out.println("Found match at index " + matcher.start() +
" to " + matcher.end());
} else {
System.out.println("No match found");
}
}
}
2. Simple Validation (Email Example)
public class EmailValidator {
public static boolean isValidEmail(String email) {
String regex = "^[\\w.-]+@[\\w.-]+\\.[a-zA-Z]{2,}$";
return email.matches(regex);
}
public static void main(String[] args) {
String email1 = "test@example.com";
String email2 = "invalid.email";
System.out.println(email1 + " is valid? " + isValidEmail(email1));
System.out.println(email2 + " is valid? " + isValidEmail(email2));
}
}
3. Finding Multiple Matches
import java.util.regex.*;
public class MultipleMatches {
public static void main(String[] args) {
String text = "cat dog cat dog cat";
String patternString = "cat";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
int count = 0;
while (matcher.find()) {
count++;
System.out.println("Match " + count + ": " + matcher.group() +
" at " + matcher.start());
}
System.out.println("Total matches: " + count);
}
}
4. Group Extraction
import java.util.regex.*;
public class GroupExtraction {
public static void main(String[] args) {
String text = "John Doe, age 30; Jane Smith, age 25";
String regex = "(\\w+ \\w+), age (\\d+)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
System.out.println("Name: " + matcher.group(1));
System.out.println("Age: " + matcher.group(2));
System.out.println();
}
}
}
5. Replacement
import java.util.regex.*;
public class RegexReplacement {
public static void main(String[] args) {
String text = "The quick brown fox jumps over the lazy dog";
String replaced = text.replaceAll("fox", "cat");
System.out.println(replaced);
// More complex replacement
String phoneNumbers = "Phone: 123-456-7890, 987-654-3210";
String masked = phoneNumbers.replaceAll("\\d{3}-\\d{3}-(\\d{4})", "XXX-XXX-$1");
System.out.println(masked);
}
}
6. Splitting Strings
import java.util.regex.*;
public class RegexSplit {
public static void main(String[] args) {
String text = "apple,orange,banana,grape";
String[] fruits = text.split(",");
for (String fruit : fruits) {
System.out.println(fruit);
}
// More complex split
String complexText = "Hello world! How are you?";
String[] words = complexText.split("\\s+");
for (String word : words) {
System.out.println(word);
}
}
}
Flags
You can modify regex behavior with flags:
Pattern pattern = Pattern.compile("pattern", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
Common flags:
-
Pattern.CASE_INSENSITIVE
- Case insensitive matching -
Pattern.MULTILINE
-^
and$
match at line boundaries -
Pattern.DOTALL
-.
matches any character including line terminators -
Pattern.UNICODE_CASE
- Unicode-aware case folding
Common Patterns
-
Email:
^[\\w.-]+@[\\w.-]+\\.[a-zA-Z]{2,}$
-
Phone (US):
^\\d{3}-\\d{3}-\\d{4}$
or^\\(\\d{3}\\) \\d{3}-\\d{4}$
-
URL:
^(https?|ftp)://[\\w.-]+(\\.[a-zA-Z]{2,})+(/\\S*)?$
-
Date (YYYY-MM-DD):
^\\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])$
-
IP Address:
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
Performance Considerations
- Precompile patterns if used repeatedly:
private static final Pattern EMAIL_PATTERN = Pattern.compile("^[\\w.-]+@[\\w.-]+\\.[a-zA-Z]{2,}$");
public boolean isValidEmail(String email) {
return EMAIL_PATTERN.matcher(email).matches();
}
Be careful with greedy quantifiers (
*
,+
) which can cause performance issues (catastrophic backtracking)Use possessive quantifiers (
*+
,++
,?+
) or atomic groups when appropriate to prevent backtracking
Advanced Features
Lookahead and Lookbehind
// Positive lookahead (password with at least one digit)
String passwordPattern = "^(?=.*\\d).{8,}$";
// Positive lookbehind
String text = "USD100 EUR200 GBP300";
Pattern pattern = Pattern.compile("(?<=USD)\\d+");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("USD amount: " + matcher.group());
}
Named Groups (Java 7+)
String regex = "(?<area>\\d{3})-(?<exchange>\\d{3})-(?<line>\\d{4})";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher("123-456-7890");
if (matcher.matches()) {
System.out.println("Area code: " + matcher.group("area"));
System.out.println("Exchange: " + matcher.group("exchange"));
System.out.println("Line number: " + matcher.group("line"));
}
Unicode Support
// Match any letter from any language
String unicodeRegex = "\\p{L}+";
Pattern pattern = Pattern.compile(unicodeRegex, Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = pattern.matcher("Hello 你好 привет");
while (matcher.find()) {
System.out.println("Match: " + matcher.group());
}
Regular expressions are a powerful tool in Java, but they can become complex. Always test your patterns thoroughly and consider readability when creating complex expressions.
Top comments (0)