DEV Community

Sudhakar V
Sudhakar V

Posted on

Regular Expressions (Regex) in Java

Regular expressions provide a powerful way to search, match, and manipulate text in Java. Java's regex support is primarily through the java.util.regex package.

Core Classes

  1. Pattern: Compiled representation of a regular expression
  2. Matcher: Engine that performs match operations on a character sequence using a pattern

Basic Syntax

Common Metacharacters

  • . - Any character
  • \d - Digit (equivalent to [0-9])
  • \D - Non-digit
  • \s - Whitespace character
  • \S - Non-whitespace character
  • \w - Word character (equivalent to [a-zA-Z_0-9])
  • \W - Non-word character
  • ^ - Beginning of line
  • $ - End of line
  • \b - Word boundary

Quantifiers

  • * - 0 or more occurrences
  • + - 1 or more occurrences
  • ? - 0 or 1 occurrence
  • {n} - Exactly n occurrences
  • {n,} - n or more occurrences
  • {n,m} - Between n and m occurrences

Character Classes

  • [abc] - a, b, or c
  • [^abc] - Any character except a, b, or c
  • [a-z] - Any lowercase letter
  • [a-zA-Z] - Any letter

Basic Usage

1. Pattern Matching

import java.util.regex.*;

public class RegexExample {
    public static void main(String[] args) {
        String text = "The quick brown fox jumps over the lazy dog";
        String patternString = "fox";

        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(text);

        if (matcher.find()) {
            System.out.println("Found match at index " + matcher.start() + 
                             " to " + matcher.end());
        } else {
            System.out.println("No match found");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

2. Simple Validation (Email Example)

public class EmailValidator {
    public static boolean isValidEmail(String email) {
        String regex = "^[\\w.-]+@[\\w.-]+\\.[a-zA-Z]{2,}$";
        return email.matches(regex);
    }

    public static void main(String[] args) {
        String email1 = "test@example.com";
        String email2 = "invalid.email";

        System.out.println(email1 + " is valid? " + isValidEmail(email1));
        System.out.println(email2 + " is valid? " + isValidEmail(email2));
    }
}
Enter fullscreen mode Exit fullscreen mode

3. Finding Multiple Matches

import java.util.regex.*;

public class MultipleMatches {
    public static void main(String[] args) {
        String text = "cat dog cat dog cat";
        String patternString = "cat";

        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(text);

        int count = 0;
        while (matcher.find()) {
            count++;
            System.out.println("Match " + count + ": " + matcher.group() + 
                             " at " + matcher.start());
        }
        System.out.println("Total matches: " + count);
    }
}
Enter fullscreen mode Exit fullscreen mode

4. Group Extraction

import java.util.regex.*;

public class GroupExtraction {
    public static void main(String[] args) {
        String text = "John Doe, age 30; Jane Smith, age 25";
        String regex = "(\\w+ \\w+), age (\\d+)";

        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(text);

        while (matcher.find()) {
            System.out.println("Full match: " + matcher.group(0));
            System.out.println("Name: " + matcher.group(1));
            System.out.println("Age: " + matcher.group(2));
            System.out.println();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

5. Replacement

import java.util.regex.*;

public class RegexReplacement {
    public static void main(String[] args) {
        String text = "The quick brown fox jumps over the lazy dog";
        String replaced = text.replaceAll("fox", "cat");
        System.out.println(replaced);

        // More complex replacement
        String phoneNumbers = "Phone: 123-456-7890, 987-654-3210";
        String masked = phoneNumbers.replaceAll("\\d{3}-\\d{3}-(\\d{4})", "XXX-XXX-$1");
        System.out.println(masked);
    }
}
Enter fullscreen mode Exit fullscreen mode

6. Splitting Strings

import java.util.regex.*;

public class RegexSplit {
    public static void main(String[] args) {
        String text = "apple,orange,banana,grape";
        String[] fruits = text.split(",");

        for (String fruit : fruits) {
            System.out.println(fruit);
        }

        // More complex split
        String complexText = "Hello   world!  How are   you?";
        String[] words = complexText.split("\\s+");
        for (String word : words) {
            System.out.println(word);
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Flags

You can modify regex behavior with flags:

Pattern pattern = Pattern.compile("pattern", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
Enter fullscreen mode Exit fullscreen mode

Common flags:

  • Pattern.CASE_INSENSITIVE - Case insensitive matching
  • Pattern.MULTILINE - ^ and $ match at line boundaries
  • Pattern.DOTALL - . matches any character including line terminators
  • Pattern.UNICODE_CASE - Unicode-aware case folding

Common Patterns

  1. Email: ^[\\w.-]+@[\\w.-]+\\.[a-zA-Z]{2,}$
  2. Phone (US): ^\\d{3}-\\d{3}-\\d{4}$ or ^\\(\\d{3}\\) \\d{3}-\\d{4}$
  3. URL: ^(https?|ftp)://[\\w.-]+(\\.[a-zA-Z]{2,})+(/\\S*)?$
  4. Date (YYYY-MM-DD): ^\\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])$
  5. IP Address: ^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

Performance Considerations

  1. Precompile patterns if used repeatedly:
   private static final Pattern EMAIL_PATTERN = Pattern.compile("^[\\w.-]+@[\\w.-]+\\.[a-zA-Z]{2,}$");

   public boolean isValidEmail(String email) {
       return EMAIL_PATTERN.matcher(email).matches();
   }
Enter fullscreen mode Exit fullscreen mode
  1. Be careful with greedy quantifiers (*, +) which can cause performance issues (catastrophic backtracking)

  2. Use possessive quantifiers (*+, ++, ?+) or atomic groups when appropriate to prevent backtracking

Advanced Features

Lookahead and Lookbehind

// Positive lookahead (password with at least one digit)
String passwordPattern = "^(?=.*\\d).{8,}$";

// Positive lookbehind
String text = "USD100 EUR200 GBP300";
Pattern pattern = Pattern.compile("(?<=USD)\\d+");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
    System.out.println("USD amount: " + matcher.group());
}
Enter fullscreen mode Exit fullscreen mode

Named Groups (Java 7+)

String regex = "(?<area>\\d{3})-(?<exchange>\\d{3})-(?<line>\\d{4})";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher("123-456-7890");
if (matcher.matches()) {
    System.out.println("Area code: " + matcher.group("area"));
    System.out.println("Exchange: " + matcher.group("exchange"));
    System.out.println("Line number: " + matcher.group("line"));
}
Enter fullscreen mode Exit fullscreen mode

Unicode Support

// Match any letter from any language
String unicodeRegex = "\\p{L}+";
Pattern pattern = Pattern.compile(unicodeRegex, Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = pattern.matcher("Hello 你好 привет");
while (matcher.find()) {
    System.out.println("Match: " + matcher.group());
}
Enter fullscreen mode Exit fullscreen mode

Regular expressions are a powerful tool in Java, but they can become complex. Always test your patterns thoroughly and consider readability when creating complex expressions.

Top comments (0)