DEV Community

Anh Trần Tuấn
Anh Trần Tuấn

Posted on • Originally published at tuanh.net on

Regex in Java: The Ultimate Guide for Developers

1. What is Regex?

Regular Expressions (regex) are sequences of characters that define search patterns. These patterns are used for string matching, replacing, and splitting operations. In Java, regex is handled via the java.util.regex package, specifically the Pattern and Matcher classes.

1.1. Basic Syntax

At the heart of regex is a set of special characters that define the patterns. Here are a few basic ones:

  • . : Matches any single character except newline.
  • * : Matches 0 or more occurrences of the preceding element.
  • + : Matches 1 or more occurrences of the preceding element.
  • ? : Matches 0 or 1 occurrence of the preceding element.
  • []: Matches any one of the characters inside the brackets.

1.2. Example: Matching Email Addresses

Let’s start with a simple regex to match a valid email address. Here’s the code:

import java.util.regex.*;

public class RegexExample {
    public static void main(String[] args) {
        String emailPattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6}$";
        String email = "test@example.com";

        Pattern pattern = Pattern.compile(emailPattern);
        Matcher matcher = pattern.matcher(email);

        if (matcher.matches()) {
            System.out.println("Valid email");
        } else {
            System.out.println("Invalid email");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

1.3. Explanation of the Pattern

Breaking down the pattern ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,6}$:

  • ^ : Asserts position at the start of the string.
  • [a-zA-Z0-9._%+-]+: Matches any letter, digit, or one of the special characters (._%+-), ensuring at least one character before the "@".
  • @ : The literal "@" symbol.
  • [a-zA-Z0-9.-]+: Matches the domain name, allowing letters, digits, dots, and hyphens.
  • . : Escaped dot (.) to match the dot before the domain extension.
  • [a-zA-Z]{2,6}: Matches the domain extension (e.g., ".com", ".org"), allowing 2 to 6 characters.
  • $ : Asserts the position at the end of the string.

For the input test@example.com, the output would be:

Valid email
Enter fullscreen mode Exit fullscreen mode

2. Advanced Regex Techniques

Beyond basic matching, regex offers powerful features that can handle more complex scenarios such as capturing groups, lookahead, and lookbehind assertions.

2.1. Capturing Groups

Capturing groups are used to extract specific parts of a string based on the pattern. You define a group by placing a part of the regex inside parentheses ( ).

Example: Extracting date components from a string:

import java.util.regex.*;

public class DateExtractor {
    public static void main(String[] args) {
        String datePattern = "(\d{2})/(\d{2})/(\d{4})";
        String date = "15/09/2024";

        Pattern pattern = Pattern.compile(datePattern);
        Matcher matcher = pattern.matcher(date);

        if (matcher.matches()) {
            System.out.println("Day: " + matcher.group(1));
            System.out.println("Month: " + matcher.group(2));
            System.out.println("Year: " + matcher.group(3));
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Output:

Day: 15
Month: 09
Year: 2024
Enter fullscreen mode Exit fullscreen mode

2.2. Lookahead and Lookbehind

These are special regex assertions that allow you to match a pattern only if it's followed (lookahead) or preceded (lookbehind) by another pattern.

  • Lookahead (?=...): Ensures a match is followed by a specific pattern.
  • Lookbehind (?<=...): Ensures a match is preceded by a specific pattern.

Example: Matching a word only if it's followed by a number:

String pattern = "\bword\b(?=\d)";
Enter fullscreen mode Exit fullscreen mode

This pattern matches "word" only if it's followed by a digit.

2.3. Optimizing Regex Performance

Regex can become inefficient when dealing with large texts or complex patterns. To improve performance, keep these tips in mind:

  • Avoid unnecessary backtracking : When using quantifiers like * or +, consider using their non-greedy versions (*?, +?) if you don’t need to capture the longest possible match.
  • Precompile patterns : Instead of creating a new Pattern object for every match, compile it once and reuse it, especially inside loops.

3. Common Pitfalls and How to Avoid Them

While regex is powerful, it can also be tricky. Here are some common mistakes developers make and how to avoid them.

Image

3.1. Forgetting to Escape Special Characters

Characters like . or ? have special meanings in regex. If you want to match them literally, you must escape them with a double backslash ().

Example:

Pattern.compile("\."); // Matches a literal dot, not any character
Enter fullscreen mode Exit fullscreen mode

3.2. Using Greedy Quantifiers When Not Necessary

By default, quantifiers like * and + are greedy, meaning they will match as much as possible. This can lead to unintended matches.

Example:

Pattern.compile(".*"); // Greedy, will match everything
Pattern.compile(".*?"); // Non-greedy, will match the smallest possible string
Enter fullscreen mode Exit fullscreen mode

3.3. Not Testing Edge Cases

Always test your regex with a variety of inputs, including edge cases like empty strings, strings without the pattern, or patterns at the boundaries of your match conditions.

4. Conclusion

Mastering regex in Java opens the door to solving many string manipulation challenges efficiently. Whether it’s validation, extraction, or text replacement, a strong understanding of regex will make your Java development smoother. Start small, practice regularly, and remember to test your patterns against different inputs to ensure accuracy and performance.

If you have any questions or need further clarification, feel free to comment below! Happy coding!

Read posts more at : Regex in Java: The Ultimate Guide for Developers

AWS GenAI LIVE image

Real challenges. Real solutions. Real talk.

From technical discussions to philosophical debates, AWS and AWS Partners examine the impact and evolution of gen AI.

Learn more

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay