DEV Community

Cover image for Java REGEX based scanner
Mustapha Belmokhtar
Mustapha Belmokhtar

Posted on

Java REGEX based scanner

I often use the java built-in java.util.Scanner in order to retrieve information from string streams, but what I wasn't able to do, is to extract some tokens that match a predefined regex, and then consume the character sequence till the end.
For the java built-in scanner example, the regex is used only to define the delimiters, for example the tokens are separated by blank spaces, say , \t, or \n.
I could not tell it to retrieve well defined information, for example 123a4567-e89b-12d3-a456-123442445670, especially when it is inside a text mixed with some other tokens.
All I have to do now is to provide the regex example, and I get all the matching tokens from the given text.

Exemple

@Test
    public void testUUID() {
        final String regex = "[0-9abcdef]{8}(-[0-9abcdef]{4}){3}-[0-9abcdef]{12}";
        final String text = "uuid : 6d0a3538-9760-41ae-965d-7aad70021f81\n" +
                "uuid : d7d97fb3-3676-4109-9a94-7acc5f593ace\n" +
                "uuid : 02e87dd3-10ff-43cf-9572-bd9d151bb439\n" +
                "uuid : 632a4c31-8dfe-43a3-8f8d-15b472292cc9";

        final List<String> expectedUUIDs = Arrays.asList("6d0a3538-9760-41ae-965d-7aad70021f81",
                "d7d97fb3-3676-4109-9a94-7acc5f593ace",
                "02e87dd3-10ff-43cf-9572-bd9d151bb439",
                "632a4c31-8dfe-43a3-8f8d-15b472292cc9");

        final List<String> foundUUIDs = new ArrayList<>();
        final RegexScanner regexScanner = new RegexScanner(text, regex);
        while (regexScanner.hasNext()) {
            foundUUIDs.add(regexScanner.next());
        }
        Assert.assertArrayEquals(expectedUUIDs.toArray(), foundUUIDs.toArray());
    }

This way, the scanner is consumed by the while loop until the end is reached, which means all the token are read and processed.

It is possible also to provide a function that maps the found token to another object using the next(Function<String ,R> mapper) method.

The code is available on github via this link.

Latest comments (0)