DEV Community

david duymelinck
david duymelinck

Posted on

PHP fun: regex builder in 8.5

I just saw following post.

And the first thing I thought was, this has pipe operator written all over it.

Why?

The main reason for me is this is a builder that creates a text so why should it need to instantiate an object?

The second reason is that the build-in PHP regex methods should be used, instead of wrapping then in an object method.
duplicates.IsMatch('test') versus preg_match($pattern, 'test').

How will it work?

I'm not going to create the whole library, I'm going to highlight a few of the possibilities of the C# library in their pipe operator form.

The start is simply an empty string, but it can be a string with a delimiter or even a delimiter function.

$pattern = '' |> anyCharacter(...); // regex: .*

// or

$pattern = '/' |> anyCharacter(...);

// or 

$pattern = delimiter() |> anyCharacter(...);
Enter fullscreen mode Exit fullscreen mode

The library uses class constants, Pattern.With.LowercaseLetter, to add known character patterns. This can be a backed enum in PHP, and the anyCharacter function can become any with an argument.

enum CharacterPattern: string
{
   case Any = '.';
   case LowercaseLetter = '[a-z]';
   case Word = '\w';
}

function any(string $pattern, CharacterPattern|string $add = CharacterPattern::Any): string
{
  $addPattern = $add instanceof CharacterPattern ? $add->value;

  return "$pattern$addPattern*";
}

// examples

$pattern = '' |> any(...);
$pattern = '' |> (fn($pattern) => any($pattern, CharacterPattern::LowercaseLetter));
$pattern = '' |> (fn($pattern) => any($pattern, '[sunday|monday]'));
Enter fullscreen mode Exit fullscreen mode

For the last example the library needed a method, literal, type hinting is more than enough.

To complete the quantifying pattern functions, there should be the exact and atLeast functions.

The PositiveLookahead method of the library can result in nested function calls as shown in .PositiveLookahead(Pattern.With.Anything.Repeat.ZeroOrMore.Set(Pattern.With.Literal("!@#$%^&*()_+-="))).
Splitting the method in two functions allows the builder to remain flat.

positiveLookaheadStart(string pattern, string $times = '')
{
   return "$pattern(?=$times";
}

positiveLookaheadEnd(string pattern)
{
  return "$pattern)";
}

// example

$pattern = ''
  |> (fn($pattern) => positiveLookaheadStart($pattern, '.*'))
  |> (fn($pattern) => any($pattern, '[sunday|monday]'))
  |> positiveLookaheadEnd(...);
Enter fullscreen mode Exit fullscreen mode

Another example in the library of a method that should be split in a pipe operator library is the NamedGroup method. I would call the function group because beside naming the group it is also possible to create non-capturing groups.

The back referencing example might look great, but in regex a back reference is nothing more than \1 or \k<name> depending on the use of unnamed or named groups. A function could be;

function backReference(string $pattern, int|string $add)
{
   $reference = is_string($add) ? "k<$add>" : $add;

   return "$pattern\\$reference";
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

The benefit of using the pipe operator is that there will be less code because the functions do one thing without side effects.

This also means less edge cases to test, because the regex building happens on the language level.

A minor negative point is that because the functions have more generic names it might not be as intuitive as a fluent API pattern.
To alleviate that problem you could add your own functions, which is also another positive consequence of using the pipe operator. Extending a library becomes trivial.

Next time you think about the builder pattern, ask yourself if using the pipe operator might be a better fit.

PS: Don't use a regex builder in production, use the output of the library for example from a pre-warm cache.

Top comments (2)

Collapse
 
a7mdfre7at profile image
Ahmad Al-Freihat

Thanks for the reference and for exploring the idea from a PHP perspective.

To clarify the scope and goals of the library: it is not intended to replace regex, abstract it away, or optimize runtime performance. Its primary goal is to make complex patterns explicit, readable, and intention-revealing in statically typed, long-lived C# codebases—where raw regex strings often become difficult to review, refactor, and reason about over time.

The fluent, object-based approach is a deliberate trade-off:

  • it favors expressiveness and discoverability over minimalism,
  • it leverages C#’s type system and fluent APIs rather than string composition,
  • and it targets scenarios where maintainability and code review clarity matter more than brevity.

Your functional, pipe-based exploration makes sense in PHP and highlights how different language ecosystems encourage different design choices. I see it less as an alternative implementation and more as a confirmation that the underlying problem—regex readability—is real, even if the solutions vary by language.

Appreciate you engaging with the idea and surfacing the trade-offs so clearly.

Collapse
 
xwero profile image
david duymelinck • Edited

I found it a great way to show off the new PHP 8.5 pipe operator. But I would never use a library to make a regex more readable. That is one of the reasons I didn't made the effort to create a library.
My first thought would be, is it necessary the pattern needs to be that complex? Could I solve the problem in another way?

If that is not the case just add comments to the regex.

^                           # Start of the string
(?=.*[a-z])                 # Lookahead to ensure at least one lowercase letter is present
(?=.*[A-Z])                 # Lookahead to ensure at least one uppercase letter is present
(?=.*\d)                    # Lookahead to ensure at least one digit is present
(?=.*[!@#$%^&*()_+\-=])     # Lookahead to ensure at least one special character from the set is present
.{8,}                      # Ensure the string is at least 8 characters long
$                           # End of the string
Enter fullscreen mode Exit fullscreen mode

(I let AI comment the example for me, if there is something wrong blame that)

Regular expressions can be cryptic because it uses meta characters instead of well named functions/methods. When the comments can't explain the regex, then the comments are to blame.

Adding an extra dependency to make a more readable expression feels to me like someone is not willing to take the time to learn how regular expressions work. It is not like you need to learn a whole new language, the only purpose is text extraction. What is next? Not learning a database query language, not learning HTML?
You created the library with best intentions, but sometimes they can lead people to choose a bad path.