Germán Alberto Gimenez Silva

Posted on Dec 23, 2025 • Originally published at rubystacknews.com on Dec 23, 2025

Building a DFA-Based Regular Expression Engine in Ruby

#programming #ruby #software #softwaredevelopment

December 23, 2025

Inspired by a talk by Yudai Takada (@ydah) at Hokuriku Ruby Kaigi 01

Regular expressions are one of the most widely used tools in programming, yet they are often treated as black boxes. We use them daily, but rarely stop to ask a fundamental question: how does a regular expression engine actually work?

In his talk “Walking with Computer Science in Ruby — Building a DFA-based Regular Expression Engine”, Yudai Takada , product engineer at SmartHR and CRuby contributor, takes us on a deep but accessible journey into the internals of regex engines — not by theory alone, but by building one step by step in Ruby.

What “Regular Expressions” Really Mean

Takada begins by clarifying an important misconception: many features commonly called “regular expressions” are not regular in the strict computational sense.

As he explains, regular expressions are:

“A mathematical and computational concept for describing a set of strings using a single notation.”

However, modern engines often include extensions such as backreferences, lookaheads, and recursion — features that go beyond regular languages. Takada references Larry Wall’s famous remark:

“What we call ‘regular expressions’ are only marginally related to real regular expressions.”

This distinction matters, because once we restrict ourselves to true regular expressions, we gain powerful theoretical guarantees — including linear-time matching.

This post is an excerpt. The full version — including detailed explanations, diagrams, and code examples — is available on my blog:

👉 https://rubystacknews.com/2025/12/23/building-a-dfa-based-regular-expression-engine-in-ruby/