Understanding the RegexEntityExtractor in RASA

#rasa #yaml #chatbot #llm

Our previous blog explored how the Entity Synonym Mapper helps normalize extracted entities into canonical values.

Hereafter, we’ll move one step deeper into how entities are detected in the first place, specifically using pattern-based extraction.
This is where the RegexEntityExtractor comes into play.

Contents of this blog

What is RegexEntityExtractor
YAML configuration
Internal working
When and why to use it

What is the RegexEntityExtractor?
The RegexEntityExtractor is a rule-based entity extractor that uses regular expressions to identify entities in user input.
Unlike ML-based extractors, it does not learn from data.
Instead, it works on a very simple principle:

If the text matches a predefined pattern, extract it as an entity.

This makes it:

Deterministic
Fast
Extremely precise (when patterns are well-defined)

Why do we need it?
Not all entities are ambiguous.
Some entities:

Follow fixed formats
Are numerical or structured
Do not benefit from ML generalization

Examples:

Phone numbers
Email addresses
Order IDs
Dates
ZIP codes
Trying to train an ML model to extract these is often overkill.

YAML Configuration Example
Regex patterns are defined directly in your NLU YAML file.

version: "3.1"

nlu:
  - regex: phone_number
    examples: |
      - abc@gmail.com
      - xyz@gmail.com

Pipeline Configuration
To enable it, the extractor must be added to your pipeline:

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexEntityExtractor

Internal Working
At a low level, the RegexEntityExtractor works as follows:

Takes the raw user message
Iterates over each regex pattern defined in YAML
Applies the pattern to the text
If a match is found:
- Extracts the matched substring
- Assigns it as an entity
- Stores start and end character indices

Consider the example:

"My phone number is 9876543210"

Then the entity extracted is:

{
  "entity": "phone_number",
  "value": "9876543210",
  "start": 19,
  "end": 29
}

Combining with Entity Synonym Mapper
A very common pattern is:

RegexEntityExtractor extracts the entity
Entity Synonym Mapper normalizes it

This combination gives:

Precision
Consistency
Clean downstream data

When should RegexEntityExtractor be used?

When Entity format is predictable
When Precision matters more than recall
When You want to reduce ML complexity
When You want deterministic behavior

Hereafter we’ll explore CRFEntityExtractor, where entities are learned statistically rather than matched explicitly.

DEV Community

Understanding the RegexEntityExtractor in RASA

Contents of this blog

Top comments (0)