DEV Community

Nikola Mitrovic
Nikola Mitrovic

Posted on

I tried parsing emails with regex. It went exactly how you think.

Recently I needed to process incoming emails automatically.

The idea sounded simple:

Email arrives → extract some fields → trigger a webhook

Things like:

  • order confirmations
  • invoice emails
  • shipping notifications
  • support messages

Nothing complicated.

Or so I thought.


Attempt #1 — Regex

Like most developers, I started with regex.

const price = email.match(/Total:\s\$(\d+)/)
Enter fullscreen mode Exit fullscreen mode

For the first email it worked perfectly.

Then the next email came in and said:

Amount paid: $29
Enter fullscreen mode Exit fullscreen mode

Then another one said:

Total price: USD 29
Enter fullscreen mode Exit fullscreen mode

Then an HTML email arrived with nested tables, inline styles, and formatting from what looked like 2004 Outlook templates.

At this point my regex slowly evolved into something like this:

/(Total|Amount|Price).*?(\$|USD)?\s?(\d+(\.\d+)?)/
Enter fullscreen mode Exit fullscreen mode

Which is usually the moment you realize the approach is already doomed.


Attempt #2 — Parsing the HTML

Okay fine.

Let's parse the HTML instead.

That led to code like this:

const dom = new JSDOM(emailHtml)
Enter fullscreen mode Exit fullscreen mode

Which sometimes worked.

Except email HTML is a special kind of chaos.

You get:

  • tables inside tables
  • inline styles everywhere
  • different layouts for every sender

And suddenly you're maintaining custom parsers for every email format.


The real problem

Emails aren't structured data.

They're written for humans, not machines.

And every sender formats them differently.

Trying to enforce rigid parsing rules becomes fragile very quickly.


The obvious solution (in hindsight)

Instead of trying to force strict parsing rules…

Why not let AI interpret the email and extract the fields you want?

Example email:

Subject: Order confirmation
Customer: John Smith
Product: T-shirt
Total: $39
Enter fullscreen mode Exit fullscreen mode

Structured output:

{
  "customer": "John Smith",
  "product": "T-shirt",
  "total": 39
}
Enter fullscreen mode Exit fullscreen mode

Now your backend receives clean structured data instead of raw email text.


So I built a small tool

Mostly because I kept running into this problem again and again.

It's called ParseForce.

The flow is simple:

Incoming email → AI parsing → structured JSON → webhook
Enter fullscreen mode Exit fullscreen mode

You:

  1. Get a unique inbox
  2. Send emails to it
  3. Define the schema you want
  4. Receive structured JSON in your webhook

That's it.


Some things it works well for

So far I've been using it for:

  • parsing order confirmation emails
  • extracting invoice data
  • processing lead emails
  • triggering automation workflows

Basically anything where an email contains data you want your system to understand.


If you're curious

You can check it out here:

👉 https://parseforce.io

I'm also curious how others deal with this problem.

Are you using regex, templates, or something else entirely?


node #webdev #saas #automation #ai

Top comments (0)