Jonathan Tavares for Broodnet

Posted on Apr 10 • Edited on Apr 17 • Originally published at broodnet.com

I scored 163 real emails on how well an AI agent can read them. Most of them are terrible.

#ai #marketing #agents #datascience

We are building Broodnet, email infrastructure for AI agents. Each agent gets its own address and its own inbox. Through the Broodnet CLI, an agent can list its emails, open individual messages, search for specific senders or subjects, and send messages to its owner or other agents in the same account.

While testing with real agent frameworks like Openclaw and hermes I kept running into the same wall: the agents could receive the email just fine, but when it tried to actually do something with it, the email was unreadable. Every link wrapped in a 200-character tracking redirect. Invisible Unicode characters scattered through the body. OTP codes buried in template noise. Some were instantly interpreted, but others left the models running in circles

So, for science, I grabbed 160+ real transactional emails — verification codes, welcome messages, notifications, security alerts — that had been sitting across the Broodnet team's professional and personal inboxes for the last 5 to 10 years. SaaS platforms, games, crypto wallets, dev tools, news services, government portals, you name it. I scored every single one on two perspectives: how good is this email for a human, and how good is it for an agent reading through a CLI. All scores are normalized to a 0-10 scale so they're comparable across dimensions.

How I scored them

Human side, five metrics: clarity, warmth, visual noise (inverted, 10 means clean), subject line quality, and onboarding helpfulness. Averaged and normalized to 0-10.

Agent side, using Claude Opus 4.6, four scaled metrics plus two binary flags: extractability (can a plain-text parser get the key info?), sender clarity, URL cleanliness, body noise level, and whether the code appears in the subject line (yes/no), whether expiry is explicitly stated (yes/no). Also normalized to 0-10.

Plus a shared metric: CTA URL quality and clarity.

The dataset breaks down into 53 welcome emails, 52 verify-link flows, 20 verify-code OTPs, and 35 notifications, with a few others and edge cases.

The tradeoff that doesn't exist

Going into this I assumed there'd be a tradeoff. Emails that look great for humans probably look worse for agents, right? Rich templates, beautiful buttons, all that stuff an LLM can't see. Turns out that's wrong.

The 26 emails that scored "double pristine" (clean URLs and no spacer pollution) averaged 7.0/10 on the human scale. The dataset overall averaged 6.3/10. The cleanest emails for agents were also better for humans. Not by a little, by almost a full point.

Clean emails for agents are also better emails for humans. The supposed tradeoff is a myth.

This surprised me until I thought about where the noise actually comes from. The same ESPs and marketing tools that inject tracking links also inject spacer characters, bloated templates, and broken plain-text fallbacks. Clean emails tend to be clean everywhere.

If you plot human noise score against agent URL cleanliness, 86.5% of emails fall within 1 point of the diagonal. An email that's noisy for you is almost certainly opaque for an agent too. These aren't independent dimensions. They share a root cause.

The onboarding trap

Here's the pattern I didn't expect.

Onboarding quality	Agent score (out of 10)
None	6.7
Some	5.6
Good	5.7
Great	5.3

The emails with the best human onboarding had the worst agent scores. The email that helps a human the most is the email that buries an agent the deepest.

What's happening is that "good onboarding" in practice means multiple sections, step-by-step flows, feature highlights, images, and CTAs. Every one of those CTAs gets a tracking link because the marketing team wants to know which step users click.

The 29 emails that scored high on both dimensions broke this pattern. They include things like Paymo, Pulsetic, AITopTools, IndieHunt, Baselight. What these have in common: they're mostly small companies. They didn't invest in an elaborate ESP with click tracking. Their onboarding emails just... link to the product. Directly. With normal URLs.

The Tracking Tax

42.9% of all emails in the dataset have fully opaque tracking URLs (scored 1 out of 5 on URL cleanliness). Only 19.6% have perfectly clean raw URLs.

For welcome emails specifically, 66% have zero usable CTA links. Two thirds. For an agent that just signed up for a service and gets a welcome email, there is literally nothing actionable in the email body. Every "Get Started" button goes to click.whatever.com/ls/click?upn=u001.aKJF8sldjf... and the destination is unknowable without following the redirect.

I catalogued 11 distinct tracking systems across the dataset. They all look different but produce the same result: a URL that tells you nothing. Salesforce Marketing Cloud, customer.io, HubSpot, Braze, Beehiiv, Eloqua, AWS SES awstrack, Google's own tracker, Microsoft, vialoops, Stripe. From an agent's perspective they're all equally opaque.

The worst offender for sheer URL ugliness was Microsoft's Bing Webmaster Tools: mucp.api.account.microsoft.com/m/v2/c?r=<UPPERCASE-BASE32>. But the most consistently bad was customer.io (used by Buffer, daily.dev, Uphold), which wraps every link in a JWT-encoded redirect on every email type.

The marketing team doesn't talk to the product team

One of the weirdest patterns in the data: the same company can produce wildly different email quality depending on which template they use.

Mailgun: welcome email scores 3.2/10 on agent metrics (every link through their own Mailjet tracker). Verify-link email scores 7.7/10 (raw signup.mailgun.com/activate/<hex> URL). That's a swing of 4.5 points out of 10 across templates from the same sender. And Mailgun is an email infrastructure company.

ngrok: both welcome emails score 3.6/10 (all links through HubSpot tracker). Their verify-link? Scores 8.6/10. Pure plain text, 3 lines total, raw URL. Swing: 5 points.

Loops: welcome routes everything through c.vialoops.com (their own tracker, ironic since they sell email delivery). Their DNS notification email? All records in plain text, raw links, scores 8.2/10. Swing: 4.6 points.

Polar, Docker, and others follow the same pattern. Transactional emails come from engineering. Welcome emails come from marketing. Different tools, different templates, different philosophies.

The best predictor of email quality isn't the company or the industry. It's which team within the company owns the template.

The gaming industry tells the whole story

I had gaming emails in the dataset and they split perfectly into two groups with nothing in between.

Indie studios and smaller gaming sites (itch.io, Larian Studios, Raider.IO, etc): agent scores of 7.7 to 8.6/10. All raw URLs. Zero tracking. Raider.IO's verify URL has the username right in it: raider.io/verify?user=<username>&token=validation3ad9de269693489d. That validation prefix in the token is a small touch but it tells you what the URL does just by reading it.

AAA studios and big gaming platforms (HoYoverse, Bethesda, Discord, Riot Games, etc): agent scores of 2.7 to 3.6/10. Everything through AWS SES awstrack, Braze, Salesforce, Eloqua. Every link opaque. Image-heavy marketing.

The dividing line isn't the content or the email type. It's whether the company has a marketing department with access to an ESP.

The invisible character zoo

This one gets technical but it matters. Email senders inject zero-width Unicode characters to control how mail clients render the preview text. In a normal email client you never see them. When an agent reads the raw text through the CLI, it gets hundreds of invisible characters mixed into the content.

I found 7 distinct character types used across the dataset, including compound sequences of 4-5 different invisible characters repeated dozens of times. Trading 212's verify email has over a hundred U+200C characters before the actual message starts. MoonPay chains together U+034F, U+200C, and U+FEFF in repeating sequences.

It's not malicious, it's just how ESPs handle preheader text. But it means an agent parsing email output has to strip a zoo of invisible Unicode before it can even find the verification code. 30.7% of emails had moderate pollution. 11.7% were severely polluted.

What good actually looks like

26 of the 163 emails (16%) were double pristine: clean URLs and clean body text. The standouts by category:

Verify-link: Pulsetic's URL is the one I keep coming back to: app.pulsetic.com/email_verify/?email=<email>&hash=<uuid4>. Named parameters, the intent readable in the URL itself, UUID4 as the token. ngrok's verify is even more minimal: 3 lines of plain text, no HTML at all.

Verify-code: GitLab is the template to copy. 6 emails across every type, consistently at the top. Code in body, expiry stated in plain text, raw gitlab.com URLs for everything. Zero tracking on any email they send. On a related note, every email that put the verification code in the subject line scored a perfect 5/5 on subject quality. Only four senders in the entire dataset did this: Canva, Slack, LinkedIn, and Gravatar. This also happens to be exactly what iOS and Android need to surface that "copy code" button on the lock screen notification. Both platforms use heuristics to detect OTP codes in message content, and having the code right in the subject makes it trivially detectable. Good for humans tapping their phone, good for agents scanning their inbox.

Welcome: the rare ones that work for both sides tend to be small companies that link directly to their product. Paymo, Pulsetic, IndieHunt, EarlyHunt, AITopTools. No elaborate onboarding funnels, so no tracking on every CTA.

Emails that explicitly stated expiry ("this code expires in 60 minutes") scored 7.4/10 on agent metrics versus 5.7/10 for those that didn't. It's a small detail, but teams that think to add it tend to care about the other stuff too.

The irony hall of fame

Mailgun (email infrastructure company): welcome email agent score 3.2/10. Uses their own Mailjet tracker on all links.

Loops (email platform for SaaS): welcome email agent score 3.6/10. Uses their own vialoops tracker.

Anthropic (AI company): Claude Code welcome email agent score 4.1/10. The email for their AI coding tool can't be read by an AI agent. The onboarding steps are actually great (/init, git commands, all in plain text) but every URL is opaque and the social links resolve to anchor-only references that go nowhere.

Buffer (5 emails, all consistently worst-in-class): the only sender in the dataset where the best email they sent still scored below average. Multiple emails at 2.7/10.

deviantART: tracked every single element in the email with individual utm_term values. The greeting text was wrapped in its own tracked URL with utm_term=greeting. Even the paragraph between the greeting and the CTA had its own tracker: utm_term=ph1.

What now

16% of the emails in this dataset were fully clean for both humans and agents. The other 84% have room to improve. Some of them have a lot of room.

If you're building agents that need to receive email, that's what Broodnet does. Each agent gets its own address, checks its own inbox through the CLI, and acts on what it finds. We solved the infrastructure problem. The email design problem... well that's on the senders.

broodnet gives AI agents their own email addresses. CLI-native, built for agent-to-owner communication. Free tier available.

DEV Community