DEV Community

Nate S
Nate S

Posted on

Building Confidence Scoring for Email Open Tracking (Engineering Notes)

Most email open tracking in 2026 is broken. Apple Mail Privacy
Protection fires a fake open within seconds of delivery, before any
human sees the email. Corporate scanners do the same. Open rates run
2-3x inflated.

The engineering problem: how do you tell a real human open from a
machine pre-fetch given only the HTTP request metadata of the pixel
load?

*Signals available
*

Every open event arrives at the tracking endpoint with:

  • Request IP
  • User-Agent string
  • Request timestamp (relative to email send)
  • Accept-Language, Referer, other headers

Patterns by source
**
**Apple MPP pre-fetches:

  • IP from Apple-attributable ranges (17.0.0.0/8 mostly)
  • User-Agent: Mac/iOS native with Apple's tracking-relay format
  • Timing: typically 30 seconds to 5 minutes after delivery
  • No subsequent click activity

Corporate scanner pre-fetches (Defender, Mimecast, Proofpoint):

  • IP from known scanner ranges (each vendor publishes these or they are discoverable via reverse lookup)
  • User-Agent: scanner-specific signatures
  • Timing: sub-5-second from delivery
  • Multiple link and image requests within 1-3 second window from same IP

Gmail image proxy:

  • IP from googleusercontent.com range
  • User-Agent: Google bot signature
  • Timing: variable

Real human opens:

  • IP from residential or generic corporate range
  • User-Agent: actual mail client used by a human
  • Timing: rarely sub-30-second from delivery; clusters at typical inbox-check times
  • Often followed by click activity within 30 minutes

*Model approach
*

A gradient-boosted classifier on the feature set above gives 95-98%
agreement with human-rated labels on a held-out test set. Output is a
confidence score (0-100%) which maps to Tier 1-5.

  • False-positive rate (Tier 1 graded when it's a machine): <2%
  • False-negative rate (Tier 4-5 graded when it's a human): <5%

The productionized version of the model + dashboard surface is at:
https://outsolvi.com/features/confidence-scoring

*The retraining problem
*

Apple keeps shifting MPP's IP block allocation. The model needs
retraining every few months as patterns drift. We've automated the
labeled-data collection so retraining is a 1-day job rather than a
1-week job.

Anyone else working on email signal filtering? Curious about your
approach to the drift problem specifically.

— Nate Summers
Co-Founder, Outsolvi

Top comments (0)