Building Confidence Scoring for Email Open Tracking (Engineering Notes)

#email #engineering #dataengineering #sales

Most email open tracking in 2026 is broken. Apple Mail Privacy
Protection fires a fake open within seconds of delivery, before any
human sees the email. Corporate scanners do the same. Open rates run
2-3x inflated.

The engineering problem: how do you tell a real human open from a
machine pre-fetch given only the HTTP request metadata of the pixel
load?

*Signals available
*
Every open event arrives at the tracking endpoint with:

Request IP
User-Agent string
Request timestamp (relative to email send)
Accept-Language, Referer, other headers

Patterns by source
**
**Apple MPP pre-fetches:

IP from Apple-attributable ranges (17.0.0.0/8 mostly)
User-Agent: Mac/iOS native with Apple's tracking-relay format
Timing: typically 30 seconds to 5 minutes after delivery
No subsequent click activity

Corporate scanner pre-fetches (Defender, Mimecast, Proofpoint):

IP from known scanner ranges (each vendor publishes these or they are discoverable via reverse lookup)
User-Agent: scanner-specific signatures
Timing: sub-5-second from delivery
Multiple link and image requests within 1-3 second window from same IP

Gmail image proxy:

IP from googleusercontent.com range
User-Agent: Google bot signature
Timing: variable

Real human opens:

IP from residential or generic corporate range
User-Agent: actual mail client used by a human
Timing: rarely sub-30-second from delivery; clusters at typical inbox-check times
Often followed by click activity within 30 minutes

*Model approach
*
A gradient-boosted classifier on the feature set above gives 95-98%
agreement with human-rated labels on a held-out test set. Output is a
confidence score (0-100%) which maps to Tier 1-5.

False-positive rate (Tier 1 graded when it's a machine): <2%
False-negative rate (Tier 4-5 graded when it's a human): <5%

The productionized version of the model + dashboard surface is at:
https://outsolvi.com/features/confidence-scoring

*The retraining problem
*
Apple keeps shifting MPP's IP block allocation. The model needs
retraining every few months as patterns drift. We've automated the
labeled-data collection so retraining is a 1-day job rather than a
1-week job.

Anyone else working on email signal filtering? Curious about your
approach to the drift problem specifically.

— Nate Summers
Co-Founder, Outsolvi