- My project: Hermes IDE | GitHub
- Me: gabrielanhaia
Sometime around 2002, Trey Harris, a sysadmin at a university, got one of those help desk requests that makes you question the caller's grip on reality. The head of the statistics department was reporting that email couldn't be sent farther than 500 miles.
Not 500 megabytes. Not 500 recipients. Five hundred miles.
Trey's reaction was everyone's reaction: that's not how email works. It's TCP/IP, not a carrier pigeon. Distance doesn't enter into it. But the guy was a statistician, and he'd actually done the analysis. Emails to certain servers worked. Emails to others didn't. He'd plotted the failures geographically. The cutoff was roughly 500 miles in every direction.
Trey checked the logs. The statistician was right.
Testing the Impossible
Trey started testing systematically. He sent emails to servers at known locations and tracked which succeeded and which failed:
- Nearby campuses within ~300 miles: delivered
- Servers in Atlanta (~400 miles away): delivered
- Providence, Rhode Island (~600 miles): failed
- Memphis, Tennessee (~800 miles): failed
Some long-distance emails did go through, which muddied the picture at first. But those successes were all to large providers with geographically distributed infrastructure. The emails were landing at a nearby edge node, not traveling the full distance. Smaller institutions with a single server location followed the 500-mile rule almost perfectly.
The pattern was real. Emails had a geographic range limit. Which is insane.
What Changed?
This is the single most valuable debugging question you can ask, and Trey asked it at exactly the right moment.
The mail server had been recently upgraded. A colleague had updated SunOS and reinstalled Sendmail while Trey was on vacation. Everything seemed to be working fine after the upgrade, so nobody flagged it for review. The "can't email past 500 miles" problem only showed up once someone actually tried to reach a smaller, more distant server.
Trey pulled up the Sendmail configuration. Most of it was correct. But one value was wrong: the timeout for remote SMTP connections was set to zero.
# sendmail.cf (the problematic config)
# Timeout for initial connection to remote server
O Timeout.connect=0
Zero Doesn't Mean What You Think
Sendmail is an old, forgiving, occasionally baffling piece of software. A timeout of 0 didn't mean "no timeout" or "wait forever." It meant "don't wait at all." Connect, and if the response isn't essentially instantaneous, give up.
But "instantaneous" isn't actually zero in a computer. The operating system has a minimum time resolution. On this system, the OS clock ticked in intervals of a few milliseconds. So the effective timeout wasn't zero; it was the smallest unit of time the OS could measure, which was roughly 3 milliseconds.
If the remote mail server could complete the TCP handshake (SYN → SYN-ACK → ACK) within about 3 milliseconds, the email would send successfully. If the handshake took longer than that, Sendmail gave up and reported a connection failure.
What determines how fast a server can respond to a TCP SYN? Mostly physics.
The Speed of Light Enters the Chat
An electrical signal in a fiber optic cable travels at roughly two-thirds the speed of light in a vacuum. The speed of light in vacuum is about 186,000 miles per second. In fiber, that's approximately 124,000 miles per second.
A TCP handshake is a round trip: your server sends a SYN packet to the remote server, the remote server sends back a SYN-ACK. The signal has to travel there and back. So the question becomes: how far can a signal travel round-trip in about 3 milliseconds?
# the_math.py
SPEED_OF_LIGHT_VACUUM = 186_282 # miles per second
FIBER_OPTIC_FACTOR = 0.67 # signal speed in fiber vs vacuum
SPEED_IN_FIBER = SPEED_OF_LIGHT_VACUUM * FIBER_OPTIC_FACTOR # ~124,809 mi/s
# Max one-way distance at various timeouts
for timeout_ms in [3, 5, 8, 10]:
timeout_s = timeout_ms / 1000
# Round trip, so divide by 2 for one-way distance
max_one_way = (SPEED_IN_FIBER * timeout_s) / 2
print(f" Timeout {timeout_ms}ms -> max one-way distance: ~{max_one_way:.0f} miles")
# Output:
# Timeout 3ms -> max one-way distance: ~187 miles
# Timeout 5ms -> max one-way distance: ~312 miles
# Timeout 8ms -> max one-way distance: ~499 miles
# Timeout 10ms -> max one-way distance: ~624 miles
At a pure 3ms timeout and speed-of-light propagation, the theoretical limit is around 187 miles. But Trey observed a limit closer to 500 miles, which suggests the effective timeout on that system was a bit higher (around 8ms). Several factors push the observed distance up:
- Router hops add latency, but close-by servers in 2002 often had fairly direct routes
- Server processing time for SYN-ACK generation varies by hardware and load
- The exact OS timer resolution on that specific SunOS installation might have been 8-10ms rather than 3ms
The precise number doesn't matter. What matters is that a near-zero timeout, combined with the finite speed of light in fiber, created a physical distance limit on email delivery. Physics was enforcing a software misconfiguration.
A config file turned email into a distance-limited protocol. That's beautiful and horrifying at the same time.
The Fix
# sendmail.cf (fixed)
O Timeout.connect=5m
One value changed from 0 to 5m (five minutes). Emails started reaching every server on the planet again.
Time to fix once identified: about 30 seconds. Time to identify: several hours of systematic debugging, geographic mapping, and an inspired leap connecting network latency to the speed of light.
Why This Story Sticks
This happened over twenty years ago on a mail server running SunOS. The specific technology is ancient. The debugging lessons are permanent.
Take "impossible" reports literally. "Email can't go past 500 miles" sounds like something you'd close the ticket on. It was precisely accurate. Users describe symptoms, not causes. The symptom was correct. Dismissing it would've meant the bug stayed indefinitely.
Ask "what changed?" before anything else. The server upgrade was the root cause. If Trey had started by analyzing Sendmail's routing logic or debugging DNS, he could've burned weeks going nowhere. "What changed recently?" cuts through noise faster than any other question in debugging.
Audit defaults after every upgrade. The new Sendmail installation came with a timeout of zero. Nobody checked. This still happens constantly with modern tools. A Kubernetes Helm chart with default resource limits that are wrong for your workload. A database connection pool with a default max size of 10. A CDN with a default cache TTL that caches your API responses. After every upgrade, compare the new defaults against the old ones.
Every "impossible" bug is rational. Software doesn't violate physics. When something seems impossible, the gap is in your mental model, not in reality. The emails didn't actually have a distance limit. There was a timeout that correlated with distance because of the speed of light. The bug was perfectly rational. Finding the explanation required crossing a mental boundary between "software configuration" and "physics" that most people wouldn't think to cross.
That jump is what makes Trey Harris's debugging legendary. Not the punchline. The process.
Original source: Trey Harris first told this story on a mailing list around 2002. The most commonly referenced version is archived at ibiblio.org/harris/500milemail.html.
Top comments (0)