Yasir Ansari

Posted on Feb 16

Railway URL Timeouts: Why a Healthy Server Can Still Be Unreachable

#discuss #devops #programming #webdev

My deployed backend on Railway kept timing out even though the server, logs, and health checks looked perfect. The culprit wasn't my code, my port configuration, or my deployment—it was my mobile hotspot's DNS resolver caching a stale IP address. This post explains what actually happened, why switching to Cloudflare DNS (1.1.1.1) fixed it instantly, and how DNS resolution can silently break modern cloud deployments.

The Situation

I had just deployed my backend to Railway at
https://0x*******-production.up.railway.app.

Everything worked perfectly locally:

curl localhost:8080 → ✅ OK
Server logs showed it running smoothly
Database connected successfully
Health check routes responded

But when I tried accessing the public url ERR_CONNECTION_TIMED_OUT
My immediate thought: The server must be crashing in production.

The Wild Goose Chase

Like any developer facing a production timeout, I went through the standard checklist:

✅ Verified port configuration (changed to 0.0.0.0)
✅ Checked SSL certificates
✅ Reviewed CORS settings
✅ Redeployed multiple times
✅ Checked firewall rules Nothing changed. The timeout persisted.

Then some random article I read suggested something seemingly unrelated:

"Try switching to Cloudflare DNS (1.1.1.1)"

I was skeptical, but I made the change. And instantly—the site opened. That single DNS change revealed the real problem: my application had been working perfectly the entire time.

Understanding DNS: The Internet's Phone Book

When you visit a URL like https://railway.app, your computer doesn't inherently know where that server lives. Here's what actually happens:

Your browser asks a DNS resolver: "What's the IP address for this domain?"
DNS responds with an IP: 0x*******-production.up.railway.app → 104.26.xx.xx
Your browser connects to that IP address
The server responds

The critical insight:
If DNS returns the wrong IP, your browser connects to the wrong machine. Your backend can be perfectly healthy and completely unreachable at the same time.

Why Modern Platforms Are Different

Traditional hosting uses fixed IP addresses. Deploy a server, get an IP, done.
Modern platforms like Railway, Vercel, and Cloudflare Pages work differently. They use Anycast CDN routing, which means:

The same domain resolves to different edge servers based on: Geographic location, Load balancing, Server availability.
IP addresses behind your domain change frequently
DNS records use extremely low TTL (Time To Live), often 60 seconds

This architecture enables global scale and resilience, but it requires DNS resolvers to respect TTL values and fetch fresh records constantly.

Good resolvers do this. Bad ones don't.

The Hidden Problem: My Mobile Hotspot's DNS

Here's what was actually happening on my network:
Laptop → Phone Hotspot → Mobile Carrier DNS → Internet

My laptop queried the hotspot's DNS server (172.20.10.1), which forwarded requests to my mobile carrier's resolver.

The carrier's DNS resolver cached an old Railway edge server IP address.

So every request from my browser went to a server that no longer hosted my application. The result? Connection timeout (not "connection refused").

This is critically deceptive:

A crashed server typically returns: connection refused
A wrong IP address returns: timeout

One suggests server failure. The other suggests... nothing specific.

Why Cloudflare DNS (1.1.1.1) Fixed It

When I switched to Cloudflare's public DNS resolver, my network path changed:

Laptop → Cloudflare DNS (1.1.1.1) → Correct Railway Edge → Backend

Cloudflare's resolver:

Respects low TTL values (refreshes DNS records every 60 seconds)
Returns the current, correct edge server location
Uses a globally distributed infrastructure for reliability

My backend had been working the entire time. I simply wasn't reaching it.

The Most Confusing Part

Here's what made this bug so difficult to diagnose:
Local testing worked perfectly:

curl localhost:8080 # ✅ 200 OK

Why? Because localhost doesn't use DNS at all. It goes directly to the loopback interface (127.0.0.1).

This created the worst possible debugging experience:

✅ No error logs
✅ Healthy server metrics
✅ Working local environment
❌ Completely unreachable production URL Everything looked healthy while production appeared completely dead.

How to Recognize a DNS Resolver Issue

You're likely facing a DNS problem if you notice:

✅ Deployed URL times out consistently
✅ localhost works perfectly
✅ Server logs show no errors
✅ Works on mobile data but not Wi-Fi (or vice versa)
✅ Works for colleagues but not you
✅ Suddenly starts working hours later with no code changes
✅ Switching DNS providers fixes it instantly ← smoking gun That last point is the definitive test.

The Permanent Solution

Instead of relying on your ISP, router, or hotspot DNS, use a reliable public resolver:

Primary DNS: 1.1.1.1
Secondary DNS: 1.0.0.1
(Cloudflare DNS)

Or alternatively:
Primary DNS: 8.8.8.8
Secondary DNS: 8.8.4.4
(Google DNS)

After changing your DNS settings:

Flush your DNS cache
Reconnect to your network
Test your deployment

Your deployments should now open immediately and consistently.

Why This Matters for Developers

If you frequently work with:

Serverless backends (Railway, Vercel, Render)
Preview deployment URLs
Custom domain configurations
Edge-deployed applications You're constantly creating fresh DNS records that need to propagate quickly.

Unreliable DNS resolvers will:

Cache incorrect IPs
Ignore low TTL values
Create inconsistent behavior across your team
Make you think your production system is unstable

The result is a dangerous false signal: you believe your application is broken when the problem is actually upstream networking.

The Broader Lesson

Modern web development has changed.
Debugging isn't just about code anymore.

Your application stack now spans multiple layers:
Code → Container → Platform → CDN → DNS → Resolver → Network

A failure in any of these layers can manifest as what appears to be an application failure.

This experience taught me one critical debugging rule:

If localhost works but production times out, suspect DNS before rewriting your backend.

Sometimes the server isn't down. You're just asking the wrong person for directions.

Final Thoughts

This "bug" cost me hours of debugging—rechecking ports, SSL certificates, firewall rules, and deployment configurations. The actual problem was completely invisible in my application logs.

The fix took 30 seconds: changing two DNS server addresses.

If you're deploying to modern cloud platforms and experiencing unexplained timeouts while your logs look perfect, check your DNS resolver first. It might save you from questioning your entire deployment strategy.

And if you're using a mobile hotspot for development?

Switch to `1.1.1.1` now. Your future self will thank you.

Have you encountered mysterious timeouts that turned out to be DNS issues? I'd love to hear your war stories in the comments.

Top comments (1)

Lars Faye | Confident Coding • May 4

Great writeup, interesting issue, and debugging process. Reducing the possible contributing variables is always my goto debugging step. If changing providers fixed it (or at least, broke it differently), then you have a HUGE lead to follow.

And yeah, a 30 second fix...that's how it usually works! When I have situations like this, I like to joke and say "This fix took me 6 days and 30 seconds". 😅