137Foundry

Posted on Jun 4

Why Your File Upload Endpoint Times Out at 4GB

#webdev #javascript #backend #productivity

You shipped a file upload endpoint. The unit tests pass, the demo went well, the early adopters seem happy. Then a customer support ticket arrives: "trying to upload a 4GB video file, it fails after 5 minutes." You look at the logs. The request never reached your application server. Or it reached the application server, ran for a while, and then died with a cryptic timeout error.

This is one of the most common production bugs in web applications, and it is almost never where developers initially look. The fix is almost always architectural rather than tuning.

Photo by Brett Sayles on Pexels

The Timeout Layers You Did Not Realize Existed

A request to your upload endpoint passes through more layers than you probably remember. Each layer has its own timeout configuration, and the request has to fit within the tightest one.

Starting from the client side, the browser has a connection timeout. Different browsers handle this differently, but most close requests after some idle period if no progress is made. For active uploads, browsers usually do not enforce a strict total duration, but they do close stalled connections.

Next is whatever NAT or firewall sits between the user and the internet. Home routers, corporate firewalls, and ISP-level connection-tracking devices all keep state for active connections and may close them after a TCP idle timeout (often 60 seconds to several minutes).

The CDN is the next hop. Cloudflare, Fastly, and most other CDNs have a default request timeout in the 60 to 300 second range. Some plans allow longer, some do not. The timeout applies to the entire request duration, not just the time spent on the CDN's servers.

The load balancer is next. AWS ALB, GCP Load Balancer, and most managed load balancers have idle timeouts in the 60 to 300 second range. Some allow extending; some require enterprise contracts to go past their default.

The reverse proxy in front of your application is next, if you have one. Nginx, HAProxy, Envoy, and Traefik all have configurable timeouts that default to something between 30 and 120 seconds.

Finally, your application server has its own timeout. Node.js, Python web frameworks, Java application servers, and Go HTTP servers all have configurable read and write timeouts that default to a few minutes typically.

Any one of these can kill an upload request mid-stream. The error symptoms differ by which layer killed it, but the result is the same: the upload fails, the user is angry, and the logs do not always make clear what happened.

Why Raising Timeouts Does Not Work

The first instinct when an upload times out is to raise the timeout values. This works occasionally. It does not work as a general solution because the upper bound on each timeout is constrained by something else.

CDNs and managed load balancers usually cap their maximum timeout regardless of what you configure. Even if your application server can handle a 30-minute request, the load balancer in front of it cannot.

Long-running requests consume connection pool slots. A web server that allows 1000 concurrent connections cannot sustain 1000 concurrent 30-minute uploads without running out of file descriptors or memory.

Long-running requests are also fragile. The longer the request, the higher the cumulative probability that something between the client and the server will drop the connection. A 30-minute upload has roughly 30 times the connection-drop risk of a 1-minute upload, all else being equal.

The architectural solution is not to make a single request long. It is to make many requests short. Chunked uploads split the file into many small requests, none of which run against the timeout. The Mozilla Developer Network covers the File API and the Fetch API patterns that support this on the client side.

The Memory Bound on Both Sides

A second problem with single-request uploads of large files is memory. The client holds the file in memory while uploading. The server holds the request body in memory (or streams it to disk) while receiving. Both sides have constraints.

Browsers typically handle files up to a few hundred megabytes without significant slowdown. Past a gigabyte, performance degrades. On lower-end devices, the browser tab can become unresponsive or crash.

Servers can usually stream large request bodies to disk without holding them in memory, but this depends on the framework. Some web frameworks default to buffering the entire request body in memory before invoking the application handler. For Node.js with Express, this is the case unless you explicitly use a streaming middleware. For Python with Flask, the default behavior depends on the WSGI server configuration.

Even with streaming, the disk space matters. A server that accepts uploads needs enough disk for the concurrent in-flight uploads plus the assembled files. A single 4GB upload is not a problem; 100 concurrent 4GB uploads is suddenly 400GB of in-flight data.

The Chunked Pattern Removes the Bound

Chunked uploads remove the upper bound on file size because no single request handles the whole file. Each chunk is 5MB to 10MB, which fits comfortably in:

The browser's memory budget (only one chunk in flight at a time).
The proxy's request body limit (well under any default).
The proxy's request timeout (a 5MB chunk uploads in seconds on typical connections).
The server's memory budget (one chunk per concurrent upload).

The total upload runs as long as it needs to, but no individual request runs against any timeout. The probability of any single chunk failing on a noisy connection is low because the chunk is small and the request is short. Even if a chunk fails, only that chunk has to be retried, not the entire upload.

The Storage Backend Matters

Where the chunks actually go during the upload determines a lot of the architectural complexity. Two patterns are common in production.

The first pattern is direct-to-storage. The client uploads chunks directly to an object storage service like Amazon S3 using pre-signed URLs. The application server creates the upload session and grants permission but never sees the file bytes. This pattern scales well because the application servers do not handle the upload bandwidth.

The second pattern is application-server-mediated. The client uploads chunks to the application server, which stores them in temporary storage and then assembles the final file. This pattern allows more control over the upload (per-chunk validation, custom checksums, content scanning during upload) but consumes more bandwidth and compute on the application servers.

For high-volume applications, the direct-to-storage pattern is almost always the right choice. The application server's bandwidth becomes a bottleneck quickly otherwise. For applications that need to validate or transform content during upload, the application-server-mediated pattern is worth the extra cost.

The Resume Story

The other reason chunked uploads matter is resume. A user who has uploaded 80 percent of a 4GB file does not want to restart from zero when the network drops. With chunked uploads, the server tracks which chunks have been received. When the upload resumes, only the missing chunks need to be re-sent.

The Tus protocol is an open standard for resumable upload that handles the resume protocol cleanly. Several implementations exist for major languages. Using Tus saves you from designing the resume wire format yourself and provides a battle-tested client library.

The user-facing experience of a resumable upload is what makes the difference. Instead of "upload failed, please try again from the beginning," the user sees "connection lost, will resume when network is back." When the network returns, the upload continues automatically. The same physical file size (4GB) becomes much less stressful for users on flaky networks.

Photo by iam hogir on Pexels

What to Look For in Logs When Diagnosing

If you have an upload endpoint that is failing for large files, the diagnostic process is mostly about figuring out which layer is killing the request.

Check the CDN logs first. CDNs usually log request duration and termination reason. If the CDN shows the request was closed due to timeout, the timeout limit at the CDN layer is the culprit.

If the CDN logs show success but the application server logs show failure, the load balancer or reverse proxy is suspect. Check their timeout configuration.

If the application server logs show the request starting and then aborting, check the application server's own read timeout. Many frameworks have a separate read timeout for the request body that is shorter than the overall request timeout.

If everything looks fine on the server side but the client reports failure, look at the browser's network panel for the actual error. Common errors include net::ERR_NETWORK_CHANGED (the user's connection switched) and net::ERR_CONNECTION_RESET (something in the middle dropped the connection).

When to Stop Patching and Migrate

A single-request upload endpoint that handles files past 100MB will eventually need to be rebuilt as chunked. The migration is a meaningful project but not a heroic one.

The 137Foundry team has done this migration on several SaaS products. The pattern is: ship the chunked endpoint alongside the existing one, route large files to the chunked endpoint, monitor for a few weeks, deprecate the single-request endpoint once the chunked one is handling all real traffic.

The full architectural framing, including the storage layer choices and the resume protocol, is covered in the longer guide on resumable file uploads. The 137Foundry services overview covers the broader engineering work around application infrastructure.

A Practical Migration Sequence

If you have a current upload endpoint that is timing out and you need to fix it, the practical sequence is:

Confirm the failure mode by collecting failure rates by file size. A clear breakpoint above some file size confirms the timeout hypothesis.
Decide on the chunked protocol. For most teams, the Tus protocol or S3 multipart upload are the right choices.
Implement the chunked endpoint alongside the existing one. Test with files of progressively larger size and progressively more aggressive network failure simulation.
Update the client to use the chunked endpoint for files above some threshold (start with 50MB; lower it as you gain confidence).
Monitor failure rates. The chunked endpoint should show flat failure rates regardless of file size.
Deprecate the single-request endpoint once the chunked endpoint is stable in production.

The whole migration typically takes 2 to 6 weeks for a team with the relevant skills, depending on how much application logic surrounds the upload flow. The result is an upload system that does not produce mysterious timeout errors and that scales with the user base.

DEV Community