The Problem
Modern backend systems often integrate OCR, machine learning inference, or heavy data processing jobs that cannot complete within a typical HTTP request lifecycle. When a user sends a request that triggers a long-running operation, keeping the HTTP connection open until processing completes is usually a poor design choice. Long-running synchronous requests can still increase the risk of timeouts, tie up resources, and make failure handling more difficult, even when a platform supports longer request durations.
Although longer synchronous timeouts are possible in some environments, asynchronous APIs are still valuable as an architectural choice for resilience and better user experience. For example, AWS API Gateway increased its integration timeout limit beyond 29 seconds in June 2024 for Regional and private REST APIs, though with trade-offs such as possible reductions in account-level throttle quota.
The Core Idea
A practical way to handle this problem is to accept the request, create a job record, and return immediately, while a background worker processes the job asynchronously. The client can then periodically check the job status until the work is completed or fails.
HTTP Contract
A common HTTP approach for long-running work is to return 202 Accepted as soon as the request is accepted, then let the client check a separate status resource for updates. This matters because HTTP cannot later push the final result back into that same original response.
How Clients Receive Updates
In this article, the client checks job progress by polling a status endpoint. Polling is the simplest option, but it is not the only one: systems can also deliver updates using Server-Sent Events (SSE), WebSockets, or Webhooks, depending on the use case.
A Simple Endpoint Shape
A common way to expose this pattern is to separate job creation from job tracking.
-
POST /jobs→ accepts the request and returns with ajobIdor monitor URL -
GET /jobs/{id}→ returns the current job state, such asstatus,progress,result, orerror
This style is commonly used in APIs for long-running jobs, where the initial request stays fast and the client follows a separate status resource until the job reaches a terminal state. See also MDN’s Prefer header for respond-async, and this practical guide on REST API design for long-running tasks.
The Status Lifecycle
Once the request has been accepted and the client has a way to track it, the job lifecycle can be described with a small set of states:
- PENDING: When the backend accepts the request and creates a job record.
- PROCESSING: When a separate worker starts processing the long-running job.
- DONE: When the long-running job completes successfully.
- ERROR: If the job fails during any stage, the status response should ideally include structured error details such as an error code, a human-readable message, whether the operation is retryable, and the step that failed if that information is known.
Formats such as Problem Details for HTTP APIs (RFC 9457) are useful for standardizing machine-readable error responses, while the initial 202 Accepted response still only means the work was accepted, not completed.
A similar idea appears in production APIs as well. For example, Stripe’s error handling documentation shows how structured error objects can include fields such as code, message, param, type, and links to relevant documentation, making debugging and client-side handling easier.
Example Error Payload
{
"jobId": "12345",
"status": "ERROR",
"error": {
"code": "OCR_TIMEOUT",
"message": "Document processing exceeded the allowed time limit.",
"retryable": true,
"failedStep": "text-extraction"
}
}
A job normally moves from PENDING to PROCESSING to DONE, but failures can occur during either PENDING or PROCESSING, transitioning the job into ERROR. Optional states may include RETRYING, CANCELLED, and PARTIAL_SUCCESS.
For better user experience, the status resource can optionally expose a progress field, such as 0–100, and the client can poll the status endpoint to show updates over time. However, this value is not always exact. In practice, progress may be estimate-based, stage-based, or omitted entirely when the backend cannot measure it reliably. See Google Cloud’s guide to long-running operations.
Implementation Approaches
To make this pattern more concrete, let's go through a practical example using Python, AWS, and SAM. You can access the code example here. Follow the instructions provided in the README, and don't forget to shut down any AWS service you launch during the experiment.
Why This Pattern Works Well
Responsive UX: The API can return quickly while the long-running work continues in the background, so the user is not left waiting with no visibility into what is happening. The request is accepted first and completed later through a separate status endpoint.
Retry Capability: Because the job state is stored separately from the original request, the system can apply timeouts, retries, and backoff more safely when transient failures occur. In practice, retries should be paired with idempotency and strategies such as timeouts, retries, and backoff with jitter.
Fault Isolation: The workflow can be split into smaller stages and handled by separate workers, which makes it easier to narrow failures down to a specific step instead of treating the whole process as one opaque unit. This kind of decoupling also prevents one slow or failing stage from directly blocking the initial request-response path.
Observability: When each stage is separated, it becomes easier to attach logs, metrics, and traces to each part of the workflow and understand where time is spent or where failures occur. Tools and standards such as OpenTelemetry’s observability primer help connect those signals into a clearer end-to-end view.
Scalability: Background workers can often be scaled independently from the API layer, which is useful when the long-running job needs more compute, memory, or concurrency than the initial request handler. For example, AWS documents that Lambda functions scale independently with concurrency limits and scaling behavior, which makes this split especially useful in serverless designs.
Real Challenges
Duplicate Execution: In asynchronous systems, retries and queue semantics can cause the same job to be delivered or processed more than once. For that reason, background workers and mutating endpoints should be designed to be idempotent, so repeating the same operation does not produce unintended side effects. With Amazon SQS standard queues, duplicate delivery is expected as part of at-least-once delivery behavior.
Stuck jobs: A job can remain in
PENDINGorPROCESSINGlonger than expected if a worker crashes, loses connectivity, or never updates its final state. In production, this usually needs timeouts, heartbeats, lease expiry, or a reconciliation process that detects and recovers stalled work. See AWS Batch alerts for stuck jobs.Race conditions: When multiple workers, retries, or client actions try to update the same job at nearly the same time, the system can end up with lost updates or invalid state transitions. This is usually handled with conditional writes, optimistic locking, or version checks so only valid state changes are accepted. See DynamoDB optimistic locking.
Retry storms: If many clients or workers retry immediately after a failure, they can create a second wave of load that makes recovery even harder. Exponential backoff with jitter is a standard way to spread retries out and avoid synchronized spikes.
Visibility gaps: Once work moves into queues, workers, and downstream services, it becomes harder to understand where a job is failing or slowing down. Propagating correlation IDs and tracing context across components helps connect logs, traces, and metrics into one end-to-end view.
When NOT to Use This Pattern
- Simple, fast operations: If the job finishes quickly, this pattern can add unnecessary complexity.
- Strong consistency requirements: If the caller must know the final committed outcome immediately, asynchronous processing may be the wrong fit.
- Transactional workflows: If several steps must succeed or fail together, a simple job-status pattern may not be sufficient.
Lessons From Real Usage
- Async APIs improve UX, but production systems still need timeouts, retries, and cleanup rules.
- A plain
ERRORstatus is usually not enough; clients need structured error details and retry guidance. - A
DONEstatus is often not enough on its own; clients may also need result metadata, timestamps, or follow-up links. - Progress values are useful, but they are often estimates rather than exact measurements.
- Idempotency matters once retries and duplicate delivery become possible.
- Polling is a good entry point, but not the only update model.
This pattern is a simple but powerful way to design APIs around long-running work. It improves responsiveness and separates request handling from background execution, but it also introduces operational concerns such as retries, idempotency, stuck jobs, and observability. In a follow-up article, I’ll go deeper into implementation details, production hardening, and more concrete code examples.





Top comments (0)