DEV Community

Cover image for SingleFlight: Smart Request Deduplication
Serif COLAKEL
Serif COLAKEL

Posted on

SingleFlight: Smart Request Deduplication

When many clients ask for the same expensive data at once (web-scrape, ML inference, DB aggregation), singleflight lets one worker do the work and shares the result with everyone else. This saves CPU/memory, prevents thundering-herd disasters, and often reduces cost by 90%+ for duplicated work.


The Problem: Duplicate Expensive Work

You have an API endpoint that performs an expensive operation — for example:

  • Spinning up a headless browser to scrape a dynamic webpage
  • Running a heavy database aggregation
  • Generating an ML prediction or report

Each call takes 3–5 seconds and consumes significant CPU and memory.

What happens if 100 users request the same data simultaneously?

Without optimization: 100 expensive operations, 100x CPU/memory, possible server crash

With singleflight: 1 operation, result shared with 99 others instantly

One day, a sudden traffic spike hit — 100 clients requested the same fund code at nearly the same time. Without mitigation we would have launched 100 Chrome instances and overloaded the server. With singleflight, we launched only one.

This article explains the pattern, shows code, and contains a benchmarking methodology you can run locally.

Expensive operations that are deterministic for the same input (scrapes, DB aggregations, ML predictions) are prime candidates for request coalescing. When many requests for identical keys arrive concurrently they often trigger the same expensive task, wasting CPU and memory and potentially crashing your service.


The solution: singleflight (request coalescing)

singleflight (from golang.org/x/sync) is a small concurrency primitive: give each expensive operation a key and call Do(key, fn). If a call for the same key is already in progress, additional callers wait and receive the original call’s result instead of re-executing fn.

Basic idea: one execution, many consumers.

Example (simplified)

import (
    "golang.org/x/sync/singleflight"
)

type Scraper struct {
    fundScrapingSF singleflight.Group
}

func (s *Scraper) ScrapeFund(ctx context.Context, code string) (*model.Fund, error) {
    result, err, shared := s.fundScrapingSF.Do(code, func() (interface{}, error) {
        return s.scrapeFundInternal(ctx, code)
    })

    if shared {
        s.logger.Info("Shared singleflight result for", code)
    }

    if err != nil {
        return nil, err
    }

    return result.(*model.Fund), nil
}
Enter fullscreen mode Exit fullscreen mode

shared is true for callers that received a result produced for another goroutine.


Timeline (what happens under the hood)

  • T+0ms: request A for ABC123 — becomes the in-flight execution
  • T+100ms: request B for ABC123 — joins, waits
  • T+3s–5s: scraping completes — result broadcast to A, B, C... all receive same payload

Result: one headless Chrome instance, single CPU/memory footprint, and all clients served.


Benchmarks & Impact (how to measure and what to expect)

Benchmark setup

  • Work: simulated scraping function that sleeps 3s and allocates moderate memory
  • Load: N concurrent clients hitting the same key
  • Metric: number of actual expensive runs vs number of requests; latency distribution; CPU/memory

Representative results (observed in similar experiments): when flooding a server with 100 concurrent requests for the same key, an implementation using singleflight reduced duplicate expensive runs by ~95% and turned many 3–5s waits into a single 3–5s run for all clients (others wait and return quickly on broadcast).

Note: real numbers vary by workload and machine. Always benchmark in your environment. See the “Run this benchmark” section below.


Gotchas and trade-offs

  • Single point stalls: if the in-flight call stalls or deadlocks, all waiting callers are affected. Use context deadlines and timeouts.
  • Stale data / consistency: the first result may lag transient updates; if strict read-after-write consistency is required, be careful.
  • Error blast radius: a failing call causes all waiting callers to see the same error. Consider retry/backoff strategies.
  • Key choice matters: choose a key that matches your deduplication intention (i.e., fund code + any relevant params)

Helpful APIs: use Group.Forget(key) to force a subsequent Do to re-execute.


Production Tips

  • Use separate singleflight.Group instances per operation type (per-endpoint or per-resource class).
  • Combine with caching: use singleflight to prevent duplicate cache rehydration.
  • Add observability: log whether a call was shared, export counters (shared_count, total_calls, in_flight), and trace spans for the in-flight operation.
  • Use pprof and runtime/trace to validate resource savings.

Run This Benchmark Locally (Suggested Methodology)

  1. Implement a fake scrape that sleeps 3s and increments a counter.
  2. Add an HTTP server with two endpoints: /sf (uses singleflight) and /raw (no deduplication).
  3. Use a load generator (wrk, vegeta, or ab) with concurrency 100 and ~100k requests.
  4. Measure: actual expensive runs (a counter), avg latency, p95/p99 latency, CPU and memory.

Checklist: measure both counts (how many times the expensive function actually ran) and resource use (CPU, memory). Instrument the expensive function to increment a Prometheus counter so that you can see how many times it executed.


Conclusion

singleflight is an elegant, low-friction tool to drastically cut duplicate work for identical requests. For scraping, image processing, ML inference, or heavy DB aggregations, it can change a service from fragile under load to resilient and cost-efficient.

If you maintain an endpoint that performs expensive deterministic work for the same inputs — give singleflight a try.

Happy coding! 🚀

Top comments (0)