Making Parallel HTTP Requests Stable in Go: Lessons from Building a Markdown Linter

#architecture #go #networking #performance

When building gomarklint, a Go-based Markdown linter, I faced a challenge: checking 100,000+ lines of documentation for broken links. Parallelizing this with Goroutines seemed like a "no-brainer," but it immediately led to Flaky Tests in CI environments.

Speed is easy in Go; stability is the real challenge. Here are the three patterns I implemented to achieve both.

The Cache: Preventing 'Barrages'

In a large docset, the same URL appears dozens of times. Naive concurrency sends a request for every single occurrence, which looks like a DoS attack to the host.

Using sync.Map, I implemented a simple URL cache to ensure each unique URL is only checked once.

var urlCache sync.Map

// Check if we've seen this URL before
if val, ok := urlCache.Load(url); ok {
    return val.(*checkResult), nil
}

The Semaphore: Respecting Resource Limits

Even with a cache, checking 1,000 unique URLs simultaneously can exhaust local file descriptors or trigger rate limits.
I used a buffered channel as a semaphore to cap the number of active Goroutines.

maxConcurrency := 10
sem := make(chan struct{}, maxConcurrency)

for _, url := range urls {
    sem <- struct{}{} // Acquire token
    go func(u string) {
        defer func() { <-sem }() // Release token
        checkURL(u)
    }(url)
}

The Retry: Tolerating Network 'Whims'

Networks are inherently unreliable. A momentary blip shouldn't fail your entire CI build. I implemented Exponential Backoff to distinguish between permanent failures (404) and transient ones (5xx, timeouts).

// Only retry if it's a server error or network timeout
if err != nil || status >= 500 {
    time.Sleep(retryDelay * time.Duration(attempt))
    // retry...
}

The "Negative Caching" Trap

The most elusive bug was caching only the status code. If a request failed with a timeout, I stored status: 0. Subsequent checks retrieved 0 but didn't know an error had occurred, leading to inconsistent logic.

The Fix: Cache the entire result, including the error.

type checkResult struct {
    status int
    err    error
}
// Store the pointer to this struct in your cache

Conclusion: Is it 100% Stable?

Not quite. Even with these, "Cache Stampedes" (multiple Goroutines hitting the same uncached URL at the exact same millisecond) remain a concern.

I'm currently exploring golang.org/x/sync/singleflight to solve this. If you have experience tuning http.Client for massive parallel checks, I'd love to hear your thoughts in the comments or on GitHub!