When building gomarklint, a Go-based Markdown linter, I faced a challenge: checking 100,000+ lines of documentation for broken links. Parallelizing this with Goroutines seemed like a "no-brainer," but it immediately led to Flaky Tests in CI environments.
Speed is easy in Go; stability is the real challenge. Here are the three patterns I implemented to achieve both.
The Cache: Preventing 'Barrages'
In a large docset, the same URL appears dozens of times. Naive concurrency sends a request for every single occurrence, which looks like a DoS attack to the host.
Using sync.Map, I implemented a simple URL cache to ensure each unique URL is only checked once.
var urlCache sync.Map
// Check if we've seen this URL before
if val, ok := urlCache.Load(url); ok {
return val.(*checkResult), nil
}
The Semaphore: Respecting Resource Limits
Even with a cache, checking 1,000 unique URLs simultaneously can exhaust local file descriptors or trigger rate limits.
I used a buffered channel as a semaphore to cap the number of active Goroutines.
maxConcurrency := 10
sem := make(chan struct{}, maxConcurrency)
for _, url := range urls {
sem <- struct{}{} // Acquire token
go func(u string) {
defer func() { <-sem }() // Release token
checkURL(u)
}(url)
}
The Retry: Tolerating Network 'Whims'
Networks are inherently unreliable. A momentary blip shouldn't fail your entire CI build. I implemented Exponential Backoff to distinguish between permanent failures (404) and transient ones (5xx, timeouts).
// Only retry if it's a server error or network timeout
if err != nil || status >= 500 {
time.Sleep(retryDelay * time.Duration(attempt))
// retry...
}
The "Negative Caching" Trap
The most elusive bug was caching only the status code. If a request failed with a timeout, I stored status: 0. Subsequent checks retrieved 0 but didn't know an error had occurred, leading to inconsistent logic.
The Fix: Cache the entire result, including the error.
type checkResult struct {
status int
err error
}
// Store the pointer to this struct in your cache
Conclusion: Is it 100% Stable?
Not quite. Even with these, "Cache Stampedes" (multiple Goroutines hitting the same uncached URL at the exact same millisecond) remain a concern.
I'm currently exploring golang.org/x/sync/singleflight to solve this. If you have experience tuning http.Client for massive parallel checks, I'd love to hear your thoughts in the comments or on GitHub!
Top comments (0)