Bulk search sounds easy until real users start pasting spreadsheet data, uploading messy CSVs, and expecting clear results for thousands of records.
I recently built a bulk search workflow in React + Spring Boot, and this article is a practical breakdown of what actually mattered: normalization, validation, chunking, frontend performance, and partial-failure reporting.
Read the full article here: https://medium.com/p/ea69f155054a
Top comments (7)
Bulk search gets messy fast, especially with validation and partial failures.
Curious how you handled chunking on the backend side, did you batch at request level or async queue?
Thanks. I kept it request-level in this implementation.
I normalized and validated first, then processed the input in smaller controlled chunks and merged the results back into one response. I avoided an async queue for now to keep the user flow simpler, but I’d definitely consider that next if scale or latency increased.
Makes sense, keeping it request-level early is a good tradeoff.
Chunking + merging results is usually enough until scale actually forces async.
Did you face any issues with response size or timeout when handling larger datasets?
Yes — but more on the timeout / connection stability side than raw response size.
I stored the generated CSV results in S3, so I wasn’t trying to return huge datasets in one synchronous response. The bigger issue at larger volumes was downstream API pressure: once concurrency got too aggressive, I started seeing timeout-like failures / premature connection closes from upstream systems.
What helped was tuning parallelism down, keeping chunk sizes controlled, adding retry with backoff, and paginating report viewing separately instead of loading everything at once.
So in practice, response size stayed manageable because of the architecture — the harder problem was stability under scale.
This is the part most people underestimate.
It’s rarely the dataset size that breaks things
it’s downstream pressure + uncontrolled parallelism
Seen the same pattern:
works fine at 1k records
starts flaking at 10k
completely unpredictable after that
Good call on S3 + pagination
that’s usually the turning point from “feature” → “system”
If you revisit this later, async queue + rate limiting per downstream API might save you a lot of pain
Yes, exactly.
The feature was “working” much earlier, but it only started feeling production-safe once I treated downstream pressure as the real constraint instead of just focusing on record count.
And yes, async queueing + per-downstream rate limiting would probably be the next clean step once the workload grows beyond predictable request windows.
Yeah that shift in thinking is the real unlock.
Once you treat downstream systems as the constraint, the whole design changes from “process fast” to “process safely”.
Seen this go wrong when people only add retries without controlling rate, it just amplifies the problem.
At that point queue + backpressure isn’t even an optimization, it becomes protection.
Sounds like you hit that boundary at the right time instead of after things started breaking in production