DEV Community

Cover image for What I learned building bulk search for large datasets in React + Spring Boot
Soumya Ranjan Nanda
Soumya Ranjan Nanda

Posted on

What I learned building bulk search for large datasets in React + Spring Boot

Bulk search sounds easy until real users start pasting spreadsheet data, uploading messy CSVs, and expecting clear results for thousands of records.

I recently built a bulk search workflow in React + Spring Boot, and this article is a practical breakdown of what actually mattered: normalization, validation, chunking, frontend performance, and partial-failure reporting.

Read the full article here: https://medium.com/p/ea69f155054a

Top comments (7)

Collapse
 
buildbasekit profile image
buildbasekit

Bulk search gets messy fast, especially with validation and partial failures.

Curious how you handled chunking on the backend side, did you batch at request level or async queue?

Collapse
 
soumya_ranjannanda_168b9 profile image
Soumya Ranjan Nanda

Thanks. I kept it request-level in this implementation.

I normalized and validated first, then processed the input in smaller controlled chunks and merged the results back into one response. I avoided an async queue for now to keep the user flow simpler, but I’d definitely consider that next if scale or latency increased.

Collapse
 
buildbasekit profile image
buildbasekit

Makes sense, keeping it request-level early is a good tradeoff.

Chunking + merging results is usually enough until scale actually forces async.

Did you face any issues with response size or timeout when handling larger datasets?

Thread Thread
 
soumya_ranjannanda_168b9 profile image
Soumya Ranjan Nanda

Yes — but more on the timeout / connection stability side than raw response size.

I stored the generated CSV results in S3, so I wasn’t trying to return huge datasets in one synchronous response. The bigger issue at larger volumes was downstream API pressure: once concurrency got too aggressive, I started seeing timeout-like failures / premature connection closes from upstream systems.

What helped was tuning parallelism down, keeping chunk sizes controlled, adding retry with backoff, and paginating report viewing separately instead of loading everything at once.

So in practice, response size stayed manageable because of the architecture — the harder problem was stability under scale.

Thread Thread
 
buildbasekit profile image
buildbasekit

This is the part most people underestimate.

It’s rarely the dataset size that breaks things
it’s downstream pressure + uncontrolled parallelism

Seen the same pattern:

works fine at 1k records
starts flaking at 10k
completely unpredictable after that

Good call on S3 + pagination
that’s usually the turning point from “feature” → “system”

If you revisit this later, async queue + rate limiting per downstream API might save you a lot of pain

Thread Thread
 
soumya_ranjannanda_168b9 profile image
Soumya Ranjan Nanda

Yes, exactly.

The feature was “working” much earlier, but it only started feeling production-safe once I treated downstream pressure as the real constraint instead of just focusing on record count.

And yes, async queueing + per-downstream rate limiting would probably be the next clean step once the workload grows beyond predictable request windows.

Thread Thread
 
buildbasekit profile image
buildbasekit

Yeah that shift in thinking is the real unlock.

Once you treat downstream systems as the constraint, the whole design changes from “process fast” to “process safely”.

Seen this go wrong when people only add retries without controlling rate, it just amplifies the problem.

At that point queue + backpressure isn’t even an optimization, it becomes protection.

Sounds like you hit that boundary at the right time instead of after things started breaking in production