The Problem
During the modernization of a legacy enterprise application, one feature initially appeared deceptively simple:
Export data to CSV.
The legacy application was built using Java Struts and Hibernate and supported exporting datasets containing more than one million records.
As part of our modernization effort, I was migrating functionality to a new stack:
- React frontend
- Node.js backend
- Oracle database
One key requirement remained unchanged:
The export had to be synchronous.
Users should click Export and immediately receive a downloadable file.
No background jobs.
No queues.
No polling.
No email notifications.
Just a direct download experience.
Why This Is Hard
The first instinct many developers have is:
- Query the database
- Load all rows into memory
- Convert to CSV
- Return the file
For a million records, this approach quickly breaks down.
Problems include:
- High memory consumption
- Event loop blocking
- Slow response times
- Application crashes
- Reverse proxy timeouts
The challenge wasn't generating a CSV.
The challenge was generating a CSV while keeping memory usage nearly constant.
Understanding What the Legacy Java Application Was Doing
Although the implementation details were hidden behind Hibernate, the application was effectively relying on a streaming model:
Database Cursor
→ Fetch Records in Chunks
→ Write to Output Stream
→ Send to Browser
The goal was not to reinvent the feature.
The goal was to reproduce the same streaming philosophy in Node.js.
The Architecture I Chose
Instead of loading all records into memory,I designed a fully streamed pipeline.
Oracle ResultSet Cursor
→ Chunk Fetching
→ CSV Stream
→ GZIP Compression
→ HTTP Response Stream
→ Browser Download
Each component processes data incrementally.
At no point does the application hold the entire dataset in memory.
Key Optimization #1: Bypass the ORM
Our application uses TypeORM for standard CRUD operations.
For exports, however, ORM abstractions become expensive.
Entity hydration, decorators, and object transformation introduce unnecessary overhead.
For this feature I bypassed TypeORM entirely and used the native Oracle driver.
This allowed direct access to Oracle ResultSets and cursor-based fetching.
Key Optimization #2: Cursor-Based Fetching
Rather than executing a query and retrieving every row at once, I used Oracle ResultSets with configurable fetch sizes.
The application fetches rows in batches and immediately pushes them downstream for processing.
This keeps memory consumption stable regardless of whether the export contains ten thousand rows or one million rows.
Key Optimization #3: Stream CSV Generation
Instead of creating a giant array and converting it to CSV afterward, records are converted as they are read from Oracle.
This creates a continuous flow:
Database → CSV → Response
No intermediate buffering.
No giant objects.
No memory spikes.
Key Optimization #4: GZIP Compression
One surprising win came from compression.
CSV files compress extremely well.
A large export that might otherwise require hundreds of megabytes can often be reduced dramatically before transmission.
Benefits include:
- Reduced network transfer time
- Faster downloads
- Lower bandwidth consumption
The compression step became an important part of achieving our performance target.
Key Optimization #5: Dedicated Export Connection Pool
Exports have very different behavior compared to normal API traffic.
Long-running queries and large result sets can monopolize database connections.
To avoid impacting application traffic, I introduced a dedicated Oracle connection pool specifically for export workloads.
This isolated export traffic from normal user requests and improved system stability.
Results
After implementing:
- Native Oracle driver
- Dedicated export connection pool
- Cursor-based fetching
- Streaming CSV generation
- GZIP compression
I successfully handled exports containing over one million records while maintaining a stable memory footprint and meeting our performance expectations.
Lessons Learned
The biggest lesson was that exporting large datasets is primarily a streaming problem.
Most performance issues come from unnecessary buffering and object creation.
When working with large exports:
- Stream everything
- Avoid ORM abstractions where appropriate
- Fetch data in chunks
- Compress responses
- Measure memory usage continuously
Modern Node.js is fully capable of handling enterprise-scale exports when designed around streaming principles.
Final Thoughts
Migration projects often focus on feature parity.
In this case, understanding the architectural philosophy behind the legacy Java implementation was more important than translating code line-by-line.
The result was a Node.js solution that preserved the user experience while leveraging modern streaming capabilities to efficiently handle millions of records.
Top comments (0)