Prashanth Mangipudi

Posted on Jun 1

How I Exported 1 Million+ Oracle Records to CSV in Under 30 Seconds Using Node.js

#oracle #sql #node

The Problem

During the modernization of a legacy enterprise application, one feature initially appeared deceptively simple:

Export data to CSV.

The legacy application was built using Java Struts and Hibernate and supported exporting datasets containing more than one million records.

As part of our modernization effort, I was migrating functionality to a new stack:

React frontend
Node.js backend
Oracle database

One key requirement remained unchanged:

The export had to be synchronous.

Users should click Export and immediately receive a downloadable file.

No background jobs.
No queues.
No polling.
No email notifications.

Just a direct download experience.

Why This Is Hard

The first instinct many developers have is:

Query the database
Load all rows into memory
Convert to CSV
Return the file

For a million records, this approach quickly breaks down.

Problems include:

High memory consumption
Event loop blocking
Slow response times
Application crashes
Reverse proxy timeouts

The challenge wasn't generating a CSV.

The challenge was generating a CSV while keeping memory usage nearly constant.

Understanding What the Legacy Java Application Was Doing

Although the implementation details were hidden behind Hibernate, the application was effectively relying on a streaming model:

Database Cursor
→ Fetch Records in Chunks
→ Write to Output Stream
→ Send to Browser

The goal was not to reinvent the feature.

The goal was to reproduce the same streaming philosophy in Node.js.

The Architecture I Chose

Instead of loading all records into memory,I designed a fully streamed pipeline.

Oracle ResultSet Cursor
→ Chunk Fetching
→ CSV Stream
→ GZIP Compression
→ HTTP Response Stream
→ Browser Download

Each component processes data incrementally.

At no point does the application hold the entire dataset in memory.

Key Optimization #1: Bypass the ORM

Our application uses TypeORM for standard CRUD operations.

For exports, however, ORM abstractions become expensive.

Entity hydration, decorators, and object transformation introduce unnecessary overhead.

For this feature I bypassed TypeORM entirely and used the native Oracle driver.

This allowed direct access to Oracle ResultSets and cursor-based fetching.

Key Optimization #2: Cursor-Based Fetching

Rather than executing a query and retrieving every row at once, I used Oracle ResultSets with configurable fetch sizes.

The application fetches rows in batches and immediately pushes them downstream for processing.

This keeps memory consumption stable regardless of whether the export contains ten thousand rows or one million rows.

Key Optimization #3: Stream CSV Generation

Instead of creating a giant array and converting it to CSV afterward, records are converted as they are read from Oracle.

This creates a continuous flow:

Database → CSV → Response

No intermediate buffering.

No giant objects.

No memory spikes.

Key Optimization #4: GZIP Compression

One surprising win came from compression.

CSV files compress extremely well.

A large export that might otherwise require hundreds of megabytes can often be reduced dramatically before transmission.

Benefits include:

Reduced network transfer time
Faster downloads
Lower bandwidth consumption

The compression step became an important part of achieving our performance target.

Key Optimization #5: Dedicated Export Connection Pool

Exports have very different behavior compared to normal API traffic.

Long-running queries and large result sets can monopolize database connections.

To avoid impacting application traffic, I introduced a dedicated Oracle connection pool specifically for export workloads.

This isolated export traffic from normal user requests and improved system stability.

Results

After implementing:

Native Oracle driver
Dedicated export connection pool
Cursor-based fetching
Streaming CSV generation
GZIP compression

I successfully handled exports containing over one million records while maintaining a stable memory footprint and meeting our performance expectations.

Lessons Learned

The biggest lesson was that exporting large datasets is primarily a streaming problem.

Most performance issues come from unnecessary buffering and object creation.

When working with large exports:

Stream everything
Avoid ORM abstractions where appropriate
Fetch data in chunks
Compress responses
Measure memory usage continuously

Modern Node.js is fully capable of handling enterprise-scale exports when designed around streaming principles.

Final Thoughts

Migration projects often focus on feature parity.

In this case, understanding the architectural philosophy behind the legacy Java implementation was more important than translating code line-by-line.

The result was a Node.js solution that preserved the user experience while leveraging modern streaming capabilities to efficiently handle millions of records.

DEV Community