Abhiram Kandiraju

Posted on Mar 16

Streaming Large Financial Transaction Exports Without Breaking Your API

#java #architecture #api #distributedsystems

Large financial transaction exports can easily overwhelm traditional REST APIs.

When datasets reach hundreds of thousands or even millions of records, generating export files entirely in memory becomes inefficient and sometimes unstable.

Many financial platforms provide some version of an “Export transactions” feature for reconciliation, accounting, tax preparation, or compliance reporting.

At small scale, this is straightforward: query the transactions, generate a file, and return it to the client. Problems start appearing when those exports grow large.

This post explores a practical streaming pattern for handling those exports efficiently while keeping memory usage predictable.

The architectural approach discussed here is also described in more detail in my research paper:
Streaming REST APIs for Large Financial Transaction Exports from Relational Databases

Why Traditional Export APIs Struggle

A typical export endpoint might look something like this:

@GET
@Path("/transactions/export")
@Produces("text/csv")
public Response exportTransactions(@QueryParam("accountId") String accountId) {

    List<Transaction> transactions = transactionRepository.findAllTransactions(accountId);

    String csv = generateCsv(transactions);

    return Response.ok(csv)
            .header("Content-Disposition", "attachment; filename=transactions.csv")
            .build();
}

This pattern works well when datasets for the requested range are small. However, as the dataset grows, major problems begin to appear.

The API server needs to hold the entire dataset in memory while the export file is being generated.
The client must wait until the entire file has been generated before the download even begins.

For large exports, this can lead to:

Significant memory usage
Delayed response times
Poor scalability under concurrent requests

When exports reach hundreds of thousands or millions of records, these issues become noticeable very quickly.

A Simpler Way: Stream the Data

Instead of building the entire export file first, a more scalable approach is to stream the data from the database directly to the HTTP response.

The idea is simple:

Fetch transactions incrementally from the database
Process each record as it arrives
Immediately write it to the HTTP response

Conceptually the pipeline looks like this:

Database → API → Format Encoder → HTTP Response → Browser

Each transaction flows through this pipeline and is delivered to the client immediately.

The server never needs to hold the full dataset in memory.

Architecture Overview

At a high level, a streaming export pipeline avoids assembling the full export file in memory by moving records through a continuous response path.

Each record flows through the pipeline independently.

The database returns rows incrementally using a cursor.
The API processes one record at a time.
The encoder formats the record into the target export format.
The formatted data is immediately written to the HTTP response stream.

Because records are processed sequentially, the server never needs to hold the full dataset in memory.

Streaming the Database Query

The first step is ensuring the database driver retrieves rows incrementally instead of loading the entire result set.

Using JDBC, this can be achieved with a forward-only cursor and a fetch size.

try (
    Connection connection = dataSource.getConnection();
    PreparedStatement stmt = connection.prepareStatement(
        "SELECT date, description, amount FROM transactions WHERE account_id = ?",
        ResultSet.TYPE_FORWARD_ONLY,
        ResultSet.CONCUR_READ_ONLY
    )
) {
    stmt.setString(1, accountId);
    stmt.setFetchSize(1000);

    try (ResultSet rs = stmt.executeQuery()) {
        while (rs.next()) {
            Date date = rs.getDate("date");
            String description = rs.getString("description");
            BigDecimal amount = rs.getBigDecimal("amount");

            processRow(date, description, amount);
        }
    }
}

This allows the application to process transactions one row at a time, keeping memory usage predictable even for very large datasets.

Streaming the HTTP Response

Once rows are processed incrementally, they can be written directly to the HTTP response stream.

JAX-RS provides a convenient mechanism for this using StreamingOutput.

Here is a simplified example of a streaming export endpoint:

@GET
@Path("/transactions/export")
@Produces("text/csv")
public Response exportTransactions(
        @QueryParam("accountId") String accountId,
        @QueryParam("startDate") String startDate,
        @QueryParam("endDate") String endDate) {

    StreamingOutput stream = output -> {
        try (
            Connection conn = dataSource.getConnection();
            PreparedStatement stmt = conn.prepareStatement(
                "SELECT date, description, amount " +
                "FROM transactions " +
                "WHERE account_id = ? " +
                "AND date BETWEEN ? AND ? "+
                "ORDER BY date",
                ResultSet.TYPE_FORWARD_ONLY,
                ResultSet.CONCUR_READ_ONLY
            )
        ) {

            stmt.setString(1, accountId);
            stmt.setDate(2, java.sql.Date.valueOf(startDate));
            stmt.setDate(3, java.sql.Date.valueOf(endDate));
            stmt.setFetchSize(1000);

            try (
                ResultSet rs = stmt.executeQuery();
                PrintWriter writer = new PrintWriter(output)
            ) {
                while (rs.next()) {
                    writer.print(rs.getDate("date"));
                    writer.print(",");
                    writer.print(rs.getString("description"));
                    writer.print(",");
                    writer.println(rs.getBigDecimal("amount"));
                    writer.flush();
                }
            }
        }
    };

    return Response.ok(stream)
            .header("Content-Disposition", "attachment; filename=transactions.csv")
            .build();
}

Once the response begins streaming, the browser starts downloading the file immediately.

There is no need to wait for the entire dataset to be processed.

Supporting Multiple Export Formats

Financial platforms often support multiple export formats depending on the client system being used.

Common examples include:

A clean way to support these formats is to separate the export pipeline from the encoding logic.

For example:

public interface ExportEncoder {

    void start(OutputStream outputStream) throws IOException;

    void writeTransaction(Transaction transaction) throws IOException;

    void finish() throws IOException;
}

Each format can then implement its own encoder while the streaming pipeline remains unchanged.

This makes the export system easy to extend as new formats are required. This separation also keeps transport logic independent from file-format concerns, which makes testing and maintenance simpler.

Memory Behavior: Buffered vs Streaming

To understand the impact of streaming, it helps to compare how memory behaves in the two approaches.

Buffered Export

Traditional implementations load the entire dataset first.

List<Transaction> transactions = repository.findAllTransactions();
generateCSV(transactions);

Memory usage grows with dataset size because the full collection of transactions must be held in memory.

Streaming Export

With streaming, rows are processed one at a time.

while (rs.next()) {
    encode(rs);
    writeToResponse();
}

Only a small working set is required for the current fetch batch and output buffer.

As a result:

Memory usage stays relatively constant
Large exports do not require large heap allocations
API servers remain stable under concurrent export requests

What This Approach Improves

Streaming exports provide several practical advantages.

Memory usage remains low because transactions are processed incrementally rather than stored in large collections.

Users also experience faster response times because the download begins immediately instead of waiting for the server to build the entire export file.

Most importantly, the API infrastructure remains stable even when multiple large exports are requested concurrently.

For platforms that frequently generate large transaction exports, this architecture can significantly improve reliability and scalability.

Production Considerations

While streaming exports solve many scalability issues, there are a few practical considerations when implementing them in production systems.

Database timeouts

Long-running exports may require increased query timeouts depending on the database configuration.

Connection management

Streaming queries keep database connections open while data is being processed. Connection pool sizes should account for this behavior. In systems with frequent exports, it may also be useful to isolate export traffic from latency-sensitive request paths.

Backpressure

If the client download speed is slow, the server may block while writing to the response stream. Proper thread management is important to avoid tying up request threads unnecessarily.

Export limits

Some platforms enforce export limits or pagination windows to prevent excessively large exports from overwhelming infrastructure.

Even with these considerations, streaming remains one of the most effective techniques for handling large dataset exports.

Final Thoughts

Exporting large datasets is one of those features that seems trivial at first but becomes challenging as systems scale.

Streaming the export path from database to HTTP response is a simple and effective way to handle large exports at scale.

By processing transactions incrementally and delivering them directly to the client, APIs can handle large exports efficiently without excessive memory consumption.

In systems where large exports are common, adopting a streaming architecture can often be the difference between an export feature that works occasionally and one that scales reliably.

Although this article focuses on financial transaction exports, the same streaming approach can be applied to any API that returns large datasets—such as reporting endpoints, audit logs, analytics exports, or bulk data downloads.

DEV Community