DEV Community

Cover image for When JS Libraries Fail at 1M Rows: Generating XLSX via DuckDB SQL
Yuki
Yuki

Posted on

When JS Libraries Fail at 1M Rows: Generating XLSX via DuckDB SQL

Introduction

I've been building LeapRows — a browser-based CSV analysis tool that runs entirely client-side using DuckDB-WASM, with no server involved.

At some point I needed to add XLSX export. But every existing approach I tried fell apart at any meaningful scale.

I eventually landed on a solution: generate Excel-compatible XML directly via DuckDB SQL, then compress it into an XLSX file using JSZip. This post covers why I went that route and how it works.

Note: My day job is SEO, not software engineering — so please forgive any imprecise terminology. 🙏

The Problem: Exporting 1 Million Rows to XLSX in the Browser

DuckDB-WASM's Built-in XLSX Export Was Unreliable

My first attempt was DuckDB-WASM's native XLSX output (the COPY TO command via the Excel extension). It was unstable — Excel would throw "The file format or file extension is not valid" on some outputs and refuse to open the file.

I ruled it out for production use and started looking for alternatives.

JS Libraries Ran Out of Memory

Next I tried SheetJS and ExcelJS. Both worked fine on small datasets, but memory usage exploded as file size grew, and the conversion itself became painfully slow.

Testing with 1 million rows, progress would reach 95% ... and then the browser would freeze for nearly 20 seconds. After the conversion finally finished, the browser stayed sluggish — and sometimes just crashed entirely.

I Still Wanted 1 Million Rows to Work

LeapRows is built around the promise of handling 1M+ rows quickly. I didn't want to quietly cap the export at some lower limit. If Excel itself supports up to ~1,048,576 rows, I wanted LeapRows to be able to fill every one of them.

The Insight: XLSX Is Just ZIP + XML

Something that might not be obvious: an XLSX file is actually a ZIP archive containing XML files. You can verify this yourself by renaming any .xlsx file to .zip — you'll see the contents directly.

Windows File Explorer showing the contents of a renamed XLSX file as a ZIP archive, <br>
revealing folders: _rels, docProps, xl, and a [Content_Types].xml file.

Those XML files contain a lot of styling information (fonts, colors, borders, conditional formatting, etc.), but the actual data lives in just one file: xl/worksheets/sheet1.xml.

A code editor displaying the XML content of xl/worksheets/sheet1.xml inside an XLSX file, <br>
showing font styling definitions including Japanese font names like 游ゴシック.

Since LeapRows only needs to export raw data from a DuckDB table — no styling required — I realized: the only file I need to generate dynamically is sheet1.xml. The other 4–5 files in the archive can be static strings.

Generating XML Directly with DuckDB SQL

The data is already in a DuckDB table. And DuckDB SQL is extremely fast at string operations — CONCAT, REPLACE, and so on.

That led to the idea: "What if I use a SQL query to output XML cell strings directly, then ZIP them up with JSZip?"

Before:

DuckDB → JS objects → library → XML → ZIP
Enter fullscreen mode Exit fullscreen mode

After:

DuckDB → XML strings (via SQL) → JSZip → ZIP
Enter fullscreen mode Exit fullscreen mode

Skipping the intermediate JS object conversion means no memory pressure from large object graphs, and no GC pauses that were causing the browser to freeze.

The approach uses a CASE expression to handle each column type appropriately — numbers pass through as-is, dates get converted to Excel serial values, and strings get XML-escaped.

What the SQL Looks Like

SELECT
    '<row r="' || CAST(rn + 1 AS VARCHAR) || '">'
    -- String column: XML-escaped, output as inline string
    || CASE WHEN "name" IS NOT NULL
       THEN '<c r="A' || CAST(rn + 1 AS VARCHAR) || '" t="inlineStr"><is><t>'
         || REPLACE(REPLACE(REPLACE("name", '&', '&amp;'), '<', '&lt;'), '>', '&gt;')
         || '</t></is></c>'
       ELSE '' END
    -- Numeric column: output value directly
    || CASE WHEN "age" IS NOT NULL
       THEN '<c r="B' || CAST(rn + 1 AS VARCHAR) || '"><v>'
         || CAST("age" AS VARCHAR)
         || '</v></c>'
       ELSE '' END
    -- Date column: convert to Excel serial value (days since 1899-12-30)
    || CASE WHEN "created_at" IS NOT NULL
       THEN '<c r="C' || CAST(rn + 1 AS VARCHAR) || '" s="1"><v>'
         || CAST(date_diff('day', DATE '1899-12-30', "created_at") AS VARCHAR)
         || '</v></c>'
       ELSE '' END
    || '</row>' AS xml_row
FROM (
    SELECT *, ROW_NUMBER() OVER () AS rn
    FROM my_table
    LIMIT 50000 OFFSET 0
)
Enter fullscreen mode Exit fullscreen mode

Each row in the result set becomes one XML element:

<row r="2"><c r="A2" t="inlineStr"><is><t>John Doe</t></is></c><c r="B2"><v>30</v></c><c r="C2" s="1"><v>45302</v></c></row>
<row r="3"><c r="A3" t="inlineStr"><is><t>Jane Smith</t></is></c><c r="B3"><v>25</v></c><c r="C3" s="1"><v>45150</v></c></row>
Enter fullscreen mode Exit fullscreen mode

Combine these rows with a header row and the static XML files, compress with JSZip, and you have a valid XLSX file.

Results

With ExcelJS, exporting 1 million rows took around 20 seconds and consumed over 1GB of memory (Unfortunately I didn't capture a "before" screenshot for comparison).

After switching to the SQL-based XML approach:

  • XML generation: 5.5s
  • ZIP compression: 7.3s
  • Memory usage: significantly reduced
  • Post-export browser freeze: gone

The reduction in GC-eligible objects was what solved the post-conversion sluggishness.

Processing time breakdown for 1 million row XLSX export: DuckDB Init 4ms, File Load 1ms, Table Creation 439ms, Schema Analysis 1ms, Preview Generation 4ms, Build XML 5.5s (41%), ZIP Compress 7.3s (55%) .

Side note: compression could probably be moved to the WASM layer for even better performance, but that felt out of scope for now. 😇

Trade-offs

This approach doesn't support cell styling — no colors, borders, or Excel-specific formatting. For LeapRows that's fine, since the goal is pure data export. But it's worth keeping in mind if you need rich formatting.

Wrapping Up

Once I let go of the assumption that "XLSX is a complex format," it became clear that for plain data export, a minimal XML structure is completely sufficient.

DuckDB's string processing is fast enough to make "generate XML via SQL" a genuinely practical approach — and I suspect the same idea could be applied in other creative ways beyond XLSX export.

Top comments (0)