Introduction
I've been building LeapRows — a browser-based CSV analysis tool that runs entirely client-side using DuckDB-WASM, with no server involved.
At some point I needed to add XLSX export. But every existing approach I tried fell apart at any meaningful scale.
I eventually landed on a solution: generate Excel-compatible XML directly via DuckDB SQL, then compress it into an XLSX file using JSZip. This post covers why I went that route and how it works.
Note: My day job is SEO, not software engineering — so please forgive any imprecise terminology. 🙏
The Problem: Exporting 1 Million Rows to XLSX in the Browser
DuckDB-WASM's Built-in XLSX Export Was Unreliable
My first attempt was DuckDB-WASM's native XLSX output (the COPY TO command via the Excel extension). It was unstable — Excel would throw "The file format or file extension is not valid" on some outputs and refuse to open the file.
I ruled it out for production use and started looking for alternatives.
JS Libraries Ran Out of Memory
Next I tried SheetJS and ExcelJS. Both worked fine on small datasets, but memory usage exploded as file size grew, and the conversion itself became painfully slow.
Testing with 1 million rows, progress would reach 95% ... and then the browser would freeze for nearly 20 seconds. After the conversion finally finished, the browser stayed sluggish — and sometimes just crashed entirely.
I Still Wanted 1 Million Rows to Work
LeapRows is built around the promise of handling 1M+ rows quickly. I didn't want to quietly cap the export at some lower limit. If Excel itself supports up to ~1,048,576 rows, I wanted LeapRows to be able to fill every one of them.
The Insight: XLSX Is Just ZIP + XML
Something that might not be obvious: an XLSX file is actually a ZIP archive containing XML files. You can verify this yourself by renaming any .xlsx file to .zip — you'll see the contents directly.
Those XML files contain a lot of styling information (fonts, colors, borders, conditional formatting, etc.), but the actual data lives in just one file: xl/worksheets/sheet1.xml.
Since LeapRows only needs to export raw data from a DuckDB table — no styling required — I realized: the only file I need to generate dynamically is sheet1.xml. The other 4–5 files in the archive can be static strings.
Generating XML Directly with DuckDB SQL
The data is already in a DuckDB table. And DuckDB SQL is extremely fast at string operations — CONCAT, REPLACE, and so on.
That led to the idea: "What if I use a SQL query to output XML cell strings directly, then ZIP them up with JSZip?"
Before:
DuckDB → JS objects → library → XML → ZIP
After:
DuckDB → XML strings (via SQL) → JSZip → ZIP
Skipping the intermediate JS object conversion means no memory pressure from large object graphs, and no GC pauses that were causing the browser to freeze.
The approach uses a CASE expression to handle each column type appropriately — numbers pass through as-is, dates get converted to Excel serial values, and strings get XML-escaped.
What the SQL Looks Like
SELECT
'<row r="' || CAST(rn + 1 AS VARCHAR) || '">'
-- String column: XML-escaped, output as inline string
|| CASE WHEN "name" IS NOT NULL
THEN '<c r="A' || CAST(rn + 1 AS VARCHAR) || '" t="inlineStr"><is><t>'
|| REPLACE(REPLACE(REPLACE("name", '&', '&'), '<', '<'), '>', '>')
|| '</t></is></c>'
ELSE '' END
-- Numeric column: output value directly
|| CASE WHEN "age" IS NOT NULL
THEN '<c r="B' || CAST(rn + 1 AS VARCHAR) || '"><v>'
|| CAST("age" AS VARCHAR)
|| '</v></c>'
ELSE '' END
-- Date column: convert to Excel serial value (days since 1899-12-30)
|| CASE WHEN "created_at" IS NOT NULL
THEN '<c r="C' || CAST(rn + 1 AS VARCHAR) || '" s="1"><v>'
|| CAST(date_diff('day', DATE '1899-12-30', "created_at") AS VARCHAR)
|| '</v></c>'
ELSE '' END
|| '</row>' AS xml_row
FROM (
SELECT *, ROW_NUMBER() OVER () AS rn
FROM my_table
LIMIT 50000 OFFSET 0
)
Each row in the result set becomes one XML element:
<row r="2"><c r="A2" t="inlineStr"><is><t>John Doe</t></is></c><c r="B2"><v>30</v></c><c r="C2" s="1"><v>45302</v></c></row>
<row r="3"><c r="A3" t="inlineStr"><is><t>Jane Smith</t></is></c><c r="B3"><v>25</v></c><c r="C3" s="1"><v>45150</v></c></row>
Combine these rows with a header row and the static XML files, compress with JSZip, and you have a valid XLSX file.
Results
With ExcelJS, exporting 1 million rows took around 20 seconds and consumed over 1GB of memory (Unfortunately I didn't capture a "before" screenshot for comparison).
After switching to the SQL-based XML approach:
- XML generation: 5.5s
- ZIP compression: 7.3s
- Memory usage: significantly reduced
- Post-export browser freeze: gone
The reduction in GC-eligible objects was what solved the post-conversion sluggishness.
Side note: compression could probably be moved to the WASM layer for even better performance, but that felt out of scope for now. 😇
Trade-offs
This approach doesn't support cell styling — no colors, borders, or Excel-specific formatting. For LeapRows that's fine, since the goal is pure data export. But it's worth keeping in mind if you need rich formatting.
Wrapping Up
Once I let go of the assumption that "XLSX is a complex format," it became clear that for plain data export, a minimal XML structure is completely sufficient.
DuckDB's string processing is fast enough to make "generate XML via SQL" a genuinely practical approach — and I suspect the same idea could be applied in other creative ways beyond XLSX export.
![Windows File Explorer showing the contents of a renamed XLSX file as a ZIP archive, <br>
revealing folders: _rels, docProps, xl, and a [Content_Types].xml file.](https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbsayknhxlb9nhu52tel.png)


Top comments (0)