By Design — Episode 02
No specification. No schema. No data types. No standard encoding. No committee. No owner. No version number.
In 1972, IBM's Fortran compiler started accepting comma-separated values as input. Nobody wrote a design document. Nobody proposed a standard. Someone needed to move data from one place to another, separated values by commas, and it worked. That was the entire specification.
Thirty-three years later, Yakov Shafranovich wrote RFC 4180 to formalise what was already everywhere. The format was faster than the standardisation.
The Complaint
"CSV is not a real format. No types, no schema, no validation. One misplaced comma and your import breaks. One semicolon-delimited file from Germany and your pipeline explodes. It is amateur hour in a text file."
Every data engineer has said this. Most of them said it today.
The Decision
Nobody decided. That is the decision. No committee means no politics. No schema means no version conflicts. No types means every system on earth can read it: your database, your spreadsheet, your shell, your thirty-year-old mainframe. grep finds rows. awk splits columns. sort orders them. The entire Unix toolchain works on CSV without knowing what CSV is.
It requires no parser beyond "split at delimiter." It requires no agreement beyond "the first row might be headers." It requires no dependency, no library, no runtime.
The Trade-Off
Encoding chaos. Delimiter conflicts. No escaping standard. A quoted field containing a comma inside a file delimited by commas inside a system that does not handle quotes. One has been there. It was not pleasant.
The format pays for its universality with fragility at the edges. Every edge case is a surprise. Every surprise is a Friday afternoon.
The Proof
60% of enterprises use CSV for data exchange between systems. Every spreadsheet application. Every database export. Every CRM, ERP, and accounting tool. RFC 4180 came in 2005: by then, billions of CSV files already existed.
XML tried to replace it: too verbose. JSON tried: no tabular structure. Parquet tried: requires a runtime. Avro tried: requires a schema registry. CSV survived them all, because it requires nothing but a text editor and the ability to count commas.
The Principle
The format that requires no agreement will always beat the format that requires consensus. CSV has no governance, no authority, no design document. That is not a flaw. It is the reason it outlived every format that tried to replace it.
Nobody designed CSV. Fifty-three years later, everybody uses it.
Read the full article on vivianvoss.net →
By Vivian Voss — System Architect & Software Developer. Follow me on LinkedIn for daily technical writing.

Top comments (0)