Why I Built Parquet Data

Parquet is my favourite format for storing tabular data.

Parquet compresses well, it has a strong schema, and it's efficient to analyze. These are all things that CSV, the defacto standard for storing tabular data, lacks.

For all of the benefits of Parquet, there are some clear downsides.

Parquet is a binary format. To see inside a Parquet file you need to either write some code to parse it, or use a specialized Parquet Viewer.

Without a Parquet Viewer, the easiest way to look inside a Parquet file is to convert to CSV and then open in a regular text editor or Excel.

CSV files are interoperable, they are understood by many kinds of software, but they are harder to analyse than Parquet files. Plus many text editors struggle to load a CSV file bigger than 20MB. It would be great to keep data in Parquet format while doing analysis and debugging.

I built Parquet Data because I love the Parquet format, but had trouble working with it. I made the Parquet Viewer and converters to make it easier to debug some code that I was working on.

DEV Community

Why I Built Parquet Data

Top comments (0)