DEV Community

Aleksei Aleinikov
Aleksei Aleinikov

Posted on

๐Ÿ’พ Parquet or Avro? CSV or JSON?

As a data engineer, picking the wrong format can slow queries, bloat disks, or leave analysts crying in a text editor.

Hereโ€™s my go-to cheat sheet:
โœ… Parquet โ€” scan petabytes in seconds, nested data is fine.
โœ… CSV โ€” the universal handshake, opens in any spreadsheet.
โœ… JSON โ€” flexible for APIs and webhooks.
โœ… Avro โ€” schema-safe, perfect for streaming.
โœ… ORC โ€” dense and fast for heavy Hive/Spark crunching.
โœ… YAML โ€” configs your teammates can actually read.

๐Ÿ’ก Real code snippets and use cases included (yes, even screenshots for your future self).

https://medium.com/data-engineer-things/top-file-formats-every-data-engineer-should-know-in-2025-8ec8f20205d0

Top comments (0)