DEV Community

Aleksei Aleinikov
Aleksei Aleinikov

Posted on

πŸ’Ύ Parquet or Avro? CSV or JSON?

As a data engineer, picking the wrong format can slow queries, bloat disks, or leave analysts crying in a text editor.

Here’s my go-to cheat sheet:
βœ… Parquet β€” scan petabytes in seconds, nested data is fine.
βœ… CSV β€” the universal handshake, opens in any spreadsheet.
βœ… JSON β€” flexible for APIs and webhooks.
βœ… Avro β€” schema-safe, perfect for streaming.
βœ… ORC β€” dense and fast for heavy Hive/Spark crunching.
βœ… YAML β€” configs your teammates can actually read.

πŸ’‘ Real code snippets and use cases included (yes, even screenshots for your future self).

https://medium.com/data-engineer-things/top-file-formats-every-data-engineer-should-know-in-2025-8ec8f20205d0

Top comments (0)