DEV Community

Pacharapol Withayasakpunt
Pacharapol Withayasakpunt

Posted on

Please ELI5 what Parquet is for, and NOT for

I am trying to understand how good is Apache Parquet for

  • Data storage format (when you DO NOT have a Hadoop; only on your local computer)
    • How big is the size?
    • How reliable is it?
  • Query-able format
    • Do I have to index first? (Probably unique indices are not possible?)
    • Speed?
    • Resource usage?

As far as I understand, Parquet may not be good for frequent writes or updates; but is it good enough for a static database?

You can compare to the always popular SQLite, as a benchmark; disregarding SQLite features, such as foreign keys, unique indices, full text search and multiple tables.

BTW, I have seen SQLite file size goes to 700 MB for a few megabytes for final CSV data, and not sure if it is reliable as a storage anymore...

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

Top comments (0)

Billboard image

Try REST API Generation for MS SQL Server.

DevOps for Private APIs. With DreamFactory API Generation, you get:

  • Auto-generated live APIs mapped from database schema
  • Interactive Swagger API documentation
  • Scripting engine to customize your API
  • Built-in role-based access control

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay