So you’ve got a dataset — a few names, marks, maybe your friend’s secret crush score 😏 — and you’re told to store it. But how?
That’s where our six heroes of data storage enter:
CSV, SQL, JSON, Parquet, XML, and Avro.
Each has its own personality — some simple, some complicated, and some that just exist to confuse you at 3 AM.
Let’s meet the squad 👇
🧩 Our Mini Dataset
Let’s keep it simple:
Name | Register Number | Subject | Marks |
---|---|---|---|
Praba | 21CS001 | Cloud Data | 92 |
Alex | 21CS002 | Big Data | 88 |
Sam | 21CS003 | AI | 95 |
1️⃣ CSV — The Simplicity King 👑
CSV (Comma-Separated Values) is like the humble text file that stores your data line by line, separated by commas.
No fancy metadata, no drama. Just pure simplicity.
csv
Name,Register Number,Subject,Marks
Praba,21CS001,Cloud Data,92
Alex,21CS002,Big Data,88
Sam,21CS003,AI,95
🗣️ “I’m small, fast, and open in Excel. What else do you need?” — CSV, probably
2️⃣ SQL — The Organized Perfectionist 🧮
SQL stores data in structured tables with rows and columns, like a well-maintained hostel attendance sheet.
It loves order and rules — “no duplicate primary keys, please.”
`CREATE TABLE Students (
Name VARCHAR(50),
RegisterNumber VARCHAR(10),
Subject VARCHAR(50),
Marks INT
);
INSERT INTO Students VALUES
('Praba', '21CS001', 'Cloud Data', 92),
('Alex', '21CS002', 'Big Data', 88),
('Sam', '21CS003', 'AI', 95);
`
🗣️ “I believe in relationships… relational databases, to be precise.” — SQL
3️⃣ JSON — The Developer’s BFF 💻
JSON (JavaScript Object Notation) is loved by APIs and front-end devs everywhere.
It’s structured yet flexible — perfect for sending data between systems.
[
{ "Name": "Praba", "RegisterNumber": "21CS001", "Subject": "Cloud Data", "Marks": 92 },
{ "Name": "Alex", "RegisterNumber": "21CS002", "Subject": "Big Data", "Marks": 88 },
{ "Name": "Sam", "RegisterNumber": "21CS003", "Subject": "AI", "Marks": 95 }
]
🗣️ “I speak fluently with JavaScript. JSON.stringify me!”
4️⃣ Parquet — The Speed Demon 🏎️
Parquet is a columnar storage format used in Big Data tools like Spark and Hadoop.
It’s highly compressed and optimized for reading specific columns fast — like “give me all the marks” instead of scanning the whole file.
Example?
Parquet isn’t human-readable (that’s the point!), but if it were:
Columns:
Name → ["Praba", "Alex", "Sam"]
RegisterNumber → ["21CS001", "21CS002", "21CS003"]
Subject → ["Cloud Data", "Big Data", "AI"]
Marks → [92, 88, 95]
🗣️ “I’m not pretty, but I’m fast. Ask any data engineer.”
5️⃣ XML — The Drama Queen 📜
XML (Extensible Markup Language) loves tags — open tags, close tags, nested tags… it’s basically HTML’s over-serious cousin.
<Students>
<Student>
<Name>Praba</Name>
<RegisterNumber>21CS001</RegisterNumber>
<Subject>Cloud Data</Subject>
<Marks>92</Marks>
</Student>
<Student>
<Name>Alex</Name>
<RegisterNumber>21CS002</RegisterNumber>
<Subject>Big Data</Subject>
<Marks>88</Marks>
</Student>
<Student>
<Name>Sam</Name>
<RegisterNumber>21CS003</RegisterNumber>
<Subject>AI</Subject>
<Marks>95</Marks>
</Student>
</Students>
🗣️ “I may be verbose, but at least I have structure!” — XML
6️⃣ Avro — The Efficient Coder 🧠
Avro is a row-based binary format designed for fast serialization and compact storage.
It’s schema-driven and used in Kafka or streaming pipelines.
Human-readable version (simplified):
{
"type": "record",
"name": "Student",
"fields": [
{"name": "Name", "type": "string"},
{"name": "RegisterNumber", "type": "string"},
{"name": "Subject", "type": "string"},
{"name": "Marks", "type": "int"}
],
"data": [
{"Name": "Praba", "RegisterNumber": "21CS001", "Subject": "Cloud Data", "Marks": 92},
{"Name": "Alex", "RegisterNumber": "21CS002", "Subject": "Big Data", "Marks": 88},
{"Name": "Sam", "RegisterNumber": "21CS003", "Subject": "AI", "Marks": 95}
]
}
🗣️ “I’m compact, I’m fast, and I don’t need XML’s drama.” — Avro
💬 Written by Prabakaran SR — a caffeine-fueled data explorer trying not to break the pipeline again ☕💾
Top comments (0)