6 Common Data Formats in Data Analytics

DHANYAA R S — Wed, 08 Oct 2025 07:17:03 +0000

In the world of data analytics, information can come in many formats. Each format serves different purposes—some are human-readable, others are optimized for storage or speed. In this article, we’ll explore six popular data formats used in analytics: CSV, SQL, JSON, Parquet, XML, and Avro. We’ll use a simple dataset to demonstrate each format.
Sample Dataset
[{'Name': 'Dhanyaa', 'Register_No': 'KPR23CB007', 'Subject': 'Data Analytics', 'Marks': 92}, {'Name': ’Krishna, 'Register_No': 'KPR23CB009', 'Subject': 'Cloud Computing', 'Marks': 88}, {'Name': 'Aarav', 'Register_No': 'KPR23CB011', 'Subject': 'AI & ML', 'Marks': 95}]
1. CSV (Comma Separated Values)
CSV is one of the simplest and most widely used data formats. It stores data in plain text, where each line represents a record and columns are separated by commas.
Name,Register_No,Subject,Marks
Dhanyaa,KPR23CB007,Data Analytics,92
Krishna,KPR23CB009,Cloud Computing,88
Aarav,KPR23CB011,AI & ML,95

2. SQL (Relational Table Format)
SQL databases store data in tables with defined columns and rows. You can create, read, update, and delete records using SQL queries.
CREATE TABLE students (
Name VARCHAR(50),
Register_No VARCHAR(20),
Subject VARCHAR(50),
Marks INT
);

INSERT INTO students VALUES
('Dhanyaa', 'KPR23CB007', 'Data Analytics', 92),
(Krishna, 'KPR23CB009', 'Cloud Computing', 88),
('Aarav', 'KPR23CB011', 'AI & ML', 95);

3. JSON (JavaScript Object Notation)
JSON is a lightweight data-interchange format that’s easy for humans to read and machines to parse. It’s widely used in APIs and data transmission.
{
"students": [
{"Name": "Dhanyaa", "Register_No": "KPR23CB007", "Subject": "Data Analytics", "Marks": 92},
{"Name": "Krishna", "Register_No": "KPR23CB009", "Subject": "Cloud Computing", "Marks": 88},
{"Name": "Aarav", "Register_No": "KPR23CB011", "Subject": "AI & ML", "Marks": 95}
]
}

4. Parquet (Columnar Storage Format)
Parquet is a columnar storage format optimized for big data processing frameworks like Apache Spark. It stores data by columns instead of rows, making queries faster for analytical workloads.
Example representation (simplified for illustration):
| Column Name | Values |
|--------------|--------------------------|
| Name | Dhanyaa, Krishna, Aarav |
| Register_No | KPR23CB007, KPR23CB009, KPR23CB011 |
| Subject | Data Analytics, Cloud Computing, AI & ML |
| Marks | 92, 88, 95 |

5. XML (Extensible Markup Language)
XML uses custom tags to define and structure data. Although more verbose, it’s useful for hierarchical data representation and data exchange.

Dhanyaa
KPR23CB007
Data Analytics
92

Krishna
KPR23CB009
Cloud Computing
88

Aarav
KPR23CB011
AI & ML
95

6. Avro (Row-Based Storage Format)
Avro is a binary row-based format developed under Apache Hadoop. It stores data along with its schema, which makes it efficient for serialization.
Schema Example:
{
"type": "record",
"name": "Student",
"fields": [
{"name": "Name", "type": "string"},
{"name": "Register_No", "type": "string"},
{"name": "Subject", "type": "string"},
{"name": "Marks", "type": "int"}
]
}
Data Example (in JSON-like representation):
{"Name": "Dhanyaa", "Register_No": "KPR23CB007", "Subject": "Data Analytics", "Marks": 92}
{"Name": "Krishna", "Register_No": "KPR23CB009", "Subject": "Cloud Computing", "Marks": 88}
{"Name": "Aarav", "Register_No": "KPR23CB011", "Subject": "AI & ML", "Marks": 95}

Conclusion
Each data format serves a unique purpose depending on the use case. While CSV and JSON are great for readability, Parquet and Avro are more efficient for large-scale analytics. Understanding these formats helps data professionals choose the right tools for data storage, transfer, and processing.

🚀 My Hilarious Journey Into MongoDB Atlas (with Yelp Reviews, JSON, and “good” vibes)

DHANYAA R S — Sun, 24 Aug 2025 15:14:09 +0000

So there I was, innocently sipping chai ☕ when I thought: “Hey, let’s play around with MongoDB Atlas. How hard could it be?” Spoiler alert: it was part comedy, part tragedy, but in the end—success tasted sweeter than Gulab Jamun. 🍯

Step 1: Logging into MongoDB Atlas

MongoDB Atlas greeted me like a strict professor: “Welcome, young padawan. Ready to suffer with connection strings?”
I bravely clicked Create Cluster, gave it a free-tier hug, and promised not to blow up the cloud.

Step 2: Building the yelp_demo.reviews Collection

I created a database called yelp_demo and a collection named reviews. Then came the fun part—manually inserting 10 reviews. Imagine me, typing fake reviews like:

{
"business_id": "B003",
"review": "The biryani here is sooo good!",
"rating": 5,
"date": "2025-08-20"
}

Yes, I felt like an undercover Yelp critic.

Step 3: Query Magic 🪄

Top 5 businesses with highest average rating
Using the Aggregation Pipeline:

_db.reviews.aggregate([
{ $group: { _id: "$business_id", avgRating: { $avg: "$rating" } } },
{ $sort: { avgRating: -1 } },
{ $limit: 5 }
])
_

Translation: “Dear MongoDB, please rank these food joints before my stomach makes decisions for me.”

Count reviews containing “good”
But first, MongoDB whispered: “Thou shall create a text index.”

db.reviews.createIndex({ review: "text" })
db.reviews.countDocuments({ $text: { $search: "good" } })

Result: Apparently, everyone thinks food is “good.” My dataset looked like it was sponsored by the word “good.” 😂

Get all reviews for a specific business (B003)

db.reviews.find({ business_id: "B003" }).sort({ date: -1 })

Yup, sorted by date, because reviews age faster than bananas. 🍌

Update a review
_
db.reviews.updateOne(
{ business_id: "B003" },
{ $set: { review: "Actually, the biryani was legendary!" } }
)

_
Because sometimes, you realize you were too harsh.

Delete a record

db.reviews.deleteOne({ business_id: "B010" })

Farewell, random fake café. You shall not be missed.

Step 4: The Export Saga 🎭

I thought: “Cool, I’ll just click Export in Atlas!” But Atlas laughed in my face—no export button in browser!

So here’s the trick I used:

Switch to JSON view in Atlas, copy everything, paste into VS Code, save as .json.

If CSV was needed, I tossed the JSON into an online converter.
Not elegant, but hey—it worked! 🎉

The Moral of the Story 🧘

MongoDB Atlas is like a desi auntie at a wedding—confusing at first, but once you understand her, she’ll feed you endless data love. I inserted, queried, updated, deleted, and even counted “good” vibes, all while laughing at my own mistakes.

So, if you’re diving into #DataEngineering or #DataAnalysis, don’t be afraid to get your hands messy. MongoDB will test your patience, but trust me, the JSON rewards are worth it.

_
_#Hashtags:

DataEngineering #DataAnalysis #LearningJourney #MongoDB #DevHumor_#DataEngineering #DataAnalysis #LearningJourney #MongoDB #DevHumor_

DEV Community: DHANYAA R S

6 Common Data Formats in Data Analytics

🚀 My Hilarious Journey Into MongoDB Atlas (with Yelp Reviews, JSON, and “good” vibes)

DataEngineering #DataAnalysis #LearningJourney #MongoDB #DevHumor_#DataEngineering #DataAnalysis #LearningJourney #MongoDB #DevHumor_