1. CSV (Comma Separated Values)
What it is:
CSV is the simplest and most widely used data format. It stores data in plain text where each line represents a record, and values are separated by commas.
Example:
name,reg_no,subject,marks
Asha Rao,R001,Maths,89
Vikram S,R002,Physics,76
Meera K,R003,Chemistry,92
Rohit P,R004,Maths,68
Where it’s used:
CSV is used in spreadsheets, data imports/exports, and small-scale analytics.
2. SQL (Relational Table Format)
What it is:
SQL represents data stored in relational databases. The data is organized in tables with defined columns and data types. Each row represents one record.
Example:
TABLE: students
name | reg_no | subject | marks
Asha Rao | R001 | Maths | 89
Vikram S | R002 | Physics | 76
Meera K | R003 | Chemistry | 92
Rohit P | R004 | Maths | 68
Where it’s used:
Used in databases like MySQL, PostgreSQL, and SQL Server for structured data and transactional operations.
3. JSON (JavaScript Object Notation)
What it is:
JSON stores data as key-value pairs. It is lightweight, human-readable, and commonly used in APIs and modern web applications.
Example:
[
{"name": "Asha Rao", "reg_no": "R001", "subject": "Maths", "marks": 89},
{"name": "Vikram S", "reg_no": "R002", "subject": "Physics", "marks": 76},
{"name": "Meera K", "reg_no": "R003", "subject": "Chemistry", "marks": 92},
{"name": "Rohit P", "reg_no": "R004", "subject": "Maths", "marks": 68}
]
Where it’s used:
APIs, web applications, configuration files, and NoSQL databases like MongoDB.
4. Parquet (Columnar Storage Format)
What it is:
Parquet is a columnar storage format used for big data analytics. Unlike row-based formats, Parquet stores data column-wise, which reduces storage space and increases query performance.
Example (conceptual view):
name reg_no subject marks
Asha Rao R001 Maths 89
Vikram S R002 Physics 76
Meera K R003 Chemistry 92
Rohit P R004 Maths 68
Where it’s used:
Big data platforms like Apache Spark, Hadoop, and AWS Athena for fast analytics and cloud storage efficiency.
5. XML (Extensible Markup Language)
What it is:
XML is a tag-based format used to represent structured data. It is similar to HTML but designed to store and transport data rather than display it.
Example:
Asha Rao
R001
Maths
89
Vikram S
R002
Physics
76
Where it’s used:
Web services (SOAP), configuration files, and systems that require strong data validation through schemas.
6. Avro (Row-based Storage Format)
What it is:
Avro is a compact binary format that stores both data and schema. It’s designed for fast data serialization and supports schema evolution, making it ideal for real-time data pipelines.
Example (logical representation):
{"name": "Asha Rao", "reg_no": "R001", "subject": "Maths", "marks": 89}
{"name": "Vikram S", "reg_no": "R002", "subject": "Physics", "marks": 76}
{"name": "Meera K", "reg_no": "R003", "subject": "Chemistry", "marks": 92}
{"name": "Rohit P", "reg_no": "R004", "subject": "Maths", "marks": 68}
Where it’s used:
Data streaming (Apache Kafka), data serialization, and large-scale data pipelines.
Conclusion:
Each data format serves a different purpose in the analytics and cloud ecosystem:
- CSV is simple and universal.
- SQL ensures structure and relationships.
- JSON adds flexibility and nesting.
- Parquet optimizes analytical queries.
- XML emphasizes structure and validation.
- Avro focuses on efficient, schema-based data transport.
Understanding when and how to use these formats is a core skill for any data analyst, data engineer, or cloud professional.
Top comments (0)