DEV Community

Cover image for DATA FLOATING IN THE CLOUD
Abijith Raja B
Abijith Raja B

Posted on

DATA FLOATING IN THE CLOUD

Introduction

In today’s digital world, data moves faster than ever before. From online classes to global business systems, one invisible force connects it all — the cloud.
But when we say data in the cloud, it doesn’t mean our information is literally floating in the sky. Instead, it’s stored safely in large, distributed data centers managed by powerful servers. These servers allow us to access files, photos, and applications anytime, anywhere.
Let’s explore how data is represented in six different formats used widely in data analytics and cloud platforms.

Data Formats in Cloud Analytics

Every time you store, share, or query data in the cloud, you’re likely dealing with one of these six formats:

CSV – Simple text-based, comma-separated data

SQL – Relational, structured data tables

JSON – Lightweight, flexible key-value data

Parquet – Efficient, columnar storage for big data

XML – Markup-based hierarchical data

Avro – Binary, schema-driven data for streaming

To make it easy to understand, let’s take a small dataset and represent it in all six formats.

Sample Dataset

Name Roll_No Course Grade
Aadhira 201 Data Science A
Niveth 202 AI B+
Rahul 203 Cloud Computing A+

1️⃣ CSV (Comma Separated Values)

CSV is one of the simplest and most human-readable formats. Each record is written in one line, and each field is separated by commas.

Example:

Name,Roll_No,Course,Grade
Aadhira,201,Data Science,A
Niveth,202,AI,B+
Rahul,203,Cloud Computing,A+
Enter fullscreen mode Exit fullscreen mode

✅ Pros

  • Easy to read and edit
  • Works with almost every tool like Excel, Python, and Google Sheets

⚠️ Cons

  • No data types or schema
  • Not suitable for very large datasets

2️⃣ SQL (Structured Query Language)

SQL is the language of relational databases. It stores data in tables with defined columns and allows complex queries.

Example:

CREATE TABLE Students (
  Name VARCHAR(50),
  Roll_No INT,
  Course VARCHAR(50),
  Grade CHAR(2)
);

INSERT INTO Students VALUES
('Aadhira', 201, 'Data Science', 'A'),
('Niveth', 202, 'AI', 'B+'),
('Rahul', 203, 'Cloud Computing', 'A+');
Enter fullscreen mode Exit fullscreen mode

✅ Pros

  • Structured and organized
  • Perfect for queries, filters, and joins

⚠️ Cons

  • Rigid schema
  • Not suitable for nested data

3️⃣ JSON (JavaScript Object Notation)

JSON is the go-to format for APIs and NoSQL databases. It’s lightweight and great for representing hierarchical data.

Example:

[
  {"Name": "Aadhira", "Roll_No": 201, "Course": "Data Science", "Grade": "A"},
  {"Name": "Niveth", "Roll_No": 202, "Course": "AI", "Grade": "B+"},
  {"Name": "Rahul", "Roll_No": 203, "Course": "Cloud Computing", "Grade": "A+"}
]



Enter fullscreen mode Exit fullscreen mode

✅ Pros

  • Easy to parse in web apps
  • Supports nested structures

⚠️ Cons

  • No strict schema
  • Becomes bulky for large datasets

4️⃣ Parquet (Columnar Storage Format)

Parquet is built for big data analytics. It stores data column-wise, improving compression and query performance — ideal for tools like AWS Athena or Spark.

Example:

Name: ["Aadhira", "Niveth", "Rahul"]
Roll_No: [201, 202, 203]
Course: ["Data Science", "AI", "Cloud Computing"]
Grade: ["A", "B+", "A+"]
Enter fullscreen mode Exit fullscreen mode

✅ Pros

  • High compression
  • Fast analytical queries

⚠️ Cons

  • Not human-readable
  • Needs specialized tools (e.g., PyArrow, Spark)

5️⃣ XML (Extensible Markup Language)

XML represents data using tags. It’s structured and self-descriptive — often used in web services or configurations.

Example:

<Students>
  <Student>
    <Name>Aadhira</Name>
    <Roll_No>201</Roll_No>
    <Course>Data Science</Course>
    <Grade>A</Grade>
  </Student>
  <Student>
    <Name>Niveth</Name>
    <Roll_No>202</Roll_No>
    <Course>AI</Course>
    <Grade>B+</Grade>
  </Student>
  <Student>
    <Name>Rahul</Name>
    <Roll_No>203</Roll_No>
    <Course>Cloud Computing</Course>
    <Grade>A+</Grade>
  </Student>
</Students>

Enter fullscreen mode Exit fullscreen mode

✅ Pros

  • Self-descriptive and structured
  • Great for hierarchical data

⚠️ Cons

  • Verbose and heavy
  • Slower to parse

6️⃣ Avro (Row-Based Storage Format)

Avro is used for data streaming and serialization. It stores data in binary along with a schema — ensuring compactness and compatibility over time.

Schema Example:

{
  "type": "record",
  "name": "Student",
  "fields": [
    {"name": "Name", "type": "string"},
    {"name": "Roll_No", "type": "int"},
    {"name": "Course", "type": "string"},
    {"name": "Grade", "type": "string"}
  ]
}

Enter fullscreen mode Exit fullscreen mode

✅ Pros

  • Compact binary format
  • Schema evolution supported

⚠️ Cons

  • Not human-readable
  • Requires Avro libraries

Conclusion

Each data format serves a unique purpose in the cloud ecosystem:

Use Case

Simple exports/logs -> CSV
Relational databases -> SQL
APIs or nested data -> JSON
Big data analytics -> Parquet
Hierarchical data -> XML
Real-time streaming -> Avro

In essence, data in the sky isn’t just about storage — it’s about choosing the right format for the right purpose.

Top comments (0)