<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: SAHANA S</title>
    <description>The latest articles on DEV Community by SAHANA S (@sahana_s_723583d985050944).</description>
    <link>https://dev.to/sahana_s_723583d985050944</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3482811%2F8cda5376-9625-4596-868d-de208553c11e.png</url>
      <title>DEV Community: SAHANA S</title>
      <link>https://dev.to/sahana_s_723583d985050944</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sahana_s_723583d985050944"/>
    <language>en</language>
    <item>
      <title>Data in the Cloud: 6 Common Data Formats</title>
      <dc:creator>SAHANA S</dc:creator>
      <pubDate>Wed, 08 Oct 2025 10:09:57 +0000</pubDate>
      <link>https://dev.to/sahana_s_723583d985050944/data-in-the-cloud-6-common-data-formats-10pd</link>
      <guid>https://dev.to/sahana_s_723583d985050944/data-in-the-cloud-6-common-data-formats-10pd</guid>
      <description>&lt;p&gt;In today's world of data analytics and cloud computing, how we store and exchange data can have a huge impact on performance, scalability, and compatibility. Whether you’re working on a data pipeline or exporting reports, understanding different data formats is essential.&lt;/p&gt;

&lt;p&gt;In this article, I’ll walk you through six popular data formats — CSV, SQL, JSON, Parquet, XML, and Avro — using a simple example dataset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sample Dataset&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let’s use a small dataset of student marks:&lt;/p&gt;

&lt;p&gt;Name    RegisterNumber  Subject Marks&lt;br&gt;
Alice   1001    Math    85&lt;br&gt;
Bob 1002    Math    90&lt;br&gt;
Charlie 1003    Math    78&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CSV (Comma-Separated Values)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What is it?&lt;br&gt;
CSV is one of the simplest data formats. Each row is a line, and each value is separated by a comma. It's lightweight and easy to read.&lt;/p&gt;

&lt;p&gt;When to use:&lt;br&gt;
Exporting data from spreadsheets or databases&lt;br&gt;
Quick sharing and inspection of tabular data&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Name,RegisterNumber,Subject,Marks
Alice,1001,Math,85
Bob,1002,Math,90
Charlie,1003,Math,78
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;SQL (Relational Table Format)&lt;/strong&gt;&lt;br&gt;
What is it?&lt;br&gt;
SQL is the standard language used to interact with relational databases. Data is stored in tables with columns and rows.&lt;/p&gt;

&lt;p&gt;When to use:&lt;br&gt;
Structured data that fits into relational models&lt;br&gt;
OLTP and OLAP systems&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE StudentMarks (
  Name TEXT,
  RegisterNumber INT,
  Subject TEXT,
  Marks INT
);


INSERT INTO StudentMarks VALUES ('Alice', 1001, 'Math', 85);
INSERT INTO StudentMarks VALUES ('Bob', 1002, 'Math', 90);
INSERT INTO StudentMarks VALUES ('Charlie', 1003, 'Math', 78);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;JSON (JavaScript Object Notation)&lt;/strong&gt;&lt;br&gt;
What is it?&lt;br&gt;
JSON is a lightweight text-based format used for data interchange. It’s structured like a dictionary or object and is widely used in APIs.&lt;/p&gt;

&lt;p&gt;When to use:&lt;br&gt;
Web services and REST APIs&lt;br&gt;
Semi-structured data&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[
  {
    "Name": "Alice",
    "RegisterNumber": 1001,
    "Subject": "Math",
    "Marks": 85
  },
  {
    "Name": "Bob",
    "RegisterNumber": 1002,
    "Subject": "Math",
    "Marks": 90
  },
  {
    "Name": "Charlie",
    "RegisterNumber": 1003,
    "Subject": "Math",
    "Marks": 78
  }
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Parquet (Columnar Storage Format)&lt;/strong&gt;&lt;br&gt;
What is it?&lt;br&gt;
Parquet is an Apache columnar storage format, designed for efficient data compression and query performance. It’s a binary format and not human-readable.&lt;/p&gt;

&lt;p&gt;When to use:&lt;br&gt;
Big data workloads (Spark, Hive)&lt;br&gt;
Analytics in cloud data lakes&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
Here’s how the dataset might look conceptually (in a tool like Spark or Python) before writing to Parquet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd

data = [
    {"Name": "Alice", "RegisterNumber": 1001, "Subject": "Math", "Marks": 85},
    {"Name": "Bob", "RegisterNumber": 1002, "Subject": "Math", "Marks": 90},
    {"Name": "Charlie", "RegisterNumber": 1003, "Subject": "Math", "Marks": 78},
]

df = pd.DataFrame(data)
df.to_parquet("student_marks.parquet")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;XML (Extensible Markup Language)&lt;/strong&gt;&lt;br&gt;
What is it?&lt;br&gt;
XML is a markup language that uses custom tags to define data. It’s more verbose than JSON but was a standard for data exchange in the early web.&lt;/p&gt;

&lt;p&gt;When to use:&lt;br&gt;
Legacy systems&lt;br&gt;
Document-centric data (RSS feeds, SOAP APIs)&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;Students&amp;gt;
  &amp;lt;Student&amp;gt;
    &amp;lt;Name&amp;gt;Alice&amp;lt;/Name&amp;gt;
    &amp;lt;RegisterNumber&amp;gt;1001&amp;lt;/RegisterNumber&amp;gt;
    &amp;lt;Subject&amp;gt;Math&amp;lt;/Subject&amp;gt;
    &amp;lt;Marks&amp;gt;85&amp;lt;/Marks&amp;gt;
  &amp;lt;/Student&amp;gt;
  &amp;lt;Student&amp;gt;
    &amp;lt;Name&amp;gt;Bob&amp;lt;/Name&amp;gt;
    &amp;lt;RegisterNumber&amp;gt;1002&amp;lt;/RegisterNumber&amp;gt;
    &amp;lt;Subject&amp;gt;Math&amp;lt;/Subject&amp;gt;
    &amp;lt;Marks&amp;gt;90&amp;lt;/Marks&amp;gt;
  &amp;lt;/Student&amp;gt;
  &amp;lt;Student&amp;gt;
    &amp;lt;Name&amp;gt;Charlie&amp;lt;/Name&amp;gt;
    &amp;lt;RegisterNumber&amp;gt;1003&amp;lt;/RegisterNumber&amp;gt;
    &amp;lt;Subject&amp;gt;Math&amp;lt;/Subject&amp;gt;
    &amp;lt;Marks&amp;gt;78&amp;lt;/Marks&amp;gt;
  &amp;lt;/Student&amp;gt;
&amp;lt;/Students&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Avro (Row-based Binary Format)&lt;/strong&gt;&lt;br&gt;
What is it?&lt;br&gt;
Avro is a row-based binary data format from Apache. It supports rich data structures and includes a schema definition, making it excellent for streaming and serialization.&lt;/p&gt;

&lt;p&gt;When to use:&lt;br&gt;
Kafka message serialization&lt;br&gt;
Schema evolution&lt;br&gt;
Compact binary storage&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
Here’s how you define and write Avro data using Python (conceptual):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "type": "record",
  "name": "Student",
  "fields": [
    {"name": "Name", "type": "string"},
    {"name": "RegisterNumber", "type": "int"},
    {"name": "Subject", "type": "string"},
    {"name": "Marks", "type": "int"}
  ]
}

import avro.schema
import avro.datafile
import avro.io
from io import BytesIO

schema = avro.schema.parse(open("student.avsc", "r").read())
data = [
    {"Name": "Alice", "RegisterNumber": 1001, "Subject": "Math", "Marks": 85},
    {"Name": "Bob", "RegisterNumber": 1002, "Subject": "Math", "Marks": 90},
    {"Name": "Charlie", "RegisterNumber": 1003, "Subject": "Math", "Marks": 78},
]

with open("students.avro", "wb") as out_file:
    writer = avro.datafile.DataFileWriter(out_file, avro.io.DatumWriter(), schema)
    for record in data:
        writer.append(record)
    writer.close()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
Understanding these formats helps you choose the right tool for the job, whether you're building a modern data pipeline or integrating with legacy systems.&lt;/p&gt;

</description>
      <category>data</category>
      <category>dataengineering</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Exploring MongoDB</title>
      <dc:creator>SAHANA S</dc:creator>
      <pubDate>Sat, 06 Sep 2025 03:59:35 +0000</pubDate>
      <link>https://dev.to/sahana_s_723583d985050944/exploring-mongodb-5605</link>
      <guid>https://dev.to/sahana_s_723583d985050944/exploring-mongodb-5605</guid>
      <description>&lt;p&gt;As a part of exploring MongoDB using MongoDB Compass—the friendliest GUI for Mongo's document store. I wanted to explore MongoDB beyond the shell, so I cooked up a mini “Yelp-style” reviews dataset in a local yelp DB. My goal? Get hands-on with real-world tasks: inserting sample data, running queries and aggregations, tweaking entries, performing deletions, and even exporting results as files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Importing Dataset&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For convenience, I prepared a small sample in JSON format—modelled after Yelp reviews. I then used the Add Data → Import Data feature in Compass to upload the JSON file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inserting Records Manually&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To get hands-on, I also manually added at least 10 review entries with fields like business_id, name, rating, and review. This helped me understand data structure and ensure variety in entries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3y41gimarjp0nf7ga2m3.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3y41gimarjp0nf7ga2m3.jpg" alt=" " width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Queries &amp;amp; Aggregations&lt;/strong&gt;&lt;br&gt;
Top 5 Businesses by Average Rating&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjrs8z11e6qr54ui3tbbd.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjrs8z11e6qr54ui3tbbd.jpg" alt=" " width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Count of Reviews Containing the Word “good”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6m6ss6irzna61ig13s16.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6m6ss6irzna61ig13s16.jpg" alt=" " width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;All Reviews for a Specific Business ID&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffon4g7o3nv4bzc54vp6d.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffon4g7o3nv4bzc54vp6d.jpg" alt=" " width="800" height="265"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Delete a Record&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I selected a review and clicked the delete/trash icon to remove it. Compass prompted for confirmation and then removed the document.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikwbb6x65pk1vh7kaez5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikwbb6x65pk1vh7kaez5.jpg" alt=" " width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
By doing this practical exercise, I learned how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Insert, query, update, and delete documents in MongoDB Compass&lt;/li&gt;
&lt;li&gt;Run aggregation pipelines to analyze data&lt;/li&gt;
&lt;li&gt;Use regex to search text fields&lt;/li&gt;
&lt;li&gt;Export results for further analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MongoDB’s flexible schema and Compass’s visual interface make it a powerful pairing for real-world data tasks.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
