<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Prakalya Sambathkumar</title>
    <description>The latest articles on DEV Community by Prakalya Sambathkumar (@prakalya_sambathkumar_54e).</description>
    <link>https://dev.to/prakalya_sambathkumar_54e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3459524%2F45863ea6-3d7a-48b2-8415-e2423850e2f5.jpg</url>
      <title>DEV Community: Prakalya Sambathkumar</title>
      <link>https://dev.to/prakalya_sambathkumar_54e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/prakalya_sambathkumar_54e"/>
    <language>en</language>
    <item>
      <title>Understanding Data Formats in Cloud &amp; Data Analytics</title>
      <dc:creator>Prakalya Sambathkumar</dc:creator>
      <pubDate>Mon, 06 Oct 2025 07:09:54 +0000</pubDate>
      <link>https://dev.to/prakalya_sambathkumar_54e/understanding-data-formats-in-cloud-data-analytics-3cpc</link>
      <guid>https://dev.to/prakalya_sambathkumar_54e/understanding-data-formats-in-cloud-data-analytics-3cpc</guid>
      <description>&lt;p&gt;When working with data in cloud systems or analytics projects, the format you store your data in can make a huge difference in performance, scalability, and compatibility.&lt;/p&gt;

&lt;p&gt;Different data formats are designed for different purposes — some are easy to read, while others are optimized for large-scale analytics.&lt;/p&gt;

&lt;p&gt;In this article, we’ll explore six widely used data formats in analytics:&lt;br&gt;
CSV, SQL, JSON, Parquet, XML, and Avro, using a simple student dataset as an example.&lt;/p&gt;

&lt;p&gt;🎯 Sample Dataset&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4je6cdmm4fu01nw1p0mx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4je6cdmm4fu01nw1p0mx.png" alt=" " width="589" height="197"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;1️⃣ CSV (Comma-Separated Values)&lt;/p&gt;

&lt;p&gt;CSV is the simplest and most familiar data format. Each record is stored as a row, with commas separating the values.&lt;br&gt;
📄 Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Name,Register_No,Subject,Marks
Abi,201,Statistics,100
Mano,250,Computer Science,99
Priya,260,English,95
Riya,265,Maths,100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ Pros:Easy to create, read, and share.&lt;/p&gt;

&lt;p&gt;⚠️ Cons:No data types or schema,Struggles with nested or complex data.&lt;/p&gt;

&lt;p&gt;2️⃣ SQL (Structured Query Language)&lt;/p&gt;

&lt;p&gt;SQL represents data in a relational table format, where data is organized into rows and columns.&lt;/p&gt;

&lt;p&gt;📄 Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE Students (
    Name VARCHAR(50),
    Register_No INT,
    Subject VARCHAR(50),
    Marks INT
);

INSERT INTO Students (Name, Register_No, Subject, Marks) VALUES
('Abi', 201, 'Statistics', 100),
('Mano', 250, 'Computer Science', 99),
('Priya', 260, 'English', 95),
('Riya', 265, 'Maths', 100);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ Pros:Enforces schema and data types,easy to query and manage structured data.&lt;/p&gt;

&lt;p&gt;⚠️ Cons:Not suitable for unstructured or hierarchical data.&lt;/p&gt;

&lt;p&gt;3️⃣ JSON (JavaScript Object Notation)&lt;/p&gt;

&lt;p&gt;JSON stores data in key-value pairs. It’s lightweight, flexible, and widely used for APIs and NoSQL databases.&lt;/p&gt;

&lt;p&gt;📄 Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[
  {
    "Name": "Abi",
    "Register_No": 201,
    "Subject": "Statistics",
    "Marks": 100
  },
  {
    "Name": "Mano",
    "Register_No": 250,
    "Subject": "Computer Science",
    "Marks": 99
  },
  {
    "Name": "Priya",
    "Register_No": 260,
    "Subject": "English",
    "Marks": 95
  },
  {
    "Name": "Riya",
    "Register_No": 265,
    "Subject": "Maths",
    "Marks": 100
  }
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ Pros:Human-readable and easy to share.&lt;/p&gt;

&lt;p&gt;⚠️ Cons:Takes more space compared to binary formats,Slow.&lt;/p&gt;

&lt;p&gt;4️⃣ Parquet (Columnar Storage Format)&lt;/p&gt;

&lt;p&gt;Parquet is a binary, columnar format created for efficient data analytics.&lt;br&gt;
It’s highly optimized for tools like Apache Spark, Hadoop, AWS Athena, and BigQuery.&lt;/p&gt;

&lt;p&gt;📄 Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd

data = {
    "Name": ["Abi", "Mano", "Priya", "Riya"],
    "Register_No": [201, 250, 260, 265],
    "Subject": ["Statistics", "Computer Science", "English", "Maths"],
    "Marks": [100, 99, 95, 100]
}

df = pd.DataFrame(data)
df.to_parquet("students.parquet", engine="pyarrow", index=False)

print("✅ Parquet file created successfully!")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;⚡ Parquet files are not human-readable — they store compressed binary data for faster processing.&lt;/p&gt;

&lt;p&gt;✅ Pros:Great compression and query performance.&lt;/p&gt;

&lt;p&gt;⚠️ Cons:Needs special libraries to read/write,Not ideal for simple text sharing.&lt;/p&gt;

&lt;p&gt;5️⃣ XML (Extensible Markup Language)&lt;/p&gt;

&lt;p&gt;XML represents data using a tag-based structure, making it hierarchical and self-descriptive.&lt;/p&gt;

&lt;p&gt;📄 Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;Students&amp;gt;
  &amp;lt;Student&amp;gt;
    &amp;lt;Name&amp;gt;Abi&amp;lt;/Name&amp;gt;
    &amp;lt;Register_No&amp;gt;201&amp;lt;/Register_No&amp;gt;
    &amp;lt;Subject&amp;gt;Statistics&amp;lt;/Subject&amp;gt;
    &amp;lt;Marks&amp;gt;100&amp;lt;/Marks&amp;gt;
  &amp;lt;/Student&amp;gt;
  &amp;lt;Student&amp;gt;
    &amp;lt;Name&amp;gt;Mano&amp;lt;/Name&amp;gt;
    &amp;lt;Register_No&amp;gt;250&amp;lt;/Register_No&amp;gt;
    &amp;lt;Subject&amp;gt;Computer Science&amp;lt;/Subject&amp;gt;
    &amp;lt;Marks&amp;gt;99&amp;lt;/Marks&amp;gt;
  &amp;lt;/Student&amp;gt;
  &amp;lt;Student&amp;gt;
    &amp;lt;Name&amp;gt;Priya&amp;lt;/Name&amp;gt;
    &amp;lt;Register_No&amp;gt;260&amp;lt;/Register_No&amp;gt;
    &amp;lt;Subject&amp;gt;English&amp;lt;/Subject&amp;gt;
    &amp;lt;Marks&amp;gt;95&amp;lt;/Marks&amp;gt;
  &amp;lt;/Student&amp;gt;
  &amp;lt;Student&amp;gt;
    &amp;lt;Name&amp;gt;Riya&amp;lt;/Name&amp;gt;
    &amp;lt;Register_No&amp;gt;265&amp;lt;/Register_No&amp;gt;
    &amp;lt;Subject&amp;gt;Maths&amp;lt;/Subject&amp;gt;
    &amp;lt;Marks&amp;gt;100&amp;lt;/Marks&amp;gt;
  &amp;lt;/Student&amp;gt;
&amp;lt;/Students&amp;gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ Pros:Self-descriptive and structured,Ideal for hierarchical data.&lt;/p&gt;

&lt;p&gt;⚠️ Cons:Verbose and storage-heavy,Slower to parse compared to JSON.&lt;/p&gt;

&lt;p&gt;6️⃣ Avro (Row-based Storage Format)&lt;/p&gt;

&lt;p&gt;Avro is a row-based binary format designed for fast data serialization.&lt;br&gt;
It’s schema-based and often used in Apache Kafka and Hadoop ecosystems.&lt;/p&gt;

&lt;p&gt;📄 Schema (students.avsc):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "type": "record",
  "name": "Student",
  "fields": [
    {"name": "Name", "type": "string"},
    {"name": "Register_No", "type": "int"},
    {"name": "Subject", "type": "string"},
    {"name": "Marks", "type": "int"}
  ]
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📄 Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from fastavro import writer

schema = {
    "type": "record",
    "name": "Student",
    "fields": [
        {"name": "Name", "type": "string"},
        {"name": "Register_No", "type": "int"},
        {"name": "Subject", "type": "string"},
        {"name": "Marks", "type": "int"}
    ]
}

records = [
    {"Name": "Abi", "Register_No": 201, "Subject": "Statistics", "Marks": 100},
    {"Name": "Mano", "Register_No": 250, "Subject": "Computer Science", "Marks": 99},
    {"Name": "Priya", "Register_No": 260, "Subject": "English", "Marks": 95},
    {"Name": "Riya", "Register_No": 265, "Subject": "Maths", "Marks": 100}
]

with open("students.avro", "wb") as out:
    writer(out, schema, records)

print("✅ Avro file created successfully!")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ Pros:Schema-based and consistent.&lt;/p&gt;

&lt;p&gt;⚠️ Cons:Binary format (not readable),Requires Avro libraries to parse.&lt;/p&gt;

&lt;p&gt;Knowing when to use each helps you build efficient, scalable, and cloud-ready data pipelines. 🌥️&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>dataengineering</category>
      <category>datascience</category>
    </item>
    <item>
      <title>MongoDB: The Yelp Review Chronicles #dataengineering #mongodb #database #learningjourney</title>
      <dc:creator>Prakalya Sambathkumar</dc:creator>
      <pubDate>Tue, 26 Aug 2025 05:01:50 +0000</pubDate>
      <link>https://dev.to/prakalya_sambathkumar_54e/mongodb-the-yelp-review-chronicles-dataengineering-mongodb-database-learningjourney-53l</link>
      <guid>https://dev.to/prakalya_sambathkumar_54e/mongodb-the-yelp-review-chronicles-dataengineering-mongodb-database-learningjourney-53l</guid>
      <description>&lt;p&gt;&lt;strong&gt;Episode 1: The Data Adventure Begins&lt;/strong&gt;&lt;br&gt;
Dive into the vast world of Yelp reviews, where every customer’s opinion shapes the experience of millions. Our stage? MongoDB — the NoSQL powerhouse perfect for handling this diverse data. Our mission? Insert, query, update, and explore insights hidden in these reviews.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Act 1: Setting the Scene — MongoDB Setup&lt;/strong&gt;&lt;br&gt;
Like any great data story, we start by setting up MongoDB. Whether on local machines or the cloud with MongoDB Atlas, we created a database named yelpDB and a collection reviews — the heart of our review operations.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Forlmsbxzg5rh4tydjalv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Forlmsbxzg5rh4tydjalv.png" alt=" " width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Act 2: Rolling Out the Cast — Insert Records&lt;/strong&gt;&lt;br&gt;
With our stage ready, it was time to introduce the actors. We manually inserted 10 sample Yelp reviews, each carrying vital attributes like business_id, review_id, text of the review, and rating.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frteux7lmggbfduam8b0b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frteux7lmggbfduam8b0b.png" alt=" " width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Act 3: The Rating Royalty — Top 5 Businesses&lt;/strong&gt;&lt;br&gt;
Who reigns supreme in the Yelp kingdom? MongoDB’s aggregation framework helped us uncover the top 5 businesses with the highest average ratings, proving once again that stars have power.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxq1tv2pjior8qhiqlx7u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxq1tv2pjior8qhiqlx7u.png" alt=" " width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Act 4: The Good Word Mystery&lt;/strong&gt;&lt;br&gt;
What’s the hype about the word “good”? We counted how many reviews mentioned “good” to catch the pulse of positivity (or criticism) in the community.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ecl71i53z9yfnqijbb5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ecl71i53z9yfnqijbb5.png" alt=" " width="800" height="33"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Act 5: Reviews Spotlight — A Business Tale&lt;/strong&gt;&lt;br&gt;
Because every business has its story, we drilled down to look at all reviews for a particular business_id — say "b2" — gathering the voices behind the numbers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdbyrjhhtlmx0mv6d6mxy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdbyrjhhtlmx0mv6d6mxy.png" alt=" " width="800" height="231"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Act 6: The Plot Twists — Update &amp;amp; Delete&lt;/strong&gt;&lt;br&gt;
No story remains static. We performed an update to a review’s rating (e.g., changing rating of review "r1") and deleted another review ("r4") that no longer fit the narrative.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo3qqmk3tr1ar671wwmh0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo3qqmk3tr1ar671wwmh0.png" alt=" " width="800" height="167"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28merxpfohdrj4mkhn6y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28merxpfohdrj4mkhn6y.png" alt=" " width="800" height="79"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exporting the Chronicles&lt;/strong&gt;&lt;br&gt;
Every great story deserves to be shared. We exported our curated review dataset and query results into JSON and CSV formats for further analysis, archival, and storytelling across platforms.&lt;/p&gt;

&lt;p&gt;By the end of our Yelp Reviews MongoDB journey, we had:&lt;br&gt;
✅ Inserted sample review records&lt;br&gt;
✅ Aggregated businesses by average rating&lt;br&gt;
✅ Counted review text occurrences&lt;br&gt;
✅ Queried reviews by business_id&lt;br&gt;
✅ Updated and deleted records&lt;br&gt;
✅ Exported data for external use&lt;/p&gt;

&lt;p&gt;💡 This hands-on journey mirrors real-world data engineering workflows — from ETL to insights, data maintenance, and exporting essential data products.&lt;/p&gt;

&lt;p&gt;Stay tuned for more seasons of MongoDB exploration and data adventures!&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
