<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: CIBIN S </title>
    <description>The latest articles on DEV Community by CIBIN S  (@cibin_s).</description>
    <link>https://dev.to/cibin_s</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3482844%2F6ba7a6df-94b2-4c46-8827-7d86e6572a7b.png</url>
      <title>DEV Community: CIBIN S </title>
      <link>https://dev.to/cibin_s</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cibin_s"/>
    <language>en</language>
    <item>
      <title>🧠Understanding 6 Common Data Formats in Cloud Data Analytics</title>
      <dc:creator>CIBIN S </dc:creator>
      <pubDate>Mon, 10 Nov 2025 04:41:24 +0000</pubDate>
      <link>https://dev.to/cibin_s/understanding-6-common-data-formats-in-cloud-data-analytics-n9f</link>
      <guid>https://dev.to/cibin_s/understanding-6-common-data-formats-in-cloud-data-analytics-n9f</guid>
      <description>&lt;p&gt;Data analytics relies heavily on how data is stored, exchanged, and processed. Different data formats are optimized for different use cases — from simple spreadsheets to large-scale distributed processing. In this blog, let’s explore six popular data formats used in cloud-based analytics: CSV, SQL, JSON, Parquet, XML, and Avro.&lt;/p&gt;

&lt;p&gt;We’ll use a simple dataset throughout all examples 👇&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Name    Register_No Subject Marks
Arjun   101 Math    90
Priya   102 Science 88
Kavin   103 English 92
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CSV (Comma Separated Values)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Explanation:&lt;br&gt;
CSV is the simplest and most human-readable format for storing tabular data. Each line represents a row, and commas separate individual values. It’s widely used for data import/export in spreadsheets and analytics tools.&lt;/p&gt;

&lt;p&gt;Example (data.csv):&lt;/p&gt;

&lt;p&gt;Name,Register_No,Subject,Marks&lt;br&gt;
Arjun,101,Math,90&lt;br&gt;
Priya,102,Science,88&lt;br&gt;
Kavin,103,English,92&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL (Relational Table Format)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Explanation:&lt;br&gt;
SQL stores data in structured tables within relational databases. It allows for querying, joining, and managing data efficiently using SQL commands.&lt;/p&gt;

&lt;p&gt;Example (data.sql):&lt;/p&gt;

&lt;p&gt;CREATE TABLE Students (&lt;br&gt;
  Name VARCHAR(20),&lt;br&gt;
  Register_No INT,&lt;br&gt;
  Subject VARCHAR(20),&lt;br&gt;
  Marks INT&lt;br&gt;
);&lt;/p&gt;

&lt;p&gt;INSERT INTO Students VALUES ('Arjun', 101, 'Math', 90);&lt;br&gt;
INSERT INTO Students VALUES ('Priya', 102, 'Science', 88);&lt;br&gt;
INSERT INTO Students VALUES ('Kavin', 103, 'English', 92);&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JSON (JavaScript Object Notation)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Explanation:&lt;br&gt;
JSON is a lightweight data-interchange format that stores data as key-value pairs. It is easy for both humans and machines to read and is widely used in APIs, web applications, and NoSQL databases.&lt;/p&gt;

&lt;p&gt;Example (data.json):&lt;/p&gt;

&lt;p&gt;[&lt;br&gt;
  {"Name": "Arjun", "Register_No": 101, "Subject": "Math", "Marks": 90},&lt;br&gt;
  {"Name": "Priya", "Register_No": 102, "Subject": "Science", "Marks": 88},&lt;br&gt;
  {"Name": "Kavin", "Register_No": 103, "Subject": "English", "Marks": 92}&lt;br&gt;
]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parquet (Columnar Storage Format)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Explanation:&lt;br&gt;
Parquet is an efficient, columnar storage format used in big data systems like Hadoop, Spark, and AWS Athena. It stores data by columns instead of rows, allowing faster read performance and better compression for analytical queries.&lt;/p&gt;

&lt;p&gt;Example (Conceptual View):&lt;/p&gt;

&lt;p&gt;Column 1: Name → [Arjun, Priya, Kavin]&lt;br&gt;
Column 2: Register_No → [101, 102, 103]&lt;br&gt;
Column 3: Subject → [Math, Science, English]&lt;br&gt;
Column 4: Marks → [90, 88, 92]&lt;/p&gt;

&lt;p&gt;(In reality, Parquet is a binary format, so the data is stored in compressed column chunks rather than text.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;XML (Extensible Markup Language)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Explanation:&lt;br&gt;
XML represents data in a tree structure with tags. It’s commonly used for configuration files, data exchange, and web services (SOAP). Each element is enclosed in start and end tags, providing structure and hierarchy.&lt;/p&gt;

&lt;p&gt;Example (data.xml):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;Students&amp;gt;
  &amp;lt;Student&amp;gt;
    &amp;lt;Name&amp;gt;Arjun&amp;lt;/Name&amp;gt;
    &amp;lt;Register_No&amp;gt;101&amp;lt;/Register_No&amp;gt;
    &amp;lt;Subject&amp;gt;Math&amp;lt;/Subject&amp;gt;
    &amp;lt;Marks&amp;gt;90&amp;lt;/Marks&amp;gt;
  &amp;lt;/Student&amp;gt;
  &amp;lt;Student&amp;gt;
    &amp;lt;Name&amp;gt;Priya&amp;lt;/Name&amp;gt;
    &amp;lt;Register_No&amp;gt;102&amp;lt;/Register_No&amp;gt;
    &amp;lt;Subject&amp;gt;Science&amp;lt;/Subject&amp;gt;
    &amp;lt;Marks&amp;gt;88&amp;lt;/Marks&amp;gt;
  &amp;lt;/Student&amp;gt;
  &amp;lt;Student&amp;gt;
    &amp;lt;Name&amp;gt;Kavin&amp;lt;/Name&amp;gt;
    &amp;lt;Register_No&amp;gt;103&amp;lt;/Register_No&amp;gt;
    &amp;lt;Subject&amp;gt;English&amp;lt;/Subject&amp;gt;
    &amp;lt;Marks&amp;gt;92&amp;lt;/Marks&amp;gt;
  &amp;lt;/Student&amp;gt;
&amp;lt;/Students&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Avro (Row-Based Storage Format)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Explanation:&lt;br&gt;
Avro is a compact binary format often used in Apache Hadoop and Kafka. It stores data along with its schema, making it ideal for data streaming and serialization between services.&lt;/p&gt;

&lt;p&gt;Example (Schema + Data):&lt;br&gt;
Schema (avro_schema.json):&lt;/p&gt;

&lt;p&gt;{&lt;br&gt;
  "type": "record",&lt;br&gt;
  "name": "Student",&lt;br&gt;
  "fields": [&lt;br&gt;
    {"name": "Name", "type": "string"},&lt;br&gt;
    {"name": "Register_No", "type": "int"},&lt;br&gt;
    {"name": "Subject", "type": "string"},&lt;br&gt;
    {"name": "Marks", "type": "int"}&lt;br&gt;
  ]&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;Data (conceptual view):&lt;/p&gt;

&lt;p&gt;{&lt;br&gt;
  "Name": "Arjun", "Register_No": 101, "Subject": "Math", "Marks": 90&lt;br&gt;
}&lt;br&gt;
{&lt;br&gt;
  "Name": "Priya", "Register_No": 102, "Subject": "Science", "Marks": 88&lt;br&gt;
}&lt;br&gt;
{&lt;br&gt;
  "Name": "Kavin", "Register_No": 103, "Subject": "English", "Marks": 92&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;(Stored in binary format during real usage.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each data format serves a unique purpose:&lt;br&gt;
CSV → Simple and human-readable&lt;br&gt;
SQL → Structured and relational&lt;br&gt;
JSON → Flexible and web-friendly&lt;br&gt;
Parquet → Optimized for analytics&lt;br&gt;
XML → Hierarchical and descriptive&lt;br&gt;
Avro → Compact and schema-based&lt;br&gt;
Choosing the right data format depends on your use case, data size, and processing tools.&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>data</category>
      <category>dataengineering</category>
      <category>cloudcomputing</category>
    </item>
    <item>
      <title>MongoDB Hands-On</title>
      <dc:creator>CIBIN S </dc:creator>
      <pubDate>Sat, 06 Sep 2025 04:29:43 +0000</pubDate>
      <link>https://dev.to/cibin_s/mongodb-hands-on-30cf</link>
      <guid>https://dev.to/cibin_s/mongodb-hands-on-30cf</guid>
      <description>&lt;p&gt;Hello All,&lt;br&gt;
    I’ve been exploring how NoSQL databases work, and MongoDB was the perfect place to start.Its document-oriented structure makes it simple to model real-world data.Unlike relational databases, it doesn’t force a predefined schema, which means more flexibility.This article covers my step-by-step MongoDB practice project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Setup&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Installed MongoDB Compass locally for an easy GUI-based interaction.&lt;br&gt;
Created a database named yelpDB and a collection named reviews.&lt;br&gt;
Imported and manually added a dataset of sample Yelp-style business reviews.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.Tasks Performed&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Insert Records&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I manually inserted at least 10 records into the reviews collection.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffvbeyet9ut78usp6471o.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffvbeyet9ut78usp6471o.jpeg" alt="A pie chart showing 40% responded " width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3.Queries&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Top 5 Businesses with Highest Average Rating&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using the aggregation pipeline with $group and $sort, I retrieved the top 5 businesses with the highest average ratings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7o3qufqjteu6cbs5zpn5.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7o3qufqjteu6cbs5zpn5.jpeg" alt="A pie chart showing 40% responded " width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Count Reviews Containing the Word “good”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To analyze sentiment, I searched for reviews that contained the word "good" using a regex query.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsl6nmbou6c1z75gciyfo.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsl6nmbou6c1z75gciyfo.jpeg" alt="A pie chart showing 40% responded " width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Get All Reviews for a Specific Business ID&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I queried for all reviews belonging to a specific business ID (e.g., B7 – Healthy Bites).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cd9dmabdc1dpy8k5wqw.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cd9dmabdc1dpy8k5wqw.jpeg" alt="A pie chart showing 40% responded " width="800" height="265"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Update a Review &amp;amp; Delete a Record&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I performed both update and delete operations on the dataset:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq44dzd7dbqte716p11hs.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq44dzd7dbqte716p11hs.jpeg" alt="A pie chart showing 40% responded " width="800" height="421"&gt;&lt;/a&gt;&lt;br&gt;
Updated a review to reflect an improved service.&lt;br&gt;
Deleted one record flagged for removal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
This provided practical exposure to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Managing data in MongoDB Compass.&lt;/li&gt;
&lt;li&gt;Performing CRUD operations (Create, Read, Update, Delete).&lt;/li&gt;
&lt;li&gt;Writing queries &amp;amp; aggregation pipelines.&lt;/li&gt;
&lt;li&gt;Exporting data for reporting.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Overall, I learned how powerful MongoDB is for handling unstructured data and performing flexible queries without rigid schema constraints.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
