<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rethan Kumar cv</title>
    <description>The latest articles on DEV Community by Rethan Kumar cv (@rethan_kumarcv_c7864238f).</description>
    <link>https://dev.to/rethan_kumarcv_c7864238f</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3461001%2F00b86349-a0d0-4112-8ed3-6fcd8b85b0c9.png</url>
      <title>DEV Community: Rethan Kumar cv</title>
      <link>https://dev.to/rethan_kumarcv_c7864238f</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rethan_kumarcv_c7864238f"/>
    <language>en</language>
    <item>
      <title>📊 Understanding 6 Common Data Formats in Data Analytics</title>
      <dc:creator>Rethan Kumar cv</dc:creator>
      <pubDate>Tue, 07 Oct 2025 04:21:58 +0000</pubDate>
      <link>https://dev.to/rethan_kumarcv_c7864238f/understanding-6-common-data-formats-in-data-analytics-35gi</link>
      <guid>https://dev.to/rethan_kumarcv_c7864238f/understanding-6-common-data-formats-in-data-analytics-35gi</guid>
      <description>&lt;p&gt;&lt;strong&gt;Data formats&lt;/strong&gt; are the backbone of analytics — they determine how efficiently data is stored, transferred, and processed. Whether you’re working with simple spreadsheets or massive big data systems, understanding data formats helps you pick the right tool for the job.&lt;/p&gt;

&lt;p&gt;In this blog, let’s explore &lt;strong&gt;six popular data formats&lt;/strong&gt; used in data analytics:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;CSV (Comma Separated Values)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQL (Relational Table Format)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON (JavaScript Object Notation)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parquet (Columnar Storage Format)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;XML (Extensible Markup Language)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avro (Row-based Storage Format)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We’ll use a simple dataset throughout this article to represent the same data in all formats.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧩 Our Sample Dataset
&lt;/h2&gt;

&lt;p&gt;Let’s take a small dataset of student marks:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Register Number&lt;/th&gt;
&lt;th&gt;Subject&lt;/th&gt;
&lt;th&gt;Marks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Riya Sharma&lt;/td&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;95&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arjun Patel&lt;/td&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;Science&lt;/td&gt;
&lt;td&gt;88&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Meera Nair&lt;/td&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;92&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  1️⃣ CSV (Comma Separated Values)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;CSV&lt;/strong&gt; is the simplest and most human-readable data format. It stores data as plain text, where each line represents a record and values are separated by commas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Easy to read and write
&lt;/li&gt;
&lt;li&gt;Supported by almost every data tool (Excel, Python, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No data types (everything is text)&lt;/li&gt;
&lt;li&gt;Doesn’t handle nested or hierarchical data well&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example (CSV):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Name,Register Number,Subject,Marks
Riya Sharma,101,Math,95
Arjun Patel,102,Science,88
Meera Nair,103,English,92
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  2️⃣ SQL (Relational Table Format)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;SQL&lt;/strong&gt; data is stored in tables with defined columns and data types. It’s used in relational databases like MySQL, PostgreSQL, and SQLite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structured and enforceable schema
&lt;/li&gt;
&lt;li&gt;Supports powerful querying
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rigid structure
&lt;/li&gt;
&lt;li&gt;Not suitable for unstructured data
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example (SQL):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;Students&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;RegisterNumber&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Subject&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;Marks&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;Students&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RegisterNumber&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Marks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Riya Sharma'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;101&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Math'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Arjun Patel'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;102&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Science'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;88&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Meera Nair'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;103&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'English'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3️⃣ JSON (JavaScript Object Notation)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;JSON&lt;/strong&gt; stores data as key-value pairs, commonly used in APIs and web applications. It supports nested structures and is easily parsed by programming languages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Human-readable
&lt;/li&gt;
&lt;li&gt;Supports nested and complex data structures
&lt;/li&gt;
&lt;li&gt;Widely used in web and API data
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slightly larger file size
&lt;/li&gt;
&lt;li&gt;Parsing overhead compared to CSV
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example (JSON):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Riya Sharma"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"RegisterNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;101&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Math"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Marks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Arjun Patel"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"RegisterNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;102&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Science"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Marks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;88&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Meera Nair"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"RegisterNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;103&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"English"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Marks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  4️⃣ Parquet (Columnar Storage Format)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Parquet&lt;/strong&gt; is a &lt;strong&gt;columnar&lt;/strong&gt; storage format used in big data frameworks like Apache Spark and Hadoop. It stores data by columns instead of rows — making analytical queries faster and storage smaller.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Highly compressed and efficient
&lt;/li&gt;
&lt;li&gt;Great for analytical queries (e.g., aggregate functions)
&lt;/li&gt;
&lt;li&gt;Supports complex data types
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not human-readable
&lt;/li&gt;
&lt;li&gt;Best used with big data tools
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example (Conceptual Representation):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Column 1: Name -&amp;gt; ["Riya Sharma", "Arjun Patel", "Meera Nair"]
Column 2: RegisterNumber -&amp;gt; [101, 102, 103]
Column 3: Subject -&amp;gt; ["Math", "Science", "English"]
Column 4: Marks -&amp;gt; [95, 88, 92]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;(Actual Parquet files are binary and not viewable as text.)&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  5️⃣ XML (Extensible Markup Language)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;XML&lt;/strong&gt; represents data using custom tags, similar to HTML. It’s structured and self-descriptive but more verbose than JSON.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Self-descriptive tags
&lt;/li&gt;
&lt;li&gt;Good for hierarchical data
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verbose syntax
&lt;/li&gt;
&lt;li&gt;Slower to parse compared to JSON
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example (XML):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;Students&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Student&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;Name&amp;gt;&lt;/span&gt;Riya Sharma&lt;span class="nt"&gt;&amp;lt;/Name&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;RegisterNumber&amp;gt;&lt;/span&gt;101&lt;span class="nt"&gt;&amp;lt;/RegisterNumber&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;Subject&amp;gt;&lt;/span&gt;Math&lt;span class="nt"&gt;&amp;lt;/Subject&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;Marks&amp;gt;&lt;/span&gt;95&lt;span class="nt"&gt;&amp;lt;/Marks&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/Student&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Student&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;Name&amp;gt;&lt;/span&gt;Arjun Patel&lt;span class="nt"&gt;&amp;lt;/Name&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;RegisterNumber&amp;gt;&lt;/span&gt;102&lt;span class="nt"&gt;&amp;lt;/RegisterNumber&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;Subject&amp;gt;&lt;/span&gt;Science&lt;span class="nt"&gt;&amp;lt;/Subject&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;Marks&amp;gt;&lt;/span&gt;88&lt;span class="nt"&gt;&amp;lt;/Marks&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/Student&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Student&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;Name&amp;gt;&lt;/span&gt;Meera Nair&lt;span class="nt"&gt;&amp;lt;/Name&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;RegisterNumber&amp;gt;&lt;/span&gt;103&lt;span class="nt"&gt;&amp;lt;/RegisterNumber&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;Subject&amp;gt;&lt;/span&gt;English&lt;span class="nt"&gt;&amp;lt;/Subject&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;Marks&amp;gt;&lt;/span&gt;92&lt;span class="nt"&gt;&amp;lt;/Marks&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/Student&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/Students&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  6️⃣ Avro (Row-based Storage Format)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Avro&lt;/strong&gt;, developed by Apache, is a &lt;strong&gt;row-based&lt;/strong&gt; binary format often used for data serialization in big data pipelines. It requires a &lt;strong&gt;schema&lt;/strong&gt; definition and stores data efficiently for transmission between systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compact binary format
&lt;/li&gt;
&lt;li&gt;Schema-based (ensures consistency)
&lt;/li&gt;
&lt;li&gt;Ideal for streaming and big data
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not human-readable
&lt;/li&gt;
&lt;li&gt;Requires schema evolution handling
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example (Conceptual Representation):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"record"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Student"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"RegisterNumber"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"int"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Subject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Marks"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"int"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Data:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[
  {"Name": "Riya Sharma", "RegisterNumber": 101, "Subject": "Math", "Marks": 95},
  {"Name": "Arjun Patel", "RegisterNumber": 102, "Subject": "Science", "Marks": 88},
  {"Name": "Meera Nair", "RegisterNumber": 103, "Subject": "English", "Marks": 92}
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;(Actual Avro data is stored in binary format, not plain text.)&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Structure&lt;/th&gt;
&lt;th&gt;Readability&lt;/th&gt;
&lt;th&gt;Common Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CSV&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Row-based&lt;/td&gt;
&lt;td&gt;✅ Human-readable&lt;/td&gt;
&lt;td&gt;Spreadsheets, simple data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SQL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Table&lt;/td&gt;
&lt;td&gt;✅ Human-readable&lt;/td&gt;
&lt;td&gt;Relational databases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JSON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Key-value&lt;/td&gt;
&lt;td&gt;✅ Human-readable&lt;/td&gt;
&lt;td&gt;Web APIs, configurations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parquet&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Columnar&lt;/td&gt;
&lt;td&gt;❌ Binary&lt;/td&gt;
&lt;td&gt;Big data analytics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;XML&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tag-based&lt;/td&gt;
&lt;td&gt;✅ Human-readable&lt;/td&gt;
&lt;td&gt;Legacy web services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Avro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Row-based (binary)&lt;/td&gt;
&lt;td&gt;❌ Binary&lt;/td&gt;
&lt;td&gt;Data streaming, serialization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🚀 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Each data format has its own strengths — CSV is great for simplicity, JSON for flexibility, SQL for structure, and Parquet or Avro for performance in big data environments.  &lt;/p&gt;

&lt;p&gt;Choosing the right one depends on &lt;strong&gt;your data size, structure, and use case&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;em&gt;In modern analytics, you’ll often see multiple formats working together — for example, CSV input files transformed into Parquet for efficient querying in Spark.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  ✍️ Author: Rethan Kumar
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;UI/UX Designer | Tech Enthusiast | Exploring Data and Design&lt;/em&gt;&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>analytics</category>
      <category>datascience</category>
      <category>database</category>
    </item>
    <item>
      <title>MongoDB CRUD in the Shell — InsertMany, Update, Regex Count, Delete, and Top N Sort</title>
      <dc:creator>Rethan Kumar cv</dc:creator>
      <pubDate>Tue, 26 Aug 2025 16:57:48 +0000</pubDate>
      <link>https://dev.to/rethan_kumarcv_c7864238f/mongodb-crud-in-the-shell-insertmany-update-regex-count-delete-and-top-n-sort-46pa</link>
      <guid>https://dev.to/rethan_kumarcv_c7864238f/mongodb-crud-in-the-shell-insertmany-update-regex-count-delete-and-top-n-sort-46pa</guid>
      <description>&lt;p&gt;Recently, I had the chance to work hands-on with MongoDB using a sample games dataset. This activity gave me a deeper understanding of how NoSQL databases operate, from storing and managing data to performing meaningful analysis.&lt;/p&gt;

&lt;p&gt;Setting Up the Environment&lt;/p&gt;

&lt;p&gt;I began by installing and running MongoDB Compass, which served as a user-friendly interface for database interaction. I created a new database called gameDB, where all the records would be stored and managed.&lt;/p&gt;

&lt;p&gt;Importing the Dataset&lt;/p&gt;

&lt;p&gt;The dataset was provided in JSON format and included details such as:&lt;/p&gt;

&lt;p&gt;Game ID&lt;/p&gt;

&lt;p&gt;Title&lt;/p&gt;

&lt;p&gt;Genre&lt;/p&gt;

&lt;p&gt;Rating&lt;/p&gt;

&lt;p&gt;Review&lt;/p&gt;

&lt;p&gt;Once imported, the data appeared neatly structured inside a collection, making it easy to explore.&lt;/p&gt;

&lt;p&gt;Hands-On with CRUD Operations&lt;/p&gt;

&lt;p&gt;🔹 Inserting Records&lt;br&gt;
To get comfortable, I manually added 10 new game entries. This exercise helped me understand how MongoDB handles data insertion.&lt;/p&gt;

&lt;p&gt;🔹 Querying the Data&lt;/p&gt;

&lt;p&gt;Top 5 Games by Rating – I filtered and sorted the dataset to find the highest-rated games.&lt;/p&gt;

&lt;p&gt;Keyword Search in Reviews – By searching for the term “good” in reviews, I explored how MongoDB manages text queries.&lt;/p&gt;

&lt;p&gt;Fetching Reviews by Game ID – I retrieved reviews for a particular game, showing the power of targeted queries.&lt;/p&gt;

&lt;p&gt;🔹 Updating &amp;amp; Deleting&lt;br&gt;
I experimented with updating a review to reflect new feedback and deleting an outdated record. This highlighted MongoDB’s flexibility in handling dynamic data.&lt;/p&gt;

&lt;p&gt;Exporting Results&lt;/p&gt;

&lt;p&gt;After performing the queries, I exported results into JSON and CSV formats. This step was useful for sharing insights and conducting further analysis outside the database.&lt;/p&gt;

&lt;p&gt;Key Learnings&lt;/p&gt;

&lt;p&gt;Working with MongoDB on this dataset gave me valuable insights:&lt;/p&gt;

&lt;p&gt;How CRUD operations (Create, Read, Update, Delete) work in a NoSQL environment&lt;/p&gt;

&lt;p&gt;Ways to query, filter, and analyze unstructured data effectively&lt;/p&gt;

&lt;p&gt;The ease of exporting results for external usage&lt;/p&gt;

&lt;p&gt;Conclusion&lt;/p&gt;

&lt;p&gt;This project showed me how MongoDB can be an excellent choice for handling semi-structured datasets like game reviews. It combines flexibility, scalability, and powerful querying features, making it well-suited for modern data-driven applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkeceh30scy1sfxkwtg4m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkeceh30scy1sfxkwtg4m.jpg" alt=" " width="800" height="170"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjpdeez7v70gfvcwe32v1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjpdeez7v70gfvcwe32v1.jpg" alt=" " width="800" height="290"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzs0waycrl3lhgzpi9g9h.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzs0waycrl3lhgzpi9g9h.jpg" alt=" " width="800" height="58"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm7bfr0yjun6pab5b30f1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm7bfr0yjun6pab5b30f1.jpg" alt=" " width="800" height="39"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1y9a3ikx4y6iowhgpwg6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1y9a3ikx4y6iowhgpwg6.jpg" alt=" " width="800" height="162"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foavx3bqdmdk3hvcwd6a7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foavx3bqdmdk3hvcwd6a7.jpg" alt=" " width="800" height="435"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>mongodb</category>
      <category>database</category>
      <category>tutorial</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
