<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vignesh K</title>
    <description>The latest articles on DEV Community by Vignesh K (@vignesh_k_165855f8c465905).</description>
    <link>https://dev.to/vignesh_k_165855f8c465905</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3462461%2F63e79408-62bb-452c-8880-30160e7f9179.png</url>
      <title>DEV Community: Vignesh K</title>
      <link>https://dev.to/vignesh_k_165855f8c465905</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vignesh_k_165855f8c465905"/>
    <language>en</language>
    <item>
      <title>Data Formats</title>
      <dc:creator>Vignesh K</dc:creator>
      <pubDate>Mon, 06 Oct 2025 04:45:24 +0000</pubDate>
      <link>https://dev.to/vignesh_k_165855f8c465905/data-formats-217i</link>
      <guid>https://dev.to/vignesh_k_165855f8c465905/data-formats-217i</guid>
      <description>&lt;h1&gt;
  
  
  Understanding Popular Data Formats: CSV, SQL, JSON, XML, Avro, and Parquet
&lt;/h1&gt;

&lt;p&gt;When working with data engineering, analytics, or backend systems, you’ll often come across multiple data formats. Each has its own strengths, weaknesses, and use cases. Let’s explore the most common ones:  &lt;/p&gt;




&lt;h2&gt;
  
  
  1. CSV (Comma-Separated Values)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; A plain text file where values are separated by commas.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use case:&lt;/strong&gt; Simple tabular data, easy import/export.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; Human-readable, lightweight.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; No support for nested data, no schema enforcement.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. SQL (Structured Query Language)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; A language to interact with relational databases. Data is stored in tables with defined schema.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use case:&lt;/strong&gt; Storing structured data in relational databases (MySQL, PostgreSQL, etc.).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; Strong schema, powerful queries, ACID compliance.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; Not flexible for unstructured data.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. JSON (JavaScript Object Notation)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; A lightweight data format for representing structured, nested data using key-value pairs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use case:&lt;/strong&gt; Web APIs, configurations, NoSQL databases.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; Human-readable, supports hierarchy.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; Can become verbose for large datasets.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. XML (eXtensible Markup Language)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; A markup-based format using tags to represent data.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use case:&lt;/strong&gt; Legacy systems, document storage, SOAP APIs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; Supports metadata, validation with DTD/XSD.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; Verbose, harder to read compared to JSON.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  5. Avro
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; A row-based binary format developed by Apache, commonly used in data pipelines.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use case:&lt;/strong&gt; Kafka messaging, big data serialization.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; Compact, schema evolution supported.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; Not human-readable (binary format).
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  6. Parquet
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; A columnar storage format optimized for analytics.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use case:&lt;/strong&gt; Big data processing (Spark, Hadoop, AWS Athena).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; Compressed, fast query performance, great for large-scale analytics.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; Not human-readable.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Readable?&lt;/th&gt;
&lt;th&gt;Best Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CSV&lt;/td&gt;
&lt;td&gt;Row-based&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;Simple tabular data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL&lt;/td&gt;
&lt;td&gt;Relational&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;Databases with strong schema&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;td&gt;Hierarchical&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;APIs, configs, NoSQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XML&lt;/td&gt;
&lt;td&gt;Hierarchical&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;Legacy systems, structured documents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avro&lt;/td&gt;
&lt;td&gt;Row-based binary&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;Messaging, streaming pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parquet&lt;/td&gt;
&lt;td&gt;Columnar binary&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;Big data analytics, fast queries&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  Final Thoughts
&lt;/h3&gt;

&lt;p&gt;Each format shines in different scenarios:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;CSV&lt;/strong&gt; for small tabular data.
&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;SQL&lt;/strong&gt; for structured databases.
&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;JSON&lt;/strong&gt; for modern APIs.
&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;XML&lt;/strong&gt; when working with older systems.
&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Avro&lt;/strong&gt; for messaging pipelines.
&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Parquet&lt;/strong&gt; for analytics at scale.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd39kwcowr51dmygqqwel.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd39kwcowr51dmygqqwel.png" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>data</category>
      <category>analytics</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Working with Yelp Data in MongoDB: CRUD Operations and Queries</title>
      <dc:creator>Vignesh K</dc:creator>
      <pubDate>Wed, 27 Aug 2025 10:27:50 +0000</pubDate>
      <link>https://dev.to/vignesh_k_165855f8c465905/working-with-yelp-data-in-mongodb-crud-operations-and-queries-5d0o</link>
      <guid>https://dev.to/vignesh_k_165855f8c465905/working-with-yelp-data-in-mongodb-crud-operations-and-queries-5d0o</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhq9rte5f0397axhfkcr3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhq9rte5f0397axhfkcr3.png" alt=" " width="800" height="696"&gt;&lt;/a&gt;In this blog post, we'll explore how to perform basic CRUD operations and queries on a MongoDB database named mydb with a collection called yelp, based on a Yelp dataset. We'll use MongoDB's JavaScript-based query syntax to insert records, find top-rated businesses, count reviews containing specific words, retrieve reviews for a business, update a review, and delete a record. Let's dive in!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Inserting Records into the Yelp Collection&lt;/strong&gt;&lt;br&gt;
To populate the yelp collection, we can use the insertMany method to add multiple documents at once. Below is an example of inserting 10 review documents, each with fields like business_id, date, review_id, stars, text, type, user_id, cool, useful, and funny.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59w3x9unbaapme6o7f75.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59w3x9unbaapme6o7f75.png" alt=" " width="800" height="463"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F22r30sspv38oeo4wil0n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F22r30sspv38oeo4wil0n.png" alt=" " width="800" height="681"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftad4xp1zx7bfjqd6i88l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftad4xp1zx7bfjqd6i88l.png" alt=" " width="800" height="688"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Finding the Top 5 Businesses by Average Rating&lt;/strong&gt;&lt;br&gt;
To find the top 5 businesses with the highest average rating, we use MongoDB's aggregation pipeline. The pipeline groups reviews by 'business_id', computes the average 'stars', sorts in descending order, and limits to 5 results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffag4v7p7t9f6kpttrfrh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffag4v7p7t9f6kpttrfrh.png" alt=" " width="800" height="191"&gt;&lt;/a&gt;&lt;br&gt;
For the inserted data, this query returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;'C3D4E5F6G7H8I9J0K1L2M3': 4.5&lt;/li&gt;
&lt;li&gt;'E5F6G7H8I9J0K1L2M3N4O5': 4.5&lt;/li&gt;
&lt;li&gt;'A1B2C3D4E5F6G7H8I9J0K1': 4.5&lt;/li&gt;
&lt;li&gt;'D4E5F6G7H8I9J0K1L2M3N4': 4&lt;/li&gt;
&lt;li&gt;'B2C3D4E5F6G7H8I9J0K1L2': 3.5&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Counting Reviews Containing the Word “Good”&lt;/strong&gt;&lt;br&gt;
To count reviews with the word 'good' in the 'text' field, we use countDocuments with a case-insensitive regex.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo4ovrhedmpuy7qbi4okd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo4ovrhedmpuy7qbi4okd.png" alt=" " width="800" height="85"&gt;&lt;/a&gt;&lt;br&gt;
This returns '3' for the inserted data, as three reviews contain 'good' (review IDs 'R4V5W6X7Y8Z9A0B1C2D3E4', 'R6X7Y8Z9A0B1C2D3E4F5G6', 'R9A0B1C2D3E4F5G6H7I8J9').&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Retrieving All Reviews for a Specific Business&lt;/strong&gt;&lt;br&gt;
To fetch all reviews for a specific 'business_id', we use the find method. For example, to get reviews for '9yKzy9PApeiPPOUJEtnvkg' (from the sample Yelp data):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Forv5gvhynoqrvi02e5dy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Forv5gvhynoqrvi02e5dy.png" alt=" " width="767" height="81"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Updating a Review&lt;/strong&gt;&lt;br&gt;
To update a review, we use updateOne to modify specific fields. For example, to update the review with 'review_id' 'fWKvX83p0-ka4JS3dc6E5A' (from the sample data, if inserted):&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F23s8mbsdi09f4ac0dusi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F23s8mbsdi09f4ac0dusi.png" alt=" " width="703" height="85"&gt;&lt;/a&gt;&lt;br&gt;
Notes&lt;br&gt;
These queries assume you're using MongoDB with the 'yelp' collection in the 'mydb' database.&lt;br&gt;
The 'date' field uses MongoDB's Date object for proper storage. When importing a CSV, ensure dates are parsed correctly (e.g., new Date("YYYY-MM-DD")).&lt;br&gt;
The regex in the 'good' query is case-insensitive ('$options: "i"').&lt;br&gt;
To import a large 'yelp.csv' file, you can use MongoDB's mongoimport tool or a script to parse and insert the data.&lt;br&gt;
If you encounter issues or need help with specific MongoDB versions, importing data, or running these queries, feel free to share more details!&lt;br&gt;
This setup provides a solid foundation for working with Yelp data in MongoDB. You can extend these queries for more complex analyses, like filtering by date or combining conditions. Happy coding!&lt;/p&gt;

</description>
      <category>mongodb</category>
      <category>programming</category>
      <category>beginners</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
