<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Amit Mishra</title>
    <description>The latest articles on DEV Community by Amit Mishra (@kingsterdam).</description>
    <link>https://dev.to/kingsterdam</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3943868%2F4aa9e2b5-35ba-46ce-b237-227fcda1ca5d.gif</url>
      <title>DEV Community: Amit Mishra</title>
      <link>https://dev.to/kingsterdam</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kingsterdam"/>
    <language>en</language>
    <item>
      <title>Big Data Is Not Just About “Huge Data”</title>
      <dc:creator>Amit Mishra</dc:creator>
      <pubDate>Thu, 21 May 2026 10:36:05 +0000</pubDate>
      <link>https://dev.to/kingsterdam/-big-data-is-not-just-about-huge-data-4bp7</link>
      <guid>https://dev.to/kingsterdam/-big-data-is-not-just-about-huge-data-4bp7</guid>
      <description>&lt;p&gt;When I first started learning about Big Data, I used to think it was mainly about storing massive amounts of information.&lt;/p&gt;

&lt;p&gt;But after working around real enterprise systems and large-scale pipelines, I realized the real challenge is not simply the size of the data.&lt;/p&gt;

&lt;p&gt;It’s everything that comes with it.&lt;/p&gt;

&lt;p&gt;As systems grow, data starts arriving from everywhere:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;APIs&lt;/li&gt;
&lt;li&gt;applications&lt;/li&gt;
&lt;li&gt;IoT devices&lt;/li&gt;
&lt;li&gt;logs&lt;/li&gt;
&lt;li&gt;databases&lt;/li&gt;
&lt;li&gt;streaming platforms&lt;/li&gt;
&lt;li&gt;user interactions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Very quickly, managing that ecosystem becomes more difficult than writing the actual transformation logic.&lt;/p&gt;

&lt;p&gt;One thing that surprised me while working with large datasets was how small inefficiencies suddenly become major production issues at scale.&lt;/p&gt;

&lt;p&gt;A query that works perfectly on a few million rows may become extremely slow when datasets grow 100x larger. Similarly, a poorly optimized Spark job can consume huge resources without anyone noticing immediately.&lt;/p&gt;

&lt;p&gt;That’s when concepts like partitioning, distributed processing, incremental loading, and monitoring start becoming important in practical scenarios.&lt;/p&gt;

&lt;p&gt;Another interesting thing about Big Data is how much engineering discipline matters.&lt;/p&gt;

&lt;p&gt;People often focus heavily on tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spark&lt;/li&gt;
&lt;li&gt;Kafka&lt;/li&gt;
&lt;li&gt;Hadoop&lt;/li&gt;
&lt;li&gt;Delta Lake&lt;/li&gt;
&lt;li&gt;Fabric&lt;/li&gt;
&lt;li&gt;Databricks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But architecture decisions usually matter even more than the technology itself.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how data is partitioned&lt;/li&gt;
&lt;li&gt;how pipelines recover from failures&lt;/li&gt;
&lt;li&gt;how retries are handled&lt;/li&gt;
&lt;li&gt;how monitoring is implemented&lt;/li&gt;
&lt;li&gt;how teams access shared datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These decisions quietly affect performance, scalability, and operational stability later.&lt;/p&gt;

&lt;p&gt;One thing I personally enjoy about data engineering is that it sits somewhere between software engineering and infrastructure engineering.&lt;/p&gt;

&lt;p&gt;You are not just writing code.&lt;/p&gt;

&lt;p&gt;You are designing systems that continuously move and process large amounts of data reliably.&lt;/p&gt;

&lt;p&gt;And honestly, the operational side becomes very real once pipelines move into production.&lt;/p&gt;

&lt;p&gt;Sometimes the hardest problem is not processing the data.&lt;br&gt;
It’s figuring out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why a job failed at 2 AM&lt;/li&gt;
&lt;li&gt;why a cluster suddenly slowed down&lt;/li&gt;
&lt;li&gt;why downstream reports show incomplete data&lt;/li&gt;
&lt;li&gt;or why one dependency broke an entire workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s where monitoring and observability become just as important as the data pipeline itself.&lt;/p&gt;

&lt;p&gt;I also think Big Data is becoming even more interesting now because of AI.&lt;/p&gt;

&lt;p&gt;Modern AI systems depend heavily on data quality, scalable storage, fast processing, and reliable pipelines. In many ways, data engineering has quietly become one of the foundations behind modern AI systems.&lt;/p&gt;

&lt;p&gt;The more I explore this space, the more I feel that Big Data engineering is less about handling “big files” and more about building reliable systems that can survive complexity at scale.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
