<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rocio Baigorria</title>
    <description>The latest articles on DEV Community by Rocio Baigorria (@tuni56).</description>
    <link>https://dev.to/tuni56</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3799944%2Fb65f81d7-eb72-4bca-b3c4-986071aada7f.png</url>
      <title>DEV Community: Rocio Baigorria</title>
      <link>https://dev.to/tuni56</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tuni56"/>
    <language>en</language>
    <item>
      <title>Your Serverless Data Lake is Lying to You (Add Observability or Lose Data)</title>
      <dc:creator>Rocio Baigorria</dc:creator>
      <pubDate>Mon, 20 Apr 2026 13:15:30 +0000</pubDate>
      <link>https://dev.to/tuni56/your-serverless-data-lake-is-lying-to-you-add-observability-or-lose-data-5cpf</link>
      <guid>https://dev.to/tuni56/your-serverless-data-lake-is-lying-to-you-add-observability-or-lose-data-5cpf</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Serverless Data Lakes Scale, But Fail Silently
Serverless data lakes scale well, but can fail silently.&lt;/li&gt;
&lt;li&gt;Without observability, you risk incomplete or incorrect data.&lt;/li&gt;
&lt;li&gt;Add a DLQ to capture failed events.&lt;/li&gt;
&lt;li&gt;Use Amazon CloudWatch + Amazon SNS for real visibility.&lt;/li&gt;
&lt;li&gt;Trade-off: More components, but far more reliable pipelines.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Moment I Stopped Trusting "Successful Pipelines"&lt;br&gt;
It was 2 AM.&lt;br&gt;
The pipeline had "completed successfully."&lt;br&gt;
Amazon Athena was returning results.&lt;br&gt;
But the numbers didn’t match.&lt;/p&gt;

&lt;p&gt;Digging into Amazon CloudWatch logs, I found the issue:&lt;br&gt;
Messages were stuck in a queue no one was monitoring.&lt;br&gt;
No alerts. No visible errors. Just missing data.&lt;/p&gt;

&lt;p&gt;Serverless systems don’t fail loudly. They fail silently.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Typical Setup (and the Hidden Risk)
&lt;/h3&gt;

&lt;p&gt;Most people build serverless data lakes like this:&lt;/p&gt;

&lt;p&gt;Amazon S3 → storage&lt;/p&gt;

&lt;p&gt;AWS Glue → transformations&lt;/p&gt;

&lt;p&gt;Amazon Athena → querying&lt;/p&gt;

&lt;p&gt;It works.&lt;br&gt;
But it assumes that if the pipeline runs… the data is correct.&lt;br&gt;
That assumption is dangerous.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Was Missing: Observability
&lt;/h3&gt;

&lt;p&gt;The problem wasn’t compute or storage. It was visibility.&lt;/p&gt;

&lt;p&gt;I couldn’t answer basic questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did all events get processed?&lt;/li&gt;
&lt;li&gt;Did anything fail permanently?&lt;/li&gt;
&lt;li&gt;Is data delayed or missing?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you can’t answer those, you don’t have a production system.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Fix: Design for Failure
&lt;/h3&gt;

&lt;p&gt;I reworked the architecture for an e-commerce analytics demo with one rule: Every failure must be visible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Add a Buffer (S3 → SQS)&lt;/strong&gt;&lt;br&gt;
Instead of triggering jobs directly:&lt;/p&gt;

&lt;p&gt;Amazon S3 emits events&lt;/p&gt;

&lt;p&gt;Amazon SQS captures them&lt;/p&gt;

&lt;p&gt;Why it matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Decoupling&lt;/li&gt;
&lt;li&gt;Retry control&lt;/li&gt;
&lt;li&gt;No lost events on spikes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Add a DLQ (Non-Negotiable)&lt;/strong&gt;&lt;br&gt;
Every queue has a Dead Letter Queue.&lt;br&gt;
After retries fail: → Message goes to DLQ.&lt;/p&gt;

&lt;p&gt;Now:&lt;/p&gt;

&lt;p&gt;Nothing disappears&lt;/p&gt;

&lt;p&gt;You can inspect failures&lt;/p&gt;

&lt;p&gt;You can replay data&lt;/p&gt;

&lt;p&gt;Without a DLQ, you’re guessing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Keep Orchestration Simple&lt;/strong&gt;&lt;br&gt;
AWS Lambda polls SQS&lt;/p&gt;

&lt;p&gt;Triggers AWS Glue jobs&lt;br&gt;
No heavy orchestrators needed for this use case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Optimize for Analytics&lt;/strong&gt;&lt;br&gt;
Raw data in S3 (CSV/JSON)&lt;/p&gt;

&lt;p&gt;Transform to Parquet&lt;/p&gt;

&lt;p&gt;Partition by date&lt;/p&gt;

&lt;p&gt;This keeps costs down and queries fast in Amazon Athena.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability (The Part Most People Skip)
&lt;/h3&gt;

&lt;p&gt;This is the difference between "it works" and "it’s reliable".&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metrics (Amazon CloudWatch)&lt;/li&gt;
&lt;li&gt;Queue depth&lt;/li&gt;
&lt;li&gt;DLQ size&lt;/li&gt;
&lt;li&gt;Glue job failures&lt;/li&gt;
&lt;li&gt;Lambda errors&lt;/li&gt;
&lt;li&gt;Alerts (Amazon SNS)&lt;/li&gt;
&lt;li&gt;DLQ &amp;gt; 0 → alert&lt;/li&gt;
&lt;li&gt;Glue job fails → alert&lt;/li&gt;
&lt;li&gt;Pipeline inactivity → alert&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If something breaks, you should know immediately.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trade-Offs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What you gain:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Reliable data pipelines&lt;/p&gt;

&lt;p&gt;Full visibility&lt;/p&gt;

&lt;p&gt;Faster debugging&lt;/p&gt;

&lt;p&gt;Confidence in your data&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you pay:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;More moving parts (SQS, DLQ, Lambda)&lt;/p&gt;

&lt;p&gt;Slight increase in cost&lt;/p&gt;

&lt;p&gt;Extra setup for monitoring&lt;/p&gt;

&lt;p&gt;The Real Decision&lt;br&gt;
You’re not choosing between simple and complex.&lt;br&gt;
You’re choosing between:&lt;/p&gt;

&lt;p&gt;A simple system that hides failures&lt;/p&gt;

&lt;p&gt;A system that tells you when it breaks&lt;/p&gt;

&lt;p&gt;For production systems, that’s not optional.&lt;/p&gt;

&lt;h3&gt;
  
  
  Final Thought
&lt;/h3&gt;

&lt;p&gt;Serverless removes infrastructure.&lt;br&gt;
It does NOT remove responsibility.&lt;/p&gt;

&lt;p&gt;If you don’t design for observability:&lt;br&gt;
Your system will fail quietly—and you won’t know when.&lt;/p&gt;

&lt;p&gt;How are you handling failures in your pipelines?&lt;br&gt;
Do you have a DLQ… or are you trusting logs? 👇&lt;/p&gt;

</description>
      <category>aws</category>
      <category>dataengineering</category>
      <category>serverless</category>
      <category>observability</category>
    </item>
    <item>
      <title>Stop Babysitting Servers: Build a Scalable Serverless Data Lake on AWS</title>
      <dc:creator>Rocio Baigorria</dc:creator>
      <pubDate>Mon, 06 Apr 2026 21:07:48 +0000</pubDate>
      <link>https://dev.to/tuni56/stop-babysitting-servers-build-a-scalable-serverless-data-lake-on-aws-2pn7</link>
      <guid>https://dev.to/tuni56/stop-babysitting-servers-build-a-scalable-serverless-data-lake-on-aws-2pn7</guid>
      <description>&lt;p&gt;Building data pipelines shouldn't feel like babysitting servers. If you’ve ever managed a dedicated cluster just to run a few SQL queries, you know the pain: capacity planning, idle costs, and the "fun" of scaling infrastructure at 3 AM.&lt;/p&gt;

&lt;p&gt;As a Data Engineering professional, I always follow a simple mantra: Design, then exist. (Or in this case: Design serverless, then relax.)&lt;/p&gt;

&lt;p&gt;Today, we’re breaking down how to centralize your fragmented data into a Serverless Data Lake using the "Big Three" of AWS: S3, Glue, and Athena.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Serverless?
&lt;/h2&gt;

&lt;p&gt;The beauty of a serverless approach is the decoupling of storage from compute. You only pay for what you store and what you process.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Amazon S3&lt;/strong&gt; (The Backbone)
S3 is your central repository. A professional setup doesn't just "dump" data; it organizes it into Layers:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Raw Layer: The "Source of Truth." Data exactly as it arrived (CSV, JSON, Logs).&lt;/p&gt;

&lt;p&gt;Curated Layer: Cleaned, partitioned, and optimized data (usually in Parquet format).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AWS Glue&lt;/strong&gt; (The Librarian)&lt;br&gt;
You don't want to manually define schemas. Glue Crawlers scan your S3 buckets, infer the data types, and populate the Glue Data Catalog, which acts as a central metadata repository.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;*&lt;em&gt;Amazon Athena *&lt;/em&gt;(The Engine)&lt;br&gt;
Athena is an interactive query service that lets you run standard SQL directly against your files in S3. There are no clusters to spin up and no infrastructure to manage.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Quick Implementation: From S3 to SQL
&lt;/h2&gt;

&lt;p&gt;Ingest: Upload your dataset into your raw S3 bucket.&lt;/p&gt;

&lt;p&gt;Catalog: Point a Glue Crawler at that bucket. Once it finishes, you'll see a new table in your Data Catalog.&lt;/p&gt;

&lt;p&gt;Query: Open the Athena Console and run your analysis:&lt;/p&gt;

&lt;p&gt;SQL&lt;br&gt;
-- Aggregating sales data directly from S3 files&lt;br&gt;
SELECT &lt;br&gt;
    region, &lt;br&gt;
    SUM(amount) as total_sales&lt;br&gt;
FROM "data_lake_db"."sales_curated"&lt;br&gt;
GROUP BY region&lt;br&gt;
ORDER BY total_sales DESC;&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Engineer Pro-Tips
&lt;/h2&gt;

&lt;p&gt;If you're moving from a POC to production, keep these two things in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Friends don't let friends use CSV for Analytics: Convert your data to Apache Parquet. Because it’s a columnar format, Athena only reads the columns you actually query. This can reduce your query costs by up to 90%.&lt;/li&gt;
&lt;li&gt;Partitioning is King: Organize your S3 paths by date (e.g., s3://my-bucket/year=2026/month=04/). This limits the amount of data Athena has to scan, making your queries lightning-fast.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Serverless Data Lakes allow us to experiment fast. You can build a proof-of-concept in an afternoon and scale it to petabytes without ever touching a Linux terminal.&lt;/p&gt;

&lt;p&gt;Are you using a Data Lake at your company, or are you still sticking with traditional Data Warehouses? Let's talk about the pros and cons in the comments!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>dataengineering</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Flink + AI: Building Real-Time Decision Systems (Not Just Data Pipelines)</title>
      <dc:creator>Rocio Baigorria</dc:creator>
      <pubDate>Tue, 31 Mar 2026 11:57:20 +0000</pubDate>
      <link>https://dev.to/tuni56/flink-ai-building-real-time-decision-systems-not-just-data-pipelines-2j89</link>
      <guid>https://dev.to/tuni56/flink-ai-building-real-time-decision-systems-not-just-data-pipelines-2j89</guid>
      <description>&lt;p&gt;The problem is no longer moving data&lt;/p&gt;

&lt;p&gt;For years, “real-time” meant pushing data from transactional systems into dashboards as fast as possible.&lt;/p&gt;

&lt;p&gt;That’s no longer enough.&lt;/p&gt;

&lt;p&gt;Today, while events are still happening, something — or someone — needs to decide.&lt;/p&gt;

&lt;p&gt;The bottleneck isn’t speed anymore.&lt;br&gt;
It’s context.&lt;/p&gt;

&lt;p&gt;An AI model without fresh context makes poor decisions.&lt;br&gt;
A pipeline without governance creates noise.&lt;br&gt;
A stateless system cannot understand what’s actually happening.&lt;/p&gt;

&lt;p&gt;In a world measured in milliseconds, moving data isn’t the goal.&lt;br&gt;
We need systems that understand context and act while the data is still valuable.&lt;/p&gt;

&lt;p&gt;This forces a shift in mindset:&lt;/p&gt;

&lt;p&gt;from data pipelines → to decision architectures&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The power stack: Flink + AI agents&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where Apache Flink enters the picture.&lt;/p&gt;

&lt;p&gt;Flink is not just another streaming engine.&lt;br&gt;
It’s designed to process events where state and time are first-class citizens.&lt;/p&gt;

&lt;p&gt;Two capabilities make it critical:&lt;/p&gt;

&lt;p&gt;Stateful processing → it keeps memory across events. You don’t just see the current data point; you see its recent history.&lt;br&gt;
Windowing → it groups events over time (seconds, minutes, hours) to detect patterns instead of isolated signals.&lt;/p&gt;

&lt;p&gt;Now combine that with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an event backbone like Kafka&lt;/li&gt;
&lt;li&gt;AI agents (for example, powered by Bedrock or similar platforms)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The flow changes completely:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Events enter through Kafka&lt;/li&gt;
&lt;li&gt;Flink processes, cleans, aggregates, and maintains state&lt;/li&gt;
&lt;li&gt;The output feeds an AI agent with fresh, structured context&lt;/li&gt;
&lt;li&gt;The agent doesn’t just answer — it acts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the critical shift:&lt;/p&gt;

&lt;p&gt;You’re no longer asking&lt;br&gt;
“What happened?”&lt;/p&gt;

&lt;p&gt;You’re asking&lt;br&gt;
“What should I do now?”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use case: the data “purifier”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Think about it this way.&lt;/p&gt;

&lt;p&gt;You wouldn’t drink water directly from a raw source.&lt;br&gt;
You need a purifier to remove impurities and make it safe.&lt;/p&gt;

&lt;p&gt;Data works the same way.&lt;/p&gt;

&lt;p&gt;An AI agent fed with raw event streams will:&lt;/p&gt;

&lt;p&gt;mix old and new signals&lt;br&gt;
lose temporal context&lt;br&gt;
produce inconsistent or “hallucinated” decisions&lt;/p&gt;

&lt;p&gt;Flink plays the role of that purifier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deduplicates events&lt;/li&gt;
&lt;li&gt;corrects out-of-order data&lt;/li&gt;
&lt;li&gt;enriches streams with state&lt;/li&gt;
&lt;li&gt;filters noise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a clean, reliable stream of truth.&lt;/p&gt;

&lt;p&gt;When that stream reaches the AI agent, everything changes.&lt;/p&gt;

&lt;p&gt;The agent is no longer reacting to fragmented inputs.&lt;br&gt;
It operates on a coherent, real-time representation of reality.&lt;/p&gt;

&lt;p&gt;And in real-time systems, that’s the difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;automating decisions&lt;/li&gt;
&lt;li&gt;or scaling mistakes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;From pipelines to systems that decide&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’re entering a phase where the value is no longer in visualizing data, but in acting on it at the right moment.&lt;/p&gt;

&lt;p&gt;Flink is not just a processing tool.&lt;br&gt;
It’s a foundational layer for building systems that understand context.&lt;/p&gt;

&lt;p&gt;AI agents don’t replace this layer.&lt;br&gt;
They depend on it.&lt;/p&gt;

&lt;p&gt;Right now, I’m going deep into this stack — preparing for the Data Streaming World Tour and working toward Flink certification — with a clear focus:&lt;/p&gt;

&lt;p&gt;designing systems where data doesn’t just flow, but drives real-time decisions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The real question&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;How are you managing state in your AI agents in production?&lt;/p&gt;

</description>
      <category>eventdriven</category>
      <category>agenticai</category>
      <category>dataengineering</category>
      <category>flink</category>
    </item>
    <item>
      <title>Kafka and Data Streaming: From Batch Thinking to Real-Time Systems</title>
      <dc:creator>Rocio Baigorria</dc:creator>
      <pubDate>Tue, 24 Mar 2026 14:38:54 +0000</pubDate>
      <link>https://dev.to/tuni56/kafka-and-data-streaming-from-batch-thinking-to-real-time-systems-h80</link>
      <guid>https://dev.to/tuni56/kafka-and-data-streaming-from-batch-thinking-to-real-time-systems-h80</guid>
      <description>&lt;p&gt;Most systems don’t fail because of scale. They fail because they were designed for a world that no longer exists.&lt;/p&gt;

&lt;p&gt;A world where data arrives late, gets processed in batches, and decisions can wait.&lt;/p&gt;

&lt;p&gt;That world is gone.&lt;/p&gt;

&lt;p&gt;Today, data moves continuously. Payments, user behavior, logistics, fraud signals — everything is happening in motion. If your system waits, you lose.&lt;/p&gt;

&lt;p&gt;This is where data streaming — and Apache Kafka — changes the game.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Data Streaming?
&lt;/h2&gt;

&lt;p&gt;Data streaming is the practice of processing data as it is generated, instead of storing it first and analyzing it later.&lt;/p&gt;

&lt;p&gt;Think of it like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Batch processing:&lt;/strong&gt; collect → store → process → act&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming:&lt;/strong&gt; produce → process → act (in real time)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The shift is not technical. It’s architectural.&lt;/p&gt;

&lt;p&gt;Streaming forces you to think in &lt;strong&gt;events&lt;/strong&gt;, not tables.&lt;/p&gt;




&lt;h2&gt;
  
  
  Enter Apache Kafka
&lt;/h2&gt;

&lt;p&gt;Apache Kafka is a distributed event streaming platform designed to handle high-throughput, real-time data feeds.&lt;/p&gt;

&lt;p&gt;At its core, Kafka is built around a simple idea:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Everything is an event.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;An event can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A payment&lt;/li&gt;
&lt;li&gt;A user click&lt;/li&gt;
&lt;li&gt;A sensor reading&lt;/li&gt;
&lt;li&gt;A log entry&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These events are written to &lt;strong&gt;topics&lt;/strong&gt;, which act like append-only logs.&lt;/p&gt;

&lt;p&gt;From there:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Producers&lt;/strong&gt; send events into Kafka&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consumers&lt;/strong&gt; read events from Kafka&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consumer groups&lt;/strong&gt; allow systems to scale horizontally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kafka doesn’t just move data. It becomes the backbone of your system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Kafka Matters for Data Engineers
&lt;/h2&gt;

&lt;p&gt;Kafka is not just another tool. It represents a shift in how systems are designed.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Decoupling Systems
&lt;/h3&gt;

&lt;p&gt;Instead of services calling each other directly, they communicate through events.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fewer dependencies&lt;/li&gt;
&lt;li&gt;More resilience&lt;/li&gt;
&lt;li&gt;Easier scaling&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2. Real-Time Processing
&lt;/h3&gt;

&lt;p&gt;You don’t wait for data pipelines to run every hour.&lt;/p&gt;

&lt;p&gt;You react instantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fraud detection&lt;/li&gt;
&lt;li&gt;Recommendations&lt;/li&gt;
&lt;li&gt;Monitoring and alerting&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. Replayability
&lt;/h3&gt;

&lt;p&gt;Kafka stores events for a configurable period.&lt;/p&gt;

&lt;p&gt;That means you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reprocess data&lt;/li&gt;
&lt;li&gt;Fix bugs retroactively&lt;/li&gt;
&lt;li&gt;Build new consumers without touching producers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a massive advantage over traditional pipelines.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Mental Shift: Thinking in Events
&lt;/h2&gt;

&lt;p&gt;Most people struggle with Kafka not because it’s complex, but because it requires a different way of thinking.&lt;/p&gt;

&lt;p&gt;Instead of asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What data do I have?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What just happened?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That single shift changes everything.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You stop designing databases first&lt;/li&gt;
&lt;li&gt;You start designing flows&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A Simple Example
&lt;/h2&gt;

&lt;p&gt;Imagine an e-commerce platform.&lt;/p&gt;

&lt;p&gt;Instead of updating multiple services directly after a purchase, you emit an event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;OrderPlaced
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From there:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inventory service consumes the event&lt;/li&gt;
&lt;li&gt;Payment service processes it&lt;/li&gt;
&lt;li&gt;Notification service sends confirmation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each service reacts independently.&lt;/p&gt;

&lt;p&gt;No tight coupling. No fragile chains.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Mistakes When Starting with Kafka
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Treating Kafka like a message queue&lt;/li&gt;
&lt;li&gt;Ignoring partitioning strategy&lt;/li&gt;
&lt;li&gt;Not planning for schema evolution&lt;/li&gt;
&lt;li&gt;Overcomplicating the architecture too early&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  SEO Keywords
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;data streaming&lt;/li&gt;
&lt;li&gt;Apache Kafka&lt;/li&gt;
&lt;li&gt;event-driven architecture&lt;/li&gt;
&lt;li&gt;real-time data processing&lt;/li&gt;
&lt;li&gt;Kafka tutorial&lt;/li&gt;
&lt;li&gt;streaming pipelines&lt;/li&gt;
&lt;li&gt;data engineering&lt;/li&gt;
&lt;li&gt;Kafka use cases&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;Streaming is not a trend. It’s the default.&lt;/p&gt;

&lt;p&gt;If you’re still designing batch-first systems, you’re building latency into your architecture from day one.&lt;/p&gt;

&lt;p&gt;Kafka is not the only tool in this space — but understanding it forces you to level up as a data engineer.&lt;/p&gt;

&lt;p&gt;And that’s the real value.&lt;/p&gt;




&lt;p&gt;If you're getting into data engineering, don’t just learn tools.&lt;/p&gt;

&lt;p&gt;Learn how data moves.&lt;/p&gt;

&lt;p&gt;That’s where the leverage is.&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>data</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>From Kafka to the Cloud: Designing a Real-Time Event-Driven Data Pipeline on AWS</title>
      <dc:creator>Rocio Baigorria</dc:creator>
      <pubDate>Mon, 16 Mar 2026 12:05:31 +0000</pubDate>
      <link>https://dev.to/tuni56/from-kafka-to-the-cloud-designing-a-real-time-event-driven-data-pipeline-on-aws-5gbm</link>
      <guid>https://dev.to/tuni56/from-kafka-to-the-cloud-designing-a-real-time-event-driven-data-pipeline-on-aws-5gbm</guid>
      <description>&lt;p&gt;Modern data platforms are increasingly built around event-driven architectures. Instead of systems constantly polling databases or relying on synchronous APIs, services react to events as they happen.&lt;/p&gt;

&lt;p&gt;In this article I’ll walk through the design of a real-time streaming pipeline capable of processing 15,000+ events per second with sub-50ms latency.&lt;/p&gt;

&lt;p&gt;The project started as a distributed system built with open-source technologies and later evolved into a cloud-native architecture on AWS.&lt;/p&gt;

&lt;p&gt;The key idea is simple:&lt;/p&gt;

&lt;p&gt;Understand the fundamentals first, then move the architecture to managed cloud services.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Original Architecture (Local Distributed System)
&lt;/h2&gt;

&lt;p&gt;The first version of the project was implemented using the following stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apache Kafka for event streaming&lt;/li&gt;
&lt;li&gt;Kafka Streams for real-time processing&lt;/li&gt;
&lt;li&gt;Spring Boot for the processing services&lt;/li&gt;
&lt;li&gt;PostgreSQL for durable storage&lt;/li&gt;
&lt;li&gt;Redis for low-latency read projections&lt;/li&gt;
&lt;li&gt;Prometheus and Grafana for monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug60yyapglmi5hzr0tcl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug60yyapglmi5hzr0tcl.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;Event Flow&lt;/p&gt;

&lt;p&gt;The pipeline follows a typical streaming architecture.&lt;/p&gt;

&lt;p&gt;Producer → Schema Registry → Kafka → Stream Processing → Storage → Analytics&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A producer publishes transaction events to Kafka&lt;/li&gt;
&lt;li&gt;Each event is serialized using Avro and validated against Schema Registry&lt;/li&gt;
&lt;li&gt;Kafka partitions allow parallel consumption&lt;/li&gt;
&lt;li&gt;A streaming service processes events using Kafka Streams&lt;/li&gt;
&lt;li&gt;Results are stored in PostgreSQL and Redis&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architecture enables real-time anomaly detection by applying sliding-window aggregations to the event stream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Benchmarks
&lt;/h2&gt;

&lt;p&gt;The system was designed with performance and reliability in mind.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Metric&lt;/em&gt;        &lt;em&gt;Result&lt;/em&gt;&lt;br&gt;
Throughput  15K+ events/sec&lt;br&gt;
P99 Latency &amp;lt;50ms&lt;br&gt;
Availability    99.95%&lt;br&gt;
Data Loss   0% (exactly-once processing)&lt;/p&gt;

&lt;p&gt;Several optimizations helped achieve these results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Producer batching (32KB batch size)&lt;/li&gt;
&lt;li&gt;Snappy compression&lt;/li&gt;
&lt;li&gt;Parallel consumers&lt;/li&gt;
&lt;li&gt;Connection pooling&lt;/li&gt;
&lt;li&gt;Transactional event processing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Distributed Systems Patterns Implemented
&lt;/h2&gt;

&lt;p&gt;This project demonstrates several architectural patterns commonly used in modern data platforms.&lt;/p&gt;

&lt;h3&gt;
  
  
  Event Sourcing
&lt;/h3&gt;

&lt;p&gt;Kafka acts as the immutable event log. Every state change is stored as an event.&lt;/p&gt;

&lt;h3&gt;
  
  
  CQRS
&lt;/h3&gt;

&lt;p&gt;Write operations store events while Redis maintains optimized read models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Outbox Pattern
&lt;/h3&gt;

&lt;p&gt;Ensures reliable event publishing from the database.&lt;/p&gt;

&lt;h3&gt;
  
  
  Saga Pattern
&lt;/h3&gt;

&lt;p&gt;Coordinates distributed workflows without synchronous transactions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Circuit Breaker
&lt;/h3&gt;

&lt;p&gt;Improves resilience by isolating failing components.&lt;/p&gt;

&lt;h2&gt;
  
  
  Moving the Architecture to AWS
&lt;/h2&gt;

&lt;p&gt;After implementing the pipeline locally, the next step was mapping the same design to managed cloud services on AWS.&lt;/p&gt;

&lt;p&gt;The goal was not to redesign the system, but to replace infrastructure with managed services.&lt;/p&gt;

&lt;p&gt;Cloud Architecture&lt;br&gt;
Producer&lt;br&gt;
   ↓&lt;br&gt;
EventBridge / MSK&lt;br&gt;
   ↓&lt;br&gt;
Lambda processing&lt;br&gt;
   ↓&lt;br&gt;
Step Functions orchestration&lt;br&gt;
   ↓&lt;br&gt;
DynamoDB / RDS&lt;br&gt;
   ↓&lt;br&gt;
CloudWatch monitoring&lt;/p&gt;

&lt;h3&gt;
  
  
  Event Ingestion
&lt;/h3&gt;

&lt;p&gt;Events can be published to:&lt;/p&gt;

&lt;p&gt;Amazon EventBridge for event routing&lt;/p&gt;

&lt;p&gt;Amazon MSK for managed Kafka streaming&lt;/p&gt;

&lt;h3&gt;
  
  
  Processing Layer
&lt;/h3&gt;

&lt;p&gt;Events are processed by AWS Lambda, which allows the pipeline to scale automatically based on event volume.&lt;/p&gt;

&lt;h3&gt;
  
  
  Workflow Orchestration
&lt;/h3&gt;

&lt;p&gt;Complex workflows are coordinated using AWS Step Functions, which define the pipeline as a series of steps such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;event validation&lt;/li&gt;
&lt;li&gt;enrichment&lt;/li&gt;
&lt;li&gt;anomaly detection&lt;/li&gt;
&lt;li&gt;persistence&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Storage
&lt;/h3&gt;

&lt;p&gt;Data can be stored depending on the access pattern:&lt;/p&gt;

&lt;p&gt;DynamoDB for high-scale key-value access&lt;/p&gt;

&lt;p&gt;Amazon RDS for relational workloads&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability
&lt;/h3&gt;

&lt;p&gt;Monitoring and logs are handled by Amazon CloudWatch, allowing engineers to track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;throughput&lt;/li&gt;
&lt;li&gt;errors&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;workflow executions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Key Insight
&lt;/h2&gt;

&lt;p&gt;The most important lesson from this project is that the architecture itself does not change when moving to the cloud.&lt;/p&gt;

&lt;p&gt;The same principles remain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;events are immutable&lt;/li&gt;
&lt;li&gt;services react asynchronously&lt;/li&gt;
&lt;li&gt;systems scale through partitioned streams&lt;/li&gt;
&lt;li&gt;state is derived from event logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloud services simply remove the burden of managing infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Understanding how streaming systems work internally makes it much easier to design reliable cloud-native data platforms.&lt;/p&gt;

&lt;p&gt;Instead of thinking only in terms of tools, focus on the system flow:&lt;/p&gt;

&lt;p&gt;Event → Stream → Process → Persist → Observe&lt;/p&gt;

&lt;h2&gt;
  
  
  Once those fundamentals are clear, migrating the system to cloud platforms like AWS becomes a natural evolution.
&lt;/h2&gt;

&lt;p&gt;Design, therefore I exist.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>dataengineering</category>
      <category>kafka</category>
      <category>eventdriven</category>
    </item>
  </channel>
</rss>
