<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hamdi Mechelloukh</title>
    <description>The latest articles on DEV Community by Hamdi Mechelloukh (@hamdi_mechelloukh_628620a).</description>
    <link>https://dev.to/hamdi_mechelloukh_628620a</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3835195%2F19c24699-3727-4940-937e-3968ab4d8085.png</url>
      <title>DEV Community: Hamdi Mechelloukh</title>
      <link>https://dev.to/hamdi_mechelloukh_628620a</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hamdi_mechelloukh_628620a"/>
    <language>en</language>
    <item>
      <title>Real-time streaming pipeline with Apache Flink 2.0, Kafka and Iceberg</title>
      <dc:creator>Hamdi Mechelloukh</dc:creator>
      <pubDate>Tue, 31 Mar 2026 11:09:58 +0000</pubDate>
      <link>https://dev.to/hamdi_mechelloukh_628620a/real-time-streaming-pipeline-with-apache-flink-20-kafka-and-iceberg-2ah8</link>
      <guid>https://dev.to/hamdi_mechelloukh_628620a/real-time-streaming-pipeline-with-apache-flink-20-kafka-and-iceberg-2ah8</guid>
      <description>&lt;p&gt;It's 2:03 PM. A flash sale just started.&lt;/p&gt;

&lt;p&gt;In the warehouse, an operator is entering incoming orders into the management system. He types a quantity, makes a mistake, corrects it immediately. Two events, one reality. Thirty seconds apart.&lt;/p&gt;

&lt;p&gt;The batch job that runs at 2 AM will see both. It won't know which one is right. Depending on how the reconciliation logic is written, if it exists at all, it picks one of the two, often non-deterministically. And if the correction falls into the next batch window, the problem doesn't surface right away: the morning's numbers are wrong, cleanly, with no technical error in sight.&lt;/p&gt;

&lt;p&gt;This is a real and recurring source of data quality problems in data teams.&lt;/p&gt;

&lt;p&gt;Processing events as they arrive, in order, with their temporal context intact, fundamentally changes how this problem is handled. That's the starting point for this project: an end-to-end streaming pipeline on the Olist e-commerce dataset, built with Apache Flink 2.0, Kafka and Iceberg.&lt;/p&gt;

&lt;h2&gt;
  
  
  The dataset and the problem
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce" rel="noopener noreferrer"&gt;Olist dataset&lt;/a&gt; is a public Brazilian e-commerce dataset: orders, products, sellers, customers, reviews. 100,000 orders over two years.&lt;/p&gt;

&lt;p&gt;I had already built a batch lakehouse on this same dataset. The logical next step was to go to the other extreme: stream processing, one-minute calculation windows, anomaly detection at the second level. Three concrete needs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Revenue by category in real time&lt;/strong&gt; — know which category is performing at every minute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anomaly detection&lt;/strong&gt; — a customer placing multiple orders within a few minutes, or an order at an abnormal price&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Global KPIs&lt;/strong&gt; — average order value, order rate, total revenue in real time&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These are the three jobs that make up the pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Apache Flink?
&lt;/h2&gt;

&lt;p&gt;The question is worth asking. There are other options for streaming in Java:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kafka Streams&lt;/strong&gt; — easy to operate, no separate cluster, but limited to Kafka-in/Kafka-out topologies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache Spark Structured Streaming&lt;/strong&gt; — micro-batches, minimum latency of a few seconds, but familiar if you already know Spark&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flink&lt;/strong&gt; — true event-by-event streaming, native event-time processing, built-in CEP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Flink was the natural choice for two reasons.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CEP (Complex Event Processing).&lt;/strong&gt; Detecting "3 orders from the same customer within 5 minutes" is not an aggregation, it's a temporal correlation between events. Flink CEP handles this natively with a pattern DSL. In Kafka Streams, it requires maintaining manual state and writing the temporal logic by hand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flink 2.0.&lt;/strong&gt; Version 2.0 brought native Java 21 support. Working on the current version rather than an end-of-life one was a deliberate choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Olist CSV → Simulator → Kafka (orders)
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
    RevenueAggregation  AnomalyDetection  RealtimeKpi
    (tumbling 1 min)    (CEP 5 min)       (windowAll 1 min)
              │               │               │
              ▼               ▼               ▼
      Kafka (revenue)  Kafka (alerts)  Kafka (kpis)
              │               │               │
              └───────────────┴───────────────┘
                              │
                    Apache Iceberg (MinIO)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three independent jobs, one shared source topic, three output topics, and an optional Iceberg data lake.&lt;/p&gt;

&lt;p&gt;The independence of the jobs is a deliberate choice. In production, you want to be able to restart &lt;code&gt;AnomalyDetectionJob&lt;/code&gt; without affecting &lt;code&gt;RevenueAggregationJob&lt;/code&gt;. Each job has its own checkpoint, its own state, its own topology.&lt;/p&gt;

&lt;h2&gt;
  
  
  Job 1: RevenueAggregationJob
&lt;/h2&gt;

&lt;p&gt;The simplest of the three. It aggregates revenue by product category over one-minute windows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;orders → filter nulls → map to RevenueByCategory → keyBy(category)
       → TumblingWindow(1 min) → reduce + ProcessWindowFunction
       → Kafka sink + Iceberg sink (optional)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few details that matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Watermark strategy.&lt;/strong&gt; The pipeline uses event time, meaning the event timestamp in the Kafka message, not the arrival time. The strategy is &lt;code&gt;forBoundedOutOfOrderness(10 seconds)&lt;/code&gt; with a 5-second idleness timeout.&lt;/p&gt;

&lt;p&gt;Why idleness? If a stream is empty for several minutes (the simulator is stopped, for example), Flink can no longer advance its watermark. Without &lt;code&gt;withIdleness&lt;/code&gt;, windows never close. With &lt;code&gt;withIdleness(5s)&lt;/code&gt;, Flink ignores silent partitions and advances anyway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Side outputs.&lt;/strong&gt; Invalid events (null price, missing timestamp) are not silently dropped. They are routed to a side output that logs them. This avoids the scenario where events disappear without a trace.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two-phase reduction.&lt;/strong&gt; Before the window is applied, a &lt;code&gt;reduce&lt;/code&gt; combines events by category on the fly. The &lt;code&gt;ProcessWindowFunction&lt;/code&gt; then only attaches the window start and end timestamps. Less state to store, less work at window closure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Job 2: AnomalyDetectionJob
&lt;/h2&gt;

&lt;p&gt;This one is more interesting. It detects two types of anomalies through two different mechanisms.&lt;/p&gt;

&lt;h3&gt;
  
  
  Threshold detection: price anomaly
&lt;/h3&gt;

&lt;p&gt;A filter on price:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;ordersStream&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getPrice&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
                  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getPrice&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;compareTo&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;priceThreshold&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;map&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;OrderAlert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;priceAnomaly&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getCustomerId&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getPrice&lt;/span&gt;&lt;span class="o"&gt;()))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The threshold (500 BRL by default) is configurable via environment variable. One subtlety: the filter is &lt;code&gt;&amp;gt; 0&lt;/code&gt;, not &lt;code&gt;&amp;gt;= 0&lt;/code&gt;. An order at exactly 500 BRL is not an anomaly. This behavior is covered by a specific test.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern detection: suspicious frequency
&lt;/h3&gt;

&lt;p&gt;This is where CEP comes in.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Pattern&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;OrderEvent&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Pattern&lt;/span&gt;&lt;span class="o"&gt;.&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;OrderEvent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;begin&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"orders"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;timesOrMore&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suspiciousOrderCount&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;within&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMinutes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

&lt;span class="nc"&gt;PatternStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;OrderEvent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;patternStream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;CEP&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ordersStream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;keyBy&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nl"&gt;OrderEvent:&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;getCustomerId&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;pattern&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern says: if the same customer places 3 or more orders within a 5-minute window, it's suspicious.&lt;/p&gt;

&lt;p&gt;The key is &lt;code&gt;keyBy(customerId)&lt;/code&gt;. Without it, Flink would compare orders from different customers. With &lt;code&gt;keyBy&lt;/code&gt;, each customer has their own independent CEP state.&lt;/p&gt;

&lt;p&gt;Both streams, price alerts and frequency alerts, are then merged with &lt;code&gt;union()&lt;/code&gt; before being sent to the Kafka output topic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Job 3: RealtimeKpiJob
&lt;/h2&gt;

&lt;p&gt;Global KPIs: average order value, orders per minute, total revenue. The calculation is straightforward, but the implementation reveals an interesting trade-off.&lt;/p&gt;

&lt;h3&gt;
  
  
  windowAll: the acknowledged bottleneck
&lt;/h3&gt;

&lt;p&gt;To calculate total revenue across all orders in one minute, you need to aggregate all events together, without splitting by key. In Flink, this is called &lt;code&gt;windowAll&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;windowAll&lt;/code&gt; forces all events through a single processing instance. It's a bottleneck by design. At this volume (50 events per second), it's more than sufficient. If throughput rose to 50,000 events per second, a pre-aggregation by key followed by a merge would be necessary. We don't do that here because adding complexity for a hypothetical need is not good engineering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Two-phase aggregation
&lt;/h3&gt;

&lt;p&gt;The KPI calculation uses the &lt;code&gt;AggregateFunction&lt;/code&gt; + &lt;code&gt;ProcessAllWindowFunction&lt;/code&gt; pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;KpiAggregateFunction&lt;/code&gt; accumulates the count and sum as events arrive, continuously&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;KpiWindowFunction&lt;/code&gt; computes the average and derived metrics at window closure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation maintains minimal state (two numbers) instead of buffering all raw events. The &lt;code&gt;ProcessAllWindowFunction&lt;/code&gt; only receives the final accumulator.&lt;/p&gt;

&lt;h3&gt;
  
  
  The BoundedHistogram
&lt;/h3&gt;

&lt;p&gt;An optional but interesting detail: a custom Flink &lt;code&gt;Histogram&lt;/code&gt; implementation.&lt;/p&gt;

&lt;p&gt;The Flink Metrics API exposes three standard types: &lt;code&gt;Counter&lt;/code&gt;, &lt;code&gt;Gauge&lt;/code&gt;, &lt;code&gt;Histogram&lt;/code&gt;. For a &lt;code&gt;Histogram&lt;/code&gt;, Flink expects an implementation that returns percentiles, mean and standard deviation via a &lt;code&gt;HistogramStatistics&lt;/code&gt; interface.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;BoundedHistogram&lt;/code&gt; is a fixed-size circular buffer (1000 values). When the buffer is full, new values overwrite the oldest ones.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;synchronized&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;writeIndex&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;writeIndex&lt;/span&gt;&lt;span class="o"&gt;++;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple, thread-safe, bounded memory. It allows Grafana to show the distribution of average order values, not just the latest single value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Iceberg integration: what I didn't anticipate
&lt;/h2&gt;

&lt;p&gt;The Apache Iceberg integration was optional in the initial architecture. In practice, this is where I spent the most time.&lt;/p&gt;

&lt;h3&gt;
  
  
  The classloader problem
&lt;/h3&gt;

&lt;p&gt;Flink 2.0 loads its filesystem plugins (including &lt;code&gt;flink-s3-fs-hadoop&lt;/code&gt;) in an isolated classloader, invisible to user code. When &lt;code&gt;iceberg-flink-runtime&lt;/code&gt; tries to instantiate &lt;code&gt;S3AFileSystem&lt;/code&gt; at write time, it can't find the class provided by the Flink plugin.&lt;/p&gt;

&lt;p&gt;The solution: bundle &lt;code&gt;hadoop-aws&lt;/code&gt; and the AWS SDK directly in the fat JAR, with aggressive exclusions to avoid dependency conflicts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"org.apache.hadoop:hadoop-aws:3.4.1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;exclude&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;group&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"com.amazonaws"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;module&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"aws-java-sdk-bundle"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"com.amazonaws:aws-java-sdk-s3:1.12.780"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"com.amazonaws:aws-java-sdk-sts:1.12.780"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fat JAR reaches ~710 MB. Not ideal, but that's the real cost of an Iceberg + Flink + S3 integration outside a managed service.&lt;/p&gt;

&lt;h3&gt;
  
  
  Credential timing
&lt;/h3&gt;

&lt;p&gt;Second surprise: &lt;code&gt;HadoopCatalog&lt;/code&gt; reads its S3 configuration at construction time, not after. The intuitive pattern of creating the catalog and then injecting configuration doesn't work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Credentials are injected too late&lt;/span&gt;
&lt;span class="nc"&gt;HadoopCatalog&lt;/span&gt; &lt;span class="n"&gt;catalog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HadoopCatalog&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;catalog&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setConf&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hadoopConf&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Credentials must be in the Configuration before construction&lt;/span&gt;
&lt;span class="nc"&gt;Configuration&lt;/span&gt; &lt;span class="n"&gt;hadoopConf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Configuration&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toProperties&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;forEach&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nl"&gt;hadoopConf:&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;HadoopCatalog&lt;/span&gt; &lt;span class="n"&gt;catalog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HadoopCatalog&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hadoopConf&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;warehouse&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same applies to &lt;code&gt;CatalogLoader.hadoop()&lt;/code&gt;. This behavior is not prominently documented. It's the kind of error you only discover through end-to-end testing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Compose and .env resolution
&lt;/h3&gt;

&lt;p&gt;A less expected issue: Docker Compose v2 resolves the &lt;code&gt;.env&lt;/code&gt; file from the directory containing &lt;code&gt;docker-compose.yml&lt;/code&gt;, not from the current working directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# From the project root, this command ignores the .env at the root&lt;/span&gt;
docker compose &lt;span class="nt"&gt;-f&lt;/span&gt; docker/docker-compose.yml up &lt;span class="nt"&gt;-d&lt;/span&gt;

&lt;span class="c"&gt;# You need to pass the path explicitly&lt;/span&gt;
docker compose &lt;span class="nt"&gt;--env-file&lt;/span&gt; .env &lt;span class="nt"&gt;-f&lt;/span&gt; docker/docker-compose.yml up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, &lt;code&gt;ICEBERG_ENABLED=true&lt;/code&gt; in the &lt;code&gt;.env&lt;/code&gt; is ignored and jobs start without an Iceberg sink, with no error message.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability
&lt;/h2&gt;

&lt;p&gt;Flink exposes its metrics via Prometheus on port 9249. Each job exposes custom metrics:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Job&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;windowsEmitted&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Counter&lt;/td&gt;
&lt;td&gt;RevenueAggregationJob&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;kpiWindowsEmitted&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Counter&lt;/td&gt;
&lt;td&gt;RealtimeKpiJob&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;lastWindowOrderCount&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gauge&lt;/td&gt;
&lt;td&gt;RealtimeKpiJob&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;orderValueDistribution&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Histogram&lt;/td&gt;
&lt;td&gt;RealtimeKpiJob&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;priceAnomalyAlertsEmitted&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Counter&lt;/td&gt;
&lt;td&gt;AnomalyDetectionJob&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;suspiciousFrequencyAlertsEmitted&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Counter&lt;/td&gt;
&lt;td&gt;AnomalyDetectionJob&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;deserializationErrors&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Counter&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These metrics land in Prometheus every 15 seconds and are visualized in Grafana. The &lt;code&gt;deserializationErrors&lt;/code&gt; metric is particularly useful: if the simulator sends a malformed message, the counter rises and you see it immediately in the dashboard, without the job crashing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing
&lt;/h2&gt;

&lt;p&gt;The tests use Flink's &lt;code&gt;MiniCluster&lt;/code&gt;, an embedded Flink cluster that runs in the test process, with no external infrastructure.&lt;/p&gt;

&lt;p&gt;This choice has a cost: tests are slower (a few seconds each). But they test the actual behavior of Flink operators, not a mock. The &lt;code&gt;AnomalyDetectionJobTest&lt;/code&gt; specifically validates CEP edge cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2 orders in 5 minutes → no alert&lt;/li&gt;
&lt;li&gt;3 orders in 5 minutes → alert triggered&lt;/li&gt;
&lt;li&gt;Order at exactly 500 BRL → no price alert&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;18 tests in total, covering all three jobs, the &lt;code&gt;BoundedHistogram&lt;/code&gt; and the deserialization schema. The CI (GitHub Actions) compiles and runs all tests on every push, with a JaCoCo report as an artifact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Batch or streaming: the real debate
&lt;/h2&gt;

&lt;p&gt;Back to the opening scene.&lt;/p&gt;

&lt;p&gt;Streaming is often perceived as expensive: the cluster runs continuously, the infrastructure never shuts down. That's true. But this comparison is incomplete.&lt;/p&gt;

&lt;p&gt;A batch pipeline that handles events which correct themselves over time accumulates its own debt. Timeline reconciliation logic. Re-processing when an event arrives late. Alerts, manual interventions, data engineers spending time explaining why numbers are inconsistent across two windows. This cost is diffuse: it doesn't appear on any cloud bill, but it accumulates in sprints, in support, in technical debt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nobody actually does this calculation in practice — because it's too costly to conduct seriously.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This project doesn't claim to settle the debate. What it shows is that an end-to-end streaming pipeline with Flink 2.0 is accessible today without managed infrastructure, without Databricks, without Confluent Cloud. A &lt;code&gt;docker compose up&lt;/code&gt; and the pipeline runs. The complexity is in the integration details, not in the paradigm itself.&lt;/p&gt;

&lt;p&gt;The code is on &lt;a href="https://github.com/HamdiMechelloukh/olist-flink-streaming" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. The &lt;code&gt;start-e2e.sh&lt;/code&gt; script launches the entire pipeline in a single command.&lt;/p&gt;




&lt;p&gt;You can also read this and other articles on &lt;a href="https://www.hamdimechelloukh.com" rel="noopener noreferrer"&gt;my portfolio&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>java</category>
      <category>dataengineering</category>
      <category>kafka</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building an open-source vendor-neutral lakehouse</title>
      <dc:creator>Hamdi Mechelloukh</dc:creator>
      <pubDate>Fri, 20 Mar 2026 11:05:38 +0000</pubDate>
      <link>https://dev.to/hamdi_mechelloukh_628620a/building-an-open-source-vendor-neutral-lakehouse-f2c</link>
      <guid>https://dev.to/hamdi_mechelloukh_628620a/building-an-open-source-vendor-neutral-lakehouse-f2c</guid>
      <description>&lt;p&gt;When you work in data, you always end up asking the same question: &lt;strong&gt;what happens if we need to switch platforms tomorrow?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I've seen firsthand that software vendors can be aggressive with pricing, and they won't hesitate to sunset a product that isn't generating enough revenue. When that happens, you need to migrate quickly, or face massive costs in migration, redevelopment, and lost time.&lt;/p&gt;

&lt;p&gt;This conviction led me to build &lt;strong&gt;an end-to-end open-source, vendor-neutral lakehouse&lt;/strong&gt;, from messaging to visualization. Here are the architecture choices, the trade-offs, and what I learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack: Kafka → Spark → Iceberg
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sources → Kafka → Spark (Bronze) → Spark (Silver) → Spark (Gold) → Streamlit
                                                                   ↑
                                                          Great Expectations
                                                          (quality at each layer)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Kafka for ingestion
&lt;/h3&gt;

&lt;p&gt;Choosing event-driven for data transfer isn't trivial. In computing, &lt;strong&gt;managing time is one of the hardest problems&lt;/strong&gt;. On the operational side, we're moving more and more toward event-driven architectures precisely for this reason: an event arrives when it arrives, and the system processes it. No batch window to respect, no "the file should have arrived at 6 AM".&lt;/p&gt;

&lt;p&gt;Kafka is the de facto standard for this kind of architecture. Open-source, battle-tested, and crucially: no vendor lock-in. You can deploy it on any cloud or on-premise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Spark for compute
&lt;/h3&gt;

&lt;p&gt;You might ask: why Spark in an event-driven architecture? My position is pragmatic. Pure streaming via Kafka works well for ingestion into bronze, or even silver, to &lt;strong&gt;handle temporality upstream&lt;/strong&gt;. But once you reach heavy transformations (aggregations, joins, enrichments), Spark remains the most battle-tested and portable tool.&lt;/p&gt;

&lt;p&gt;Spark's advantage is that it runs everywhere: on a YARN cluster, on Kubernetes, on Databricks, on EMR, or locally for development. It's one of the few compute tools that doesn't lock you in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Iceberg for the table format
&lt;/h3&gt;

&lt;p&gt;Iceberg is the open table format that's gaining momentum. My choice was partly technical curiosity: I use Delta Lake daily at work, so I wanted to explore the alternative.&lt;/p&gt;

&lt;p&gt;But beyond curiosity, Iceberg has properties that make it particularly suited for a vendor-neutral lakehouse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Open format&lt;/strong&gt; — no dependency on a specific vendor&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time travel&lt;/strong&gt; — query data at any point in time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema evolution&lt;/strong&gt; — add or modify columns without rewriting data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partition evolution&lt;/strong&gt; — change partitioning scheme without migration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compatible with all engines&lt;/strong&gt; — Spark, Trino, Flink, Dremio, Athena...&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project could just as well run with Delta Lake or Hudi. In fact, it would be interesting to offer format choice to anyone forking the project.&lt;/p&gt;

&lt;h2&gt;
  
  
  The layered architecture: bronze, silver, gold
&lt;/h2&gt;

&lt;p&gt;The medallion pattern (bronze/silver/gold) structures data in three levels of refinement:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bronze&lt;/strong&gt; — raw data as it arrives, no transformation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silver&lt;/strong&gt; — cleaned, deduplicated, properly typed data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gold&lt;/strong&gt; — aggregated data ready for business consumption&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Honestly, these terms are recent. A few years ago, we called them dataraw, dataprep, dataset. The vocabulary changes, the principle stays the same. What matters is to &lt;strong&gt;follow this progressive refinement logic without being rigid.&lt;/strong&gt; Functional reality always takes precedence over technical rules. If data doesn't need three layers, it doesn't need three layers.&lt;/p&gt;

&lt;h2&gt;
  
  
  MinIO: S3-compatible without the lock-in
&lt;/h2&gt;

&lt;p&gt;One point that might surprise you: why MinIO rather than S3 directly?&lt;/p&gt;

&lt;p&gt;Because &lt;strong&gt;S3 is an AWS service&lt;/strong&gt;, and using S3 means locking yourself into AWS. MinIO implements the S3 API identically: every tool that speaks S3 speaks MinIO without modification. You can develop and test locally, deploy on any cloud, and migrate to S3, GCS or Azure Blob Storage without changing a single line of application code.&lt;/p&gt;

&lt;p&gt;That's exactly the vendor-neutral principle: &lt;strong&gt;use open standards rather than proprietary managed services&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data quality: Great Expectations and its limits
&lt;/h2&gt;

&lt;p&gt;Great Expectations is the most widely used data validation tool in the Python/Spark ecosystem. I integrated it at each pipeline layer to validate data on input and output.&lt;/p&gt;

&lt;p&gt;The tool does its job well for simple quality rules: nullability, uniqueness, value ranges, formats. It's also a tool I've seen used in enterprise settings, which validated the choice.&lt;/p&gt;

&lt;p&gt;But it has &lt;strong&gt;real limitations&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex quality rules (cross-table consistency, conditional business rules) are hard to express&lt;/li&gt;
&lt;li&gt;Resource-intensive checks (massive joins for cross-source duplicate detection) don't scale easily&lt;/li&gt;
&lt;li&gt;And most importantly: &lt;strong&gt;discovering quality issues is not enough&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This last point is crucial and comes directly from my production experience at Decathlon. You can set up all the quality alerts in the world. If source teams have no commitment to fix the issues, nothing will change. You need to work on &lt;strong&gt;data quality service-level agreements&lt;/strong&gt;: SLAs on fix turnaround, shared responsibilities, clear escalation paths. Without that, source teams will make little effort to resolve quality problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  The difficulty of vendor-neutral
&lt;/h2&gt;

&lt;p&gt;The biggest challenge of this project wasn't technical in the traditional sense. It was &lt;strong&gt;resisting the temptation of managed services&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;At every step, there's a managed option that saves time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why manage your own Kafka when there's Amazon MSK or Confluent Cloud?&lt;/li&gt;
&lt;li&gt;Why MinIO when S3 is there, configured in 2 clicks?&lt;/li&gt;
&lt;li&gt;Why self-hosted Airflow when there's MWAA?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The answer is always the same: &lt;strong&gt;because the day the pricing changes or the service is deprecated, you need to be able to leave&lt;/strong&gt;. This doesn't mean you should never use managed services. It means you should do it knowingly, and make sure the abstraction layer allows switching.&lt;/p&gt;

&lt;p&gt;In practice, building vendor-neutral requires more upfront effort:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Terraform for declarative, multi-cloud infrastructure management&lt;/li&gt;
&lt;li&gt;Docker for isolation and portability&lt;/li&gt;
&lt;li&gt;Standard interfaces everywhere (S3 API, JDBC, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But once it's in place, the freedom it provides is invaluable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Orchestration: Airflow
&lt;/h2&gt;

&lt;p&gt;Airflow is the natural choice for orchestration in a vendor-neutral stack. Open-source, extensible, and above all: the community is massive. When you have an Airflow problem, someone has already had it and posted the solution on Stack Overflow.&lt;/p&gt;

&lt;p&gt;Alternatives would be Dagster or Prefect, but Airflow remains the most widely deployed in production and the most in-demand on the market. Pragmatism.&lt;/p&gt;

&lt;h2&gt;
  
  
  IaC: Terraform for multi-cloud
&lt;/h2&gt;

&lt;p&gt;Terraform is the piece that makes vendor-neutral viable at scale. Infrastructure is described in code, versioned in Git, and deployable on AWS, GCP or Azure with provider changes, no complete rewrite needed.&lt;/p&gt;

&lt;p&gt;In this project, Terraform modules provision AWS infrastructure, but the same logic could be ported to another cloud without rebuilding the application architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I took away
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Vendor-neutral has a cost, but so does lock-in
&lt;/h3&gt;

&lt;p&gt;Building vendor-neutral requires more upfront work. But lock-in has a hidden cost that explodes the day you need to migrate. And that day always comes sooner than you think.&lt;/p&gt;

&lt;h3&gt;
  
  
  Open formats are your data's life insurance
&lt;/h3&gt;

&lt;p&gt;Iceberg, Parquet, Avro: as long as your data is in an open format, you can switch compute engines without losing your data. It's the most important decision in a data architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data quality is an organizational problem, not a technical one
&lt;/h3&gt;

&lt;p&gt;Tools like Great Expectations are necessary but not sufficient. Without service-level agreements with sources, quality alerts are just noise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Functional reality takes precedence over patterns
&lt;/h3&gt;

&lt;p&gt;Bronze/silver/gold is a good guide, not a religion. If your data only needs two layers, don't make three to respect a pattern. Architecture should serve the business need, not the other way around.&lt;/p&gt;

&lt;h3&gt;
  
  
  Streaming doesn't replace batch, it complements it
&lt;/h3&gt;

&lt;p&gt;Kafka for real-time ingestion, Spark for heavy transformations. The two coexist, and that's healthy. Trying to do everything in streaming is as dogmatic as doing everything in batch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Going further
&lt;/h2&gt;

&lt;p&gt;The source code is available on &lt;a href="https://github.com/HamdiMechelloukh/olist-lakehouse" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. The project uses the Olist dataset (Brazilian e-commerce) as a data source, making it testable without heavy infrastructure.&lt;/p&gt;




&lt;p&gt;You can also read this and other articles on &lt;a href="https://www.hamdimechelloukh.com" rel="noopener noreferrer"&gt;my portfolio&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>opensource</category>
      <category>kafka</category>
      <category>spark</category>
    </item>
    <item>
      <title>Lessons from 2 years as Production Manager at Decathlon Digital</title>
      <dc:creator>Hamdi Mechelloukh</dc:creator>
      <pubDate>Fri, 20 Mar 2026 10:59:30 +0000</pubDate>
      <link>https://dev.to/hamdi_mechelloukh_628620a/lessons-from-2-years-as-production-manager-at-decathlon-digital-a4b</link>
      <guid>https://dev.to/hamdi_mechelloukh_628620a/lessons-from-2-years-as-production-manager-at-decathlon-digital-a4b</guid>
      <description>&lt;p&gt;For two and a half years, I stepped away from code to manage data production for sales at Decathlon Digital. A role I discovered upon arrival: the job title said "Production Expert", and I quickly realized it was going to be a full-time commitment.&lt;/p&gt;

&lt;p&gt;Here's what I learned from switching to the other side.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context: Perfeco and sales data
&lt;/h2&gt;

&lt;p&gt;Perfeco was the data product that served the company's economic performance and sales data. In practice, it meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An ingestion pipeline built on &lt;strong&gt;Talend&lt;/strong&gt; and &lt;strong&gt;Redshift&lt;/strong&gt; — data was processed and stored in Redshift, then pushed to S3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2 to 3 million sales per day&lt;/strong&gt; ingested&lt;/li&gt;
&lt;li&gt;Data exposed in the datalake and via an API consumed by multiple business teams&lt;/li&gt;
&lt;li&gt;Kafka messages with XML payloads converted to CSV before loading&lt;/li&gt;
&lt;li&gt;A scheduler (OpCon) to orchestrate ingestion jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My role: &lt;strong&gt;make sure all of this runs&lt;/strong&gt;, every day, without interruption.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Production Manager" actually means day-to-day
&lt;/h2&gt;

&lt;p&gt;Coming from development, you'd think production is about monitoring and a few alerts. Reality is very different.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reducing the operational burden
&lt;/h3&gt;

&lt;p&gt;My main goal wasn't to react to incidents, but to &lt;strong&gt;reduce their frequency&lt;/strong&gt;. That meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Proactive alerting&lt;/strong&gt; — setting up the right dashboards (QuickSight, Tableau) and alerts to detect anomalies before they become incidents. Automatic Jira ticket creation when a threshold is breached.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data quality at the source&lt;/strong&gt; — analyzing and detecting quality issues upstream, then escalating them to source teams. This is facilitation work, not code: convincing an upstream team that their data is poorly formatted takes time and diplomacy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run documentation&lt;/strong&gt; — writing and maintaining on-call procedures so that any team member can intervene at 3 AM without relying on one person's memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run KPIs&lt;/strong&gt; — scripting metrics collection to objectively measure stability: incident count, resolution time, data availability.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Facilitating, not coding
&lt;/h3&gt;

&lt;p&gt;The biggest surprise was the &lt;strong&gt;relational dimension&lt;/strong&gt; of the role. I spent more time managing people than technology:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Facilitating consumer teams&lt;/strong&gt; — improving incident communication. When an ingestion is delayed, 5 different teams need to know why and when it will be resolved. You need a clear channel, a clear message, and consistency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Facilitating source teams&lt;/strong&gt; — working with teams that produce upstream data so they fix quality issues at the root.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On-call planning&lt;/strong&gt; — organizing rotations for the team, making sure everyone is trained and the load is fairly distributed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Postmortems&lt;/strong&gt; — I organized regular meetings with both data sources and consumers. Postmortems were filled collaboratively during these sessions: what happened, why, and what actions to take to prevent recurrence. This collaborative format aligned everyone and avoided the blame game.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The typical incident: when the scheduler crashes
&lt;/h2&gt;

&lt;p&gt;My nemesis during those two years was the OpCon scheduler client crashing on the machine. Silently.&lt;/p&gt;

&lt;p&gt;The scenario was always the same:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The OpCon client crashes → no jobs are launched&lt;/li&gt;
&lt;li&gt;Sales keep arriving via Kafka (messages with XML payloads)&lt;/li&gt;
&lt;li&gt;Messages pile up, hundreds of thousands within hours&lt;/li&gt;
&lt;li&gt;When we restart the scheduler, the XML → CSV conversion job faces a massive backlog&lt;/li&gt;
&lt;li&gt;The Talend job struggles, processing times explode, Redshift is overwhelmed, data arrives late in S3&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The biggest incidents we had were all tied to this problem. What made it frustrating was that the client crash was silent: no alert, no explicit log. We'd only discover it by noticing the absence of data downstream.&lt;/p&gt;

&lt;p&gt;The lesson: &lt;strong&gt;monitoring the absence of events is as important as monitoring errors&lt;/strong&gt;. If a job that runs every 15 minutes hasn't executed in 30 minutes, that's a strong signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Production is engineering
&lt;/h3&gt;

&lt;p&gt;Reducing operational burden isn't just "adding alerts". It's designing an observability system, automating detection, documenting procedures, and measuring improvement. It's engineering work in its own right.&lt;/p&gt;

&lt;h3&gt;
  
  
  Communication is a technical skill
&lt;/h3&gt;

&lt;p&gt;Writing a clear incident message, running a blameless postmortem, convincing a source team to fix a data format. These are skills as important as writing code. And they can be practiced.&lt;/p&gt;

&lt;h3&gt;
  
  
  Proactive alerting changes everything
&lt;/h3&gt;

&lt;p&gt;The difference between a PM who reacts and one who manages is proactivity. When you discover an incident from an automatic alert at 8 AM instead of a call from a business team at 10 AM, you've gained 2 hours and a lot of peace of mind.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitor the silence
&lt;/h3&gt;

&lt;p&gt;The most dangerous incidents don't generate errors: they generate silence. A pipeline that stops running, a scheduler that has crashed, a message that never arrives. Alerts on the absence of activity saved me more often than alerts on errors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Documentation is not optional
&lt;/h3&gt;

&lt;p&gt;In dev, you can sometimes get by with readable code and a few comments. In production, if the on-call procedure isn't written down, it doesn't exist. The person on call at 3 AM doesn't have time to guess.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I went back to technical work
&lt;/h2&gt;

&lt;p&gt;After two and a half years, I decided to return to a Data Engineer role. The reason is simple: &lt;strong&gt;I felt I was regressing technically&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The PM day-to-day is fascinating: the diversity of problems, the human dimension, the direct impact on data reliability. But I was spending my days facilitating, documenting and communicating, and less and less designing and coding.&lt;/p&gt;

&lt;p&gt;I was afraid of falling behind, of no longer being up to speed on fast-evolving technologies: Spark, Databricks, lakehouse architectures. The risk of becoming a purely managerial profile without technical expertise didn't sit well with me.&lt;/p&gt;

&lt;p&gt;Today, looking back, I don't regret the experience. It gave me an understanding of production that many developers don't have. When I design a pipeline now, I naturally think about observability, error recovery, and operational documentation. These are reflexes that code alone wouldn't have given me.&lt;/p&gt;

&lt;h2&gt;
  
  
  In summary
&lt;/h2&gt;

&lt;p&gt;If you're a developer and someone offers you a production-oriented role, here's what I'd tell you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It's a real job&lt;/strong&gt;, not a support role. It requires engineering, rigor, and a lot of soft skills.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You'll learn things that development will never teach you&lt;/strong&gt; — crisis communication, priority management under pressure, the end-to-end view of a data product.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set a time limit&lt;/strong&gt;. It's enriching, but if your core expertise is technical, don't stay too long or you risk falling behind.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bring those reflexes back into your code&lt;/strong&gt;. Observability, documentation, monitoring the silence — these are skills that make better engineers.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;You can also read this and other articles on &lt;a href="https://www.hamdimechelloukh.com" rel="noopener noreferrer"&gt;my portfolio&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>production</category>
      <category>devops</category>
      <category>career</category>
    </item>
    <item>
      <title>AgenticDev: a multi-LLM framework for generating tested code</title>
      <dc:creator>Hamdi Mechelloukh</dc:creator>
      <pubDate>Fri, 20 Mar 2026 10:58:27 +0000</pubDate>
      <link>https://dev.to/hamdi_mechelloukh_628620a/agenticdev-a-multi-llm-framework-for-generating-tested-code-184</link>
      <guid>https://dev.to/hamdi_mechelloukh_628620a/agenticdev-a-multi-llm-framework-for-generating-tested-code-184</guid>
      <description>&lt;p&gt;In late 2025, after spending hours prompting LLMs one by one to generate code, a question kept nagging me: &lt;strong&gt;what if multiple LLM agents could collaborate to produce a complete project?&lt;/strong&gt; Not a single agent doing everything, but a specialized team (an architect, a developer, a tester), each with its own role, tools, and constraints.&lt;/p&gt;

&lt;p&gt;That's how &lt;strong&gt;AgenticDev&lt;/strong&gt; was born, a Python framework that orchestrates 4 LLM agents to turn a plain-text request into tested, documented code.&lt;/p&gt;

&lt;p&gt;In this article, I share the architecture decisions, the problems I ran into, and the lessons learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting point: testing the limits of multi-agent collaboration
&lt;/h2&gt;

&lt;p&gt;My initial goal was simple: explore how far LLM agents can collaborate autonomously. Not a throwaway POC, but a real pipeline where each agent has a clear responsibility:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Architect&lt;/strong&gt; — analyzes the request and produces a technical specification (&lt;code&gt;spec.md&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Designer&lt;/strong&gt; — generates SVG assets from the spec&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer&lt;/strong&gt; — implements the code following the spec and integrating the assets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tester&lt;/strong&gt; — writes and runs tests, then sends failures back to the Developer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The idea is the &lt;strong&gt;Agent as Tool&lt;/strong&gt; pattern: each agent is a node in an execution graph, not an LLM calling other LLMs chaotically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture: why LangGraph over an LLM orchestrator
&lt;/h2&gt;

&lt;p&gt;My first approach was letting an orchestrator agent (Gemini) dynamically decide which sub-agent to call, via function calls. It worked, but I quickly identified a problem: &lt;strong&gt;the more generic the system, the more unpredictable it became.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The LLM orchestrator could decide to skip the Designer, call the Tester before the Developer, or loop indefinitely. For a framework that needs to produce reliable code, that's a deal-breaker.&lt;/p&gt;

&lt;p&gt;So I chose to &lt;strong&gt;delegate orchestration to LangGraph&lt;/strong&gt;, a deterministic graph framework. The pipeline becomes explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Architect → Designer → Developer → Tester
                                      │
                                      ▼ (tests fail?)
                                   Developer ← fix loop (max 3×)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each node is an autonomous agent, but &lt;strong&gt;execution order and retry logic are deterministic&lt;/strong&gt;. The LLM controls the &lt;em&gt;what&lt;/em&gt; (generated content), but not the &lt;em&gt;when&lt;/em&gt; (execution flow).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PipelineState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;architect&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;architect&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;designer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;designer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;developer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;developer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tester&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tester&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;should_fix_or_end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fix&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fix_developer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fix_developer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tester&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;should_fix_or_end&lt;/code&gt; function is pure Python: it parses the Tester's output and decides whether to rerun the Developer or finish. No LLM in the decision loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  The prompt caching problem and the switch to full Gemini
&lt;/h2&gt;

&lt;p&gt;During the exploration phase, I very quickly hit &lt;strong&gt;API rate limits&lt;/strong&gt; on Gemini. Every agent call sent the full system prompt, tool definitions, project context, thousands of tokens per request.&lt;/p&gt;

&lt;p&gt;The solution: &lt;strong&gt;prompt caching&lt;/strong&gt;. But Gemini and Claude handle it very differently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gemini: implicit caching
&lt;/h3&gt;

&lt;p&gt;Gemini automatically caches repeated prefixes. If the system prompt and initial instructions are identical between two calls, Google reuses the cached context. On the code side, there's nothing to do: caching is transparent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Savings show up in usage metadata
&lt;/span&gt;&lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cached_content_token_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt_token_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache hit: %d/%d tokens (%d%%)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Claude: explicit caching
&lt;/h3&gt;

&lt;p&gt;Claude requires explicit &lt;code&gt;cache_control: ephemeral&lt;/code&gt; markers on the blocks you want cached: the system prompt, tool definitions, and the first user message.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}]&lt;/span&gt;

&lt;span class="n"&gt;claude_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_fn_to_claude_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;claude_tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;claude_tools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why I switched to full Gemini
&lt;/h3&gt;

&lt;p&gt;I started with a multi-LLM architecture: Gemini for the Architect and Tester, Claude for the Developer. The idea was appealing: use each LLM where it excels.&lt;/p&gt;

&lt;p&gt;In practice, &lt;strong&gt;Claude's API cost quickly made this approach unsustainable&lt;/strong&gt;. A full pipeline run with Claude as Developer cost significantly more than with Gemini, especially during fix iterations where the context grows with each turn. So I decided to switch to &lt;strong&gt;full Gemini&lt;/strong&gt; as the default pipeline, while keeping the &lt;code&gt;ClaudeAgent&lt;/code&gt; in the framework as a configurable option.&lt;/p&gt;

&lt;p&gt;This pragmatic choice also let me fully benefit from Gemini's implicit caching across the entire pipeline, without managing two different caching strategies in production.&lt;/p&gt;

&lt;p&gt;The contrast between both approaches still pushed me to design the class hierarchy to isolate these differences:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BaseAgent (ABC)
├── GeminiAgent    → implicit caching, google-genai SDK
│   ├── ArchitectAgent
│   ├── DesignerAgent
│   ├── DeveloperAgent
│   └── TesterAgent
└── ClaudeAgent    → explicit caching, anthropic SDK
    └── DeveloperAgent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each agent inherits its backend's caching strategy without having to worry about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent hierarchy: ABC and specialization
&lt;/h2&gt;

&lt;p&gt;The core of the framework relies on a simple hierarchy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;BaseAgent&lt;/code&gt;&lt;/strong&gt; (ABC) — defines the contract: &lt;code&gt;run(context) → AgentResult&lt;/code&gt;, tool management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;GeminiAgent&lt;/code&gt;&lt;/strong&gt; — implements the agentic loop for Gemini (chat + tool calls)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ClaudeAgent&lt;/code&gt;&lt;/strong&gt; — implements the agentic loop for Claude (messages + tool_use blocks)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Specialized agents (Architect, Developer, Tester) inherit from &lt;code&gt;GeminiAgent&lt;/code&gt; and only define their &lt;strong&gt;instructions&lt;/strong&gt; and &lt;strong&gt;tools&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ArchitectAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GeminiAgent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Architect&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a software architect...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;write_file&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.1-pro-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To add a new agent, just create a class, define its instructions, and add it as a node in the LangGraph pipeline. No need to touch the chat logic, tool calling, or caching.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Designer: a special case
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;DesignerAgent&lt;/code&gt; is an interesting case. Unlike other agents that use the standard agentic loop (chat → tool call → response → tool call → ...), the Designer makes &lt;strong&gt;direct API calls&lt;/strong&gt; to generate SVG.&lt;/p&gt;

&lt;p&gt;Why? Because SVG generation is a well-defined two-step workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Planning&lt;/strong&gt; — "what assets does this project need?" → returns JSON&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generation&lt;/strong&gt; — "generate these N SVG sprites" → returns parsable text&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No need for an agentic loop with tools here. The Designer still inherits from &lt;code&gt;GeminiAgent&lt;/code&gt; (for the API client and key validation), but it &lt;strong&gt;overrides &lt;code&gt;run()&lt;/code&gt;&lt;/strong&gt; with its own logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  The automatic fix loop
&lt;/h2&gt;

&lt;p&gt;One of the most useful aspects of the pipeline is the &lt;strong&gt;fix loop&lt;/strong&gt;. When the Tester detects failures, the Developer is relaunched in &lt;strong&gt;FIX MODE&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;should_fix_or_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PipelineState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fix&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nf"&gt;_has_test_failures&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fix_iterations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;MAX_FIX_ITERATIONS&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fix&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Developer then receives the test output in its context, with a clear instruction:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"You are in FIX MODE — read existing files and fix these. Do NOT rewrite all files from scratch."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In practice, 3 iterations are enough in most cases to go from 60-70% passing tests to 100%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shared tools
&lt;/h2&gt;

&lt;p&gt;Agents interact with the file system through 4 simple tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;write_file(path, content)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Write a file (creates parent directories)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;read_file(path)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Read an existing file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;execute_code(command)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Execute a shell command&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;web_search(query)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Web search via DuckDuckGo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These tools are plain Python functions, passed to agents through their constructor. The framework handles exposing them to the LLM in the right format (Gemini function declarations or Claude tool definitions).&lt;/p&gt;

&lt;h2&gt;
  
  
  The limits: a solid foundation, not a finished product
&lt;/h2&gt;

&lt;p&gt;Let's be honest about what the framework can and can't do. &lt;strong&gt;AgenticDev excels at generating a functional project base&lt;/strong&gt;: file structure, initial code, tests, documentation. For simple projects (CLI tools, libraries, small APIs), the output is often usable as-is.&lt;/p&gt;

&lt;p&gt;But as complexity grows (intricate business logic, multiple integrations, performance constraints), &lt;strong&gt;the generated code will be a starting point, not the final product&lt;/strong&gt;. There will be technical limitations (overly naive architectures, uncovered edge cases) and functional gaps (the LLM doesn't know your business context) that you'll need to fix manually or by vibe-coding with a tool like Claude Code or Cursor.&lt;/p&gt;

&lt;p&gt;This is actually the workflow I recommend: let AgenticDev generate the skeleton, then iterate on it with a coding assistant to refine the details. The framework saves you the first hours of setup, not the last hours of polish.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Specialization beats generality
&lt;/h3&gt;

&lt;p&gt;An agent that "does everything" is less reliable than a team of specialized agents. The Architect can't code, the Developer can't test, and that's by design. Each agent has precise instructions and a limited scope.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deterministic orchestration is non-negotiable
&lt;/h3&gt;

&lt;p&gt;Letting an LLM decide the execution flow means accepting that the pipeline behaves differently on every run. For a code generation tool, that's unacceptable. LangGraph let me keep the LLMs' creativity while enforcing a predictable execution order.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt caching is essential in multi-agent systems
&lt;/h3&gt;

&lt;p&gt;Without caching, a 4-agent pipeline easily consumes 100k+ tokens per run, 80% of which is repeated context. Caching significantly reduces both costs and latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost dictates architecture
&lt;/h3&gt;

&lt;p&gt;Starting with multi-LLM was intellectually satisfying, but economic reality caught up. Keeping the multi-backend abstraction while using a single provider by default is the right trade-off: you only pay for what you use, without sacrificing flexibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent instructions are code
&lt;/h3&gt;

&lt;p&gt;Agent prompts aren't vague sentences: they're precise specifications with rules, examples, and edge cases. For instance, the Developer's prompt includes rules on Python vs TypeScript conventions, placeholder handling, and a mandatory completion audit before returning its response.&lt;/p&gt;

&lt;h2&gt;
  
  
  Going further
&lt;/h2&gt;

&lt;p&gt;The source code is available on &lt;a href="https://github.com/HamdiMechelloukh/AgenticDev" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. The framework is designed to be extended: adding a new agent takes about ten lines of code.&lt;/p&gt;

&lt;p&gt;You can also read this and other articles on &lt;a href="https://www.hamdimechelloukh.com" rel="noopener noreferrer"&gt;my portfolio&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Next steps I'm considering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Support for new LLM backends (Mistral, Llama)&lt;/li&gt;
&lt;li&gt;Quality metrics on generated code&lt;/li&gt;
&lt;li&gt;Interactive mode with human validation between each step&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>python</category>
      <category>langchain</category>
    </item>
  </channel>
</rss>
