<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pratik Dhanave</title>
    <description>The latest articles on DEV Community by Pratik Dhanave (@pratikdhanave).</description>
    <link>https://dev.to/pratikdhanave</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F225118%2Fec05069a-629a-4ef8-a962-12705bd0b599.jpg</url>
      <title>DEV Community: Pratik Dhanave</title>
      <link>https://dev.to/pratikdhanave</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pratikdhanave"/>
    <language>en</language>
    <item>
      <title>Architecture Over Alerts: How We Cut BigQuery Costs by 57%($12M) for a Fortune 500</title>
      <dc:creator>Pratik Dhanave</dc:creator>
      <pubDate>Mon, 08 Jun 2026 06:02:02 +0000</pubDate>
      <link>https://dev.to/pratikdhanave/architecture-over-alerts-how-we-cut-bigquery-costs-by-5712m-for-a-fortune-500-34mb</link>
      <guid>https://dev.to/pratikdhanave/architecture-over-alerts-how-we-cut-bigquery-costs-by-5712m-for-a-fortune-500-34mb</guid>
      <description>&lt;p&gt;I helped redesign a large BigQuery-based enterprise data warehouse and cut spend by 57%. The biggest savings didn't come from dashboards or one-off query tuning. They came from architecture decisions — partitioning, clustering, incremental MERGE patterns, and a better capacity model.&lt;/p&gt;

&lt;p&gt;Here's how I approach cost as an architectural problem in large systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I joined as the Solution Architect for a Fortune 500 financial company running BigQuery at scale — hundreds of analysts, dozens of automated pipelines, 4+ TB of daily ingestion across finance, risk, and regulatory reporting workloads.&lt;/p&gt;

&lt;p&gt;The platform had grown organically. Datasets were added without architectural standards. Queries ran without cost awareness. The pricing model was never revisited as workloads matured. Leadership was asking "how much are we spending?" We needed to answer a different question: "why does our architecture force us to spend this much?"&lt;/p&gt;

&lt;p&gt;In BigQuery, cost scales with data scanned — so table design and query patterns directly determine spend. One unoptimized query scanning 10 TB hourly — without caching or partition pruning — burns through a massive annual budget. We had dozens of patterns like this across hundreds of queries.&lt;/p&gt;

&lt;p&gt;The cost drivers fell into four areas: compute (unoptimized queries and MERGE operations), storage (no partitioning, clustering, or lifecycle policies), governance (no cost attribution or alerting), and capacity (on-demand pricing for predictable workloads).&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Found
&lt;/h2&gt;

&lt;p&gt;The first four weeks were pure analysis. No changes. Using BigQuery's INFORMATION_SCHEMA and Jobs logs — no third-party tools — we profiled every query by cost, frequency, and scan volume, mapped every dataset by partitioning state and access patterns, traced pipelines end-to-end to understand what would break, and modeled 90 days of workload behavior for capacity planning.&lt;/p&gt;

&lt;p&gt;Across hundreds of pipelines, a small number of recurring architectural patterns dominated total spend.&lt;/p&gt;

&lt;p&gt;The single largest: what we internally called the &lt;strong&gt;MERGE anti-pattern&lt;/strong&gt; — MERGE operations doing full-table read + write on every execution, even when only a fraction of rows had changed. This one pattern accounted for more waste than any other issue we found.&lt;/p&gt;

&lt;p&gt;We explicitly avoided query-level tuning as a primary strategy. At this scale, tuning individual queries reduces cost — but redesigning the architecture that generates those queries changes the cost curve. That decision shaped everything that followed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five Architecture Decisions
&lt;/h2&gt;

&lt;p&gt;Every decision was constrained by zero-downtime requirements — hundreds of downstream pipelines, regulatory reporting jobs, and cross-team dependencies all depended on these tables. Nothing could break.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Zero-downtime wasn't a constraint — it was the primary design input.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Partitioning:&lt;/strong&gt; Ingestion-date partitioning on high-traffic tables, over business-key partitioning. We traded optimal query performance for migration safety — accepting slightly higher scan costs on edge-case query patterns to avoid breaking production pipelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clustering:&lt;/strong&gt; High-cardinality filter columns, over composite clustering. 80% of the benefit for 20% of the complexity. We accepted the remaining gap for operational simplicity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The MERGE anti-pattern — the biggest win.&lt;/strong&gt; The underlying pattern: treating BigQuery like a traditional RDBMS. Every MERGE operation was reading and rewriting the entire target table on every run. We shifted from full-table DML to idempotent, partition-scoped operations — joining only against affected shards using staging tables with predicate filters. Write amplification dropped ~90%. One architectural pattern change eliminated the platform's largest cost driver.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Capacity model:&lt;/strong&gt; Hybrid — 70/30 split between committed slots for predictable workloads and on-demand burst for peaks. We traded slightly higher peak cost for 40% lower total cost and predictable billing. Finance could finally forecast cloud spend annually — that alone changed the stakeholder conversation from reactive to strategic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monitoring:&lt;/strong&gt; Real-time cost attribution with automated multi-tier alerting, replacing the monthly bill-review cycle. It required instrumentation effort upfront, but it shifted governance from reactive to proactive — and gave every team visibility into their own cost footprint for the first time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;These were structural changes, not temporary fixes. The savings persist because the architecture prevents regression.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;57% reduction in data warehouse spend&lt;/li&gt;
&lt;li&gt;Significant realised annual savings at multi-petabyte scale with thousands of daily queries&lt;/li&gt;
&lt;li&gt;3-5x performance improvement on the highest-cost query patterns&lt;/li&gt;
&lt;li&gt;Top pipelines reduced from ~10 TB scans per run to under 500 GB — a 95% reduction through partition pruning and clustering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The engagement evolved from a tactical intervention into a strategic FinOps roadmap. The client established a permanent Architecture Review Board to vet all future high-scale pipelines.&lt;/p&gt;

&lt;p&gt;The approach was repeatable: diagnostic → cost driver isolation → architecture intervention → governance. We've since applied the same model across multiple enterprise environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently: The Trust Gap
&lt;/h2&gt;

&lt;p&gt;I underestimated how long it takes to get stakeholder buy-in without real-time proof. For nine weeks, leadership was trusting my word that the architecture changes were working — I couldn't show them live cost-attribution data until the monitoring layer went live in Phase 3. That was a risk I shouldn't have taken.&lt;/p&gt;

&lt;p&gt;If I reran this, I'd deploy cost instrumentation in Week 1, alongside the first architecture changes. Not because dashboards solve cost problems — they don't. But because architects need to show their work in real-time, or the next budget review becomes a trust exercise instead of a data conversation. I'd also implement &lt;strong&gt;Cost-as-Code&lt;/strong&gt; — CI/CD gates that reject queries exceeding a scan threshold — so cost governance becomes automated rather than advisory.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;At scale, cloud cost is an architectural outcome — not a reporting problem. Monitoring explains spend. Architecture determines it.&lt;/p&gt;

&lt;p&gt;What's the most expensive architecture mistake you've seen in a cloud data warehouse?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Pratik Dhanave — Architect, 7+ years. Enterprise FinOps &amp;amp; Cloud Architecture. Google Cloud Next Speaker. GSoC Mentor (2019–present).&lt;/em&gt;&lt;/p&gt;

</description>
      <category>bigquery</category>
      <category>finops</category>
      <category>cloudarchitecture</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Building Production Multi-Agent Systems: Real-World Lessons from Genie</title>
      <dc:creator>Pratik Dhanave</dc:creator>
      <pubDate>Thu, 04 Jun 2026 03:18:27 +0000</pubDate>
      <link>https://dev.to/pratikdhanave/building-production-multi-agent-systems-real-world-lessons-from-genie-2f6m</link>
      <guid>https://dev.to/pratikdhanave/building-production-multi-agent-systems-real-world-lessons-from-genie-2f6m</guid>
      <description>&lt;h1&gt;
  
  
  Building Production Multi-Agent Systems: Real-World Lessons from Genie
&lt;/h1&gt;

&lt;p&gt;I shipped a &lt;strong&gt;15-agent financial assistant&lt;/strong&gt; on Microsoft's Multi-Agent Reference Architecture (MARA). It processes 40M requests per month, orchestrates complex workflows, and costs 40% less than alternatives.&lt;/p&gt;

&lt;p&gt;This isn't a "how to build agents" tutorial. This is what I learned when agents &lt;em&gt;broke&lt;/em&gt; in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem With Single-Agent Systems
&lt;/h2&gt;

&lt;p&gt;You start with one agent: "Give me a financial recommendation."&lt;/p&gt;

&lt;p&gt;It works. Then users ask for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Account analysis + fraud detection + investment advice &lt;em&gt;in the same request&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Real-time responses (P99 latency &amp;lt; 2 seconds)&lt;/li&gt;
&lt;li&gt;Cost control (don't spend more on LLM calls than we make in profit)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A single agent trying to do all three becomes a bottleneck. Enter: &lt;strong&gt;multi-agent orchestration&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture That Worked
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Customer Request
    ↓
Supervisor Agent ← (routes intelligently)
    ├→ Analyst Agent ← (account data)
    ├→ Risk Agent ← (fraud detection)
    ├→ Recommender Agent ← (investment logic)
    └→ Compliance Agent ← (regulatory checks)
    ↓
Aggregator (merge results)
    ↓
Response to customer (&amp;lt; 2 seconds)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; Each agent is independent. If Risk Agent is slow, it doesn't block Analyst Agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 1: Hierarchical Routing
&lt;/h2&gt;

&lt;p&gt;The Supervisor agent doesn't run LLM logic. It's dumb and fast:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SupervisorAgent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;specialists&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;specialists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;specialists&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Determine which agents to call based on request type
&lt;/span&gt;        &lt;span class="n"&gt;required_agents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;classify_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Call them in parallel
&lt;/span&gt;        &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;required_agents&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Aggregate results
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supervisor is lightweight (no LLM call)&lt;/li&gt;
&lt;li&gt;Specialists run in parallel (fast)&lt;/li&gt;
&lt;li&gt;Each specialist is reusable (easy to test, modify, scale)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pattern 2: Cost-Aware Routing
&lt;/h2&gt;

&lt;p&gt;This is where the 40% savings came from. Route by &lt;strong&gt;value&lt;/strong&gt;, not by capability.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CostGuardian&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Enforce budget limits on every agent call.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_with_budget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;estimated_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# High-value decision (customer asking about $50K portfolio)?
&lt;/span&gt;        &lt;span class="c1"&gt;# Use expensive model (GPT-4)
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_with_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Low-value decision (quick lookup)?
&lt;/span&gt;        &lt;span class="c1"&gt;# Use cheap model (Ollama)
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_with_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama:3b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Medium value? Tradeoff model
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_with_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-3.5-turbo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; 30-40% cost reduction without sacrificing quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 3: Structured Message Passing
&lt;/h2&gt;

&lt;p&gt;Don't let agents communicate via free-form strings. Use schemas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AnalystResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;monthly_income&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;recent_transactions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;  &lt;span class="c1"&gt;# 0-1, how confident in this data?
&lt;/span&gt;
&lt;span class="c1"&gt;# Supervisor validates before passing to next agent
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_and_forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AnalystResult&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;low confidence, retry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why:&lt;/strong&gt; Catch errors early. Agents can validate input before processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 4: Timeout &amp;amp; Fallback Chains
&lt;/h2&gt;

&lt;p&gt;In production, agents timeout. Have a plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_with_fallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;primary_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fallback_agents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Try primary. If timeout, try fallback 1. If timeout, try fallback 2.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;primary_agent&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;fallback_agents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;TimeoutError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;  &lt;span class="c1"&gt;# Try next agent
&lt;/span&gt;
    &lt;span class="c1"&gt;# All failed
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all agents timed out&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real example: Risk Agent is slow (database query backlog). Fallback to Risk Agent cached result. User gets a response instead of timeout.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability: The Game Changer
&lt;/h2&gt;

&lt;p&gt;You need to see &lt;strong&gt;every&lt;/strong&gt; agent call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.trace&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;

&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tracer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agent_execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent.&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;request.type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;request.value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_cost&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In production, I can see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which agent is slow (latency heatmap)&lt;/li&gt;
&lt;li&gt;Which agent is expensive (cost per call)&lt;/li&gt;
&lt;li&gt;Which agent fails most (error rate by agent)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This data drove our optimization decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Numbers
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Before multi-agent:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;P99 latency: 5 seconds (single agent doing everything)&lt;/li&gt;
&lt;li&gt;Cost per request: $0.032&lt;/li&gt;
&lt;li&gt;Error rate: 2.1%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;After multi-agent + patterns above:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;P99 latency: 1.2 seconds (40% reduction)&lt;/li&gt;
&lt;li&gt;Cost per request: $0.018 (40% savings)&lt;/li&gt;
&lt;li&gt;Error rate: 0.3% (better due to fallbacks)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scale:&lt;/strong&gt; 40M requests/month × $0.014 savings = &lt;strong&gt;$560K/month saved&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trap: Premature Multi-Agent
&lt;/h2&gt;

&lt;p&gt;Don't build this until you need it. Signals you need multi-agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single agent request handling time &amp;gt; 1 second&lt;/li&gt;
&lt;li&gt;One request needs &amp;gt;3 different LLM calls&lt;/li&gt;
&lt;li&gt;Cost per request &amp;gt; $0.01&lt;/li&gt;
&lt;li&gt;Error rate &amp;gt; 0.5%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before that? Single agent is simpler and faster to build.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with one agent, add observability immediately&lt;/strong&gt; — then you'll see where multi-agent helps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost budgets from day 1&lt;/strong&gt; — not an afterthought&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test with mocks first&lt;/strong&gt; — multi-agent systems are hard to test with real LLM calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document processor agreements&lt;/strong&gt; — especially important if agents share data&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Hardest Part
&lt;/h2&gt;

&lt;p&gt;Not the architecture. The hardest part is &lt;strong&gt;testing&lt;/strong&gt;. You can't mock everything. You need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unit tests (mock all LLM calls)&lt;/li&gt;
&lt;li&gt;Integration tests (real agents, fake data)&lt;/li&gt;
&lt;li&gt;Load tests (does it handle 30K RPS?)&lt;/li&gt;
&lt;li&gt;Cost audits (are we actually saving money?)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;Multi-agent systems aren't magic. They're &lt;strong&gt;cost-aware, observable, fault-tolerant orchestration&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you're building agents:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start with one&lt;/li&gt;
&lt;li&gt;Add observability (OpenTelemetry)&lt;/li&gt;
&lt;li&gt;Scale to multiple agents only when you hit limits&lt;/li&gt;
&lt;li&gt;Enforce budgets from the start&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Questions?&lt;/strong&gt; I detail implementation patterns for compliance-driven multi-agent systems (GDPR, PSD2) on my &lt;a href="https://pratikdhanave.github.io/articles/27-multi-agent-systems/" rel="noopener noreferrer"&gt;portfolio&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; #architecture #python #ai #production #observability&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>systemdesign</category>
    </item>
  </channel>
</rss>
