DEV Community: Sayed Ali Alkamel

Open Knowledge Format (OKF): A Technical Leader's Adoption Guide

Sayed Ali Alkamel — Fri, 24 Jul 2026 08:18:50 +0000

Short version: Google Cloud published the Open Knowledge Format (OKF) on June 12, 2026. It is a directory of markdown files with YAML frontmatter that any AI agent can read without an SDK, an account, or an integration. If your teams are shipping agents against internal data, OKF is a cheap way to stop each team rebuilding the same context layer.

What is the Open Knowledge Format?

OKF is an open specification for representing the metadata, context, and curated knowledge that AI systems need, published as version 0.1 by the Google Cloud Data Cloud team (Google Cloud blog). A bundle is a directory of markdown files. Each file is one concept: a table, a metric, an API, a runbook. The file path is the concept's identity, so tables/orders.md has the concept ID tables/orders (OKF v0.1 spec).

That is the whole idea. No runtime, no registry, no required tooling. Google calls v0.1 a starting point rather than a finished standard, and ships it under Apache 2.0.

Hold onto one boundary: OKF is a format, not a platform and not a search signal. It is loaded by your agents, not crawled from a public URL.

What problem does it actually solve?

The context an agent needs is mostly internal. What a metric means in your business, which table joins to which, why an API was deprecated last quarter. Google's framing is that these facts sit in metadata catalogs with their own APIs, in wikis and shared drives, in code comments, and in the heads of a few senior engineers.

So every new agent assembles the same answer from incompatible surfaces, and every catalog vendor models the same objects again. That shows up on your budget as duplicated integration work, not as a missing product. It is why the answer here is a format instead of another service.

OKF or your existing data catalog?

This is not a replacement decision. The spec lists prescribing storage, serving, or query infrastructure as an explicit non-goal, and it does not subsume domain schemas such as Avro, Protobuf, or OpenAPI. It references them instead.

Rule of thumb: if the question is where knowledge is governed, that stays a catalog decision. If the question is how knowledge travels to an agent that your catalog vendor did not build, that is OKF.

How does OKF actually work?

Frontmatter carries the few fields you want to query or filter on. The body carries what humans and models actually read.

---
type: BigQuery Table
title: Orders
description: One row per completed customer order.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders
tags: [sales, revenue]
timestamp: 2026-05-28T14:30:00Z
---

# Schema

| Column        | Type    | Description                              |
|---------------|---------|------------------------------------------|
| `order_id`    | STRING  | Globally unique order identifier.        |
| `customer_id` | STRING  | FK to [customers](/tables/customers.md). |

Exactly one field is required: type. Everything else, including title, description, resource, tags, and timestamp, is recommended but optional. Concepts link to each other with ordinary markdown links, which makes the directory a graph rather than a tree. Two filenames are reserved: index.md for progressive disclosure and log.md for change history.

Conformance is three rules. Every non-reserved .md file has a parseable frontmatter block, every block has a non-empty type, and any index.md or log.md follows its defined structure. Consumers are told not to reject a bundle over unknown types, unknown extra keys, or broken links.

How do you roll it out?

Start where your agents are already wrong. Metric definitions and join paths are the usual first win, and they are small enough to finish.

Google shipped a reference enrichment agent that walks a BigQuery dataset, writes one concept document per table and view, then runs a second pass that crawls documentation URLs you seed it with and adds citations and join paths. It also shipped a self-contained HTML visualizer and three browsable sample bundles. Treat all of it as a proof of concept, because nothing in the format requires a particular agent framework or model.

The step that matters organizationally is the third one. Put the bundle in git, give each directory a named owner, and review changes through pull requests. Knowledge curation becomes a normal engineering activity with diffs, blame, and an approver, which is the governance mechanism most wikis never had.

Three things to know before you start

Links are untyped. A link asserts that a relationship exists, but the kind of relationship is carried by the surrounding prose, not by the link. If you need depends_on versus supersedes, you will add your own frontmatter keys. The spec permits that and tells consumers to preserve keys they do not recognize.
A writable bundle is an attack surface. If an agent enriches concepts from crawled pages or ticket text, untrusted content lands in a file that other agents read as authority. That is indirect prompt injection, the top entry in OWASP's LLM risk list. Keep agent writes behind human review and scope what the enrichment pass may fetch. Google's reference agent enforces a hard page cap and a same-host filter for exactly this reason.
Adoption is the open question, not the format. The value of any exchange format is how many parties speak it, and at launch the reference producer and consumer were both Google's. Minor versions are promised to be backward compatible, major versions may break. Your downside is bounded: if OKF stalls, you are left with well organized, version-controlled documentation.

FAQ

Does OKF improve SEO or AI search visibility?
No. It is an internal bundle your own agents load, not a public file published at a well-known URL. Schema.org markup is still what makes your web pages legible to search engines.

Does OKF replace MCP or RAG?
No. MCP is how an agent reaches tools and live systems. OKF is the durable context it reads before acting, and an MCP server can serve a bundle as a resource.

What is the license?
Apache 2.0, in the GoogleCloudPlatform/knowledge-catalog repository. Google also updated Knowledge Catalog, formerly Dataplex, to ingest OKF and serve it to agents.

What does adopting it cost?
No license and no SDK. The real cost is curation time, plus answering who owns and approves each concept document.

Is v0.1 safe to build on?
It is a draft. Bundles may declare okf_version: "0.1" in the root index.md frontmatter, and consumers that do not understand a declared version are told to attempt best-effort consumption rather than refuse the bundle.

Sources

Introducing the Open Knowledge Format, Google Cloud blog: https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing/
OKF v0.1 specification: https://github.com/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/SPEC.md
OKF reference agent, visualizer, and sample bundles: https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/okf
OWASP LLM01, Prompt Injection: https://genai.owasp.org/llmrisk/llm01-prompt-injection/
Karpathy, LLM wiki pattern: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

Code Knowledge Graphs: How to Evaluate Graphify, GitNexus, and CodeGraph

Sayed Ali Alkamel — Fri, 24 Jul 2026 07:51:35 +0000

Short version: Graphify, GitNexus, and CodeGraph all parse your repo with tree-sitter and serve a graph to an AI coding agent over MCP. They occupy the same square of the market, so raw capability will not separate them. Three things will: the license, how long the graph stays out of date, and how much of the call graph the tool can actually resolve.

What is a code knowledge graph, and what is it not?

A code knowledge graph is a pre-built index of every symbol in a repo and every edge between them: calls, imports, extends, implements. An agent queries that index instead of running grep and read until it reconstructs your architecture. In CodeGraph's benchmark across seven open source repos, the agent's file reads dropped to roughly zero on all seven (README).

What it is not is ground truth. Every edge was either read from source or guessed by a heuristic, and some edges never get built at all. Treat the output as a lower bound on impact, never a guarantee. That reframe is the difference between using a graph to explore a system and using it to approve a change.

Where do these tools sit in the market?

Two axes matter: how deep the analysis goes, and where it runs.

Text match. grep. Free, exhaustive, no structure.

Symbol index. Serena runs real language servers and answers at the symbol level on your machine, and it also edits by symbol path rather than line number. Sourcegraph indexes with SCIP and returns every reference rather than a ranked list, at multi-repo scale, on a server (Sourcegraph).

Call graph. All three tools in this article, plus FalkorDB's code-graph, which needs a graph database running.

Data flow. Joern builds a code property graph (syntax tree plus control flow plus program dependence) for questions like whether user input reaches a query unsanitized. GitNexus reaches into this tier with an opt-in --pdg mode, currently TypeScript and JavaScript only.

The three land in the same cell. Evaluate them on operations and risk, not on feature lists.

How stale does the graph get?

This is the failure mode nobody demos. Between your edit and the next index, the tool answers from a graph that describes code you no longer have.

CodeGraph watches the filesystem with native OS events and re-indexes after a 2 second debounce (tunable via CODEGRAPH_WATCH_DEBOUNCE_MS). Inside that window its responses prepend a banner naming the pending file and telling the agent to read it directly. It also reconciles on reconnect, so edits made while no server was running get absorbed on the next query.

Graphify ties refresh to commits. Between commits, the graph drifts from your working tree.

GitNexus needs gitnexus analyze re-run. Incremental indexing is still listed under actively building, though a post-commit hook can prompt a reindex in Claude Code and Codex.

If your team points agents at a repo they are actively editing, measure this first. An index that lags by a working session is worse than no index, because the agent stops hedging.

Can you trust the blast radius?

Every one of these tools reports impact analysis. None of them can report the edges it failed to build, so read the provenance model before you trust a number.

Graphify tags each edge EXTRACTED, INFERRED, or AMBIGUOUS, so you can tell what was read from source and what was reasoned (graphify.com). GitNexus attaches a confidence score and, when several symbols share a name, returns a ranked candidate list instead of guessing. CodeGraph publishes measured cross-file coverage per language against a named benchmark repo and does not hide the weak results: 95.8% on TypeScript, 86.7% on Rust, 73.8% on Liquid.

The residual is a static analysis ceiling, not a bug to file. Dynamic dispatch, reflection, dependency injection containers, and framework conventions resolve at runtime, so no AST parser will ever see them.

Which gate eliminates a tool first?

	Graphify	GitNexus	CodeGraph
License	MIT	PolyForm Noncommercial	MIT
Telemetry	none	none	anonymous, on by default
Languages	36	14	30+
Query language	none	Cypher	none
Human-readable output	`graph.html`	browser UI	none
Non-code input	docs, PDFs, SQL, Terraform	no	no

Run the license gate first. GitNexus ships under PolyForm Noncommercial, so commercial use needs a license from Akon Labs (README). That is a procurement conversation, not an install command, and it should happen before anyone builds a proof of concept on it. CodeGraph's telemetry is anonymous and documented, and it is on until you run codegraph telemetry off.

What to test on your own repo

Edit a file, then immediately ask about it. This measures the staleness window, which is the only number that reflects daily use.
Pick a symbol whose blast radius you already know. Compare what the tool returns against what you know breaks. The gap is your resolution ceiling.
Check your real language mix, including infrastructure code. Terraform, SQL, and legacy modules are where coverage varies most.
Decide who reads the output. If only agents do, visualization is dead weight. If a lead needs to review architecture, its absence is a blocker.
Time a cold index on your largest repo, then a warm one. Adoption dies on the first number, not the second.

Do not adopt any of these on a small repo. On a few hundred files an agent's own search is fine and the index is overhead. The category earns its keep on large, tangled, polyglot codebases, which is exactly where it is hardest to evaluate cheaply.

FAQ

Is a code knowledge graph better than vector RAG for code?
For structural questions, yes. Vector search returns chunks ranked by similarity and hopes the model reconnects them, while a graph returns an explicit path with file and line citations. That makes the answer auditable rather than merely plausible.

Does my code leave my machine?
Not for source parsing. All three parse locally with tree-sitter and zero model calls on code. Graphify's optional pass over docs and PDFs uses a model API you configure, which can point at a local Ollama backend.

Can I gate a pull request on graph impact analysis?
Only as a signal, not as a merge condition. The graph cannot enumerate the edges it failed to resolve, so a clean impact report is weak evidence of safety.

Which one supports the most languages?
Graphify lists 36 grammars, CodeGraph more than 30 in its language table, GitNexus 14. CodeGraph reaches the unusual ones: COBOL, Solidity, Terraform, Nix, Erlang, VB.NET, Delphi.

Do I still need an LSP tool like Serena?
Often yes. Graph tools find the blast radius, Serena makes the edit at symbol level. They solve different halves of the problem and run side by side.

Sources

Graphify: https://graphify.com/
Graphify repo: https://github.com/Graphify-Labs/graphify
GitNexus repo: https://github.com/abhigyanpatwari/GitNexus
CodeGraph repo: https://github.com/colbymchenry/codegraph
PolyForm Noncommercial license: https://polyformproject.org/licenses/noncommercial/1.0.0/
Serena: https://github.com/oraios/serena
Sourcegraph context comparison: https://sourcegraph.com/resources/context-compare
Joern code property graph: https://cpg.joern.io/

Star counts, versions, and benchmark figures in this category move weekly. Everything above was checked on 24 July 2026.

Flutter Material and Cupertino Leave the SDK: What Changes for Your App

Sayed Ali Alkamel — Sun, 19 Jul 2026 07:54:46 +0000

Short version: Flutter is moving Material and Cupertino out of the SDK and onto pub.dev as material_ui and cupertino_ui. Contributions to both libraries inside flutter/flutter froze on April 7, 2026, and both package names are already reserved by the flutter.dev publisher. Nothing in your app breaks today, but every import 'package:flutter/material.dart' in the ecosystem eventually becomes a package dependency.

What is the Material and Cupertino decoupling?

Flutter has always shipped its two design systems inside the framework. You import package:flutter/material.dart, you get Material's widget set, and you declare nothing. The decoupling project, tracked in flutter/flutter#101479, ends that arrangement and republishes both libraries as first-party packages on pub.dev.

The first milestone landed on April 7, 2026, when all contributions to Material and Cupertino inside flutter/flutter were frozen (Flutter blog). No further changes are allowed in the framework copies. Development resumes in flutter/packages once the new packages ship.

One correction worth making, because a lot of write-ups still get it wrong: the packages are named material_ui and cupertino_ui, not material and cupertino. Both already exist on pub.dev at version 0.0.1 under the verified flutter.dev publisher, currently unlisted placeholders (pub.dev). Even as placeholders, material_ui has collected 152 likes and 4.28k downloads.

Why move Material and Cupertino out of the SDK?

Release cadence is the headline reason. Flutter plans four stable releases for 2026 (Flutter blog), so a one-line design fix waits roughly a quarter to reach anyone. Contributors in the tracking issue documented exactly that, including a CupertinoAlertDialog divider rendering the wrong color because a value failed to resolve. Community design packages like fluent_ui and macos_ui ship on their own schedule and update far more often.

The architectural reason matters more. Material and Cupertino were entangled with each other and with the core widgets layer. An audit posted in the issue found 21 of Material's 179 source files importing Cupertino, while Cupertino imported Material zero times. Untangling that meant pushing shared behavior down into widgets, which is why the team has spent years relocating pieces like ToggleableStateMixin and RawMenuAnchor out of Material.

The payoff is a dependency inversion. Core widgets stop knowing about design systems, and design systems depend on widgets instead. That is what puts fluent_ui, yaru, and your own in-house design system on the same footing as Material.

![Two-lane diagram comparing how a design fix reaches your app today, waiting for the quarterly Flutter SDK release, versus after decoupling, when it ships as a material_ui release on pub.dev.]

What actually changes in your pubspec?

Two dependency lines and an import. That is the whole mechanical change for most apps:

dependencies:
  flutter:
    sdk: flutter
  material_ui: ^1.0.0
  cupertino_ui: ^1.0.0

// before
import 'package:flutter/material.dart';

// after
import 'package:material_ui/material_ui.dart';

The part that is not mechanical is what that import was hiding. package:flutter/material.dart re-exports the entire widgets library, so plenty of files import Material only to reach Column, Text, or StatelessWidget. Those files never needed a design system. They needed package:flutter/widgets.dart.

When does this hit your app?

Not yet, and not all at once. Flutter 3.44 shipped on May 18, 2026 with Material and Cupertino still inside the SDK. The in-framework copies are scheduled for deprecation in the stable release after 3.44, and deletion some time after that. The team has said it does not anticipate removing the old code within about a year (flutter/flutter#184093).

Four steps, and only the first one has happened:

![Four-step timeline of the Flutter Material and Cupertino decoupling: code freeze in April 2026, material_ui and cupertino_ui 1.0 published, SDK copies deprecated, SDK copies deleted.]

Three things to do before the packages land

Audit your material.dart imports. Any file that imports Material but only touches widgets primitives should move to package:flutter/widgets.dart now. This compiles on current stable, needs no new dependency, and removes those files from your migration entirely.
Package authors, plan for two support windows. If you publish something that re-exports Material or Cupertino widgets, you will need to work against both the SDK libraries and the new packages during the deprecation window. Decide early whether that is a major version bump or a compatibility shim, because your users will hit the deprecation warnings before you do.
Split your upgrade policy in two. The whole point is that you can hold material_ui at a version your design team signed off on while still taking SDK upgrades for performance and security fixes. Start treating the design system version and the Flutter version as two separate decisions.

FAQ

Will my app break when this ships?
No. The SDK copies stay through a deprecation window, and the Flutter team does not expect to delete them within about a year of the April 2026 freeze. You migrate on your own schedule.

What are the package names?
material_ui and cupertino_ui, both published by flutter.dev. Articles referring to material and cupertino predate the naming decision.

Is Google dropping Material design in Flutter?
No. Material stays first-party and Google-maintained. It moves onto its own release train so fixes and new Material specs land without waiting for a quarterly SDK cut.

Can I use material_ui today?
Not for real work. The published 0.0.1 is an unlisted placeholder, and its API reference still says the library is coming.

Does this shrink my app?
Probably not much. Tree shaking already drops unused widgets, so the real win here is architectural, not binary size.

Sources

Move the material and cupertino packages outside of Flutter (tracking issue): https://github.com/flutter/flutter/issues/101479
Flutter's Material and Cupertino code freeze: https://blog.flutter.dev/flutters-material-and-cupertino-code-freeze-d32d94c59c38
Material and Cupertino are now frozen (issue #184093): https://github.com/flutter/flutter/issues/184093
material_ui on pub.dev: https://pub.dev/packages/material_ui/versions
cupertino_ui on pub.dev: https://pub.dev/packages/cupertino_ui/versions
What's new in Flutter 3.41 (2026 release cadence): https://blog.flutter.dev/whats-new-in-flutter-3-41-302ec140e632
Flutter release notes: https://docs.flutter.dev/release/release-notes

Agent Design Patterns: Google and Anthropic, Side by Side

Sayed Ali Alkamel — Fri, 17 Jul 2026 19:25:36 +0000

Short version: Agent design patterns are reusable ways to structure how a language model plans, delegates, and checks its own work. Anthropic and Google both published official guides, and they mostly agree. The real skill is not memorizing patterns, it is choosing the simplest one that solves your task, then adding structure only when it pays off.

What are agent design patterns?

An agent design pattern is a common architectural approach for organizing an agentic system: how the model connects to tools, and how one or more agents are orchestrated to finish a task (Google Cloud). Anthropic frames the same idea as "simple, composable patterns" that beat complex frameworks in practice (Anthropic).

The single most useful distinction comes from Anthropic. Workflows are systems where the control flow is fixed in code. Agents are systems where the model decides the control flow at runtime. That one line predicts a pattern's cost, latency, and how hard it is to debug.

How the two official guides line up

Google's guidance lives in two places: a Developer Blog post on eight multi-agent patterns in the Agent Development Kit (ADK) (Google Developers), and a Cloud Architecture Center document that adds trade-offs and a decision table (Google Cloud). Anthropic's guidance is a single engineering post, "Building Effective Agents" (Anthropic). Here is how the names map.

What you want to do	Anthropic	Google
Start from one capable unit	Augmented LLM	Single-agent system
Run fixed steps in order	Prompt chaining	Sequential pipeline
Send each input to a specialist	Routing	Coordinator / dispatcher
Do independent work at once	Parallelization (sectioning, voting)	Parallel fan-out and gather
Break a task down at runtime	Orchestrator-workers	Hierarchical task decomposition
Critique and improve output	Evaluator-optimizer	Generator-critic, iterative refinement
Let the model own the loop	Autonomous agents	ReAct
Pause for a person	Human oversight, checkpoints	Human-in-the-loop
Let peers collaborate freely	(not named)	Swarm
Combine several patterns	Combining and customizing	Composite / custom logic

The overlap is not a coincidence. Both teams watched the same production systems and named the same shapes.

The one rule both guides repeat

Start simple. Anthropic says to find the simplest solution possible and add complexity only when it demonstrably improves outcomes. Google says to begin with a single agent, refine its prompt and tools, and reach for multi-agent designs only when one agent starts to struggle. The reason is money and reliability: multi-agent systems can use many more model calls, and every extra model call adds latency, cost, and a new place for errors to compound.

A useful test before you add a pattern: can this task run as a single model call with good retrieval and a few examples? If yes, you do not need an agent at all (Google Cloud).

The patterns, one article each

Each link below is a short, illustrated guide with an animated diagram, when to use it, when not to, and the known failure modes.

The Augmented LLM: the foundation, one model with retrieval, tools, and memory.
Prompt Chaining: fixed steps, each feeding the next.
Routing: classify an input, then send it to a specialist.
Parallelization: run subtasks at the same time, then merge.
Orchestrator-Workers: decide the subtasks at runtime and delegate.
Evaluator-Optimizer: generate, critique, and refine in a loop.
Autonomous Agents: the ReAct loop, where the model owns control.
Human-in-the-Loop: a person authorizes high-stakes steps.
Swarm: peer agents that debate and converge with no central boss.
Composite Patterns: how real systems chain patterns together.

How to choose, in practice

Group the decision by the shape of your task (Google Cloud). If the steps are known and fixed, you are in workflow territory: sequential, parallel, or a refinement loop. If the model must plan and decide at runtime, you need dynamic orchestration: a single agent, a coordinator, hierarchical decomposition, or a swarm. If quality comes from cycles of feedback, use ReAct, a loop, or a generator-critic. If the task carries real risk, add a human-in-the-loop checkpoint regardless of the base pattern.

FAQ

Are Google and Anthropic's agent patterns different?
Mostly no. The names differ, but the shapes match: chaining equals a sequential pipeline, routing equals a coordinator, and so on. Google adds a few multi-agent patterns Anthropic does not name, such as swarm.

What is the difference between a workflow and an agent?
In a workflow the control flow is written in code, so it is predictable. In an agent the model decides the control flow at runtime, so it is flexible but costs more and is harder to debug (Anthropic).

Do I need a framework to use these patterns?
No. Anthropic recommends starting with direct API calls, since many patterns take only a few lines of code. Frameworks like Google ADK help once you standardize on multi-agent orchestration.

Which pattern should I start with?
A single augmented LLM. Add another pattern only when that one agent measurably falls short.

Sources

Anthropic, Building Effective Agents: https://www.anthropic.com/engineering/building-effective-agents
Google Developers Blog, Developer's guide to multi-agent patterns in ADK: https://developers.googleblog.com/developers-guide-to-multi-agent-patterns-in-adk/
Google Cloud Architecture Center, Choose a design pattern for your agentic AI system: https://docs.cloud.google.com/architecture/choose-design-pattern-agentic-ai-system
Google Agent Development Kit docs: https://google.github.io/adk-docs/

Composite Patterns: How Real Agent Systems Combine the Basics

Sayed Ali Alkamel — Fri, 17 Jul 2026 19:22:38 +0000

Short version: Real agent systems rarely use one pattern. They chain several: route the request, fan out a search, then run a critic before replying. Google calls the mix a composite pattern, and gives you a custom logic pattern when even that is not enough. Anthropic frames the same idea as combining and customizing the building blocks.

What are composite patterns?

Composite patterns combine the basic patterns to build production-grade applications (Google Developers). Anthropic makes the same point about its building blocks: they are not prescriptive, they are common patterns you shape and combine to fit your use case (Anthropic).

Google's worked example is a customer support system: a coordinator routes the request, a technical issue triggers a parallel search of documentation and user history, and the final answer passes through a generator-critic loop to keep the tone consistent before it reaches the user (Google Developers).

When one pattern is not enough: custom logic

Sometimes the mix needs real branching that no single pattern provides. Google's custom logic pattern gives you maximum flexibility by letting you implement orchestration in code, using constructs like conditional statements to create workflows with multiple branching paths (Google Cloud).

The example is a refund agent. A coordinator runs a parallel verifier that checks the purchaser and refund eligibility at once. It then calls a tool to decide eligibility. If the user is eligible, it routes to a refund processor. If not, it routes to a separate sequential flow for store credit. Whichever path runs, a final agent writes the answer (Google Cloud). That mix of a parallel check plus a conditional branch to two different downstream processes is the textbook case for custom logic.

When to use composite and custom logic

Use a composite pattern when a single task naturally spans several shapes: a decision, then concurrent work, then a quality check. Most production systems land here, because real requests are not uniform.

Use the custom logic pattern specifically when you need fine-grained control or your workflow does not fit any standard pattern (Google Cloud). It is the right tool for complex, branching logic that mixes predefined rules with model reasoning.

When not to use it

Do not combine patterns before a single one has failed you. Google's own advice is to start simple, get a sequential chain working, debug it, and only then add complexity (Google Developers). A composite system built on day one is a debugging nightmare with no baseline to compare against.

And do not jump to custom logic when a supported pattern already fits. Hand-rolled orchestration is the most work to maintain, so use it only when the standard shapes genuinely cannot express your flow.

Known problems

The cost of custom logic is ownership. Google is explicit: this approach increases development and maintenance complexity, because you are responsible for designing, implementing, and debugging the entire orchestration flow, which is more error-prone than a predefined pattern supported by a tool like ADK (Google Cloud).

Composite systems also inherit every failure mode of their parts. A composite that includes a loop can loop forever, one that includes parallel work can hit race conditions, and one that routes can misroute. The more patterns you stack, the more places there are to fail, which is why the discipline of adding one at a time matters so much.

Three pro tips before you combine patterns

Google closes its guide with three that apply directly here (Google Developers):

Treat state as your whiteboard. Use descriptive keys when one agent writes output, so downstream agents know exactly what they are reading.
Write precise descriptions. In any routing step, a sub-agent's description is the documentation the model uses to decide. Be exact.
Start simple. Do not build a nested loop system on day one. Get a chain working, then add complexity in measured steps.

FAQ

What is a composite agent pattern?
A composite pattern combines basic patterns, such as routing, parallelization, and a critic loop, into one system. Most production agents are composites.

When should I use custom logic instead of a standard pattern?
When your workflow needs branching that no single pattern expresses, or when you need fine-grained control that mixes coded rules with model reasoning.

Why not just start with a composite system?
Because you lose the ability to debug. Start with one pattern, prove it, then add the next. A day-one composite has no baseline and hides its bugs.

Do composites cost more?
Usually, since they run more patterns and more model calls. They also inherit the failure modes of every pattern they include, so plan safeguards for each.

Sources

Google Developers Blog, Developer's guide to multi-agent patterns in ADK: https://developers.googleblog.com/developers-guide-to-multi-agent-patterns-in-adk/
Google Cloud Architecture Center, Choose a design pattern for your agentic AI system: https://docs.cloud.google.com/architecture/choose-design-pattern-agentic-ai-system
Anthropic, Building Effective Agents: https://www.anthropic.com/engineering/building-effective-agents

The Swarm Pattern: Peer Agents That Debate and Converge

Sayed Ali Alkamel — Fri, 17 Jul 2026 19:22:24 +0000

Short version: In a swarm, several specialized agents talk to each other directly, share findings, and refine a solution together, with no central orchestrator. Google names it in its Cloud Architecture Center guide as the most powerful and the most expensive multi-agent pattern. Reach for it only when a problem truly benefits from debate.

What is the swarm pattern?

The swarm pattern uses a collaborative, all-to-all communication approach, where multiple specialized agents work together to iteratively refine a solution to a complex problem (Google Cloud). The defining feature is that each agent can communicate with every other agent, sharing findings, critiquing proposals, and building on each other's work.

This is what sets a swarm apart from the coordinator pattern. A swarm typically has no central supervisor keeping the process on track. A dispatcher agent routes the initial request and facilitates communication, but it does not orchestrate the workflow the way a coordinator does (Google Cloud).

How it actually works

A dispatcher interprets the request and decides which agent should start. From there, any agent can hand the task to another it judges better suited to the next step, or return the final answer to the user through the dispatcher (Google Cloud).

Because there is no orchestrator to stop the process, you must define an explicit exit condition. Google is clear that this is often a maximum number of iterations, a time limit, or the achievement of a specific goal such as reaching consensus (Google Cloud). Without one, the swarm has no natural place to end.

When to use it

Use a swarm for ambiguous or highly complex problems that benefit from debate and iterative refinement (Google Cloud). Google's example is product design: a market researcher agent, an engineering agent, and a financial modeling agent share ideas, debate the trade-offs between features and cost, and converge on a specification that balances the competing requirements.

The value is the synthesis of multiple expert perspectives. When the answer depends on several specialists genuinely reacting to each other, a swarm can produce results a single agent or a rigid pipeline cannot.

When not to use it

Do not use a swarm when a simpler pattern fits. If the task has a known structure, a sequential pipeline or a coordinator is cheaper and far easier to control. Do not use it when you need predictable latency or cost, because dynamic, all-to-all conversation is neither. And do not use it when you cannot define a solid exit condition, since an open-ended debate with no stopping rule is a recipe for runaway cost.

Google's own comparison table flags the profile plainly: high latency and operational cost due to dynamic, all-to-all communication between agents (Google Cloud). That is the price of admission.

Known problems

The swarm is the most complex and costly multi-agent pattern to implement (Google Cloud). Two problems drive that.

First, convergence is not guaranteed. Because no agent uses a model to orchestrate, the pattern can fall into unproductive loops or fail to converge on a solution (Google Cloud). Agents can talk past each other without ever settling.

Second, the communication itself is hard to manage. You have to design sophisticated logic to control the inter-agent communication, manage the iterative workflow, and handle the significant cost and latency of a multi-turn conversation between agents (Google Cloud). This is the pattern where multi-agent designs most often turn fragile, so many teams keep it as a last resort rather than a default.

Three things to know before you start

Define an exit condition first. With no orchestrator, an iteration cap, a time limit, or a consensus goal is the only thing that ends the run.
Expect the highest bill in the catalog. All-to-all conversation multiplies model calls. Budget accordingly.
Try a coordinator first. If a central agent can route and combine, you probably do not need a swarm. Use the swarm only when debate is the point.

FAQ

How is a swarm different from a coordinator?
A coordinator uses a model to orchestrate and route tasks. A swarm has no orchestrator; peers communicate all-to-all and hand off to each other, with a dispatcher only facilitating.

Why is the swarm so expensive?
Dynamic, all-to-all communication means many model calls across a multi-turn conversation, which drives both latency and cost higher than any other pattern here.

How do I stop a swarm from looping forever?
Define an explicit exit condition: a maximum number of iterations, a time limit, or reaching consensus. Without one, it may not converge.

Is swarm an Anthropic pattern too?
No. Google names it in its Cloud Architecture Center guide. Anthropic does not list a swarm in Building Effective Agents.

Sources

Google Cloud Architecture Center, Choose a design pattern for your agentic AI system: https://docs.cloud.google.com/architecture/choose-design-pattern-agentic-ai-system
Google Developers Blog, Developer's guide to multi-agent patterns in ADK: https://developers.googleblog.com/developers-guide-to-multi-agent-patterns-in-adk/
Google Agent Development Kit docs: https://google.github.io/adk-docs/

Human-in-the-Loop: Let a Person Authorize the Risky Step

Sayed Ali Alkamel — Fri, 17 Jul 2026 19:22:12 +0000

Short version: Human-in-the-loop inserts a checkpoint where the agent pauses and waits for a person to approve, correct, or add input before it continues. Google names it as a first-class pattern, and Anthropic builds the same checkpoints into its agents. Use it whenever an action is irreversible or high-stakes.

What is the human-in-the-loop pattern?

The human-in-the-loop pattern integrates points for human intervention directly into an agent's workflow. At a predefined checkpoint, the agent pauses and calls an external system to wait for a person to review its work, so the person can approve a decision, correct an error, or provide input before the agent continues (Google Cloud).

Anthropic describes the same behavior for autonomous agents: they can pause for human feedback at checkpoints or when they hit a blocker (Anthropic). It is less a separate architecture and more a control you add to any pattern.

How it actually works

The agent runs normally until it reaches a step that needs sign-off. It then calls an approval step that halts execution and requests a human decision. In Google ADK you implement this with a custom tool: the agent calls an approval tool that pauses execution or triggers an external system to request human intervention (Google Developers).

Google's example is a transaction agent that handles routine work but calls an approval tool for high-stakes checks, which pauses and waits for a human reviewer to say yes or no (Google Developers).

# Google ADK, sketch
agent = LlmAgent(name="TransactionAgent",
                 instruction="Handle routine work. If high stakes, call ApprovalTool.",
                 tools=[ApprovalTool])

When to use it

Use human-in-the-loop for tasks that need human oversight, subjective judgment, or final approval on critical actions (Google Cloud). Google is specific about which actions qualify: executing financial transactions, deploying code to production, or acting on sensitive data rather than merely processing it (Google Developers).

A concrete case: an agent that anonymizes a patient dataset redacts the protected information automatically, then pauses for a human compliance officer to validate and approve the release, so no sensitive data leaks (Google Cloud). The pattern shows up in coding too, where Anthropic notes that automated tests verify functionality but human review remains crucial for aligning a solution with broader system requirements (Anthropic).

When not to use it

Do not add a human checkpoint to routine, low-stakes, reversible actions. A pause there just adds latency and annoys the user for no safety gain. Do not use it where you need fully automated throughput and the risk is genuinely low. And do not treat it as a substitute for good guardrails; it is the last line, not the only line.

The judgment call is honest: every checkpoint trades speed and autonomy for safety. Put them only where the downside of a wrong action is worse than the delay.

Known problems

The main cost is engineering. Google notes this pattern can add significant architectural complexity, because you have to build and maintain the external system for user interaction (Google Cloud). Pausing an agent, notifying a person, holding state while you wait, and resuming cleanly is real infrastructure.

The second issue is human factors. If checkpoints fire too often, reviewers rubber-stamp them, and the safety benefit evaporates. Place checkpoints where judgment actually matters, so each one gets real attention.

Three things to know before you start

Reserve it for irreversible actions. Money moving, code shipping, sensitive data leaving. That is the bar Google draws.
Design the pause and resume. Holding state while a human decides, then continuing, is the hard part. Plan it before you add the checkpoint.
Do not over-gate. Too many approvals train reviewers to click through. Fewer, well-placed checkpoints stay meaningful.

FAQ

What is a human-in-the-loop agent?
An agent that pauses at a checkpoint and waits for a person to approve, correct, or add input before it continues, usually for high-stakes steps.

When should an agent pause for a human?
Before irreversible or high-stakes actions: financial transactions, production deploys, or acting on sensitive data. Routine steps should not pause.

Is human-in-the-loop a separate pattern or an add-on?
Both. Google lists it as a pattern, but in practice you layer it onto other patterns wherever a step carries real risk.

Does it slow the agent down?
Yes, by design. You accept the delay at specific checkpoints in exchange for safety and accountability.

Sources

Google Cloud Architecture Center, Choose a design pattern for your agentic AI system: https://docs.cloud.google.com/architecture/choose-design-pattern-agentic-ai-system
Google Developers Blog, Developer's guide to multi-agent patterns in ADK: https://developers.googleblog.com/developers-guide-to-multi-agent-patterns-in-adk/
Anthropic, Building Effective Agents: https://www.anthropic.com/engineering/building-effective-agents

Autonomous Agents and the ReAct Loop: When the Model Owns Control

Sayed Ali Alkamel — Fri, 17 Jul 2026 19:21:55 +0000

Short version: An autonomous agent is a model using tools in a loop, deciding its own next step from what it observes. Anthropic calls it an agent; Google calls the core loop ReAct: thought, action, observation. It is the most flexible pattern and the most expensive, so you use it only when you cannot hardcode the path.

What is an autonomous agent?

Agents are systems where the model dynamically directs its own processes and tool usage, keeping control over how it accomplishes a task (Anthropic). That is the line that separates an agent from a workflow: in a workflow, code owns the control flow, but here the model owns it.

Google names the loop that powers this the ReAct pattern, after the 2022 research paper. The agent runs an iterative loop of thought, action, and observation until an exit condition is met (Google Cloud, ReAct paper).

The loop, step by step

Google breaks ReAct into three moves (Google Cloud):

Thought: the model reasons about the task and decides what to do next, judging whether the request is fully answered.
Action: it either picks a tool and forms a query to gather more information, or, if the task is done, writes the final answer and ends the loop.
Observation: it reads the tool output and saves what matters to memory, so it can build on past observations instead of repeating itself.

Anthropic frames the same idea and stresses one thing: at each step the agent must gain ground truth from the environment, such as tool results or code execution, to assess its progress (Anthropic). The loop ends on completion, or on a stopping condition like a maximum number of iterations.

How it actually works

An agent is, in Anthropic's words, typically just an LLM using tools based on environmental feedback in a loop. Because so much rides on those tools, Anthropic spent more time optimizing tools than prompts when building its coding agent, and small changes mattered: switching to absolute file paths removed a whole class of mistakes (Anthropic). Google adds a debugging benefit: the model's thinking gives you a transcript of its reasoning, which helps you see where it went wrong (Google Cloud).

When to use it

Use an autonomous agent for open-ended problems where you cannot predict the number of steps and cannot hardcode a fixed path (Anthropic). The agent may run for many turns, so you need some trust in its decisions. Anthropic's own examples are a coding agent that resolves real GitHub issues across many files, and a computer-use agent that operates a desktop to finish tasks.

Google's fit is the same: complex, dynamic tasks that need continuous planning and adaptation, such as a robotics agent that recomputes its path as new obstacles appear (Google Cloud). If the constraints change while the agent works, the loop lets it adjust.

When not to use it

Do not use an autonomous agent when a workflow will do. If the steps are predictable, a fixed pipeline is cheaper, faster, and easier to trust. Do not use it for high-frequency, low-complexity tasks, where deterministic code beats both workflows and agents on cost and latency. And avoid it in untrusted environments without strong guardrails, because autonomy plus a risky action surface is how small mistakes become big ones.

Known problems

Autonomy has a price. Anthropic is direct: the autonomous nature of agents means higher costs and the potential for compounding errors, so it recommends extensive testing in sandboxed environments with appropriate guardrails (Anthropic).

Google names the two specific risks. The multi-step loop can raise end-to-end latency compared to a single query. And the agent's quality depends heavily on the model's reasoning, so an error or a misleading tool result in one observation can propagate and make the final answer wrong (Google Cloud). One bad observation early can derail everything after it.

Three things to know before you start

Set a stopping condition. A maximum iteration count keeps a wandering agent from running forever.
Invest in tools, not just prompts. The agent acts through tools, so a clear, mistake-proof tool interface is where reliability comes from.
Sandbox first. Test in an isolated environment with guardrails before you let an agent act on anything that matters.

FAQ

What is the ReAct pattern?
ReAct is the reasoning loop behind autonomous agents: the model thinks, takes an action such as a tool call, observes the result, and repeats until it has an answer (ReAct paper).

How is an agent different from a workflow?
In a workflow, the control flow is fixed in code. In an agent, the model decides the control flow at runtime. That flexibility costs more and is harder to debug.

Why do autonomous agents cost more?
They run many turns, each a model call, and every call adds latency and cost. Errors can also compound across steps, requiring more testing.

How do I keep an autonomous agent safe?
Test in a sandbox, add guardrails, set a maximum iteration count, and keep humans in the loop for high-stakes actions.

Sources

Anthropic, Building Effective Agents: https://www.anthropic.com/engineering/building-effective-agents
Google Cloud Architecture Center, Choose a design pattern for your agentic AI system: https://docs.cloud.google.com/architecture/choose-design-pattern-agentic-ai-system
Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models: https://arxiv.org/abs/2210.03629

Evaluator-Optimizer: Generate, Critique, and Refine in a Loop

Sayed Ali Alkamel — Wed, 15 Jul 2026 17:49:22 +0000

Short version: The evaluator-optimizer pattern has one agent generate output while another evaluates it and gives feedback, in a loop. Anthropic calls it evaluator-optimizer. Google splits the same idea into a generator-critic loop for correctness and an iterative refinement loop for quality. Both live or die by their exit condition.

What is the evaluator-optimizer?

In the evaluator-optimizer workflow, one LLM call generates a response while another provides evaluation and feedback in a loop (Anthropic). Google frames it as two specialized agents: a generator creates an output, and a critic evaluates it against set criteria, then approves it, rejects it, or returns it with feedback for revision (Google Cloud).

It mirrors how a writer works: draft, get notes, revise, repeat until it is good.

Two flavors of the same loop

Google draws a line worth keeping (Google Developers):

Generator-critic focuses on correctness. The critic returns a pass or fail against hard criteria, such as valid syntax or passing unit tests. On a pass, the loop breaks. On a fail, specific feedback goes back to the generator.
Iterative refinement focuses on quality. A generator drafts, a critic suggests optimizations, and a refiner rewrites. It repeats to polish, not just to pass.

Both are implementations of a loop agent, and both need a way out (Google Cloud).

# Google ADK, sketch
generator = LlmAgent(name="Generator", instruction="Draft. If {feedback}, fix it.", output_key="draft")
critic    = LlmAgent(name="Critic", instruction="Check {draft}. Output PASS or the errors.", output_key="feedback")
loop      = LoopAgent(sub_agents=[generator, critic], exit_condition="PASS")

How it actually works

The generator produces a draft. The critic evaluates it against explicit criteria and either approves it or sends feedback. The loop repeats until the output passes or hits an iteration cap. In ADK you can set a hard limit with a maximum iteration count, and an agent can also signal early completion when the quality bar is met before the cap (Google Developers).

When to use it

Anthropic gives two clear signals of fit: an answer that measurably improves when a human articulates feedback, and a model that can produce that kind of feedback itself (Anthropic). If both hold, a critic loop tends to pay off. Anthropic's example is literary translation, where the first pass misses nuance an evaluator can catch, and complex search, where the evaluator decides whether more searching is warranted.

Google's version fits tasks where output must be highly accurate or must meet strict constraints before use, such as a generator writing code and a critic auditing it for vulnerabilities or checking that it passes tests (Google Cloud).

When not to use it

Do not use it when you have no reliable way to judge good from bad. The loop becomes circular when the evaluator cannot tell a good output from a weak one, and you spend calls without converging. Do not use it when a single pass is already good enough, since the extra critic call is pure overhead. And avoid it for latency-critical paths, because each cycle adds at least one more model call.

Known problems

The direct cost is latency and money. The workflow needs at least one extra model call for the critic, and if revision loops kick in, both latency and cost accumulate with each iteration (Google Cloud).

The dangerous failure is the loop that never ends. Google warns that if the termination condition is not defined correctly, or the subagents never produce the state that stops the loop, it can run indefinitely, leading to excessive cost, high resource use, and potential system hangs (Google Cloud). A maximum iteration count is not optional, it is the safety belt.

Three things to know before you start

Always cap the iterations. A hard limit prevents runaway cost and hangs, even when your quality check is imperfect.
Make the criteria explicit. The critic can only enforce standards you spell out. Vague criteria produce vague critiques.
Pick your flavor. Pass or fail for correctness, iterative rewrite for quality. They want different exit conditions.

FAQ

Is evaluator-optimizer the same as generator-critic?
They are the same loop. Anthropic uses "evaluator-optimizer." Google splits it into a generator-critic loop for correctness and iterative refinement for quality.

How do I avoid an infinite loop?
Set a maximum number of iterations and a clear exit condition. Optionally let the agent signal early completion when the quality bar is met.

When does this pattern actually help?
When feedback demonstrably improves the output and the model can generate that feedback. If either is missing, the loop wastes calls.

Is this the same as chain-of-thought?
No. Chain-of-thought is reasoning inside one call. This pattern uses a separate evaluation step and can loop across multiple calls.

Sources

Anthropic, Building Effective Agents: https://www.anthropic.com/engineering/building-effective-agents
Google Developers Blog, Developer's guide to multi-agent patterns in ADK: https://developers.googleblog.com/developers-guide-to-multi-agent-patterns-in-adk/
Google Cloud Architecture Center, Choose a design pattern for your agentic AI system: https://docs.cloud.google.com/architecture/choose-design-pattern-agentic-ai-system

Orchestrator-Workers: Decide the Subtasks at Runtime

Sayed Ali Alkamel — Mon, 13 Jul 2026 08:46:09 +0000

Short version: In orchestrator-workers, a central agent breaks a task into subtasks at runtime, delegates them to worker agents, and synthesizes the results. Anthropic calls it orchestrator-workers; Google calls it hierarchical task decomposition. The defining trait is that the subtasks are not known in advance.

What is orchestrator-workers?

In the orchestrator-workers workflow, a central LLM dynamically breaks down tasks, delegates them to worker LLMs, and synthesizes their results (Anthropic). Google's hierarchical task decomposition organizes agents into a multi-level hierarchy: a root agent receives a complex task, decomposes it into smaller subtasks, and delegates each to a specialized subagent, which can decompose further (Google Cloud).

The one line to remember is Anthropic's: this looks like parallelization, but the subtasks are not pre-defined, they are determined by the orchestrator based on the specific input (Anthropic).

How it actually works

The orchestrator reads the task, decides what subtasks it needs right now, and hands each to a worker. Google's ADK guide shows a report writer that does not research anything itself. It delegates to a research assistant, which in turn manages its own web-search and summarizer tools (Google Developers).

A neat trick makes this composable: you can wrap a whole sub-agent as a tool. In ADK, wrapping an agent in an agent-tool lets the parent call the entire sub-workflow as if it were a single function (Google Developers).

# Google ADK, sketch
research = LlmAgent(name="ResearchAssistant",
                    sub_agents=[web_search_agent, summarizer_agent])
report   = LlmAgent(name="ReportWriter",
                    instruction="Write a report. Use ResearchAssistant to gather info.",
                    tools=[AgentTool(research)])

When to use it

Use orchestrator-workers when you cannot predict the subtasks a task will need. Anthropic's example is coding: the number of files to change and the nature of each change depend on the task, so you cannot hardcode them (Anthropic). Search is similar, where you gather and analyze from sources you do not know ahead of time.

Google positions hierarchical decomposition for ambiguous, open-ended problems that need multi-step reasoning: research, planning, and synthesis. A coordinator decomposes a research project into gathering, analysis, and writing, then delegates each to a specialist (Google Cloud). Decomposing the ambiguity is the whole point.

When not to use it

Do not reach for this when the subtasks are fixed. If you already know the steps, use parallelization or a chain, both of which are cheaper and easier to debug. Do not use it for simple tasks, because the multi-level structure adds real overhead you will not recover. And if a single agent with good tools already handles the task, the hierarchy is premature.

Known problems

The cost is complexity, on two fronts. The multi-level structure adds considerable architectural complexity, which makes the system harder to design, debug, and maintain (Google Cloud). And the layers of delegation and reasoning produce a high number of model calls, which significantly increases both latency and operational cost (Google Cloud).

There is a subtler risk when this pattern is pushed toward autonomous multi-agent territory. When parallel workers share implicit context they do not actually have, they can make conflicting choices that do not compose, which is why some teams keep writes single-threaded even in a hierarchy. The safe version keeps the orchestrator firmly in charge of how the pieces fit back together.

Three things to know before you start

This is dynamic, not parallel. If you can list the subtasks in advance, you do not need an orchestrator, you need a fan-out.
Wrap sub-agents as tools. Treating a sub-workflow as a single callable keeps the parent's reasoning clean and the system composable.
Budget for the model calls. Each layer multiplies calls. Watch cost and latency before you add another level.

FAQ

How is orchestrator-workers different from parallelization?
Parallelization uses fixed, predefined subtasks. Orchestrator-workers decides the subtasks at runtime based on the input. Same shape on a diagram, different flexibility.

Is hierarchical task decomposition the same pattern?
Yes, it is Google's name for it, extended across multiple levels. A root agent decomposes and delegates, and subagents can decompose further.

Why is this pattern so expensive?
Every layer of decomposition and delegation adds model calls, and those calls drive latency and cost. Deep hierarchies multiply the effect.

When does the model decide versus the code?
Here the orchestrator model decides the subtasks. That is what separates it from a coded pipeline, where the control flow is fixed.

Sources

Anthropic, Building Effective Agents: https://www.anthropic.com/engineering/building-effective-agents
Google Developers Blog, Developer's guide to multi-agent patterns in ADK: https://developers.googleblog.com/developers-guide-to-multi-agent-patterns-in-adk/
Google Cloud Architecture Center, Choose a design pattern for your agentic AI system: https://docs.cloud.google.com/architecture/choose-design-pattern-agentic-ai-system

Parallelization: Run Subtasks at Once, Then Merge

Sayed Ali Alkamel — Thu, 09 Jul 2026 18:35:30 +0000

Short version: Parallelization runs independent subtasks at the same time and aggregates their outputs. Anthropic splits it into sectioning and voting, and Google calls it the parallel fan-out and gather pattern. Use it for speed, or to get several perspectives on the same problem for higher confidence.

What is parallelization?

Parallelization has LLMs work simultaneously on a task and then aggregates their outputs programmatically (Anthropic). Google's parallel pattern runs multiple specialized subagents on a task or subtasks at the same time, then synthesizes their outputs into one consolidated response (Google Cloud).

Anthropic names two variations (Anthropic):

Sectioning: break a task into independent subtasks and run them in parallel.
Voting: run the same task several times to get diverse outputs, then combine them.

How it actually works

A dispatcher fans the work out, the workers run concurrently, and a final synthesizer gathers the results. Google's canonical example is automated code review: spawn a security auditor, a style enforcer, and a performance analyst on a pull request at the same time, then have a synthesizer combine their feedback into one review comment (Google Developers).

One warning from Google's ADK guide: parallel agents run in separate threads but share session state, so each agent must write to a unique key to avoid race conditions (Google Developers).

# Google ADK, sketch
sec  = LlmAgent(name="Security", output_key="security_report")
sty  = LlmAgent(name="Style", output_key="style_report")
perf = LlmAgent(name="Performance", output_key="performance_report")
swarm = ParallelAgent(sub_agents=[sec, sty, perf])
merge = LlmAgent(name="Synthesizer",
                 instruction="Combine {security_report}, {style_report}, {performance_report}.")
workflow = SequentialAgent(sub_agents=[swarm, merge])

When to use it

Use parallelization when the subtasks are independent and can run at once for speed, or when several attempts raise your confidence in the result (Anthropic). For complex tasks with many considerations, models often do better when each consideration gets its own focused call rather than one prompt trying to hold all of them.

Two Anthropic examples make the split concrete. Sectioning fits guardrails: one model handles the user query while another screens it for problems, which beats making one call do both. Voting fits code review for vulnerabilities: several prompts each look for a problem, and you flag the code if any of them find one (Anthropic).

When not to use it

Do not parallelize when the subtasks depend on each other. If step two needs step one's output, they are a chain, not a fan-out. Do not use it when the subtasks are decided at runtime rather than fixed. That is orchestrator-workers, which looks similar but lets the lead agent choose the subtasks. And if a single call is fast and good enough, running many calls just multiplies cost.

Known problems

The clearest cost is exactly that: running multiple agents at once raises immediate resource use and token consumption, which drives up operational cost (Google Cloud). You trade money for latency.

The harder problem is the gather step. Google notes that synthesizing potentially conflicting results requires complex logic, which adds to development and maintenance overhead (Google Cloud). When two workers disagree, someone has to decide who wins, and that logic is where parallel systems get messy. Add the shared-state race condition risk, and careful key management becomes non-negotiable.

Three things to know before you start

Give every worker a unique output key. Shared state plus concurrent writes equals race conditions. Distinct keys prevent them.
Design the merge first. The synthesizer is the hard part, not the fan-out. Decide up front how you resolve conflicts.
Know your variant. Sectioning splits different subtasks, and voting repeats the same task for confidence. They solve different problems.

FAQ

What is the difference between sectioning and voting?
Sectioning breaks a task into different independent subtasks that run in parallel. Voting runs the same task multiple times and combines the outputs to raise confidence.

Is parallelization the same as orchestrator-workers?
They look alike, but parallelization uses fixed, predefined subtasks, while orchestrator-workers decides the subtasks at runtime. Fixed versus dynamic is the dividing line.

Does parallelization always cost more?
It uses more tokens and compute at once, so it usually costs more per request. You accept that to gain speed or higher confidence.

How do I combine conflicting outputs?
That is the gather step, and it needs deliberate logic: a vote threshold, a priority order, or a synthesizer prompt. Plan it before you build the fan-out.

Sources

Anthropic, Building Effective Agents: https://www.anthropic.com/engineering/building-effective-agents
Google Developers Blog, Developer's guide to multi-agent patterns in ADK: https://developers.googleblog.com/developers-guide-to-multi-agent-patterns-in-adk/
Google Cloud Architecture Center, Choose a design pattern for your agentic AI system: https://docs.cloud.google.com/architecture/choose-design-pattern-agentic-ai-system

Routing: Classify the Input, Then Send It to a Specialist

Sayed Ali Alkamel — Wed, 08 Jul 2026 17:18:04 +0000

Short version: Routing classifies an input and hands it to the specialist best suited to it. Anthropic calls it routing, and Google calls it the coordinator or dispatcher pattern. It lets you write focused prompts for each case instead of one bloated prompt that tries to do everything.

What is routing?

Routing classifies an input and directs it to a specialized followup task (Anthropic). Google's coordinator pattern describes the same move: a central agent analyzes a request, decomposes it, and dispatches it to the specialized agent that handles that function (Google Cloud).

The reason it works is separation of concerns. Without routing, tuning a single prompt for one kind of input tends to hurt performance on the others (Anthropic). Splitting lets each specialist stay sharp.

How it actually works

A classifier sits at the front. It can be an LLM or a plain classification model, whichever labels the input accurately. The label picks the downstream prompt, tools, and even the model size.

In Google ADK this is model-driven delegation. You define a coordinator with a list of specialist sub-agents, and the framework transfers execution based on each specialist's description (Google Developers). The description field is doing real work here: it is effectively the API doc the model reads to decide where to send the request.

# Google ADK, sketch
billing = LlmAgent(name="Billing", description="Handles invoices and billing.")
tech    = LlmAgent(name="TechSupport", description="Troubleshoots technical issues.")
router  = LlmAgent(name="Coordinator",
                   instruction="Route billing issues to Billing and bugs to TechSupport.",
                   sub_agents=[billing, tech])

A second win: route by difficulty

Routing is not only about topic. Anthropic points out you can route easy, common questions to a small, cost-efficient model and hard, unusual ones to a larger model (Anthropic). That single decision can cut cost sharply without hurting the answers that actually need the big model.

When to use it

Use routing when your inputs fall into distinct categories that are better handled separately, and when classification can be done accurately (Anthropic). Customer service is the canonical case: general questions, refunds, and technical support each want their own prompt, tools, and process. Google frames the fit as structured business processes that need adaptive routing, like sending an order-status, return, or refund request to the right specialist (Google Cloud).

When not to use it

Skip routing when the categories are fuzzy or the classifier is unreliable. A wrong label sends the whole request down the wrong path, and the specialist has no way to know. If you cannot classify accurately, a single flexible agent may do better.

Also skip it when one prompt already handles every input well. Routing adds at least one extra decision step, so if there is nothing to separate, you are paying for structure you do not need. And note the difference from parallelization: routing picks one destination, it does not run several at once.

Known problems

The main cost is more model calls. Because the coordinator and each specialist rely on a model to reason, this pattern makes more calls than a single agent, which raises token throughput, cost, and overall latency (Google Cloud). You get higher-quality, more focused answers, but you pay for the routing step.

The second problem is misrouting. Classification is never perfect, and a confident wrong route is worse than a hedge. Weak specialist descriptions make this worse, because the model routes on those descriptions. Vague descriptions cause quiet, hard-to-trace errors.

Three things to know before you start

Write specialist descriptions like API docs. In model-driven routing, the description is how the model decides. Be precise about what each specialist does and does not handle.
Route by cost, not just topic. Sending easy requests to a smaller model is one of the cheapest wins in this whole catalog.
Log the label. When an answer is wrong, the first question is whether the route was wrong. Store the classification so you can find out.

FAQ

Is routing the same as the coordinator pattern?
Yes. Anthropic's routing and Google's coordinator or dispatcher pattern both classify an input and send it to one specialist.

How is routing different from parallelization?
Routing sends an input to one path. Parallelization runs several paths at once and merges the results. Routing chooses, parallelization spreads.

Do I need an LLM to do the routing?
No. The classifier can be a traditional model or algorithm if that labels your inputs accurately. Use whatever is reliable.

What happens if the router misclassifies?
The request goes to the wrong specialist and usually fails silently. Accurate classification and clear specialist boundaries are what keep this pattern safe.

Sources

Anthropic, Building Effective Agents: https://www.anthropic.com/engineering/building-effective-agents
Google Developers Blog, Developer's guide to multi-agent patterns in ADK: https://developers.googleblog.com/developers-guide-to-multi-agent-patterns-in-adk/
Google Cloud Architecture Center, Choose a design pattern for your agentic AI system: https://docs.cloud.google.com/architecture/choose-design-pattern-agentic-ai-system