DEV Community: Daathwi Naagh

What Would Gemma4 Look Like as a Human?

Daathwi Naagh — Sat, 23 May 2026 20:35:34 +0000

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

I couldn't stop thinking about this question. So I built the answer.

Hear me out.

Every time a new model drops, we do the same thing. We look at the benchmarks. We run a few prompts. We compare it to the last one. We move on.

But I've been sitting with a different question lately one that I think gets closer to what's actually happening with Gemma 4:

If this model were a person, what kind of person would it be?

Not as a metaphor. As a serious design exercise. Because if you look closely at what Gemma 4 can do, really look, you'll find that Google DeepMind didn't just release a language model. They assembled something that maps, piece by piece, onto the full architecture of a human being.

A brain that thinks before it speaks. Eyes that read the world. Ears that hear any language. A mouth that answers in yours. Hands that reach out and do work. And the ability to learn, really learn from the domain you put in front of it.

Let's build this person. From scratch. One piece at a time.

The Brain — `<|think|>`

What kind of person never thinks before they speak? Not a trustworthy one.

Every person you've ever relied on a good doctor, a careful lawyer, a thoughtful friend, shares one quality: they don't just react. They deliberate. They weigh what they know, consider the edge cases, check themselves before they answer.

Gemma 4's brain works exactly this way. Drop one token into your system prompt:

<|turn>system
<|think|> You are a careful, expert reasoner.<turn|>

And before the model says a word to the user, it opens a private channel:

<|channel>thought
...weighing the possibilities...
checking edge cases...
cross-referencing what it knows...
<channel|>

This is the model talking to itself. The way you work through a hard problem in your head before saying anything out loud. Internal. Private. Honest. The user never sees it they only get the answer that survived the thinking.

The benchmarks tell you how well that thinking works. 89.2% on AIME 2026 math problems. 84.3% on GPQA Diamond — a benchmark designed to stump PhD-level experts. That's not a system that pattern-matches its way to answers. That's a system that actually reasons.

And you can tune how hard it thinks. Use a system instruction to push it toward deeper deliberation on complex problems, lighter thinking on simple ones. The docs call it "adaptive thought efficiency." A person who knows when to try hard and when to be quick.

This person thinks before they speak. That already makes them rare.

The Brain Learns — Fine-tuning

A person who can't be taught is just a statue with opinions.

Here's what separates a brilliant person from a brilliant colleague: the colleague has learned your context. Your terminology. Your domain's quirks. The way your particular community talks about the things that matter to it.

The base Gemma 4 model is brilliant but general. Fine-tuning is how it becomes yours.

LoRA attaches small trainable adapters to specific layers like installing a new module without touching the underlying architecture. The base intelligence stays intact. The specialization layers on top. Runs on a GPU most developers already own.

QLoRA shrinks the base weights first, then applies LoRA on top. Fine-tuning on a consumer GPU. A hospital can teach this person to speak their clinical documentation format. A regional newsroom can teach them their style guide.

Full fine-tuning rebuilds every layer around your domain. Reserved for when you need someone who doesn't just know your field they are your field.

A general model knows what a medical record looks like. A fine-tuned model knows what your hospital's records look like. A general model can speak Hindi. A fine-tuned model speaks your community's Hindi its idioms, its register, its warmth.

The community has already shown what this looks like at scale. Over 100,000 fine-tuned variants of the Gemma family exist today. 100,000 specialized people. Each one shaped by someone who looked at the base model and said: I can make this more useful for my corner of the world.

You can be the 100,001st.

This person doesn't just know things. They learn your things.

The Eyes — `<|image|>`

A person who can only process text is missing most of the world.

The real world isn't text. It's a handwritten note on a whiteboard. A chart in a research paper. A screenshot of a broken UI. A scanned form with faded ink. A wound on an animal in a field.

<|turn>user
Describe this image: <|image|><turn|>

That <|image|> token is where pixels become meaning. Gemma 4 handles object detection, document and PDF parsing, UI understanding, chart comprehension, OCR across languages, and handwriting recognition.

And like a human, it doesn't see everything at the same zoom level. You squint to read small print. You glance at a landscape. Gemma 4 adjusts through a configurable visual token budget:

Token budget	What it's like
70	A quick glance
280	Normal reading
1120	Leaning in, reading every word

On MMMU Pro — multimodal reasoning — the 31B scores 76.9%. On OmniDocBench for document parsing, an edit distance of 0.131. Near-perfect.

This person doesn't just read. They look.

The Ears — `<|audio|>`

A person who can't hear you has already failed half the conversation.

The E2B and E4B models — built to run on phones and laptops — have ears. Real ones.

<|turn>user
a. <|audio|>
b. <|audio|><turn|>

Pass raw audio bytes to the model and it hears what was said. Not just transcribes — understands. And translates.

Transcribe the following speech segment in Hindi,
then translate it into English.

That's the whole instruction. The model hears it, transcribes it in Hindi, renders it in English. In one pass. On one device. No network call.

On FLEURS, the E4B scores 0.08 word error rate — near-perfect speech recognition. On CoVoST for translation, 35.54 BLEU score.

Ears that work across 140 languages. Ears that handle accents. Ears that don't need the internet to function.

This person hears you — in whatever language you actually speak.

The Mouth — Text generation + TTS

Intelligence that can't communicate isn't intelligence. It's a locked room.

Gemma 4 generates text. But text is the raw material of voice. Pipe its output into any TTS engine and this person speaks — in the same 140+ languages they were trained on, delivered back in the language the question came in.

You ask in Tamil. It thinks in Tamil. It responds in Tamil. It speaks to you in Tamil.

This is what a mouth does. It takes what the brain worked out and makes it real for someone else — in the language they think in, not the language that was convenient to build for.

This person answers you in your language. Not theirs.

The Hands — Function Calling

A thinker who can't act is just a philosopher. A person with hands changes things.

A brilliant person without the ability to do anything is ultimately useless in a crisis. What makes someone powerful is that they can reach out — run a search, check a database, file a form, call a service, place an order.

Gemma 4's hands are its function calling system. Define a tool, and when the model decides it needs it, it reaches out, executes the function, reads the result, and answers naturally.

The thinking and the tool-calling are woven together. In a single agentic turn, this person can reason privately about which tool to reach for before they reach. No seams. One continuous loop of thought and action.

The full lifecycle of a person solving a problem:

Someone asks a question
They think privately about what they need
They reach out to get the information
They get it back
They answer

This person doesn't just know things. They go and find them.

Choosing Your Person: The Versions of Gemma 4

Here's the part that makes Gemma 4 genuinely unusual: this person comes in four sizes, running on everything from a mid-range phone to a workstation. Same DNA. Different scale.

	E2B	E4B	26B A4B (MoE)	31B Dense
Lives on	Phone	Laptop / tablet	Consumer GPU	Workstation
RAM needed	~4 GB	~8 GB	~14 GB	~19 GB
Eyes	✅	✅	✅	✅
Ears	✅ Native	✅ Native	❌	❌
Context window	128K	128K	256K	256K
Architecture	Dense	Dense	MoE (4B active)	Dense
Personality	Quick, offline, multilingual voice	Voice + vision, portable	Fast thinker, production-ready	Deep thinker, thorough
MMLU Pro	60.0%	69.4%	82.6%	85.2%
AIME 2026	37.5%	42.5%	88.3%	89.2%
Codeforces ELO	633	940	1,718	2,150

The E2B is the field version — ears, eyes, voice, no internet required. 4 GB of RAM. Runs on a mid-range phone. When the person using your app has one hand occupied and needs an answer in thirty seconds, this is the one.

The 26B A4B is the everyday version — nearly as capable as the 31B, but runs almost as fast as a 4B model because only 3.8B parameters activate during inference. The sweet spot for most production use cases. Start here.

The 31B is the deep thinker — when correctness matters more than speed. Medical reasoning. Legal analysis. Complex multi-step problems. Give it time and it will reason its way through things the smaller versions would stumble on.

The Complete Person

Put all the pieces together and here's who you've built:

Human quality	Gemma 4 equivalent
Thinks before speaking	Thinking mode — private reasoning channel
Learns your domain	Fine-tuning — LoRA, QLoRA, full weights
Sees the world	Image tokens — vision, OCR, documents, handwriting
Hears you	Audio tokens — speech recognition + translation, 140+ languages
Speaks your language	Text generation → TTS → any language, any voice
Does things	Function calling — agentic action in the world
Remembers context	Up to 256K token context window
Belongs to you	Apache 2.0 — no rent, no terms change, no vendor lock-in

What This Person Can Do That You Can't

They remember everything. 256,000 tokens of active working memory. An entire codebase. A five-year medical history. A full legal archive. All in context, all at once.

They speak 140 languages natively. Trained on them from the ground up — not translated into, but grown from.

They never have a bad day. Never tired, never defensive, never carrying yesterday's frustration into today's conversation. Thinks harder when you ask. Lighter when you don't need it.

They're unconditionally yours. Not rented. Not metered by the query. Apache 2.0 means you can take the weights, fine-tune them, deploy them, build a business on them. No one can change the terms on you next quarter.

The Last Question

Here's the thing about building a person, even a digital one.

The body is the easy part. Brain, eyes, ears, mouth, hands — those are engineering problems. Gemma 4 solved them. Beautifully.

The hard part is the question that comes after: what does this person do with all of that?

A doctor who can't afford a cloud subscription but can run a local model that reads scans, hears patient descriptions in their local language, and reasons carefully before it speaks. A teacher in a school with no reliable internet, whose AI assistant lives on a tablet and never drops the connection. A developer building an agent that thinks before it acts, reaches out to the right tools, and reports back in the language its users actually speak.

The box is open. The pieces — brain, learning, eyes, ears, mouth, hands — are all there. I have built TharVA: An Offline, Mobile based AI Assistant for camel herders of Rural Rajasthan using Gemma4

So let me ask you what I keep asking myself:

If you could build this person for your community, your domain, your language, what would they do?

📖 Gemma 4 docs — ai.google.dev/gemma/docs
🤗 Download Gemma 4 — Hugging Face

Everything is a prompt. Everything is possible. Start building.

An offline multilingual AI assistant built with Gemma 4 for camel herders in Rajasthan. Voice, vision, local language support, grounded knowledge, and real-world usability designed for regions with low connectivity and limited digital access.

Daathwi Naagh — Sat, 23 May 2026 07:09:19 +0000

Gemma 4 Challenge: Build With Gemma 4 Submission

Daathwi Naagh

May 22

TharVA : Keeping India's Desert Heritage Alive with Offline AI (Gemma4)

#discuss #devchallenge #gemmachallenge #gemma

5 min read

TharVA : Keeping India's Desert Heritage Alive with Offline AI (Gemma4)

Daathwi Naagh — Fri, 22 May 2026 16:22:18 +0000

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

TharVA : Thar Virtual Assistant
A mobile-first, fully offline, multilingual AI assistant for camel herders in the Thar Desert.

Not a general assistant. Not a chatbot.
A field tool built specifically for Camel herders in rural Rajasthan who raise camels in one of the world's harshest environments, have no reliable internet, work with their hands, and need answers in Hindi or any other language, fast, when something goes wrong with an animal.

The spark came from time I spent in Bikaner, talking to Ashok Bishnoi, a social entrepreneur near the National Research Centre on Camel in Jorbeer and to Raika Community camel keepers whose generational knowledge of camel behavior, calving, and desert survival isn't written down anywhere accessible. What they lacked wasn't expertise. It was fast access to reliable guidance at the right moment.

One conversation stayed with me:

A calf had been rejected by its mother.
A time-critical emergency where the first hours determine survival.

The formal channels couldn't give clear enough answers fast enough. What actually helped was a Raika Community elder who had seen it before and knew exactly what to do.

TharVA is an attempt to make that knowledge reachable in a field, with no signal, in Hindi / any language, with one hand free.

The Two Interaction Modes

It has two interaction modes, built around how field users actually work:

Quick Call — Voice-in, voice-out. Hold a button, speak your question, hear a short direct answer. Streaming generation so TTS begins before the full response finishes. Designed for when you're standing next to a distressed animal and have thirty seconds, not three minutes.

Detailed Chat — Text or Voice input, with image support. Attach a photo of a camel's wound or a skin condition. Get a thorough, structured response. Same model, different prompt, different temperature completely different feel.

Answers are grounded in curated camel husbandry reference material from actual veterinary literature and NRCC research, injected into the system instruction at session start. The model isn't improvising from general training data. It knows the domain because it was given the domain.

How I Used Gemma 4

Model chosen: Gemma 4 E2B

Not the 26B.
Not the 31B.
The smallest one in the family — and that was entirely intentional.

The people TharVA is built for don't have high-end phones or reliable connectivity. The rule I held myself to for the entire build was: if it doesn't load and respond reliably on a mid-range Android phone in realistic conditions, nothing else matters.

The E2B — 2.3 billion effective parameters, running on as little as 4 GB of RAM, is the only model in the Gemma 4 family that makes that possible while still being genuinely capable. I have set the context length to 4096 tokens which shaped all the technical decisions I have made.

The entire inference stack runs on-device through the flutter_gemma package, wrapping Google AI Edge's LiteRT-LM runtime.

No cloud API.
No data leaving the phone.
No signal required.
For a community where privacy matters and internet is genuinely unreliable.

Offline-first wasn't a feature preference, it was the baseline.

TharVA's Application Architecture

Multimodal is no longer a premium feature

Ears (<|audio|>) — Voice input bypasses device-level speech recognition entirely. I record audio as a raw WAV file (PCM 16kHz, 16-bit, mono) and pass the bytes directly to the model. This removed the requirement to pre-install language packs through obscure settings menus that field users would never find. Unexpectedly, the E2B handled local Hindi accents and regional speech patterns from around Bikaner better than device-level ASR did. Voice input that understands your accent is voice input people will actually use.

Eyes (<|image|>) — Users can photograph a wound, a skin condition, or an animal's posture and include that in their question. I capped image support at one image per turn — a deliberate product decision, not a temporary limitation. Allowing multiple images per turn caused context overflow failures mid-conversation that were impossible to handle gracefully in the field. One image per turn gives stable, predictable behavior under real conditions.

Brain (<|think|> + system prompt) — Quick Call and Detailed Chat use the same weights but entirely different system prompts and temperatures. Quick Call prompts bias heavily toward short, direct outputs with lower sampling temperature. Detailed Chat allows longer, structured responses. The model adapts its behavior completely based on what the prompt asks. same brain, different mode.

Mouth / Vocal (streaming TTS) — I use generateChatResponseAsync() to feed tokens into text-to-speech as they arrive. The user starts hearing the response before generation finishes. Without streaming, you wait for full generation then wait for TTS. With streaming, those processes overlap. The perceived latency difference in Quick Call is the difference between an app that feels usable and one that feels broken.

Grounded knowledge (context injection) — Curated camel husbandry reference text is loaded into the system instruction at session start, truncated to a fixed character budget. Every per-turn input is kept lean, a language reminder, an optional location/battery prefix, and the actual question. The knowledge base is in context from the start without consuming fresh tokens on every turn. This was forced by the 4,096-token on-device context limit, which is the real constraint that shaped almost every other technical decision.

Multilingual behavior — Any language or mode change triggers a full session reset: close the inference session, rebuild the system prompt with fresh language reminders, start fresh. Without hard resets, KV cache state bleeds between contexts, the wrong script, the wrong tone, the wrong response length in ways that undermine trust in the app entirely.

The hardest engineering in this project was completely unglamorous: a download recovery system that detects partial model files and restarts cleanly, a runtime compatibility fix for a silent mismatch between the LiteRT-LM version and the updated Hugging Face artifact format, and a turn cap that forces session rotation before context overflow causes silent failures mid-conversation.

None of that shows up in a demo. All of it determines whether the app actually works in a field in Rajasthan.

Code

https://github.com/daathwi/TharVA

The Raika community have kept camels alive in the Thar Desert for centuries. They don't need an AI to tell them what they already know. What TharVA tries to do is make the knowledge that exists in community memory and veterinary literature reachable at the moment when someone needs it with no signal, in Hindi, with one hand free.

That's a narrow goal. I think it's the right one.

Google ADK vs LangGraph — The Definitive Comparison for Agent Builders

Daathwi Naagh — Fri, 10 Apr 2026 01:11:39 +0000

Choosing the wrong framework doesn't just slow you down. It shapes every architectural decision that follows. Here's how to get it right.

Why This Decision Matters More Than You Think

The agent framework you pick isn't just a tooling choice. It determines:

How you model state and orchestration
How you debug when things go wrong in production
What cloud infrastructure you're implicitly committing to
How fast your team can move from prototype to deployment

Both LangGraph and Google ADK are serious, production-capable frameworks. But they have fundamentally different philosophies — and that gap matters enormously depending on what you're building.

Let's go deep.

At a Glance

Dimension	LangGraph	Google ADK
Released by	LangChain Team	Google (Google Cloud NEXT, April 2025)
Core philosophy	Graph-based state machines	Code-first, hierarchical agent trees
Abstraction level	Low-level, explicit control	Higher-level, batteries-included
Model support	Fully model-agnostic	Optimized for Gemini, but model-agnostic
Cloud tie-in	Deploy anywhere	Native Vertex AI / GCP integration
Learning curve	Medium–High	Medium
Best for	Precision, auditability, complex flows	Speed, multi-agent systems, GCP environments
Observability	LangSmith / Langfuse	OpenTelemetry-native
State management	Built-in checkpointing + time travel	Session state with pluggable backends
Streaming	Per-node token streaming	Bidirectional audio/video + text (Gemini Live API)
Production maturity	High (battle-tested)	Early–Medium (growing fast)

Philosophy: Where They Diverge Fundamentally

LangGraph — "You Are the Architect"

LangGraph is an extension of LangChain that treats your agent as a directed graph (or DAG). Every step, every branch, every loop — you define it explicitly.

You construct nodes (LLM calls, tool calls, custom logic) and edges (transitions, conditions, cycles). The agent's entire execution path is a graph you designed.

# LangGraph — you define the graph explicitly
from langgraph.graph import StateGraph

builder = StateGraph(AgentState)
builder.add_node("classify", classify_intent)
builder.add_node("search", run_search)
builder.add_node("respond", generate_response)

builder.add_conditional_edges("classify", route_by_intent, {
    "search_needed": "search",
    "direct": "respond"
})
builder.add_edge("search", "respond")
builder.set_entry_point("classify")

graph = builder.compile(checkpointer=MemorySaver())

This is powerful. And demanding. You own the architecture completely.

Google ADK — "You Define the Agents, ADK Handles the Rest"

ADK treats agents as hierarchical tree structures. A root agent delegates to specialized sub-agents. Orchestration is handled through pattern primitives: SequentialAgent, ParallelAgent, LoopAgent.

# ADK — declare agents and their roles, ADK orchestrates
from google.adk.agents import LlmAgent, SequentialAgent

research_agent = LlmAgent(
    name="researcher",
    model="gemini-2.5-flash",
    instruction="Research the given topic thoroughly.",
    tools=[google_search]
)

writer_agent = LlmAgent(
    name="writer",
    model="gemini-2.5-flash",
    instruction="Write a concise summary based on research.",
)

pipeline = SequentialAgent(
    name="research_pipeline",
    sub_agents=[research_agent, writer_agent]
)

ADK provides the scaffolding. You define the logic, roles, and tools. The framework manages context, routing, state, and lifecycle.

Deep Dive: Feature by Feature

1. Orchestration Model

LangGraph uses a graph model — nodes and edges. It shines when your workflow has:

Complex conditional branching
Loops and retries with custom exit conditions
Parallel branches that must merge at specific points
Precise control over which step runs when

ADK uses a hierarchical agent tree. It shines when your workflow looks like:

A root "manager" agent that delegates to specialists
Tasks that can run sequentially or in parallel by design
Multi-agent workflows where each agent has a clear, encapsulated role

The key difference: LangGraph models flow as a graph. ADK models flow as a team.

2. State Management

This is where LangGraph has a significant technical edge for complex use cases.

LangGraph has built-in checkpointing — state is persisted at every node. This enables:

Time travel debugging: replay your agent from any prior state
Resuming interrupted runs
Human-in-the-loop flows (pause, wait for approval, continue)
Fault tolerance in long-running workflows

# LangGraph time-travel: rewind to any past state
config = {"configurable": {"thread_id": "abc123"}}
graph.update_state(config, {"messages": [...]}, as_node="classify")

ADK manages state through Session objects — short-term state per conversation, with pluggable backends for longer-term memory. It's cleaner for conversational flows and multi-session memory, but doesn't natively offer the time-travel / checkpoint replay that LangGraph does.

Winner for complex state: LangGraph. Winner for conversational memory across sessions: ADK.

3. Multi-Agent Systems

Both frameworks support multi-agent architectures, but they approach it very differently.

LangGraph: You build multi-agent systems by composing graphs. One graph can invoke another as a subgraph. Communication between agents is via shared state passed through the graph. It's powerful but requires you to design the topology explicitly.

ADK: Multi-agent is a first-class primitive. ADK is explicitly designed for hierarchical agent teams. Sub-agents can be:

Invoked sequentially (SequentialAgent)
Invoked in parallel (ParallelAgent)
Looped until a condition is met (LoopAgent)
Called as tools by a root agent (AgentTool)

ADK also supports Agent2Agent (A2A) Protocol — a standardized interface allowing ADK agents to call agents built in LangGraph, CrewAI, or other frameworks. This is a major interoperability win.

# ADK: Run flight and hotel agents IN PARALLEL
from google.adk.agents import ParallelAgent

booking_pipeline = ParallelAgent(
    name="booking_pipeline",
    sub_agents=[flight_agent, hotel_agent]  # runs concurrently
)

Winner for multi-agent-first design: ADK.

4. Observability and Debugging

LangGraph integrates tightly with LangSmith (and Langfuse via callbacks). You get:

Step-by-step trace of every node execution
Token usage per node
Visual graph replay of agent runs
LangGraph Studio: visual debugging UI

ADK is built with OpenTelemetry natively. This means:

Plugs into any OTel-compatible backend (Jaeger, Grafana, Datadog, etc.)
One-click integrations with Langfuse and other LLM observability platforms
Built-in evaluation framework for both final responses and intermediate steps
Visual Web UI + CLI for local debugging
When deployed on Vertex AI: Cloud Trace integration out of the box

LangGraph's edge: LangSmith is mature and deeply integrated.
ADK's edge: OpenTelemetry-first avoids vendor lock-in.

5. Tool Ecosystem

LangGraph/LangChain has a massive, mature ecosystem — thousands of pre-built integrations, tools, and chains built over years. It's hard to beat for breadth.

ADK brings:

Pre-built tools: Google Search, Code Execution, BigQuery, AlloyDB
MCP (Model Context Protocol) tool support
LangChain tools usable inside ADK (interoperability)
Other agent frameworks (CrewAI, LangGraph agents) usable as tools
Support for 200+ models via LiteLLM

ADK's tool interoperability story is strong — it can consume LangChain tools, which largely closes the ecosystem gap.

6. Deployment

LangGraph: Deploy anywhere. Containerize your graph and run it on any infrastructure. LangGraph Cloud (managed service) available for scale. Truly cloud-agnostic.

ADK: "Deploy anywhere" is the stated goal, and it works — but the native experience is GCP:

One-command deploy to Vertex AI Agent Engine
Native Cloud Run, GKE support
Managed sessions, auth, and tracing on Vertex AI automatically

If you're on GCP, ADK's deployment story is a genuine competitive advantage. If you're on AWS, Azure, or self-hosted, LangGraph is simpler.

7. Streaming

LangGraph: Per-node token streaming. Standard LLM streaming, solid and reliable.

ADK: Bidirectional audio and video streaming via the Gemini Live API. This is unique — no other major framework natively supports this. For voice agents, customer support bots, or multimodal applications, ADK is in a different league here.

8. Developer Experience

LangGraph feels like a graph DSL — powerful, but you're working at a lower abstraction level. It rewards engineers who want transparency and deterministic behavior. The cost: more boilerplate, steeper learning curve, fragmented documentation.

ADK feels like a full-stack Python application framework — Web UI, CLI, API server, test harness, deploy pipelines, all included. It rewards engineers who want to move fast and think in terms of agents and roles rather than nodes and edges.

The Honest Tradeoffs

LangGraph Strengths

Unmatched precision and control over execution flow
Best-in-class state checkpointing and time-travel debugging
Mature ecosystem, battle-tested in production
Truly model-agnostic and cloud-agnostic
Excellent for compliance-heavy environments (every decision is auditable)

LangGraph Weaknesses

Verbose code for straightforward multi-agent patterns
Steeper learning curve — graph thinking isn't intuitive for all teams
Documentation can be fragmented
No native multimodal streaming

ADK Strengths

Fastest path to hierarchical multi-agent systems
Built-in evaluation, Web UI, CLI — production-grade DX out of the box
Native A2A protocol for cross-framework agent interoperability
OpenTelemetry-native observability (no vendor lock-in)
Best multimodal/streaming support of any major framework
Actively backed by Google, powering internal products (Agentspace, Customer Engagement Suite)

ADK Weaknesses

Newer — less production battle-testing than LangGraph
GCP ecosystem makes it awkward outside Google Cloud
Less fine-grained control than LangGraph for complex cyclical flows
Gemini optimization means other models are second-class (though supported)

Decision Guide: When to Choose What

Choose LangGraph when...

You need surgical precision over every execution step.
Compliance systems, financial workflows, healthcare automation — any domain where you must prove exactly what happened and why.

Your workflow has complex, custom loops and branching logic.
Non-standard patterns that don't fit "sequential" or "parallel" — LangGraph lets you model any flow you can imagine.

You're building long-running tasks that must survive interruptions.
Checkpointing + resume is a LangGraph superpower. Multi-day agent runs, workflows requiring human approval mid-execution.

You're multi-cloud or cloud-agnostic.
If AWS, Azure, or self-hosted infrastructure is non-negotiable, LangGraph is the frictionless path.

Your team already knows LangChain.
The ecosystem familiarity is a real productivity advantage.

You need the widest model support without friction.
OpenAI, Anthropic, Mistral, local models — all first-class citizens.

Choose Google ADK when...

You're building on Google Cloud (Vertex AI, GCP).
One-command deployment, managed sessions, Cloud Trace — the native experience is genuinely excellent.

Speed to production matters more than architectural customization.
ADK's batteries-included approach gets you from prototype to deployed agent faster than anything else.

You're building hierarchical multi-agent systems.
Agent teams with clear roles and delegation are ADK's native strength.

You need multimodal or voice agents.
Bidirectional audio/video streaming via Gemini Live API is uniquely available here.

You want cross-framework agent interoperability via A2A.
If your org is mixing ADK agents, LangGraph agents, and CrewAI agents — the A2A protocol makes ADK the best orchestration hub.

Your model choice is Gemini (or you want access to Gemini 3 Pro/Flash).
ADK and Gemini are deeply co-designed. You'll get the best performance, streaming, and tooling here.

Situational Cheatsheet

Situation	Recommendation
Compliance/audit-critical workflow	LangGraph
GCP-native enterprise deployment	ADK
Complex custom loops and cycles	LangGraph
Multi-agent delegation with clear roles	ADK
Long-running tasks with resume/replay	LangGraph
Voice/multimodal agents	ADK
AWS or Azure infrastructure	LangGraph
Fast prototyping to production	ADK
You need every model under the sun	LangGraph
Google Gemini is your primary model	ADK
HITL (Human-in-the-loop) workflows	LangGraph
Cross-framework agent interop (A2A)	ADK
Team prefers explicit flow control	LangGraph
Team prefers role-based agent design	ADK

Can You Use Both?

Yes — and increasingly, teams do.

ADK can treat a LangGraph agent as an AgentTool. LangGraph can call ADK-built agents as subgraphs via API. With MCP and A2A protocol support in ADK, the two frameworks are becoming interoperable rather than mutually exclusive.

A pragmatic architecture some teams use:

Root Orchestrator (ADK — hierarchical multi-agent)
├── Research Agent (ADK — Google Search, BigQuery)
├── Processing Agent (LangGraph — complex stateful loop)
│   ├── Validate Node
│   ├── Transform Node
│   └── Retry Node (with checkpointing)
└── Output Agent (ADK — Gemini streaming response)

Use each where it's strongest.

Final Verdict

LangGraph is the engineer's framework. It gives you the surgical control, auditability, and state management that complex production systems demand. You pay for it in learning curve and boilerplate. For compliance-heavy, custom-flow, cloud-agnostic workloads — it's the right tool.

ADK is the product team's framework. It's fast, cohesive, and opinionated in the right ways. Multi-agent orchestration is a first-class citizen, deployment is frictionless on GCP, and the multimodal streaming story is unmatched. For hierarchical agent teams, GCP environments, and teams that want to move fast — it's compelling and only getting better.

The framework you pick should match your workflow pattern, your infrastructure, and your team's mental model. Neither is universally better.

Pick the one that makes your specific problem easier to model — then go build.

Have you shipped production agents on either? I'd be curious what failure modes you've hit in practice — that's where the real framework comparison happens.

100s of Tools in Your Agent — Here's How to Actually Pick the Right One

Daathwi Naagh — Fri, 10 Apr 2026 00:49:45 +0000

The Problem Nobody Talks About

You've built an agent. You've wired up 100+ tools. You feel good about it.

Then it starts hallucinating. Picking the wrong tool. Collapsing entire workflows over a single misclassified query.

The failure isn't the LLM. It's the architecture.

My previous post covered a genuine use case I found for Gemma4 — and this is exactly where it fits in.

The Naive Approach (and Why It Fails)

Load all tools into LLM context and let it decide.

Sounds simple. It is. And it breaks at scale.

Hallucinations increase with context length
Bloated context = slower, more expensive calls
LLM gets confused choosing between 50+ tool descriptions

Result: Slow. Unreliable. Expensive agents.

The "Slightly Smarter" Approach (Still Broken)

RAG over tool descriptions. Seems reasonable:

User query → embedding → top 5 matches → LLM picks

Sounds clean. But embeddings can't distinguish intent.

Similar words ≠ same meaning.

Example:

User says: "I need iPhone"
Tool: check_product_catalog

Embedding search may:

Miss the tool completely
Retrieve irrelevant tools
Break the entire downstream workflow

Wrong tool gets selected. Entire workflow collapses.

The problem isn't the embedding model. It's that you're asking a single layer to carry too much responsibility.

What Actually Worked: A Layered Filtering Stack

The approach that works in production is layered filtering — not pure semantic search, not raw LLM reasoning. Both together, in the right order.

Here's the stack I use:

Layer	Tool	Role
Intent Classification	`gemma4:e4b` via Ollama (9.6 GB, local)	Fast, private intent routing
Semantic Search	`nomic-embed-text` via Ollama (274 MB)	Embedding over filtered subset

The 5-Step Architecture

Step 1 — Classify Intent First

A lightweight LLM maps the query to a high-level category before any search happens.

This eliminates entire irrelevant domains upfront. If the user is asking about orders, you never even look at inventory tools.

"I need iPhone" → category: product_discovery

Step 2 — Hard Filter by Metadata

Deterministic rules. Not embeddings.

Only tools matching the classified intent category are eligible. The search space collapses from 100+ tools to maybe 10–15.

category: product_discovery → eligible tools: [check_catalog, search_products, get_inventory, ...]

Step 3 — Semantic Search Within the Clean Subset

Now RAG works — because it's running over a small, relevant set, not noisy hundreds.

nomic-embed-text finds the closest semantic matches within your filtered pool. False positives drop dramatically.

Step 4 — Score and Rank

Confidence scoring on the top candidates. Auditable. Explainable.

No black box decisions. You can log exactly why a tool was selected — which matters when things break in production.

Step 5 — LLM Final Pick

Send the top candidates + the original user query to the LLM.

Now it's choosing between 3–5 relevant tools, not 100+. The context is clean. The decision is accurate.

[check_product_catalog, search_products, get_item_details] + "I need iPhone"
→ LLM picks: check_product_catalog

Why This Works

Each layer does one job. No single layer carries too much responsibility.

Intent classifier  → reduces domain space
Metadata filter    → reduces tool count (deterministic, fast)
Semantic search    → finds closest match in clean subset
Scoring            → adds confidence + auditability
LLM               → makes final call with minimal context

That's why it's reliable. That's why it's fast.

The Part No One Talks About

Even the best architecture fails if your tool descriptions are written like API docs.

# Bad — written for engineers
def check_product_catalog(sku: str, region: str) -> dict:
    """Queries product catalog by SKU with region filtering."""

# Good — written for users
def check_product_catalog(...):
    """
    Use this when someone wants to find a product, check if something 
    is available, look up an item by name or model, or browse what's 
    in stock. Works for queries like 'do you have iPhone 15?' or 
    'show me your laptop options'.
    """

Write descriptions in the language your users actually speak.

Think about how someone types a message to an agent — not how an engineer names a function.

The system learns from the language you give it.

Results

Metric	Value
End-to-end latency (all 4 steps)	< 2 seconds
Tool selection accuracy	Significantly higher than pure RAG
Model infra	Fully local, private
Bottleneck	1000+ concurrent users (next problem to solve)

The Real Lesson

We don't need smarter models.

We don't need infinite context windows.

We need better system design and the discipline to think like a user.

A layered architecture with good tool descriptions, running on lightweight local models, will outperform a bloated LLM context every time.

That's the real work.

Built this in production. Happy to discuss the concurrent scaling problem — that's a different beast entirely.

Found this useful? Follow for more on agent architecture and production ML systems.

DEV Community: Daathwi Naagh

What Would Gemma4 Look Like as a Human?

The Brain — <|think|>

The Brain Learns — Fine-tuning

The Eyes — <|image|>

The Ears — <|audio|>

The Mouth — Text generation + TTS

The Hands — Function Calling

Choosing Your Person: The Versions of Gemma 4

The Complete Person

What This Person Can Do That You Can't

The Last Question

An offline multilingual AI assistant built with Gemma 4 for camel herders in Rajasthan. Voice, vision, local language support, grounded knowledge, and real-world usability designed for regions with low connectivity and limited digital access.

TharVA : Keeping India's Desert Heritage Alive with Offline AI (Gemma4)

TharVA : Keeping India's Desert Heritage Alive with Offline AI (Gemma4)

What I Built

The Two Interaction Modes

How I Used Gemma 4

TharVA's Application Architecture

Multimodal is no longer a premium feature

Code

Google ADK vs LangGraph — The Definitive Comparison for Agent Builders

Why This Decision Matters More Than You Think

At a Glance

Philosophy: Where They Diverge Fundamentally

LangGraph — "You Are the Architect"

Google ADK — "You Define the Agents, ADK Handles the Rest"

Deep Dive: Feature by Feature

1. Orchestration Model

2. State Management

3. Multi-Agent Systems

4. Observability and Debugging

5. Tool Ecosystem

6. Deployment

7. Streaming

8. Developer Experience

The Honest Tradeoffs

LangGraph Strengths

LangGraph Weaknesses

ADK Strengths

ADK Weaknesses

Decision Guide: When to Choose What

Choose LangGraph when...

Choose Google ADK when...

Situational Cheatsheet

Can You Use Both?

Final Verdict

100s of Tools in Your Agent — Here's How to Actually Pick the Right One

The Problem Nobody Talks About

The Naive Approach (and Why It Fails)

The "Slightly Smarter" Approach (Still Broken)

What Actually Worked: A Layered Filtering Stack

The 5-Step Architecture

Step 1 — Classify Intent First

Step 2 — Hard Filter by Metadata

Step 3 — Semantic Search Within the Clean Subset

Step 4 — Score and Rank

Step 5 — LLM Final Pick

Why This Works

The Part No One Talks About

Results

The Real Lesson

The Brain — `<|think|>`

The Eyes — `<|image|>`

The Ears — `<|audio|>`