<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Amit Maraj</title>
    <description>The latest articles on DEV Community by Amit Maraj (@agenticamit).</description>
    <link>https://dev.to/agenticamit</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3652060%2Fe9c989f4-c531-4cc8-9d61-eb552f2f60d8.png</url>
      <title>DEV Community: Amit Maraj</title>
      <link>https://dev.to/agenticamit</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/agenticamit"/>
    <language>en</language>
    <item>
      <title>Agent Factory Recap: Building with Gemini 3, AI Studio, Antigravity, and Nano Banana</title>
      <dc:creator>Amit Maraj</dc:creator>
      <pubDate>Thu, 04 Jun 2026 13:04:00 +0000</pubDate>
      <link>https://dev.to/googleai/agent-factory-recap-building-with-gemini-3-ai-studio-antigravity-and-nano-banana-186h</link>
      <guid>https://dev.to/googleai/agent-factory-recap-building-with-gemini-3-ai-studio-antigravity-and-nano-banana-186h</guid>
      <description>&lt;p&gt;Welcome back to &lt;a href="https://www.youtube.com/playlist?list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs" rel="noopener noreferrer"&gt;The Agent Factory!&lt;/a&gt; This week, we went beyond the hype to dissect the technical details of Google's massive wave of AI releases. We were joined by &lt;strong&gt;Paige Bailey&lt;/strong&gt;, the UTL for Developer Relations at DeepMind, to break down everything from the new &lt;a href="https://blog.google/products/gemini/gemini-3/" rel="noopener noreferrer"&gt;Gemini 3&lt;/a&gt; model to the &lt;a href="https://antigravity.google/" rel="noopener noreferrer"&gt;Antigravity IDE&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Google has been shipping at a breakneck pace—literally a new model or feature nearly every day—and this episode is all about how developers can harness these tools right now.&lt;/p&gt;

&lt;p&gt;This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz46kac6059k1ado53olm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz46kac6059k1ado53olm.png" width="800" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tech Stack - What is it?
&lt;/h2&gt;

&lt;p&gt;We tossed around a few new names in this episode. Here is a quick primer on the tech discussed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://blog.google/products/gemini/gemini-3/" rel="noopener noreferrer"&gt;Gemini 3&lt;/a&gt;: The latest iteration of Google's model family. While Gemini 1 was about understanding and Gemini 2 was about reasoning, Gemini 3 is designed for &lt;strong&gt;acting and coding&lt;/strong&gt;. It features improved tool use and function calling.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://antigravity.google/" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt;: Google's new AI-native IDE (Integrated Development Environment) designed to integrate Gemini 3 directly into the coding workflow, allowing for multimodal inputs like screenshots to drive code changes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://blog.google/technology/ai/nano-banana-pro/" rel="noopener noreferrer"&gt;Nano Banana Pro&lt;/a&gt;: The newest iteration in the media generation series, capable of creating high-fidelity images, voxel art, and game assets.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Factory Floor
&lt;/h2&gt;

&lt;p&gt;The Factory Floor is our segment for getting hands-on. Here, we moved from high-level concepts to practical code with live demos.&lt;/p&gt;

&lt;h3&gt;
  
  
  Building "Nordic Shield" with Gemini 3
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://youtu.be/JKW8InX3mdQ?t=681" rel="noopener noreferrer"&gt;11:20&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Paige demonstrated the "Build" feature in AI Studio to create a complex React application from scratch. The goal was to test the model's ability to self-correct and handle specific design constraints.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Prompt:&lt;/strong&gt; Create an insurance cataloging app using the webcam and microphone. It needed a "Nordic/IKEA" design theme, an inventory list, and the ability to estimate item value using Google Search grounding.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Process:&lt;/strong&gt; Gemini 3 generated a React Native app, set up the directory structure, and wrote its own prompts for the agents.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Result:&lt;/strong&gt; The app, named "Nordic Shield," successfully cataloged items (like a Pixel 7 and a soda can) via video. When it encountered audio issues, it generated a reasoning trace to debug the problem live. It successfully utilized &lt;strong&gt;Gemini Live&lt;/strong&gt; for the conversation and executed a secondary "agentic" step to search Google for the estimated value of the items.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Redesigning a Website with Antigravity
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://youtu.be/JKW8InX3mdQ?t=1827" rel="noopener noreferrer"&gt;30:27&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F05624txssc1vr68ljcrn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F05624txssc1vr68ljcrn.png" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We shifted gears to look at Google's new IDE, Antigravity. The goal was to update an existing, text-heavy website to match a new, vibrant "neo-brutalist" design aesthetic using only screenshots as a guide.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Input:&lt;/strong&gt; The existing codebase plus two screenshots of the desired visual style (doodly, pastel, notebook-esque).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Implementation:&lt;/strong&gt; Antigravity analyzed the images to understand the design philosophy. It created a task list and an implementation plan to ensure it stayed grounded.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Outcome:&lt;/strong&gt; The IDE successfully refactored the site to match the brand guidelines, introducing "jiggling pill" UI elements and updating the color palette to match the provided screenshots perfectly.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Paige Bailey on The Evolution of Gemini
&lt;/h2&gt;

&lt;p&gt;We sat down with Paige to understand how DeepMind is approaching the rapid evolution of their models and what it means for developers building agents today.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Three Stages of Gemini
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://youtu.be/JKW8InX3mdQ?t=169" rel="noopener noreferrer"&gt;2:49&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feb2b9kag6zyvzcilbnqa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feb2b9kag6zyvzcilbnqa.png" width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Paige outlined the clear evolutionary path of the Gemini family. She explained that the original Gemini was focused on &lt;strong&gt;multimodal understanding&lt;/strong&gt; (video, text, audio). Gemini 2 introduced &lt;strong&gt;thinking&lt;/strong&gt;—the ability to reason and plan step-by-step. Gemini 3, the current iteration, is all about &lt;strong&gt;acting&lt;/strong&gt;. This model is optimized for acting on its reasoning, specifically through coding and tool use, allowing for composite architectures where models work together rather than in isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pre-Training vs. Post-Training
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://youtu.be/JKW8InX3mdQ?t=295" rel="noopener noreferrer"&gt;4:55&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We discussed the "schooling" of these models. Paige used a great analogy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pre-training&lt;/strong&gt; is like sending the model to school. It involves giving Gemini access to massive amounts of tokens (internet data, synthetic data, video game footage) to learn the basics.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Post-training&lt;/strong&gt; is "on-the-job experience." This is where DeepMind provides specific, hand-curated examples of complex workflows, such as multi-turn conversations that involve editing websites or using multiple tools to accomplish a single task.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The "Vending Bench"
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://youtu.be/JKW8InX3mdQ?t=408" rel="noopener noreferrer"&gt;6:48&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Benchmarks are changing. Paige introduced us to a fascinating new evaluation metric called &lt;strong&gt;Vending Bench&lt;/strong&gt;. This test gauges a model's ability to run a passive business—specifically, a vending machine. The model must figure out stock, reorder items, deploy restockers, and do long-range planning to maximize uptime. The score is determined by how much profit the model generates in a year. Currently, Gemini 3 Pro is generating around &lt;strong&gt;$5,462&lt;/strong&gt; per machine, showing significant improvements in long-term strategic decision-making.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creative Multimodality with Nano Banana
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://youtu.be/JKW8InX3mdQ?t=1714" rel="noopener noreferrer"&gt;28:34&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff63eiip1gdu8b7od1qhx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff63eiip1gdu8b7od1qhx.png" width="800" height="646"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We also touched on the creative side of the stack. Paige highlighted that when you combine reasoning with multimodal outputs, the possibilities explode. She shared examples of Nano Banana Pro being used to generate game assets, orthographic blueprints for 3D modeling (like castles), and detailed physics explainers. The key takeaway is the power of combining these media models with search grounding to create accurate, high-fidelity visual assets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;It is incredible to see not just the models, but the entire ecosystem Google is building—from the hardware to the IDEs like Antigravity. The ability to deploy these agents directly to Google Cloud with a single click bridges the gap between a cool demo and a production-ready application.&lt;/p&gt;

&lt;p&gt;As Paige mentioned, the trajectory is exponential. Whether you are building passive businesses or complex coding agents, the tools are ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your turn to build
&lt;/h2&gt;

&lt;p&gt;If you haven't yet, head over to &lt;strong&gt;AI Studio&lt;/strong&gt; or try out the &lt;strong&gt;Gemini API&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;Try the "Vending Bench" challenge yourself—can you build an agent that runs a better business than Gemini 3? &lt;/p&gt;

&lt;p&gt;Let us know what you build!&lt;/p&gt;

&lt;h2&gt;
  
  
  Connect with us
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Amit Maraj → &lt;a href="https://www.linkedin.com/in/amit-maraj/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/agenticamit" rel="noopener noreferrer"&gt;X&lt;/a&gt;, &lt;a href="https://www.tiktok.com/@agenticamit" rel="noopener noreferrer"&gt;TikTok&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Paige Bailey → &lt;a href="https://www.linkedin.com/in/dynamicwebpaige/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/DynamicWebPaige" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>nanobanana</category>
      <category>antigravity</category>
      <category>agents</category>
    </item>
    <item>
      <title>Agent Factory Recap: Cracking Open an Open Model</title>
      <dc:creator>Amit Maraj</dc:creator>
      <pubDate>Fri, 06 Feb 2026 19:10:28 +0000</pubDate>
      <link>https://dev.to/googleai/agent-factory-recap-cracking-open-an-open-model-42e6</link>
      <guid>https://dev.to/googleai/agent-factory-recap-cracking-open-an-open-model-42e6</guid>
      <description>&lt;p&gt;Welcome back to &lt;a href="https://www.youtube.com/playlist?list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs" rel="noopener noreferrer"&gt;The Agent Factory&lt;/a&gt;! In this episode, we’re joined by Ravin Kumar, a Research Engineer at DeepMind, to tackle one of the biggest topics in AI right now: building and training open-source agentic models. We wanted to go beyond just using agents and understand what it takes to build the entire factory line—from gathering data and supervised fine-tuning to reinforcement learning and evaluations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent Industry Pulse
&lt;/h2&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=7YgUDf_JXN8&amp;amp;t=54s" rel="noopener noreferrer"&gt;2:00&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fql8gfwe2x33lom1ydlpd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fql8gfwe2x33lom1ydlpd.png" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before diving into the deep research, we looked at the latest developments in the fast-moving world of AI agents.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://blog.google/technology/google-deepmind/gemini-computer-use-model/" rel="noopener noreferrer"&gt;Gemini 2.5 Computer Use&lt;/a&gt;: Google's new model can act as a virtual user, interacting with computer screens, clicking buttons, typing in forms, and scrolling. It’s a shift from agents that just know things to agents that can do tasks directly in a browser.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://blog.google/technology/developers/introducing-vibe-coding-in-google-ai-studio/" rel="noopener noreferrer"&gt;Vibe Coding in AI Studio&lt;/a&gt;: A new approach to app building where you describe the "vibe" of the application you want, and the AI handles the boilerplate. It includes an Annotation Mode to refine specific UI elements with simple instructions like "Change this to green."&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2510.18234" rel="noopener noreferrer"&gt;DeepSeek-OCR and Context Compression&lt;/a&gt;: DeepSeek introduced a method that treats documents like images to understand layout, compressing 10-20 text tokens into a single visual token. This drastically improves speed and reduces cost for long-context tasks.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://blog.google/technology/ai/veo-updates-flow/" rel="noopener noreferrer"&gt;Google Veo 3.1 and Flow&lt;/a&gt;: The new update to the AI video model adds rich audio generation and powerful editing features. You can now use "Insert" to add characters or "Remove" to erase objects from existing video footage, giving creators iterative control.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Ravin Kumar on Building Open Models
&lt;/h2&gt;

&lt;p&gt;We sat down with Ravin to break down the end-to-end process of creating an open model with agent capabilities. It turns out the process mirrors a traditional ML lifecycle but with significantly more complex components.&lt;/p&gt;

&lt;h3&gt;
  
  
  Defining Agent Data
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://youtu.be/7YgUDf_JXN8?si=r8PP24GP0o--DmQc&amp;amp;t=895" rel="noopener noreferrer"&gt;14:55&lt;/a&gt;&lt;br&gt;
Ravin explained that training data for agents looks vastly different from standard text datasets. It starts with identifying what users actually need. The data itself is a collection of trajectories, complex examples of the model making decisions and using tools. Ravin noted that they use a mix of human-curated data and synthetic data generated by their own internal "teacher" models and APIs to create a playground for the open models to learn in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Training Techniques: SFT and Reinforcement Learning
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://youtu.be/7YgUDf_JXN8?si=lGRLwhn00IBx5Vj0&amp;amp;t=1034" rel="noopener noreferrer"&gt;17:14&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Once the data is ready, the training process involves a two-phase approach. First comes Supervised Fine-Tuning (SFT), where frameworks update the model's weights to nudge it into new behaviors based on the examples. However, to handle generalization—new situations not in the original trainin data—they rely on Reinforcement Learning (RL). Ravin highlighted the difficulty of setting rewards in RL, warning that models are prone to "reward hacking," where they might collect intermediate rewards without ever completing the final task.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Stakes of Evaluation
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://youtu.be/7YgUDf_JXN8?si=CiWVnqgYaDPV3MV7&amp;amp;t=1211" rel="noopener noreferrer"&gt;20:10&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ravin emphasized that evaluation is the most critical and high-stakes part of the process. You can't just trust the training process; you need a rigorous "final exam." They use a combination of broad public benchmarks to measure general capability and specific, custom evaluations to ensure the model is safe and effective for its intended user use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This conversation with Ravin Kumar really illuminated that building open agentic models is a highly structured, rigorous process. It requires creating high-quality trajectories for data, a careful combination of supervised and reinforcement learning, and, crucially, intense evaluation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your turn to build
&lt;/h2&gt;

&lt;p&gt;As Ravin advised, the best place to start is at the end. Before you write a single line of training code, define what success looks like by building a small, 50-example final exam for your agent. If you can't measure it, you can't improve it. We also encourage you to try mixing different approaches; for example, using a powerful API model like Gemini as a router and a specialized open-source model for specific tasks.&lt;/p&gt;

&lt;p&gt;Check out the full episode for more details, and catch us next time!&lt;/p&gt;

&lt;h2&gt;
  
  
  Connect with us
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Ivan Nardini → &lt;a href="https://www.linkedin.com/in/ivan-nardini/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/ivnardini" rel="noopener noreferrer"&gt;X&lt;/a&gt;, &lt;a href="https://bsky.app/profile/ivnardini.bsky.social" rel="noopener noreferrer"&gt;Bsky&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Amit Maraj → &lt;a href="https://www.linkedin.com/in/amit-maraj/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/agenticamit" rel="noopener noreferrer"&gt;X&lt;/a&gt;, &lt;a href="https://www.tiktok.com/@agenticamit" rel="noopener noreferrer"&gt;TikTok&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Ravin Kumar → &lt;a href="https://www.linkedin.com/in/ravinakumar/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>gemini</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building a Multi-Agent Deep Research Tool with Google ADK, A2A, &amp; Cloud Run</title>
      <dc:creator>Amit Maraj</dc:creator>
      <pubDate>Tue, 30 Dec 2025 03:29:32 +0000</pubDate>
      <link>https://dev.to/googleai/building-a-multi-agent-deep-research-tool-with-google-adk-a2a-cloud-run-2ldj</link>
      <guid>https://dev.to/googleai/building-a-multi-agent-deep-research-tool-with-google-adk-a2a-cloud-run-2ldj</guid>
      <description>&lt;p&gt;"Research" is a loaded word. It’s not just Googling a keyword. It’s reading papers, verifying facts, finding that &lt;em&gt;one&lt;/em&gt; perfect diagram, and synthesizing it all into something coherent.&lt;/p&gt;

&lt;p&gt;Asking a single AI agent to do all of that sequentially is not very efficient. They’ll hallucinate, they’ll get stuck, and they’ll definitely be slow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0x1rl0mywjtyugavrtuq.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0x1rl0mywjtyugavrtuq.gif" alt="Deep Researcher Tool" width="600" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(TL;DR: Want the code? Check out the &lt;a href="https://github.com/amitkmaraj/deep-research-agentic-architecture" rel="noopener noreferrer"&gt;&lt;strong&gt;Deep Research Agent code&lt;/strong&gt; on GitHub&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;I wanted a system that could take a topic—say, "The History of Recurrent Neural Networks"—and produce a comprehensive, illustrated report. Additionally, I wanted to learn how to build a Deep Research Tool from scratch.&lt;/p&gt;

&lt;p&gt;The first attempt? A single loop. It researched, then it looked for images, then it checked its work. It took forever.&lt;/p&gt;

&lt;p&gt;So I asked: &lt;strong&gt;Can I make this faster?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this post, we’re going to build a &lt;strong&gt;Parallel Research Squad&lt;/strong&gt;. Instead of one agent doing everything, we’ll spin up three specialized agents that run &lt;em&gt;simultaneously&lt;/em&gt;, coordinated by a central Orchestrator. We’ll use &lt;a href="https://github.com/google/adk-python" rel="noopener noreferrer"&gt;&lt;strong&gt;Google’s Agent Development Kit (ADK)&lt;/strong&gt;&lt;/a&gt; for the brains, the &lt;a href="https://google.github.io/adk-docs/a2a/" rel="noopener noreferrer"&gt;&lt;strong&gt;Agent-to-Agent (A2A) Protocol&lt;/strong&gt;&lt;/a&gt; for communication, and &lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x5f9e213a_default_b472372936&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;&lt;strong&gt;Google's Cloud Run&lt;/strong&gt;&lt;/a&gt; to let them scale infinitely.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmoep0obqflw5dk2m73zy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmoep0obqflw5dk2m73zy.png" alt="Architecture" width="800" height="696"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1: Agentic Design Patterns
&lt;/h2&gt;

&lt;p&gt;We aren't just writing prompts anymore; we are doing &lt;strong&gt;System Engineering&lt;/strong&gt;. To build a robust system, we leverage three key design patterns:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Orchestrator Pattern
&lt;/h3&gt;

&lt;p&gt;Instead of a "God Agent" that decides everything, we have a central &lt;strong&gt;Orchestrator&lt;/strong&gt;. Think of it as the Editor-in-Chief. It doesn't write the articles; it assigns stories to reporters. It manages the state, handles errors, and ensures the final product meets the deadline.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Parallelization
&lt;/h3&gt;

&lt;p&gt;This is our speed hack. Most agent frameworks run sequentially (Step A -&amp;gt; Step B -&amp;gt; Step C). But "Reading Arxiv Papers" and "Searching for Images" are independent tasks. By running them in parallel, we reduce the total latency to the duration of the slowest task, not the sum of all tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The Evaluator-Optimizer
&lt;/h3&gt;

&lt;p&gt;We don't trust the first draft. Our system includes a &lt;strong&gt;Judge&lt;/strong&gt; agent. The Orchestrator sends the research to the Judge, who returns a strict Pass/Fail grade with feedback. If it fails, the Orchestrator loops back (Optimizer) to fix the gaps.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kdfh5qb8m8fq8szqz1h.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kdfh5qb8m8fq8szqz1h.jpeg" alt="Sequential Processing" width="800" height="812"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2: The Need for Speed (Parallel Execution)
&lt;/h2&gt;

&lt;p&gt;The biggest bottleneck in AI agents is latency. Waiting for a model to "think" and browse the web takes time.&lt;/p&gt;

&lt;p&gt;With ADK, we implement a &lt;code&gt;ParallelAgent&lt;/code&gt;. This isn't just a concept; it's a primitive in the framework that handles the async complexity for us. ParallelAgents run in parallel, and the Orchestrator waits for all of them to finish before moving on. This is a simple way to parallelize your agents and improve performance within agents that don't depend on each other.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# orchestrator/app/agent.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ParallelAgent&lt;/span&gt;

&lt;span class="c1"&gt;# The "Squad" runs together
&lt;/span&gt;&lt;span class="n"&gt;research_squad&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ParallelAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research_squad&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Runs the researcher, academic scholar, and asset gatherer in parallel.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sub_agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;researcher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;academic_scholar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;asset_gatherer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This one change cut our total processing time by &lt;strong&gt;60%&lt;/strong&gt;. While the Scholar is reading a dense PDF, the Asset Gatherer is already validating image URLs.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favhhaz1eamg01ts2jgtw.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favhhaz1eamg01ts2jgtw.jpeg" alt="A2A Handshake" width="800" height="812"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3: The Universal Language (A2A Protocol)
&lt;/h2&gt;

&lt;p&gt;How do these agents talk? They are separate microservices. The Researcher might be on a high-memory instance, while the Orchestrator is on a tiny one.&lt;/p&gt;

&lt;p&gt;We use the &lt;strong&gt;Agent-to-Agent (A2A) Protocol&lt;/strong&gt;. It’s like a standardized API for AI agents, built on top of JSON-RPC.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why A2A?
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Decoupling&lt;/strong&gt;: The Orchestrator doesn't need to know &lt;em&gt;how&lt;/em&gt; the Researcher works, just &lt;em&gt;where&lt;/em&gt; it is.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Interoperability&lt;/strong&gt;: You could write the Researcher in Python and the Judge in Go. As long as they speak A2A, they can collaborate.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Service Discovery&lt;/strong&gt;: In development, we map agents to &lt;code&gt;localhost&lt;/code&gt; ports. In production, we map them to Cloud Run URLs.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# orchestrator/app/agent.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents.remote_a2a_agent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RemoteA2aAgent&lt;/span&gt;

&lt;span class="c1"&gt;# The Orchestrator calls the remote Scholar service
&lt;/span&gt;&lt;span class="n"&gt;academic_scholar&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RemoteA2aAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;academic_scholar&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# In prod, this is an internal Cloud Run URL
&lt;/span&gt;    &lt;span class="n"&gt;agent_card&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://scholar-service:8000/.well-known/agent.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Searches for academic papers.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsfbbymggjjdyk3pn8h2c.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsfbbymggjjdyk3pn8h2c.jpeg" alt="Scaling Graph" width="800" height="812"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 4: Infrastructure as a Superpower (Cloud Run)
&lt;/h2&gt;

&lt;p&gt;We deploy this system on &lt;strong&gt;Google Cloud Run&lt;/strong&gt;. This gives us the "Grocery Store" scaling model.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "Grocery Store" Model
&lt;/h3&gt;

&lt;p&gt;Imagine a grocery store with one checkout lane. If 50 people show up, the line goes out the door.&lt;br&gt;
In our system, each agent is a checkout lane.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Monolith&lt;/strong&gt;: One lane. 50 requests = 50x wait time.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Microservices on Cloud Run&lt;/strong&gt;: 50 requests = Cloud Run instantly spins up &lt;strong&gt;50 instances&lt;/strong&gt; of the Researcher. Everyone gets checked out at once.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scale to Zero
&lt;/h3&gt;

&lt;p&gt;When no one is using the app, we have &lt;strong&gt;0 instances&lt;/strong&gt; running. We pay &lt;strong&gt;$0&lt;/strong&gt;. This is crucial for cost-effective AI applications. Note, when a Cloud Run service is not in service, it is automatically scaled to zero, which means that it will require a cold start when the next request comes in. You can keep your Cloud Run services warm by using a health check.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 5: The Frontend (Next.js + Real-Time)
&lt;/h2&gt;

&lt;p&gt;We didn't want a CLI tool. We wanted a product.&lt;/p&gt;

&lt;p&gt;We built a &lt;strong&gt;Next.js&lt;/strong&gt; frontend that connects to the Orchestrator. Because we know the architecture, we can visualize it. When the &lt;code&gt;research_squad&lt;/code&gt; starts, our frontend shows three pulsing indicators side-by-side. You actually &lt;em&gt;see&lt;/em&gt; the parallelism happening.&lt;/p&gt;

&lt;p&gt;It creates a sense of "liveness" and transparency that builds user trust.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;By breaking our monolith into a &lt;strong&gt;Parallel Research Squad&lt;/strong&gt;, we built a system that is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Faster&lt;/strong&gt;: Parallel execution cuts wait times by &amp;gt;50%.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Better&lt;/strong&gt;: Specialized agents (Scholar, Gatherer) do deeper work than one generalist.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Scalable&lt;/strong&gt;: Microservices on Cloud Run handle infinite load.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Want to build this yourself?&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/amitkmaraj/deep-research-agentic-architecture" rel="noopener noreferrer"&gt;&lt;strong&gt;Deep Research Agent code&lt;/strong&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/google/adk" rel="noopener noreferrer"&gt;&lt;strong&gt;Google ADK documentation&lt;/strong&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://google.github.io/adk-docs/a2a/" rel="noopener noreferrer"&gt;&lt;strong&gt;Agent-to-Agent (A2A) Protocol&lt;/strong&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x5f9e213a_default_b472372936&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;&lt;strong&gt;Google's Cloud Run&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>adk</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
