DEV Community

Cover image for How to Build Long-Running AI Agents with Google Gen AI SDK
Gate of AI
Gate of AI

Posted on • Originally published at gateofai.com

How to Build Long-Running AI Agents with Google Gen AI SDK

🚀 Technical Briefing: This tutorial is part of our deep-dive series on Agentic Workflows at Gate of AI. For the full technical breakdown, interactive code sandbox, and the native Arabic translation, visit the original article here.

<span>Tutorial</span>
<span>Advanced</span>
<span>⏱ 45 min read</span>
<span>© Gate of AI 2026-05-31</span>
Enter fullscreen mode Exit fullscreen mode

Step away from standard chat APIs. Learn the foundational architecture for building long-running, stateful autonomous agents inspired by the new Gemini Enterprise Unified Inbox.

Prerequisites


  • Python 3.10 or higher
  • Access to the Google Gen AI SDK (Gemini 1.5 Pro or higher)
  • A Google Cloud Project with Billing Enabled
  • Advanced understanding of asynchronous Python (asyncio) and state management

What We're Building


With Google Cloud's announcement of Long-Running Agents in Gemini Enterprise, the development paradigm has officially shifted. In this tutorial, we will construct the foundational "pause-and-resume" architecture required to build these agents.


We won't just build a chatbot. We will build a stateful, asynchronous Python worker that executes a multi-step task, intentionally "pauses" when it requires simulated human approval (mimicking the Unified Inbox), and resumes upon confirmation.

Setup and Installation


We will use the official Google Gen AI SDK and python-dotenv for our environment variables.


pip install google-genai python-dotenv asyncio

Secure your API credentials in a .env file.



# .env file
GEMINI_API_KEY=your_gemini_api_key_here

Step 1: Architecting the Stateful Client


Unlike a standard chatbot that forgets data between queries, a long-running agent must maintain a rigid state dictionary. We initialize the official genai.Client and set up our state manager.



import os
import asyncio
from google import genai
from dotenv import load_dotenv

load_dotenv()

class LongRunningAgent:
def init(self):
# Initialize the official Google Gen AI Client
self.client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
self.model = "gemini-1.5-pro"

    # This dictionary mimics the persistent state stored in a database
    self.state = {
        "status": "idle", # idle, running, awaiting_approval, completed
        "workflow_history": [],
        "pending_approval_request": None
    }

def log_action(self, action):
    print(f"[AGENT LOG]: {action}")
    self.state["workflow_history"].append(action)
Enter fullscreen mode Exit fullscreen mode

Step 2: Building the "Pause-and-Resume" HITL Logic


The core innovation of Gemini's new update is the Human-in-the-Loop (HITL) Inbox. Here, we build the asynchronous logic that allows the agent to pause execution when it hits a restricted action.



async def request_human_approval(self, task_description):
"""Simulates pushing a task to the Unified Inbox"""
self.state["status"] = "awaiting_approval"
self.state["pending_approval_request"] = task_description
    self.log_action(f"PAUSED: Awaiting human approval for: {task_description}")

    # Simulate waiting for the manager to click "Approve" in the Inbox
    while self.state["status"] == "awaiting_approval":
        await asyncio.sleep(2) # Check database/state every 2 seconds

    self.log_action("RESUMED: Human approval granted.")
    return True

def simulate_manager_approval(self):
    """External function called by your UI/Inbox when a user clicks approve"""
    if self.state["status"] == "awaiting_approval":
        self.state["status"] = "running"
        self.state["pending_approval_request"] = None
        print("\n✅ [INBOX]: Manager approved the action.\n")
Enter fullscreen mode Exit fullscreen mode

Step 3: Executing the Asynchronous Workflow


Now, we tie it together. We will use the client.models.generate_content method to process data, but wrap it in our async execution loop.



async def run_multi_day_workflow(self, initial_prompt):
self.state["status"] = "running"
self.log_action("Starting long-running workflow...")
    # Phase 1: Autonomous Processing
    self.log_action("Analyzing request via Gemini API...")
    response = self.client.models.generate_content(
        model=self.model,
        contents=f"Analyze this task and propose a 3-step execution plan: {initial_prompt}"
    )
    self.log_action(f"Plan generated: {response.text[:100]}...")

    # Phase 2: Hitting a permission wall (Mimicking the Unified Inbox feature)
    await asyncio.sleep(1) # Simulating heavy compute time

    # The agent realizes it needs access to a restricted system (e.g., Google Drive)
    await self.request_human_approval("Access restricted Drive Folder: 'Q3 Financials'")

    # Phase 3: Post-Approval Execution
    self.log_action("Finalizing workflow with approved access...")
    final_response = self.client.models.generate_content(
        model=self.model,
        contents="The human approved access. Generate the final summary report."
    )

    self.state["status"] = "completed"
    self.log_action("Workflow Completed.")
    return final_response.text
Enter fullscreen mode Exit fullscreen mode

⚠️ Expert Tip: In a production environment, do not use asyncio.sleep to hold state. You must serialize the self.state dictionary to a persistent database (like Redis or PostgreSQL). When the webhook from your Inbox arrives, you retrieve the state and re-initialize the agent.

Testing the Unified Inbox Architecture


To run this, we will use Python's asyncio.gather to run the agent in the background while simulating a human checking their inbox.



async def main():
agent = LongRunningAgent()
# Start the agent as a background task
agent_task = asyncio.create_task(
    agent.run_multi_day_workflow("Audit the Q3 Marketing Spend")
)

# Simulate the human manager checking their inbox after 5 seconds
await asyncio.sleep(5)
agent.simulate_manager_approval()

# Wait for the agent to finish
result = await agent_task
print(f"\n[FINAL OUTPUT]:\n{result}")
Enter fullscreen mode Exit fullscreen mode

if name == 'main':
asyncio.run(main())

What to Build Next


  • Replace the simulated wait loop by saving the agent's state to a PostgreSQL database.
  • Build a frontend React/Next.js "Unified Inbox" UI that triggers the webhook to resume the agent.
  • Implement the official genai.types.Tool configurations to let the agent actually execute the actions post-approval.

Top comments (2)

Collapse
 
harjjotsinghh profile image
Harjot Singh

Long-running agents are where most of the hard, interesting problems live, because the short demo agent hides everything that matters over time. Once a run spans minutes to hours, three things stop being optional. First, durable state: the process will be interrupted (crash, restart, timeout), so progress has to be checkpointed and resumable, or every failure means starting over and re-spending all the work and tokens. Second, idempotency on side effects, because a long run that retries a step must not re-send the email or re-charge the card it already did before it died. Third, bounded cost and a stop condition, since a long-running loop is exactly where a confused agent quietly burns money for an hour, so caps and a kill switch are load-bearing, not nice-to-have. The SDK gives you the calls; the durability, idempotency, and bounding are the architecture you have to add around them, and that's what separates a long-running agent that's reliable from one that's just a short agent left running too long. Make it resumable, idempotent, and bounded, then long-running becomes safe. That design-for-interruption-and-bounding instinct is core to how I think about Moonshift. For long runs, are you checkpointing state externally so a restart resumes mid-run, or keeping it in-process and accepting a cold start on failure?

Collapse
 
gateofai profile image
Gate of AI

Spot on, Harjot. In-process state management for long-running agents is a ticking time bomb. Accepting a cold start on failure is essentially admitting the system is a fragile script rather than an autonomous production engine. If a run fails 45 minutes into a multi-step orchestration loop, losing that execution context and re-spending tokens is an architectural failure.
At Gate of AI, we treat state-persistence as an external, first-class citizen. Here is how we handle the checkpointing topology:

  1. External Event Sourcing: Instead of saving a giant, monolithic state blob, we log every single tool input, model response, and schema transition as immutable, versioned events to an external ledger (PostgreSQL/Supabase).
  2. Deterministic Rehydration: When a crash or a network timeout occurs, a fresh agent container spins up, pulls the historical event log for that unique session ID, and replays the states up to the last successful checkpoint. This guarantees an exact mid-run resume without double-invoking non-idempotent tools.
  3. The Gateway Interceptor: To enforce the idempotency you mentioned on side effects (like sending emails or billing), our orchestration layer wraps external API tools in a deterministic token-verification layer. If a replayed step attempts to trigger an external mutation that already contains a 'success' token in our external state ledger, the gateway short-circuits and returns the cached response. For your Moonshift design, are you looking at a continuous append-only event-sourcing framework for your external checkpoints, or are you utilizing a key-value snapshot model (like Redis) at specific state-machine boundaries? Let's dive into the latency trade-offs of both.