Shir Meir Lador for Google AI

Posted on May 7 • Originally published at cloud.google.com

Deploying a Multi-Agent System with Terraform and Cloud Run

#googlecloud #terraform #ai #agents

Secret Manager refs rather than env vars

In support of our mission to accelerate the developer journey on Google Cloud, we built Dev Signal: a multi-agent system designed to transform raw community signals into reliable technical guidance by automating the path from discovery to expert creation.

In the first three parts of this series, we laid the essential groundwork by establishing its core capabilities and local verification process:

In part 1, we standardize the agent's capabilities through the Model Context Protocol (MCP), connecting it to Reddit for trend discovery and Google Cloud Docs for technical grounding. In part 2, we built a multi-agent architecture and integrated the Vertex AI memory bank to allow the system to learn and persist user preferences across different conversations. In part 3, we verified the full end-to-end lifecycle locally using a dedicated test runner to ensure that research, content creation, and cloud-based memory retrieval were perfectly synchronized.

If you'd like to dive straight into the code, you can clone the repository here.

Deployment to Cloud Run and the Path to Production

To help you transition from this local prototype to a production service, this final part focuses on building the production backbone of your agent using the foundational deployment patterns provided by the Agent Starter Pack. We will implement the essential structural components required for monitoring, data integrity, and long-term state management in the cloud. You will learn to implement the application server and helper utilities needed for a production-ready deployment before provisioning secure, reproducible infrastructure with Terraform.

While the Dockerfile packages your agent's code and its specialized dependencies, such as Node.js for the Reddit MCP tool, Terraform is used to build the platform it lives on. Terraform automates the creation of your Artifact Registry, least-privilege service accounts, and Secret Manager integrations to ensure your API keys remain protected.

By the end of this part, you will have a standardized application framework deployed on Google Cloud Run and a roadmap for graduating your prototype through continuous evaluation, CI/CD and advanced observability.

Production Utilities and Server: Building the System's Body

In this section, you implement the structural components required for monitoring and long-term state management in the cloud.

The Application Server: Initializing the FastAPI server and establishing a vital connection to the Vertex AI memory bank.
Implementing Telemetry: Enabling 'Agent Traces' for visibility into internal reasoning.

The Application Server

The fast_api_app.py file serves as the vital entry point for your agent, transforming the core logic into a production FastAPI server that acts as the "body" of your system. When deploying to Cloud Run, this server is essential because it provides the necessary web interface to listen for incoming HTTP requests and dispatch them to the agent for processing. Beyond basic serving, its most critical role is establishing a connection to the Vertex AI memory bank by defining a MEMORY_URI, which allows the ADK framework to persist and retrieve user preferences across different production sessions. Additionally, the application server initializes production-grade telemetry for real-time monitoring.

Go back to the dev_signal_agent folder.

cd ..

Paste the following code in dev_signal_agent/fast_api_app.py:

import os
from fastapi import FastAPI
from google.adk.cli.fast_api import get_fast_api_app
from google.cloud import logging as cloud_logging
from vertexai import agent_engines
from dev_signal_agent.app_utils.env import init_environment

# --- Initialization & Secure Secret Retrieval ---
# We now unpack the SECRETS dictionary returned by our updated env.py
PROJECT_ID, MODEL_LOC, SERVICE_LOC, SECRETS = init_environment()
logger = cloud_logging.Client().logger(__name__)

# Access sensitive credentials from the SECRETS dictionary
# These keys stay in memory and are NOT injected into os.environ
REDDIT_CLIENT_ID = SECRETS.get("REDDIT_CLIENT_ID")
REDDIT_CLIENT_SECRET = SECRETS.get("REDDIT_CLIENT_SECRET")
REDDIT_USER_AGENT = SECRETS.get("REDDIT_USER_AGENT")
DK_API_KEY = SECRETS.get("DK_API_KEY")

# --- Configuration & Sessions ---
AGENT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# Non-sensitive configuration uses environment variables
BUCKET = os.environ.get("AI_ASSETS_BUCKET")
USE_IN_MEMORY = os.environ.get("USE_IN_MEMORY_SESSION", "").lower() in ("true", "1")

# --- MEMORY BANK CONNECTION ---
def _get_memory_bank_uri():
    if USE_IN_MEMORY: return None, None
    # We use 'dev_signal_agent' as the display name for the Vertex AI memory bank
    name = os.environ.get("AGENT_ENGINE_MEMORY_BANK_NAME", "dev_signal_agent")
    existing = list(agent_engines.list(filter=f"display_name={name}"))
    ae = existing[0] if existing else agent_engines.create(display_name=name)
    uri = f"agentengine://{ae.resource_name}"
    print(f"DEBUG: Connecting to Memory Bank: {uri} (display_name={name})")
    return uri, uri

SESSION_URI, MEMORY_URI = _get_memory_bank_uri()

# --- Initialize FastAPI with ADK ---
app: FastAPI = get_fast_api_app(
    agents_dir=AGENT_DIR,
    web=True,
    artifact_service_uri=f"gs://{BUCKET}" if BUCKET else None,
    allow_origins=os.getenv("ALLOW_ORIGINS", "").split(",") if os.getenv("ALLOW_ORIGINS") else None,
    session_service_uri=SESSION_URI,
    memory_service_uri=MEMORY_URI, # <--- Connects the Memory Bank
    otel_to_cloud=True, # <--- Enables production telemetry
)

if __name__ == "__main__":
    import uvicorn
    # Standard Cloud Run port is 8080
    uvicorn.run(app, host="0.0.0.0", port=8080)

Implementing Telemetry

In a production environment, visibility into your agent's reasoning is critical. We leverage the built-in observability features of the Google ADK by setting the otel_to_cloud=True flag in our application server. This single parameter handles the majority of the instrumentation automatically, exporting "Agent Traces" directly to the Google Cloud Console. These traces provide a "visual waterfall" of the agent's operation, including individual agent thought processes, LLM invocations, and MCP tool calls.

Monitoring vs. Targeted Evaluation

It is essential to understand that production tracing is subject to sampling to balance performance and cost. Because Cloud Run captures only a subset of requests, not every individual user interaction will be visible.

System Traces (Monitoring): Used to analyze behavior "at large," such as identifying latency bottlenecks or system timeouts.
Reasoning Traces (Evaluation): High-quality evaluation mandates targeted trace capture. This means calling the agent specifically for a test case where you know you will evaluate that particular request in full detail.

Viewing the Trace

To see your traces, navigate to the Trace Explorer in the Google Cloud Console and filter for your service (e.g., dev-signal). Clicking a specific Trace ID opens a Gantt chart that allows you to distinguish between cognitive reasoning failures (wrong decisions) and physical system issues (timeouts).

For advanced configurations, refer to the following documentation:

Infrastructure as Code: Provisioning Secure Cloud Resources

We utilize the infrastructure-as-code patterns provided by the Agent Starter Pack's security-first design. The starter pack builds the professional platform required to automate the creation of least-privilege service accounts and robust secret management in seconds.

Using Terraform ensures that your entire Google Cloud environment - from IAM roles to Secret Manager versions - is defined in reproducible, secure code. We break our infrastructure into the following logical blocks:

Resources & Variables: Define the specific project, region, and sensitive API secrets used by the agent.
Core Infrastructure: Enable essential APIs and provision a private Artifact Registry to host your agent's container images.
Identity & Access Management (IAM): Configure specialized Service Accounts that strictly follow the Principle of Least Privilege to ensure your system remains secure.
Secret Management: Securely ingest API credentials into Google Secret Manager for protected runtime access.
Cloud Run Configuration: Define the container environment, resource limits, and automated secret injection for the final deployment.

To begin provisioning, return to the root folder of your project (dev-signal) and create the necessary deployment directories:

cd ..
mkdir deployment
cd deployment
mkdir terraform
cd terraform

Terraform Resources and Variables

The variables.tf file defines the configurable parameters for your deployment, allowing you to customize the infrastructure without altering the underlying logic. It includes variables for the project_id, the deployment region (defaulting to us-central1), and the service_name for your Cloud Run instance. Furthermore, it defines a secrets map used to securely ingest sensitive API credentials—such as Reddit and Developer Knowledge keys—into Google Secret Manager for runtime access. This modular approach ensures your production environment remains reproducible, secure, and adaptable across different projects.

Paste the following code into deployment/terraform/variables.tf:

variable "project_id" {
  description = "The Google Cloud Project ID"
  type        = string
}
variable "region" {
  description = "The Google Cloud region to deploy to"
  type        = string
  default     = "us-central1"
}
variable "service_name" {
  description = "The name of the Cloud Run service"
  type        = string
  default     = "dev-signal"
}
variable "secrets" {
  description = "A map of secret names and their values (e.g., REDDIT_CLIENT_ID, DK_API_KEY)"
  type        = map(string)
  default     = {}
}
variable "ai_assets_bucket" {
  description = "The GCS bucket for storing AI assets"
  type        = string
}

Core Infrastructure Logic

We define our infrastructure in logical blocks. Here is what each part does:

1. Enable APIs: Ensures the project has the necessary services active (Cloud Run, Vertex AI, etc.). We use disable_on_destroy = false to prevent accidental data loss if the Terraform is destroyed.

Paste the following code into deployment/terraform/main.tf:

resource "google_project_service" "services" {
  project = var.project_id
  for_each = toset([
    "run.googleapis.com",
    "artifactregistry.googleapis.com",
    "cloudbuild.googleapis.com",
    "aiplatform.googleapis.com",
    "secretmanager.googleapis.com",
    "logging.googleapis.com"
  ])
  service            = each.key
  disable_on_destroy = false
}

2. Artifact Registry: Creates a private Docker registry to store our agent's container images.

resource "google_artifact_registry_repository" "repo" {
  location      = var.region
  project       = var.project_id
  repository_id = "dev-signal-repo"
  description   = "Docker repository for Dev Signal Agent"
  format        = "DOCKER"
  depends_on    = [google_project_service.services]
}

3. Service Account & IAM: Adhering to the Principle of Least Privilege - This is a critical security step. In accordance with the Principle of Least Privilege, we avoid using the default compute service account and instead provision a dedicated user-managed service account (dev-signal-sa). By designating this as the Cloud Run service identity, we can grant it only the minimum necessary permissions—specifically roles/aiplatform.user, roles/logging.logWriter, and roles/storage.objectAdmin. This granular access control ensures that the agent has the exact permissions required to interact with Vertex AI and Cloud Storage without over-granting access to other sensitive cloud resources, significantly reducing the potential impact of a compromised account. Learn more best practices for using service accounts securely.

resource "google_service_account" "agent_sa" {
  project      = var.project_id
  account_id   = "${var.service_name}-sa"
  display_name = "Dev Signal Agent Service Account"
}

4. Secret Management: This handles your API keys securely. It creates secrets in Google Secret Manager and gives the agent's Service Account permission to access them at runtime.

resource "google_secret_manager_secret" "agent_secrets" {
  project  = var.project_id
  for_each = toset(keys(var.secrets))
  secret_id = each.key
  replication {
    auto {}
  }
  depends_on = [google_project_service.services]
}
resource "google_secret_manager_secret_version" "agent_secrets_version" {
  for_each    = toset(keys(var.secrets))
  secret      = google_secret_manager_secret.agent_secrets[each.key].id
  secret_data = var.secrets[each.key]
}
resource "google_secret_manager_secret_iam_member" "secret_accessor" {
  project  = var.project_id
  for_each = toset(keys(var.secrets))
  secret_id = google_secret_manager_secret.agent_secrets[each.key].id
  role      = "roles/secretmanager.secretAccessor"
  member    = "serviceAccount:${google_service_account.agent_sa.email}"
}

5. Cloud Run Configuration:

Security Best Practice: To satisfy production security standards, our main.tf grants the Service Account the secretmanager.secretAccessor role. Our Python application then uses the Secret Manager SDK to pull these credentials directly into local memory at runtime, ensuring they never touch the container's environment configuration

# 6. Cloud Run Service Deployment
resource "google_cloud_run_v2_service" "default" {
  project  = var.project_id
  name     = var.service_name
  location = var.region
  ingress  = "INGRESS_TRAFFIC_ALL"

  template {
    service_account = google_service_account.agent_sa.email

    containers {
      image = "us-docker.pkg.dev/cloudrun/container/hello" # Placeholder until first build

      env {
        name  = "GOOGLE_CLOUD_PROJECT"
        value = var.project_id
      }
      env {
        name  = "GOOGLE_CLOUD_LOCATION"
        value = "global"
      }
      env {
        name  = "GOOGLE_GENAI_USE_VERTEXAI"
        value = "True"
      }
      env {
        name  = "AI_ASSETS_BUCKET"
        value = var.ai_assets_bucket
      }

      resources {
        limits = {
          cpu    = "1"
          memory = "2Gi"
        }
      }
    }
  }

  traffic {
    type    = "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST"
    percent = 100
  }

Provision the Infrastructure

Before we can deploy our code, we need to provision the Google Cloud infrastructure we just defined.

Initialize Terraform: This downloads the necessary provider plugins. Run this in deployment/terraform folder:

terraform init

Create a Variables File:

Paste this code in deployment/terraform/terraform.tfvars and update it with your project details and secrets.

project_id       = "your-project-id"
region           = "us-central1"
service_name     = "dev-signal"
ai_assets_bucket = "your-bucket-name"
secrets = {
  REDDIT_CLIENT_ID     = "your_client_id"
  REDDIT_CLIENT_SECRET = "your_client_secret"
  REDDIT_USER_AGENT    = "your_user_agent"
  DK_API_KEY           = "your_dk_api_key"
}

Plan configuration: This allows you to review the changes before they are applied. Run this in the deployment/terraform folder:

terraform plan -out=plan.tfplan

Apply Configuration: Once you have reviewed the plan and confirmed it does what you want, run:

terraform apply plan.tfplan

Deployment: Containerization and the Cloud Build Pipeline

In this final stage of the build process, we package our agent's "body" and "brain" into a portable, production-ready container. This ensures that every component - from our Python logic to the Node.js environment required for the Reddit MCP tool - is bundled together with its exact dependencies.

We utilize a Dockerfile to define this environment and a Makefile to orchestrate the deployment pipeline. When you trigger the deployment, Google Cloud Build takes your local source code, builds the container image according to the Dockerfile, and stores it in the private Artifact Registry created earlier by Terraform. Finally, the pipeline automatically updates your Cloud Run service to serve traffic using this fresh image, completing the journey from local code to a live, secure cloud workload.

Paste this code in dev-signal/Dockerfile:

FROM python:3.12-slim

# Install Node.js and npm for MCP tools (like reddit-mcp)
RUN apt-get update && apt-get install -y \
    curl \
    && curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
    && apt-get install -y nodejs \
    && npm install -g reddit-mcp \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --no-cache-dir uv==0.8.13

WORKDIR /code

COPY ./pyproject.toml ./README.md ./uv.lock* ./
COPY ./dev_signal_agent ./dev_signal_agent

RUN uv sync --frozen

EXPOSE 8080

CMD ["uv", "run", "uvicorn", "dev_signal_agent.fast_api_app:app", "--host", "0.0.0.0", "--port", "8080"]

The Makefile automates the build and deploys.

Paste this code in dev-signal/Makefile:

PROJECT_ID ?= $(shell gcloud config get-value project)
REGION     ?= us-central1
IMAGE_REPO ?= dev-signal-repo
IMAGE := $(REGION)-docker.pkg.dev/$(PROJECT_ID)/$(IMAGE_REPO)/agent:latest

# Deploy via Cloud Build & Container
docker-deploy:
    @echo "? Building and deploying to $(PROJECT_ID) via Cloud Build..."
    gcloud builds submit --tag $(IMAGE) --project $(PROJECT_ID) .
    gcloud run services update dev-signal \
        --image $(IMAGE) \
        --region $(REGION) \
        --project $(PROJECT_ID) \
        --labels dev-tutorial=dev-signal-agent

Deploy Application

Now that our infrastructure is ready, we can build and deploy the application code.

Run the following command from the root of your project:

make docker-deploy

What happens when you run this?

Build: Google Cloud Build takes your local code and the Dockerfile, builds a container image, and stores it in the Artifact Registry.
Deploy: It updates the Cloud Run service defined in Terraform to use this new image.

When the deployment completes, you should get a message like this:

Service [dev-signal] revision [dev-signal...] has been deployed and is serving 100 percent of traffic.

Service URL: https://dev-signal-...-.us-central1.run.app

Verification: Accessing and Testing Your Deployed Agent

Since production services are private by default, this section covers how to grant permissions and access the agent securely.

Managing IAM Permissions: Granting the necessary run.invoker role to authorized users.

Secure Access via Cloud Run Proxy: Using the gcloud proxy to interact with your live service.

Granting User Permissions

Before you can invoke the service, you must grant your Google account the roles/run.invoker role for this specific service. Run the following command:

gcloud run services add-iam-policy-binding dev-signal \
  --member="user:$(gcloud config get-value account)" \
  --role="roles/run.invoker" \
  --region=us-central1 \
  --project=$(gcloud config get-value project)

Launch the Proxy

Now, access your private service securely via the proxy:

gcloud run services proxy dev-signal \
  --region us-central1 \
  --project $(gcloud config get-value project)

Visit http://localhost:8080 to chat with your deployed agent! See a possible test scenario in part 3 of the series.

Summary

Congratulations! You have successfully built Dev Signal.

What we covered:

Tooling (MCP): You connected your agent to Reddit, Google Docs, and a Local Image Generator using the Model Context Protocol.
Architecture: You implemented a Root Orchestrator managing specialized agents (Scanner, Expert, Drafter).
Memory: You integrated Vertex AI memory bank to give your agent long-term persistence across sessions.
Production: You deployed the entire stack to Google Cloud Run using Terraform for secure, reproducible infrastructure.

You now have a solid foundation for building sophisticated, stateful AI applications on Google Cloud.

Top comments (8)

Max Quimby • May 11

Really like that you put the agent traces and Vertex memory bank behind Secret Manager refs rather than env vars — that's the detail most "ship it on Cloud Run" walkthroughs gloss over, and it's the one that bites you the moment a second engineer joins.

One thing I'd flag from running similar setups: the Python+Node container (because the Reddit MCP needs npm) gets heavy fast, and cold starts on Cloud Run can stretch past your trace sampling window, so you end up with gaps right where you want signal. We ended up splitting the Node-based MCP servers into their own Cloud Run service and calling them over MCP-over-HTTP from the FastAPI core — bigger blast radius but the trace continuity was worth it.

Curious how you're handling memory bank invalidation when agent role definitions change. Do you version the persona prompt alongside the memory namespace, or do older memories just bleed into the new behavior?

Xidao • May 14

This is a really thorough series — the progression from capabilities to architecture to testing to deployment is exactly the kind of structured approach that's missing from most agent tutorials.

The Secret Manager integration is a great callout. One thing I've noticed in multi-agent deployments is that secret rotation becomes more complex when you have multiple agents with different permission scopes. With a single service, you rotate one set of credentials. With a multi-agent system where each agent might need access to different APIs (Reddit, Cloud Docs, etc.), you end up with a matrix of secrets that need independent rotation schedules. Have you run into any issues with secret versioning across agents, or does the Agent Starter Pack handle that gracefully?

Also curious about the observability story. You mentioned Agent Traces for internal reasoning visibility — is that using OpenTelemetry under the hood, or is it a GCP-specific tracing format? The reason I ask is that for teams running hybrid or multi-cloud setups, having agent traces in a vendor-agnostic format (like OTLP) would be really valuable for correlating agent behavior with the rest of the system's observability stack.

Mykola Kondratiuk • May 12

curious how this holds for the 'agent arrived via vendor SaaS update' case - no Terraform PR, no rollback plan, agent just shows up in your stack one day.

Harjot Singh • May 31

Deploying a multi-agent system with Terraform and Cloud Run is the unsexy half of agents that almost nobody writes about, and it's exactly where production lives, treating agents as infrastructure you can declaratively provision, version, and reproduce rather than a notebook someone runs by hand. The Terraform angle matters more than it looks, because it makes the agent system reproducible and reviewable: the deployment is code, so you can diff it, roll it back, and reason about what's actually running, which is the same auditability instinct that makes agents trustworthy applied to their infra. Cloud Run is a nice fit for the per-agent isolation too, each agent (or role) as its own scalable, sandboxed service gives you natural blast-radius boundaries, one misbehaving agent can't take down its siblings, and you scale them independently. The thing I'd think hardest about in a multi-agent deploy is the coordination and state layer: stateless containers are great for the execution, but the orchestration (who calls whom) and durable state (so a restart resumes) need to live somewhere deliberate, not in a container's memory. Provision agents as reproducible infra, isolate them per service, externalize the state. That treat-agents-as-declarative-infrastructure instinct is core to how I think about Moonshift. In your setup, where does the cross-agent state and orchestration live, a managed queue/db, or baked into one of the services?

Xidao • May 12

This is a really thorough 4-part series — the progression from MCP capabilities to memory to local testing to production deployment is exactly the learning path most teams need.

The Terraform + Cloud Run approach for agent deployment is interesting. One thing I have been exploring is how to handle the statefulness challenge — agents with memory need persistent connections to their state stores, but Cloud Run instances are ephemeral. Your use of Vertex AI memory bank as an external state store is the right pattern, but I am curious about latency. When the agent needs to load context from memory on every request, does the cold-start time become a bottleneck?

Also, the MCP-based capability layer is a great abstraction. Have you run into issues with MCP tool reliability in production? In my experience, external tools (like Reddit or document APIs) can fail intermittently, and the agent needs graceful degradation — falling back to cached data or skipping a tool call rather than failing the entire task.

Mininglamp • May 13

The infrastructure side of multi-agent systems is where most teams get stuck. One pattern that works well in practice: instead of deploying all agents as a monolithic service, treat each agent as an independent microservice with its own model and context window. The orchestration layer only needs to handle message routing between agents, not the inference itself. This makes scaling individual agents trivial and debugging much easier — you can trace exactly which agent made which decision, essentially turning the system into a white-box where every step is auditable. The cloud deployment overhead per agent is minimal once the routing layer is in place.

AudioProducer.ai • May 13

The Secret Manager + Terraform pattern carries over to non-Google clouds too. We run AudioProducer.ai's multi-agent pipeline (character extraction, voice casting, sound assignment) on a similar containerized shape, and the single biggest operational win was Max's same observation in reverse: env-var leakage was not the failure mode we feared, but cold-start gaps in trace sampling are exactly where the bugs hide. On the persona-prompt-vs-memory question Max raised: we tag every persisted agent decision with the prompt-and-ruleset version that produced it. When we ship a new voice-casting rule, old manuscripts replay with their original cast unless explicitly re-cast. Lets us iterate on agent behavior without invalidating customer-facing artifacts. The Terraform-as-source-of-truth framing at the bottom (Theo) is the right one: "can we reason about what changed between deploys" is the only question that matters once you have more than two agents in a pipeline.

Theo Valmis • May 11

Terraform for agent deploys is a sleeper move because it shifts the conversation from 'can it run' to 'can we reason about what changed between deploys'. State management for multi-agent systems is the part most write-ups gloss over, especially when agents share a queue or a vector store. Versioning the prompt and the infra in the same PR is the only way to keep root-cause analysis tractable when something regresses.