DEV Community

Sara Ford for Google AI

Posted on

The Ultimate Cloud Run Guide 2026

At Cloud Next '26, my teammate Wietse and I gave a talk called the Ultimate Guide to Cloud Run. We wanted to provide a comprehensive walkthrough of Cloud Run to experienced developers who know how to ship software, but might be new to Cloud Run.

This post recaps the Cloud Run fundamentals we went over in our talk. Each segment below has a link to the timestamp in the talk along with examples for you to try at home.

  1. Getting started - deploy a container
  2. Autoscaling / audience participation demo
  3. How gcloud Works Under the Hood
  4. Overview of Cloud Run resources
  5. Reliable Rollouts and Preview Links
  6. Structured Logging
  7. Troubleshooting: "container failed to start on port 8080"
  8. Google Cloud Developer Knowledge MCP server
  9. How to avoid hard-coded API Keys
  10. Ephemeral Disks
  11. Volume mounts
  12. VPC Networking
  13. Two Pricing Models
  14. Scale-to-zero GPUs
  15. Deploying an ADK agent

Prefer to watch the full talk instead? You can watch it here.


1. Getting started - deploy a container

Where to watch:

Code examples:

Recap:

  • Cloud Run is Google Cloud's serverless engine. With Cloud Run you can run any container, on demand, without any infrastructure management. No VMs or clusters to manage.

Hello Cloud Run!

Below is the sample hello container running on Cloud Run.

Don't have a container? It's all good! You can deploy from your source code. See section 3 - How gcloud Works Under the Hood


2. Autoscaling / audience participation demo

Where to watch:

Discussion:

Docs:

Recap:

  • Minimum Instances: You can configure minimum instances to keep containers pre-warmed, eliminating "cold start" latency.
  • Maximum Instances: Set a hard limit on scaling to act as a budget safeguard and protect backend databases from being overwhelmed.

3. How gcloud Works Under the Hood

Where to watch

Code examples:

Recap: When you type the simple command gcloud run deploy to deploy from source, gcloud performs the following steps for you:

  1. Upload: Your local directory is safely uploaded to a secure Google Cloud Storage (GCS) bucket.
  2. Build: Cloud Build takes over. If you have a Dockerfile, it runs a docker build. If not, it uses open-source Buildpacks to automatically detect your language and compile a container image.
  3. Store: The completed container image is pushed to Artifact Registry.
  4. Create: Cloud Run spins up a new Revision (a read-only, immutable copy of your container and its settings).
  5. Migrate: Once the new revision passes its startup probe (confirming it is healthy), Cloud Run seamlessly migrates 100% of web traffic over to it.

But what about other resources besides services...


4. Overview of Cloud Run resources

Where to watch

Code examples:

Recap:

  • Services:

    • Purpose: Best for web applications, APIs, and microservices.
    • Features: Automatically scales instances up or down based on incoming traffic. Includes out-of-the-box HTTPS, traffic splitting, and support for WebSockets, gRPC, and HTTP/2.
  • Jobs:

    • Purpose: Best for tasks that run to completion and do not require an active web endpoint.
    • Features: Runs for up to 7 days. Excellent for data processing, database migrations, or night-run scripts. You can parallelize a large job into multiple concurrent tasks.
    • more Jobs codelab examples
  • Worker Pools:

    • Purpose: Best for continuous background tasks.
    • Features: Always-on instances that actively pull for work (e.g., listening to a message queue). Scaling is handled manually.
    • more Worker Pools codelab examples
  • Functions:

    • Purpose: Best for single-purpose pieces of code (e.g., responding to a file upload event).
    • Features: Supports popular runtimes like Python, Node.js, Go, and Java. No Dockerfile is required; Google automatically builds the container for you from source.
    • more Functions codelab examples

5. Reliable Rollouts and Preview Links

Where to watch

Code example:

Recap: Cloud Run versioning relies on immutable revisions:

  • Zero-Downtime Updates: The new version scales up fully before traffic begins migrating over.
  • Rollbacks: If a bug gets into production, you can rollback traffic to any healthy, previous revision.
  • Preview Links (Traffic Tags):
    • Instead of auto-deploying to the public, you can pin production traffic to your stable revision
    • Apply a "traffic tag" to your latest test revision
    • This generates a private preview URL (e.g., https://latest---[your-service].run.app) where you can test changes.

6. Structured Logging

Where to watch

Recap:

  • No logging libraries required: To log, simply write standard text directly to stdout or stderr.
  • Use Structured Logging: Write your logs in JSON format. This allows you to easily run queries in Cloud Logging for custom fields (such as jsonPayload.user_id = "12345").

7. Troubleshooting container failed to start on port 8080

Where to watch

Code example:

  • Deploy hello container but set port to 8081 🙂

Recap:

  • How to troubleshoot the "The container failed to start on port 8080" error:
    • This means that Cloud Run couldn't start the container. This could be for many reasons.
    • First scroll up in the logs to search for startup code crashes.
    • Verify your code is actually listening on the port designated by the PORT environment variable.
    • If your app loads large machine learning models, move database connections or loading actions out of the immediate container startup scope to prevent timeouts.

8. Google Cloud Developer Knowledge MCP server (in public preview)

Where to watch

Example:

Recap:

  • Great for keeping your agent up to date past its data training date.

Looking for other MCP use cases?


9. How to avoid hard-coded API Keys

Where to watch

Code examples:

Recap:

  • Application Default Credentials (ADC): Google client libraries automatically search for local credentials to handle authentication to Google's APIs for you.
  • Cloud Run Identity: Assign a dedicated Service Account to your Cloud Run service. The client libraries will automatically request credentials from the metadata server to access APIs (like Firestore or Gemini).
  • Localhost Development: Run gcloud auth login application-default on your machine. Local client libraries will securely use your personal developer identity (e.g. gcloud auth list) or use --impersonate-service-account <service-account>
  • Secret manager: For when you absolutely need to save keys. Store database passwords or third-party API keys securely. You can mount them into Cloud Run directly as environment variables or volume mounts (or use the secret manager client library)

10. Ephemeral Disks

Where to watch

Code example:

Recap:

  • Ephemeral Disks: Local, high-speed temporary storage. It lives and dies with the container instance and allows you to process large scratch files without consuming your system's active RAM memory.

11. Volume mounts

Where to watch

Code example:

Recap:

  • Cloud Storage Volume Mounts: Mount a Cloud Storage bucket or an NFS network file system directly as if it were a local directory.

12. VPC Networking

Where to watch

Code example:

Recap:

  • Direct VPC Egress: Send outbound traffic directly into your internal Google Cloud VPC network. No Serverless VPC Access connector required.
  • Secure Private Backends: You have two options:
    • Option 1 (IAM Authentication): Require authentication and grant the run.invoker role to your frontend's service account.
    • Option 2 (VPC Routing): Configure your backend to exclusively accept traffic originating from your VPC network.

13. Two Pricing Models

Where to watch

Recap:

Cloud Run offers two pricing models so you can optimize your spending.

  1. Request-based pricing (default)

    • Pay for container instance time (CPU and memory)
    • No charge for idle instances (instances that are not handling requests)
    • CPU is slowed down during idle
    • There's a fee per request
    • Minimum instances are charged a lot* less than the full rate when idle
  2. Instance-based pricing

    • Pay for container instance time (CPU and memory), also when idle
    • Idle instances and minimum instances are charged at full rate
    • Idle instances shut down after a maximum of 15 minutes
    • No per-request fee
    • Pay less* for instance time when compared with request-based pricing

* See Cloud Run Pricing docs page for more details

Note: Cloud Run continuously analyzes your actual traffic patterns and will automatically recommend switching to Instance-based if it will save you money.

Pricing comparison chart between Request-based and Instance-based


14. Scale-to-Zero GPUs

Where to watch

Code examples:

Recap:

  • Cloud Run fully supports scale-to-zero GPUs.
  • Access NVIDIA L4 and the NVIDIA RTX PRO 6000 Blackwell chips
  • Used for fine-tuning models via Cloud Run Jobs, or hosting lightweight open-source models like Google's Gemma.

15. Deploying an ADK agent

Where to watch

Code examples:

Docs:

Top comments (0)