At Cloud Next '26, my teammate Wietse and I gave a talk called the Ultimate Guide to Cloud Run. We wanted to provide a comprehensive walkthrough of Cloud Run to experienced developers who know how to ship software, but might be new to Cloud Run.
This post recaps the Cloud Run fundamentals we went over in our talk. Each segment below has a link to the timestamp in the talk along with examples for you to try at home.
- Getting started - deploy a container
- Autoscaling / audience participation demo
- How gcloud Works Under the Hood
- Overview of Cloud Run resources
- Reliable Rollouts and Preview Links
- Structured Logging
- Troubleshooting: "container failed to start on port 8080"
- Google Cloud Developer Knowledge MCP server
- How to avoid hard-coded API Keys
- Ephemeral Disks
- Volume mounts
- VPC Networking
- Two Pricing Models
- Scale-to-zero GPUs
- Deploying an ADK agent
Prefer to watch the full talk instead? You can watch it here.
1. Getting started - deploy a container
Where to watch:
Code examples:
Recap:
- Cloud Run is Google Cloud's serverless engine. With Cloud Run you can run any container, on demand, without any infrastructure management. No VMs or clusters to manage.
Hello Cloud Run!
Below is the sample hello container running on Cloud Run.
Don't have a container? It's all good! You can deploy from your source code. See section 3 - How gcloud Works Under the Hood
2. Autoscaling / audience participation demo
Where to watch:
- Demo 2 watch at 6:10 Each audience member who scans the QR code gets their own container
Discussion:
Docs:
Recap:
- Minimum Instances: You can configure minimum instances to keep containers pre-warmed, eliminating "cold start" latency.
- Maximum Instances: Set a hard limit on scaling to act as a budget safeguard and protect backend databases from being overwhelmed.
3. How gcloud Works Under the Hood
Where to watch
Code examples:
- Quickstart: Build and deploy a Go web app to Cloud Run (or one of the other buildpack-supported languages)
- deploy from source example from talk
Recap: When you type the simple command gcloud run deploy to deploy from source, gcloud performs the following steps for you:
- Upload: Your local directory is safely uploaded to a secure Google Cloud Storage (GCS) bucket.
- Build: Cloud Build takes over. If you have a Dockerfile, it runs a docker build. If not, it uses open-source Buildpacks to automatically detect your language and compile a container image.
- Store: The completed container image is pushed to Artifact Registry.
- Create: Cloud Run spins up a new Revision (a read-only, immutable copy of your container and its settings).
- Migrate: Once the new revision passes its startup probe (confirming it is healthy), Cloud Run seamlessly migrates 100% of web traffic over to it.
But what about other resources besides services...
4. Overview of Cloud Run resources
Where to watch
Code examples:
Recap:
-
Services:
- Purpose: Best for web applications, APIs, and microservices.
- Features: Automatically scales instances up or down based on incoming traffic. Includes out-of-the-box HTTPS, traffic splitting, and support for WebSockets, gRPC, and HTTP/2.
-
Jobs:
- Purpose: Best for tasks that run to completion and do not require an active web endpoint.
- Features: Runs for up to 7 days. Excellent for data processing, database migrations, or night-run scripts. You can parallelize a large job into multiple concurrent tasks.
- more Jobs codelab examples
-
Worker Pools:
- Purpose: Best for continuous background tasks.
- Features: Always-on instances that actively pull for work (e.g., listening to a message queue). Scaling is handled manually.
- more Worker Pools codelab examples
-
Functions:
- Purpose: Best for single-purpose pieces of code (e.g., responding to a file upload event).
- Features: Supports popular runtimes like Python, Node.js, Go, and Java. No Dockerfile is required; Google automatically builds the container for you from source.
- more Functions codelab examples
5. Reliable Rollouts and Preview Links
Where to watch
- Demo 3 con't from talk: 10:10
- Demo 4 from talk part 1 at 12:48
- Demo 4 from talk part 2 at 16:25
Code example:
Recap: Cloud Run versioning relies on immutable revisions:
- Zero-Downtime Updates: The new version scales up fully before traffic begins migrating over.
- Rollbacks: If a bug gets into production, you can rollback traffic to any healthy, previous revision.
-
Preview Links (Traffic Tags):
- Instead of auto-deploying to the public, you can pin production traffic to your stable revision
- Apply a "traffic tag" to your latest test revision
- This generates a private preview URL (e.g.,
https://latest---[your-service].run.app) where you can test changes.
6. Structured Logging
Where to watch
Recap:
-
No logging libraries required: To log, simply write standard text directly to
stdoutorstderr. -
Use Structured Logging: Write your logs in JSON format. This allows you to easily run queries in Cloud Logging for custom fields (such as
jsonPayload.user_id = "12345").
7. Troubleshooting container failed to start on port 8080
Where to watch
Code example:
- Deploy hello container but set port to 8081 🙂
Recap:
-
How to troubleshoot the "The container failed to start on port 8080" error:
- This means that Cloud Run couldn't start the container. This could be for many reasons.
- First scroll up in the logs to search for startup code crashes.
- Verify your code is actually listening on the port designated by the
PORTenvironment variable. - If your app loads large machine learning models, move database connections or loading actions out of the immediate container startup scope to prevent timeouts.
8. Google Cloud Developer Knowledge MCP server (in public preview)
Where to watch
Example:
Recap:
- Great for keeping your agent up to date past its data training date.
Looking for other MCP use cases?
- use the Cloud Run MCP server to deploy your apps
- host your own MCP server on Cloud Run
9. How to avoid hard-coded API Keys
Where to watch
Code examples:
- blog post on local development and ADC (works the same for Cloud Run)
- How to upload and serve images using Cloud Storage, Firestore and Cloud Run | Google Codelabs
- how to use secret manager
Recap:
- Application Default Credentials (ADC): Google client libraries automatically search for local credentials to handle authentication to Google's APIs for you.
- Cloud Run Identity: Assign a dedicated Service Account to your Cloud Run service. The client libraries will automatically request credentials from the metadata server to access APIs (like Firestore or Gemini).
-
Localhost Development: Run
gcloud auth login application-defaulton your machine. Local client libraries will securely use your personal developer identity (e.g.gcloud auth list) or use--impersonate-service-account <service-account> - Secret manager: For when you absolutely need to save keys. Store database passwords or third-party API keys securely. You can mount them into Cloud Run directly as environment variables or volume mounts (or use the secret manager client library)
10. Ephemeral Disks
Where to watch
Code example:
Recap:
- Ephemeral Disks: Local, high-speed temporary storage. It lives and dies with the container instance and allows you to process large scratch files without consuming your system's active RAM memory.
11. Volume mounts
Where to watch
Code example:
Recap:
- Cloud Storage Volume Mounts: Mount a Cloud Storage bucket or an NFS network file system directly as if it were a local directory.
12. VPC Networking
Where to watch
Code example:
Recap:
- Direct VPC Egress: Send outbound traffic directly into your internal Google Cloud VPC network. No Serverless VPC Access connector required.
-
Secure Private Backends: You have two options:
-
Option 1 (IAM Authentication): Require authentication and grant the
run.invokerrole to your frontend's service account. - Option 2 (VPC Routing): Configure your backend to exclusively accept traffic originating from your VPC network.
-
Option 1 (IAM Authentication): Require authentication and grant the
13. Two Pricing Models
Where to watch
Recap:
Cloud Run offers two pricing models so you can optimize your spending.
-
Request-based pricing (default)
- Pay for container instance time (CPU and memory)
- No charge for idle instances (instances that are not handling requests)
- CPU is slowed down during idle
- There's a fee per request
- Minimum instances are charged a lot* less than the full rate when idle
-
Instance-based pricing
- Pay for container instance time (CPU and memory), also when idle
- Idle instances and minimum instances are charged at full rate
- Idle instances shut down after a maximum of 15 minutes
- No per-request fee
- Pay less* for instance time when compared with request-based pricing
* See Cloud Run Pricing docs page for more details
Note: Cloud Run continuously analyzes your actual traffic patterns and will automatically recommend switching to Instance-based if it will save you money.
14. Scale-to-Zero GPUs
Where to watch
Code examples:
Recap:
- Cloud Run fully supports scale-to-zero GPUs.
- Access NVIDIA L4 and the NVIDIA RTX PRO 6000 Blackwell chips
- Used for fine-tuning models via Cloud Run Jobs, or hosting lightweight open-source models like Google's Gemma.

Top comments (0)