Daniel Gwerzman for Google Developer Experts

Posted on Mar 14 • Edited on Apr 2

This is Cloud Run: A Decision Guide for Developers

#cloudrun #gcp #serverless #systemdesign

I like to throw spaghetti at the wall and see if it sticks.

Some of my best projects started exactly that way. An idea on a Saturday morning, a container deployed by lunch, a URL shared with a friend by dinner. No infrastructure planning, no provisioning tickets, no three-day detour through VPC configurations before writing a single line of business logic. Just code, deploy, done.

Every single one of them ran on Cloud Run.

For me, Cloud Run is my go-to, whether it's a weekend experiment that might go nowhere or a production solution for a client. More than once, a quick demo I built on Cloud Run ended up maturing into the actual production system, running on the exact same setup.

A recent example: every time I build something with AI, even a quick vibe-coded prototype, I need a server-side component to keep my LLM API calls secure and my credential keys away from public eyes. Cloud Run is perfect for this. In minutes I have a secure backend with HTTPS, and I didn't have to think about infrastructure at all. The prototype works, the keys are safe, and if the project grows into something real, the backend is already production-ready.

The idea that you need days of infrastructure preparation before you can test something with real users has always felt backwards to me. I believe in getting something live as fast as possible, putting it in front of people, and then deciding if it deserves more investment.

But this article isn't a love letter. I want to give you the understanding to make a real architectural decision: when is Cloud Run the right choice, and when isn't it? We'll look at what it actually is under the hood, what you get for free, where its boundaries are, and when you should consider moving to Kubernetes. By the end, you'll know whether Cloud Run belongs in your next project, or whether you should reach for something else entirely.

What Is Cloud Run?

Cloud Run is a fully managed serverless platform on Google Cloud that runs containers. You give it code, it gives you a URL. No clusters to provision, no nodes to manage, no load balancers to configure. You bring the code; Google handles everything else.

But what makes Cloud Run different is its core promise: the same configuration that runs your proof of concept can carry you to production.

Think about that for a moment. Most cloud services force you into one of two buckets: either you use a "quick and dirty" option for prototyping that you'll have to throw away later, or you invest days of infrastructure setup upfront for a production-grade environment. Cloud Run refuses that trade-off. Your weekend project and your production workload run on the same platform, with the same security, the same scaling, the same deployment model.

And when nobody is using your service? It scales to zero. You pay nothing. That means you can spin up ten experimental services, let them sit idle for a month, and your bill is exactly zero.

So where does Cloud Run sit in the serverless landscape? It's not a virtual machine, you don't manage an OS. It's not a Kubernetes cluster, you don't manage nodes or pods. It's not a function, you're not limited to a single entry point with a 15-minute timeout. It's a container-as-a-service: you provide something that can run in a container, and the platform handles everything else (placement, scaling, networking, TLS). If you already know containers, bring your image and Cloud Run runs it. If you don't? Just point it at your source code and Cloud Run will package and deploy it for you. And if you don't even want to manage an HTTP framework, Cloud Run functions let you write functions and deploy them individually, with Cloud Run wrapping each one in a server for you. Either way, you end up with a running service.

What's Behind the Curtain?

You don't need to understand any of this to use Cloud Run. But knowing what's underneath explains why the defaults are production-grade and why you can trust them.

Borg: Google's Internal Engine

Cloud Run doesn't run on some separate, less-proven infrastructure. It runs directly on Borg, Google's internal cluster management system. The same system that powers Gmail, YouTube, Google Search, and virtually every other Google service. Borg has been in production for over a decade, deploying billions of containers per week across clusters of tens of thousands of machines.

If Borg sounds familiar, it should. It was the direct predecessor to Kubernetes. Many of the same engineers and architectural concepts carried over. But while Kubernetes is the open-source version built for the rest of us, Borg is the battle-hardened original that still runs Google internally.

What does this mean for your containers? It means they inherit the same scheduling, failover, and resource management that Google trusts for its own products. It means your service benefits from Google's BeyondProd zero-trust security framework, where trust depends on code provenance and service identity, not network location. It means Binary Authorization for Borg verifies that only reviewed, properly-built code is deployed to the infrastructure.

In short: your containers run on the same infrastructure as Gmail.

Knative: The API Layer

Cloud Run's API is based on Knative Serving, an open-source project originally started by Google for running serverless workloads on Kubernetes. But Cloud Run is not "managed Knative". It reimplements the Knative Serving API on top of Borg, with no Kubernetes underneath.

The practical takeaway: if you define your service using a Knative YAML manifest, that definition is portable between Cloud Run and self-hosted Knative on Kubernetes. And because there's no Kubernetes under the hood, you don't pay the complexity tax of managing a cluster.

gVisor and Container Sandboxing

Every Cloud Run instance is sandboxed with two layers of isolation: not just Linux namespaces and cgroups like standard containers, but hardware-backed virtualization on top of that.

Cloud Run offers two execution environments:

Gen1 (gVisor-based): gVisor is an open-source container sandbox developed by Google. It acts as a user-space kernel, a process written in Go that intercepts your container's system calls and reimplements them, so the host kernel is never directly exposed. This gives you a smaller attack surface and faster cold starts, but some software that relies on unusual system calls may be incompatible.

Gen2 (Linux microVM-based): Instead of gVisor, Gen2 runs your container inside a lightweight virtual machine with a full Linux kernel. You get complete system call compatibility, better sustained CPU and network performance, but slightly longer cold starts.

Both environments use the same two-layer approach: a hardware-backed virtual machine monitor (VMM) boundary between instances, plus the software kernel layer (gVisor or microVM). Even if someone found a way to escape the container sandbox, they'd still face the hardware virtualization boundary.

You choose per service. Pricing is identical. Most developers never need to think about it. If you don't specify an execution environment, Cloud Run selects one automatically based on the features your service uses.

The Production-Ready Default

You can happily use Cloud Run without knowing any of this. But when someone asks "is Cloud Run production-ready?", the answer isn't "probably." Hardware-backed isolation between every instance, zero-trust security, and battle-tested scheduling. It comes hardened.

What You Get Out of the Box

Here's what a single deploy command gives you, before you touch a single config file:

$ gcloud run deploy my-api --source . --region us-central1 --allow-unauthenticated

Building using Dockerfile and deploying container to Cloud Run service [my-api] in project [my-project] region [us-central1]
✓ Building and deploying... Done.
  ✓ Uploading sources...
  ✓ Building Container...
  ✓ Creating Revision...
  ✓ Routing traffic...
Done.
Service [my-api] revision [my-api-00001-abc] has been deployed and is serving 100 percent of traffic.
Service URL: https://my-api-abc123-uc.a.run.app

That URL is live, load-balanced, auto-scaling, and secured with a managed TLS certificate. Let's break down what's included.

Security (Zero Config)

Every Cloud Run service automatically gets:

HTTPS with managed TLS certificates. Every *.run.app URL is served over HTTPS with auto-provisioned, auto-renewed certificates. There is no option to serve plain HTTP on the public endpoint. You cannot accidentally deploy an insecure service.
DDoS protection. The Google Front End (GFE) sits in front of every Cloud Run service, applying the same DDoS protections that guard Google's own services.
Hardware-backed container isolation. As we covered in the "behind the curtain" section, every instance is sandboxed behind a VMM boundary. This isn't namespace isolation, it's virtualization-level separation.
Encryption everywhere. Data encrypted at rest using Google-managed keys. All traffic between Google Cloud services encrypted in transit. This is the default and always on.
IAM-based access control. Every service integrates with Google Cloud IAM. By default, services require authentication. You explicitly opt in to public access with --allow-unauthenticated.

Scaling (Zero Config)

Scale-to-zero by default. No traffic? No instances. No cost. This is the default behavior, you don't configure it, you don't enable it. It just works.
Autoscaling up to 100 instances by default. Cloud Run automatically evaluates two key signals: request concurrency (targeting 60% of your configured max) and CPU utilization (targeting 60%). It scales up and down based on real demand. The default cap of 100 instances is configurable higher. To put that in perspective: if your service handles one request at a time and you get a sudden spike of 80 concurrent users, Cloud Run spins up roughly 80 instances to absorb the load, then scales back down as traffic drops.
Idle instance retention. After the last request, instances may be kept idle for up to 15 minutes before being terminated. This absorbs traffic bursts without cold starts. It's a small detail that makes a big difference in practice.
Startup CPU Boost. When an instance starts up, Cloud Run temporarily doubles (or more) its CPU allocation to speed up initialization. A service configured for 2 vCPU gets boosted to 4 vCPU during startup and for 10 seconds after. Google reported up to 50% faster startup times for Java/Spring applications when this feature is enabled.

Observability (Zero Config)

Automatic logging. Everything your container writes to stdout and stderr is automatically captured in Cloud Logging. No agent to install, no sidecar to configure. Write structured JSON logs and they're automatically parsed into searchable fields. For example, a JSON line like {"severity":"ERROR", "message":"connection refused", "sessionId":"abc-123", "userId":"user-42"} becomes a fully filterable log entry in the Cloud Logging console. That means you can add any custom fields you want to your JSON payload (session IDs, user IDs, request traces, feature flags) and later filter your logs by those exact fields. Debugging a problem for a specific user? Filter by jsonPayload.userId="user-42" and you get every log entry for that user across all instances.
Built-in metrics. Cloud Monitoring automatically tracks request count, latency distribution, CPU utilization, memory utilization, and instance count. These show up in the Cloud Run console with no setup.
Audit logs always on. Admin Activity audit logs record who deployed what, when, and with what configuration. These are always enabled and cannot be turned off.

Infrastructure (Zero Config)

Built-in load balancing. Requests are distributed across instances automatically. No load balancer to provision or configure.
Zero-downtime deployments. Every deployment creates a new immutable revision. Traffic switches to the new revision only after it passes its startup probe. Old instances keep serving in-flight requests. No deployment strategy to configure. It just happens. And because revisions are immutable and stick around, you can split traffic between them. Send 5% of traffic to the new revision while 95% stays on the current one, monitor the metrics, and gradually shift. Canary deployments without a deployment tool.
Automatic health checks. Cloud Run configures a TCP startup probe by default: it waits for your container to listen on the expected port before sending traffic. Your service doesn't receive requests until it's actually ready.
OS patching and runtime maintenance. You never patch the underlying OS, kernel, or runtime. Google handles the entire infrastructure stack beneath your container.

Cloud Run Functions: The Simpler Path

Everything above applies to Cloud Run services, where you bring a container (or source code) that runs an HTTP server. But what if you don't want to deal with an HTTP framework at all?

Cloud Run functions let you skip all of that. You write your functions, point a deployment at one of them, and Cloud Run wraps it in an HTTP server automatically. Your source code can define as many functions as you like. Each deployment serves one entry point, specified by the --function flag. Same codebase, multiple deployments, each with its own URL.

import functions_framework

@functions_framework.http
def hello(request):
    name = request.args.get("name", "World")
    return f"Hello, {name}!"

gcloud run deploy hello-func \
  --source . \
  --function hello \
  --base-image python312 \
  --region us-central1

That's it. No Flask, no FastAPI, no Dockerfile. Cloud Run builds the container, injects the HTTP server, and deploys it. You get the same scaling, the same security, the same zero-config observability that a full Cloud Run service gets.

If this sounds like Cloud Functions, here's the history. Cloud Functions 1st gen ran on older, separate infrastructure with strict limits: 9-minute timeouts, one request per instance, no concurrency. Cloud Functions 2nd gen (GA in 2022) was already built on top of Cloud Run under the hood, which unlocked 60-minute timeouts and multi-request concurrency. In 2024, Google made it official and rebranded 2nd gen as Cloud Run functions, consolidating everything under the Cloud Run name. So this isn't a new product. It's the recognition that the infrastructure was already unified. If your functions outgrow the one-entry-point-per-deployment model and you need routing, middleware, or multiple endpoints behind a single URL, you swap it for a full service on the same platform. No migration, no new infrastructure.

When to use functions vs. services: Cloud Run functions shine for single-purpose endpoints: webhooks, event handlers, lightweight APIs, scheduled tasks. A good example from my own workflow: when I build AI-powered front-end apps, I never call the LLM API directly from the client. That would mean shipping my API keys to the browser. Instead, I deploy a Cloud Run function that sits between my front end and the LLM provider. The function validates the user's authorization, makes the LLM call with my credentials server-side, and returns the response. My keys never leave the server. It takes minutes to set up, and it's exactly the kind of single-purpose endpoint where a function is the right fit. The moment you need multiple routes, middleware, or background processing within the same service, a full Cloud Run service with your own HTTP framework gives you that control. It's not a matter of which is "better." It's about matching the model to the job.

Supported runtimes include Node.js, Python, Go, Java, .NET, Ruby, and PHP.

Cloud Run Jobs: Run to Completion

Cloud Run services and functions are request-driven: they wait for traffic and respond to it. But not every workload fits that model. What about a nightly database export, a batch of image transformations, or a data pipeline that processes a million rows and then exits?

That's what Cloud Run jobs are for. Instead of listening for requests, a job runs your container to completion and stops. No HTTP endpoint, no scaling based on traffic. You tell it what to do, it does it, and it's done.

gcloud run jobs create my-etl-job \
  --image us-docker.pkg.dev/my-project/repo/etl:v1 \
  --tasks 100 \
  --task-timeout 30m \
  --max-retries 3 \
  --region us-central1

gcloud run jobs execute my-etl-job --region us-central1

The first command creates the job. The second runs it. You can also run jobs on a schedule using Cloud Scheduler, or trigger them from workflows and event-driven pipelines.

The --tasks flag is where it gets interesting. A job can run up to 10,000 parallel tasks, each receiving a CLOUD_RUN_TASK_INDEX environment variable (0 through 9,999) so it knows which chunk of work to handle. Need to process a million images? Create a job with 1,000 tasks, each processing 1,000 images. Cloud Run runs them in parallel, retries any that fail (up to --max-retries), and reports the result.

Task timeouts go up to 168 hours (7 days), or 1 hour with GPU, compared to the 60-minute request timeout on services. This makes jobs the natural fit for workloads that take hours to complete.

Jobs get the same infrastructure benefits as services: the same Borg scheduling, the same container isolation, the same scaling. The difference is the execution model. Services are long-lived and request-driven. Jobs are ephemeral and task-driven. Both are first-class Cloud Run workload types.

When Cloud Run Is NOT the Right Choice

Every platform has boundaries. Cloud Run's have narrowed significantly over the past two years (sidecars, GPU support, volume mounts, and worker pools have all landed), but real limits remain. Knowing them in advance saves you from the painful realization six months into a project that you're fighting the platform instead of building on it.

Statelessness by Design

Cloud Run instances are ephemeral. They can be created and destroyed at any moment. If your architecture requires any of the following, you need to understand the trade-offs:

Local disk persistence beyond the instance lifecycle. The local filesystem is ephemeral. When the instance is gone, so is everything on disk. That said, Cloud Run now supports mounting Cloud Storage buckets via FUSE and NFS file shares via Filestore, giving you read/write access to persistent shared storage. Cloud Storage mounts are eventually consistent (no file locking, last write wins), while Filestore gives you full POSIX semantics. Neither is local disk, but for many use cases they close the gap.
In-memory caching shared across instances. There are no sticky sessions by default (though session affinity is available on a best-effort basis). Each request might hit a different instance. If you need shared state, you need an external store like Redis or Memorystore.
WebSocket connections that must survive beyond ~60 minutes. Cloud Run supports WebSockets, and combined with session affinity this works well for real-time applications. But connections are limited to approximately 60 minutes (the maximum request timeout). If you need connections that live for hours or days, you need dedicated infrastructure.
Long-running background workers without HTTP triggers. Cloud Run services are request-driven. But this boundary is softening: Cloud Run worker pools (currently in preview) are designed for pull-based workloads like Kafka consumers and Pub/Sub subscribers, with no public HTTP endpoint required and up to 40% lower pricing than standard services.

Teams that need truly stateful workloads (ML model serving with warm caches that must survive across deploys, game servers with persistent connections beyond 60 minutes) find GKE's persistent volumes and StatefulSets a more honest fit.

Multi-Container Support: Better Than Before, Not Kubernetes

Cloud Run now supports multi-container instances (sidecars). You can run up to 10 containers per instance sharing the same network namespace and in-memory volumes. This enables patterns like running Nginx as a reverse proxy, OpenTelemetry collectors for custom metrics, Envoy for traffic management, or Prometheus for metric export.

But it's not full Kubernetes pod topology. The key differences:

Only one container receives inbound HTTP traffic (the "ingress container"). Sidecars can't independently serve external requests.
No init containers. You can control startup ordering (sidecar starts before ingress container), but unlike Kubernetes init containers, sidecars keep running. They don't run-to-completion before the main container starts.
Maximum 10 containers per instance.

For most sidecar patterns (proxies, observability agents, log processors), Cloud Run's implementation is sufficient. For complex pod topologies with init containers and multiple ingress points, GKE remains the answer.

Networking Depth

Cloud Run's networking has improved with Direct VPC egress (placing instances directly on your VPC without a connector), but teams still hit walls with:

Service mesh requirements. Istio and Anthos Service Mesh are native in GKE. You can run an Envoy sidecar on Cloud Run, but a full service mesh with mTLS, traffic policies, and observability across services is a different story.
Pod-to-pod direct communication without going through load balancers.
Custom network policies for zero-trust internal segmentation.
Multi-cluster routing and traffic mirroring.

If your architecture involves sophisticated network topologies or strict internal traffic control, GKE gives you the knobs. Cloud Run gives you simplicity at the cost of that control.

Cold Start Economics

Cloud Run's minimum instances feature mitigates cold starts, and Startup CPU Boost temporarily doubles CPU during initialization to get instances ready faster. For many workloads, these two features together make cold starts a non-issue.

But if your latency requirements are strict and you end up keeping instances always-on, you've lost the serverless cost model. You're now paying for always-on compute. And once you're paying for always-on instances anyway, the economic argument shifts toward GKE, where you have more control over resource packing, node utilization, and cost optimization across multiple services sharing the same cluster.

Workload Heterogeneity

Cloud Run primarily targets HTTP/gRPC workloads, though it keeps expanding. Cloud Run jobs handle batch processing, and GPU support makes AI/ML inference possible with scale-to-zero economics. The NVIDIA L4 (24 GB VRAM) is generally available, and the NVIDIA RTX PRO 6000 Blackwell (96 GB VRAM) is available in preview.

But the moment a team needs:

Daemonsets for node-level operations
Priority classes and preemption policies
Multiple GPUs per instance (Cloud Run supports only one)
Large models beyond single-GPU capacity (the L4's 24 GB VRAM limits you to ~9B parameters, though the RTX PRO 6000 with 96 GB VRAM expands this significantly in preview)

...GKE becomes the natural fit. Cloud Run is opinionated about what it runs. That opinion keeps getting broader, but it has limits.

The Migration Path: Cloud Run to Kubernetes

Here's the good news: if you start on Cloud Run and later need to move to Kubernetes, the migration path is straightforward, at least for the container itself.

If you deployed with a Docker image, that same image runs on GKE without modification. Your container doesn't know or care whether it's running on Cloud Run or Kubernetes. It listens on a port, responds to HTTP requests, and that's it.

If you deployed from source code (using Cloud Run's buildpack-based deployment), converting to a Docker image is trivial. You're adding a Dockerfile to a project that already works. The application code, the dependencies, the runtime behavior, none of that changes. You're just making the packaging step explicit instead of letting buildpacks handle it.

But here's the honest part: the container isn't the hard part of the migration. The redesign is.

You're moving to Kubernetes because you need something Cloud Run doesn't offer: complex pod topologies, full service mesh, multi-GPU inference, or unlimited connection lifetimes. That means you're not just moving a container; you're evolving your architecture. What was an in-memory cache on Cloud Run becomes a Redis cluster backed by persistent volumes on GKE. What was a single ingress container with an Envoy sidecar becomes a pod with init containers, network policies, and custom scheduling rules. The container image stays the same; everything around it changes.

The container is portable. The architecture might not be. And that's fine. It's the right trade-off. Cloud Run lets you start fast, validate your idea with real users, and build confidence in the solution. When you hit the boundaries we discussed above, you graduate to Kubernetes with a proven container and a clear understanding of what you actually need.

Cloud Run isn't a dead end. It's a deliberate starting point.

Conclusion

So here's the decision framework:

Start with Cloud Run when you're building a containerized service and you want to move fast without worrying about infrastructure.
Stay on Cloud Run as long as your workload fits its model: stateless, request-driven, with scaling needs that the platform handles naturally.
Graduate to GKE when you hit the boundaries. You'll know when you do, because you'll be fighting the platform instead of building on it.

Your container is the unit of portability. Whether it ends up on Cloud Run, GKE, or another platform entirely, the work you put into building and packaging it is never wasted. That's not unique to Cloud Run. It's the power of containers in general. But Cloud Run is the fastest way I've found to prove that a container works in production, without the upfront investment that usually requires.

Part 2 We'll get hands-on: the different ways to deploy to Cloud Run (there are more than you'd expect). In Part 3 is coming soon we'll dive into the configuration options that let you tune CPU, memory, scaling, networking, and security for your specific needs. Follow so you don't miss it.

Resources

Top comments (5)

VICTOR KIMUTAI • Mar 14

This is a great breakdown of Cloud Run and where it fits in the serverless landscape. One thing I find particularly interesting is how Cloud Run lowers the barrier between experimentation and production. Being able to deploy a container quickly and still inherit things like autoscaling, managed TLS, and IAM-based security makes it ideal for rapid prototyping without sacrificing production readiness.
I also like the point about using it as a secure backend layer for AI applications. Keeping API keys and sensitive logic server-side while exposing a lightweight endpoint is a pattern that feels increasingly important as more apps integrate LLM services.
From an architecture perspective, the stateless model also encourages cleaner system design by pushing shared state into managed services like databases or caches.Curious how you usually approach the transition point when a Cloud Run service starts needing more complex infrastructure. At what stage do you typically decide it's time to move toward something like GKE?

Daniel Gwerzman Google Developer Experts • Mar 14

Since this is a designed decision, I'm trying to catch it as early as I can. Usually after POC is done and I re-design the system.

VICTOR KIMUTAI • Mar 15

That makes a lot of sense. Treating the POC phase as a learning stage and then redesigning the system with the right architecture early on seems like a very practical approach. I imagine that once real usage patterns start appearing, it becomes easier to identify whether the service boundaries, scaling patterns, or networking requirements might eventually exceed what a serverless model like Cloud Run is designed for. It is interesting how platforms like Cloud Run allow teams to validate ideas quickly without committing to heavy infrastructure from the start. Then once the product direction becomes clearer, the architecture can evolve into something more structured if needed. Do you usually keep parts of the system on Cloud Run even after moving some workloads to GKE, or do you typically migrate the whole stack once that transition happens?

Daniel Gwerzman Google Developer Experts • Mar 16

It's depend on the architectural design.

VICTOR KIMUTAI • Mar 16

okay well, do we consider more precise with the right architecture because with proper handling on products stands the structured environment?