Daniel Gwerzman for Google Developer Experts

Posted on Mar 27 • Edited on Apr 2

This is Cloud Run: Configuration

#cloudrun #gcp #serverless #devops

This is Part 3 of the "This is Cloud Run" series. In Part 1, we covered what Cloud Run is and when to choose it. In Part 2, we walked through the deployment options and revision management. Now let's tune it.

Cloud Run's defaults are good. We covered that in Part 1. But every workload has its own needs, and Cloud Run gives you the knobs to tune for them. This article covers the settings you'll reach for most often.

CPU and Memory

Every Cloud Run instance gets a share of CPU and memory. The defaults (1 vCPU, 512 MiB) are reasonable for a lightweight API, but you'll want to adjust them as you understand your workload's needs.

CPU ranges from 0.08 vCPU (less than a tenth of a core) to 8 vCPUs. Memory ranges from 128 MiB to 32 GiB. The two are linked: higher CPU allocations require minimum memory thresholds, and some memory configurations require minimum CPU.

But the more important decision is the CPU allocation mode:

Request-only (default). CPU is only allocated while your instance is actively processing a request. Between requests, CPU is throttled to near-zero. You pay only for the time spent handling requests. This is the serverless model, and it's the right choice for most HTTP APIs.
Always-on. CPU is always available, even between requests. This costs more, but it's required for workloads that do work outside of request handling: WebSocket connections that maintain state, background threads that process queues, or services that need to keep in-memory caches warm.

gcloud run deploy my-service \
  --image my-image \
  --cpu 2 \
  --memory 1Gi \
  --no-cpu-throttling \
  --region us-central1

The --no-cpu-throttling flag enables always-on CPU. Without it (or with --cpu-throttling), you get the default request-only mode.

The pricing difference is significant. With request-only allocation, you pay per vCPU-second and GiB-second only while handling requests. With always-on, you pay for the entire lifecycle of the instance. For a service that handles bursty HTTP traffic with idle periods between, request-only can be dramatically cheaper. For a service that runs background tasks or maintains WebSocket connections, always-on is the only option that works correctly.

Health Checks

Cloud Run won't send traffic to an instance until it's ready. By default, it uses a TCP startup probe: it waits for your container to listen on the expected port, then considers it ready.

For most services, that's enough. But if your application needs time to load data, warm caches, or establish database connections after the port is open, you'll want a custom HTTP startup probe:

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 0
  periodSeconds: 2
  failureThreshold: 15

This tells Cloud Run to GET /healthz every 2 seconds. If it fails 15 times, the instance is marked unhealthy and restarted. Only when the probe succeeds does the instance start receiving traffic. This prevents the 502 errors that happen when a load balancer sends requests to an instance that's technically listening but not yet ready to serve.

Cloud Run also supports liveness probes that run continuously after startup. If a liveness probe fails, Cloud Run restarts the instance. Useful for detecting stuck processes, deadlocks, or memory leaks that don't crash the container but make it unresponsive.

For gRPC services, Cloud Run supports gRPC health checking probes following the gRPC health checking protocol.

Request Timeout

Every Cloud Run request has a timeout. The default is 300 seconds (5 minutes). The maximum is 3600 seconds (60 minutes).

gcloud run deploy my-service \
  --image my-image \
  --timeout 600 \
  --region us-central1

If your service processes large file uploads, generates reports, or runs long-running computations, you'll want to increase this. But keep in mind: the timeout applies per-request. If a single request takes longer than the timeout, Cloud Run terminates it. WebSocket connections are also subject to this timeout, which is why Part 1 mentioned the ~60-minute connection limit.

A common pattern for long-running work: accept the request, kick off the processing asynchronously (via Cloud Tasks or Pub/Sub), and return a 202 immediately. The client polls for status or receives a callback when the work is done. This keeps your request timeout short and your service responsive.

If you find yourself regularly hitting the 60-minute maximum, that's a signal your workload might be better suited to Cloud Run jobs (for batch processing) or a different platform entirely.

Scaling: Instances and Concurrency

Cloud Run's autoscaler manages three related settings:

Minimum instances controls how many instances stay warm when there's no traffic. The default is 0 (scale-to-zero). Setting it to 1 or higher eliminates cold starts but means you're paying for idle instances. It's the classic serverless trade-off: latency vs. cost. For latency-sensitive production services, 1 is often the right number. For dev environments, 0 keeps your bill at zero.

Maximum instances caps how far Cloud Run can scale up. The default is 100. This protects you from runaway scaling (and a surprising bill) during unexpected traffic spikes. But set this thoughtfully: if your service talks to a database with a 20-connection pool, 100 instances all trying to connect will overwhelm it. Match your max instances to your backend's capacity.

Concurrency controls how many requests a single instance handles simultaneously. The default is 80. This is one of Cloud Run's key advantages over the old Cloud Functions 1st gen model, which processed one request per instance. With concurrency at 80, a single instance can serve 80 simultaneous requests before Cloud Run spins up another instance.

Lower the concurrency for CPU-heavy workloads where each request needs dedicated processing power. Raise it (up to 1000) for lightweight I/O-bound handlers that spend most of their time waiting on network calls. Setting concurrency to 1 mimics the one-request-per-instance model if your code isn't thread-safe.

gcloud run deploy my-service \
  --image my-image \
  --min-instances 1 \
  --max-instances 10 \
  --concurrency 80 \
  --region us-central1

And remember Startup CPU Boost from Part 1: Cloud Run temporarily doubles CPU during instance initialization to get instances ready faster. Combined with minimum instances, this makes cold starts a non-issue for most workloads.

Environment Variables and Secrets

Cloud Run supports two mechanisms for passing configuration to your containers, and it's important to use the right one for the job.

Environment variables are for non-sensitive configuration: feature flags, API endpoints, logging levels, database hostnames. Set them at deploy time with --set-env-vars:

gcloud run deploy my-service \
  --image my-image \
  --set-env-vars "DB_HOST=10.0.0.1,LOG_LEVEL=info,ENV=production" \
  --region us-central1

This follows the 12-Factor App methodology: configuration lives in the environment, not in the code.

Secrets are for sensitive credentials: API keys, database passwords, TLS certificates, OAuth client secrets. These should never be plain environment variables. Plain env vars are visible in the Cloud Run Console, show up in debug logs, and can leak into error reports. Instead, store them in Secret Manager and reference them at deploy time:

gcloud run deploy my-service \
  --image my-image \
  --set-secrets "API_KEY=my-api-key:latest" \
  --set-secrets "/secrets/tls.key=tls-private-key:latest" \
  --region us-central1

Secrets can be mounted as environment variables or as files. The first example above mounts the secret as an environment variable called API_KEY. The second mounts it as a file at /secrets/tls.key. Secrets are versioned, access-controlled via IAM, and audit-logged. If a secret is compromised, you rotate it in Secret Manager and redeploy. No code changes.

Volume Mounts

Cloud Run instances are ephemeral, but sometimes you need temporary storage or access to shared files. Cloud Run supports three types of volume mounts:

In-memory volumes are tmpfs-style mounts backed by your instance's RAM. They're fast but volatile (gone when the instance terminates) and count against your memory limit. Useful for temporary file processing, like downloading a file, transforming it, and uploading the result:

gcloud run deploy my-service \
  --image my-image \
  --add-volume name=scratch,type=in-memory,size-limit=256Mi \
  --add-volume-mount volume=scratch,mount-path=/tmp/work \
  --region us-central1

Cloud Storage FUSE mounts a Cloud Storage bucket as a local filesystem. Your code reads and writes files normally, and GCS FUSE translates those operations into Cloud Storage API calls:

gcloud run deploy my-service \
  --image my-image \
  --add-volume name=models,type=cloud-storage,bucket=my-ml-models \
  --add-volume-mount volume=models,mount-path=/mnt/models \
  --region us-central1

The catch: it's eventually consistent. No file locking, last write wins. Good for reading shared assets (ML models, configuration files) or writing artifacts (logs, exports). Not good for concurrent writes to the same file.

NFS via Filestore gives you a fully POSIX-compliant network filesystem with proper file locking. Lower latency than GCS FUSE for random reads. Requires VPC connectivity since Filestore instances live on your VPC. Best for workloads that need shared read/write access with file-level consistency.

For most Cloud Run services, you won't need any of these. But when you do (image processing pipelines, ML model serving, shared configuration across instances), they save you from building workarounds.

Network Configuration

Cloud Run's networking defaults are simple: your service is public, and it connects to the internet for outbound traffic. But when you need more control, there are three areas to configure.

Ingress

Ingress settings control who can reach your service:

All (default). Accepts traffic from anywhere on the internet. Fine for public APIs and web apps.
Internal. Only accepts traffic from within your VPC or from other Google Cloud services (like Pub/Sub, Cloud Scheduler, or Cloud Tasks). The service is invisible to the public internet. Use this for backend services that should never be called directly by external clients.
Internal + Cloud Load Balancing. Same as internal, but also accepts traffic through a global external Application Load Balancer. This is the path to custom domains, CDN caching with Cloud CDN, and WAF protection with Cloud Armor. You'll see this load balancer pattern come up again in the Custom Domains and Cloud Armor sections below.

gcloud run deploy my-service \
  --image my-image \
  --ingress internal \
  --region us-central1

Egress and VPC Connectivity

By default, your Cloud Run instances connect to the internet directly. But if your service needs to reach private resources (a Cloud SQL database, a Memorystore Redis instance, an internal API), it needs VPC access.

Two options:

Serverless VPC Access connectors. The original approach. You create a connector resource that bridges Cloud Run and your VPC. Works, but adds a network hop and has throughput limits.
Direct VPC egress. The newer approach. Cloud Run instances are placed directly on your VPC subnet. No connector needed, no extra hop, no throughput bottleneck. This is the recommended path for new deployments.

If you're starting fresh, go with Direct VPC egress. If you have existing services using connectors, they'll keep working, but consider migrating when convenient.

Custom Domains

Every Cloud Run service gets a *.run.app URL with automatic HTTPS. But for production, you'll want your own domain. Two paths:

Cloud Run domain mapping. The simpler option. Map a domain directly to your Cloud Run service. SSL certificates are provisioned and renewed automatically. Works for straightforward setups where you just need api.example.com pointing to your service.
Global external Application Load Balancer. The more capable option. Gives you CDN caching, Cloud Armor WAF, multi-region routing, and URL-based routing to different services. More setup, but it unlocks features that domain mapping alone can't provide.

Security Configuration

Cloud Run's security defaults are strong (covered in Part 1). But for production services, you'll want to customize a few settings.

Service Accounts

Every Cloud Run service runs as a service account, which determines what Google Cloud resources it can access. By default, Cloud Run uses the project's default compute service account, which typically has broad permissions.

For production, create a dedicated service account per service with only the permissions it needs. If your service reads from Cloud Storage and writes to Pub/Sub, its service account should have storage.objectViewer and pubsub.publisher. Nothing more. This is the principle of least privilege, and it limits the blast radius if a service is compromised.

gcloud run deploy my-service \
  --image my-image \
  --service-account my-sa@my-project.iam.gserviceaccount.com \
  --region us-central1

IAM Authentication

By default, Cloud Run requires authentication. Every request must include a valid identity token, and the caller must have the roles/run.invoker role on the service. This is the right default for service-to-service communication.

For public-facing services (APIs, webhooks, web apps), you explicitly opt out by granting the roles/run.invoker role to allUsers:

gcloud run services add-iam-policy-binding my-service \
  --member="allUsers" \
  --role="roles/run.invoker" \
  --region us-central1

But even with unauthenticated access enabled, you can implement your own authentication layer in your application code. Cloud Run handles transport security (HTTPS) and platform-level identity (the IAM invoker check). Your app handles application-level identity: user logins, API keys, JWT validation.

Binary Authorization

Binary Authorization enforces deploy-time policies: only container images that have been signed by your CI/CD pipeline can be deployed. This prevents someone from deploying an untested image directly to production, even if they have the IAM permissions to do so.

It's a layer of governance that makes sense for organizations with compliance requirements or strict change management processes.

Cloud Armor

Cloud Armor is Google Cloud's WAF (Web Application Firewall). It sits in front of your Cloud Run service and can enforce:

IP allowlists and denylists
Geographic restrictions
Rate limiting per client
Pre-configured WAF rules (SQL injection, XSS, etc.)

Cloud Armor requires a global external Application Load Balancer in front of your Cloud Run service. If you're using the default *.run.app URL without a load balancer, Cloud Armor isn't available. But if your service is public-facing and handles sensitive data, the load balancer + Cloud Armor combination is worth the extra setup.

Conclusion

Cloud Run gives you enough configuration knobs to tune for real workloads. But you don't need to touch all of them at once.

The pattern I recommend: start with the defaults. Deploy your service, see how it behaves under real traffic, then adjust. Bump the memory if you're hitting limits. Lower the concurrency if requests are CPU-heavy. Add a health check if your startup is slow. Set up a dedicated service account before going to production. Every change takes effect on the next deployment, with zero downtime. Nothing is permanent.

If Part 1 was about whether Cloud Run belongs in your architecture, and Part 2 was about getting your code onto it, this article is about making it work well for your specific needs. Start simple. Add complexity when your workload demands it, not before.

If you're just joining the series, Part 1 is the place to start.

DEV Community