Building event-driven systems with Cloud Run, Pub/Sub, and Eventarc is a powerful pattern, but the gap between "Hello World" tutorials and production stability is wide. These are the hard-won lessons from the trenches of Google Cloud serverless architecture.
1. Choosing Your Transport: Eventarc vs. Pub/Sub
While Eventarc often uses Pub/Sub under the hood, the choice of which to interface with directly depends on your source and required control.
| Feature | Eventarc | Direct Pub/Sub |
|---|---|---|
| Best For | GCP-native events (GCS, Firestore, Audit Logs) | Custom inter-service messaging |
| Setup | Managed wrapper, high convenience | Manual configuration, high control |
| Filtering | Built-in attribute filtering | Fine-grained policies & custom attributes |
| IAM | Uses eventarc.eventReceiver
|
Uses pubsub.subscriber
|
Pro Tip: Use Eventarc for "plumbing" Google Cloud events. Use Direct Pub/Sub when you need a custom event bus or specific retry/ordering logic.
2. The Infrastructure Reality Check
The Idempotency Requirement
Pub/Sub guarantees at-least-once delivery by default. While exactly-once delivery exists as an opt-in feature, your consumers must assume duplicates will happen. If your handler isn't idempotent, you risk double-charging users or corrupting state.
The Non-Negotiable Flow:
- Extract a unique Idempotency Key (UUID/Hash) from the event.
- Check a fast-access cache (Memorystore or Firestore) for the key.
- If present, discard the message as a duplicate.
- If absent, process the event and store the key with a TTL.
Managing Cold Starts
Cloud Run’s "scale to zero" is a budget-saver but a latency-killer. A container startup typically adds 2–8 seconds to the first request.
-
The Fix: Set
min-instancesfor latency-critical paths (this incurs cost). - The Alternative: Use Cloud Run Jobs for asynchronous batch processing where start-time overhead is less relevant.
Eventarc Propagation Delays
When deploying Eventarc triggers via Terraform or CLI, filtering rules can take 60–120 seconds to propagate across the Google network.
- Production Fix: Build a "warm-up" delay into your CI/CD pipeline. Do not trigger integration tests immediately after a successful deployment, or you will see intermittent, false-positive failures.
3. Calculating Realistic Costs
Tutorials often suggest serverless is "pennies," but at scale, the math changes. Pub/Sub is billed at $40 per TiB, with a 1 KB minimum per message.
Example: 10M Events/Day (300M/Month)
- Data Volume: 300M messages × 1 KB (min size) ≈ 293 GiB (0.286 TiB).
- Ingest & Delivery: (0.286 TiB × $40) + (0.286 TiB × $40) = $23/month.
- Compute (Cloud Run): 300M invocations (assuming 1s duration, 512MB RAM) ≈ $150–$200/month.
Total: ~$175–$225/month. Significant, but highly predictable if you monitor message size and invocation duration.
4. Advanced Production Patterns
Regional Co-location
Eventarc triggers for multi-region services (like Firestore nam5) are often pinned to specific regions (e.g., us-central1). If your Cloud Run service resides elsewhere, you will incur cross-region egress charges and increased latency. Always verify the region mapping table in the GCP documentation.
Payload Sanitization at the Edge
Using Eventarc Advanced, you can use Common Expression Language (CEL) to transform or redact payloads before they reach the consumer. This is vital for PII compliance.
// Example: Redacting an email address in-flight
message.setField("data.email", re.extract(message.data.email, "(^.).*@(.*)", "\\1***@\\2"))
Observability and Dead-Letter Topics (DLTs)
In a distributed system, an event can "disappear" if a filter drops it or a consumer fails silently.
- DLTs: Every subscription must have a Dead-Letter Topic.
-
Alerting: Monitor the
subscription/dead_letter_message_countmetric. A rising count is your first sign of a logic bug or schema mismatch. - Tracing: Use OpenTelemetry to inject Trace IDs into event attributes, allowing you to follow a single request from the producer through the bus to the consumer logs.
5. When to Avoid Event-Driven
Don't "cargo-cult" this architecture if it doesn't fit your needs. Skip it if:
- You require strong consistency and immediate transactions.
- Your end-to-end latency requirements are < 100ms.
- The overhead of debugging distributed traces outweighs the scaling benefits.
In these cases, stick to Synchronous gRPC/HTTP or Cloud Workflows for structured orchestration.
Final Takeaway
The combination of Cloud Run and Eventarc is one of the most robust patterns in 2026. By respecting the boundaries of the platform—accounting for propagation delays, ensuring idempotency, and co-locating regions—you can build a system that scales effortlessly from zero to millions of events.
Top comments (0)