Çalgan Aygün for Google Developer Experts

Posted on Feb 16

Redefining Event-Driven Architecture on Google Cloud

#architecture #distributedsystems #googlecloud #serverless

Building event-driven systems with Cloud Run, Pub/Sub, and Eventarc is a powerful pattern, but the gap between "Hello World" tutorials and production stability is wide. These are the hard-won lessons from the trenches of Google Cloud serverless architecture.

1. Choosing Your Transport: Eventarc vs. Pub/Sub

While Eventarc often uses Pub/Sub under the hood, the choice of which to interface with directly depends on your source and required control.

Feature	Eventarc	Direct Pub/Sub
Best For	GCP-native events (GCS, Firestore, Audit Logs)	Custom inter-service messaging
Setup	Managed wrapper, high convenience	Manual configuration, high control
Filtering	Built-in attribute filtering	Fine-grained policies & custom attributes
IAM	Uses `eventarc.eventReceiver`	Uses `pubsub.subscriber`

Pro Tip: Use Eventarc for "plumbing" Google Cloud events. Use Direct Pub/Sub when you need a custom event bus or specific retry/ordering logic.

2. The Infrastructure Reality Check

The Idempotency Requirement

Pub/Sub guarantees at-least-once delivery by default. While exactly-once delivery exists as an opt-in feature, your consumers must assume duplicates will happen. If your handler isn't idempotent, you risk double-charging users or corrupting state.

The Non-Negotiable Flow:

Extract a unique Idempotency Key (UUID/Hash) from the event.
Check a fast-access cache (Memorystore or Firestore) for the key.
If present, discard the message as a duplicate.
If absent, process the event and store the key with a TTL.

Managing Cold Starts

Cloud Run’s "scale to zero" is a budget-saver but a latency-killer. A container startup typically adds 2–8 seconds to the first request.

The Fix: Set min-instances for latency-critical paths (this incurs cost).
The Alternative: Use Cloud Run Jobs for asynchronous batch processing where start-time overhead is less relevant.

Eventarc Propagation Delays

When deploying Eventarc triggers via Terraform or CLI, filtering rules can take 60–120 seconds to propagate across the Google network.

Production Fix: Build a "warm-up" delay into your CI/CD pipeline. Do not trigger integration tests immediately after a successful deployment, or you will see intermittent, false-positive failures.

3. Calculating Realistic Costs

Tutorials often suggest serverless is "pennies," but at scale, the math changes. Pub/Sub is billed at $40 per TiB, with a 1 KB minimum per message.

Example: 10M Events/Day (300M/Month)

Data Volume: 300M messages × 1 KB (min size) ≈ 293 GiB (0.286 TiB).
Ingest & Delivery: (0.286 TiB × $40) + (0.286 TiB × $40) = $23/month.
Compute (Cloud Run): 300M invocations (assuming 1s duration, 512MB RAM) ≈ $150–$200/month.

Total: ~$175–$225/month. Significant, but highly predictable if you monitor message size and invocation duration.

4. Advanced Production Patterns

Regional Co-location

Eventarc triggers for multi-region services (like Firestore nam5) are often pinned to specific regions (e.g., us-central1). If your Cloud Run service resides elsewhere, you will incur cross-region egress charges and increased latency. Always verify the region mapping table in the GCP documentation.

Payload Sanitization at the Edge

Using Eventarc Advanced, you can use Common Expression Language (CEL) to transform or redact payloads before they reach the consumer. This is vital for PII compliance.

// Example: Redacting an email address in-flight
message.setField("data.email", re.extract(message.data.email, "(^.).*@(.*)", "\\1***@\\2"))

Observability and Dead-Letter Topics (DLTs)

In a distributed system, an event can "disappear" if a filter drops it or a consumer fails silently.

DLTs: Every subscription must have a Dead-Letter Topic.
Alerting: Monitor the subscription/dead_letter_message_count metric. A rising count is your first sign of a logic bug or schema mismatch.
Tracing: Use OpenTelemetry to inject Trace IDs into event attributes, allowing you to follow a single request from the producer through the bus to the consumer logs.

5. When to Avoid Event-Driven

Don't "cargo-cult" this architecture if it doesn't fit your needs. Skip it if:

You require strong consistency and immediate transactions.
Your end-to-end latency requirements are < 100ms.
The overhead of debugging distributed traces outweighs the scaling benefits.

In these cases, stick to Synchronous gRPC/HTTP or Cloud Workflows for structured orchestration.

Final Takeaway

The combination of Cloud Run and Eventarc is one of the most robust patterns in 2026. By respecting the boundaries of the platform—accounting for propagation delays, ensuring idempotency, and co-locating regions—you can build a system that scales effortlessly from zero to millions of events.

DEV Community