DEV Community

sam-nash
sam-nash

Posted on • Edited on

Google Pub/Sub

Introduction: The Power of Event-Driven Architectures

Have you ever wondered how large-scale systems like Google, Netflix, or Spotify manage to process millions of real-time events, such as sending notifications, updating dashboards, or recommending personalized content, all without missing a beat? The secret lies in event-driven architectures, and at the heart of such systems is a powerful tool: Google Cloud Pub/Sub.

In a world where businesses demand seamless integration and instant processing of data, traditional point-to-point messaging systems often fall short. Enter Google Pub/Sub, a fully managed messaging service designed to decouple applications, handle massive data streams, and enable real-time communication with unparalleled reliability and scalability.

Why Pub/Sub is Essential for Modern Systems
Modern applications require more than just static workflows; they thrive on dynamic, event-driven processes that can adapt and respond to real-time data. Here are some key reasons why Google Pub/Sub has become indispensable:

  • Real-Time Processing: From IoT devices generating millions of sensor readings to financial systems detecting fraudulent transactions, Pub/Sub ensures these events are captured and processed in real time.

  • Scalability: Pub/Sub seamlessly scales to handle millions of messages per second, supporting the ever-growing demands of modern businesses.

  • Decoupling of Services: It simplifies architectures by allowing independent components to communicate asynchronously, improving maintainability and flexibility.

  • Global Reach: Built on Google's global infrastructure, Pub/Sub offers low-latency message delivery across regions.

  • Integration-Friendly: It integrates seamlessly with other Google Cloud services like BigQuery, Dataflow, Cloud Functions, and Cloud Storage, making it a cornerstone for building robust, interconnected systems.

Google Cloud Pub/Sub is a fully managed messaging service designed for real-time communication, enabling independent applications to send and receive messages seamlessly. It's a powerful tool for building scalable and reliable systems.

In this tutorial, we'll delve into the core concepts of Pub/Sub, explore practical examples using the command-line interface (CLI) and Python SDK, and discuss various use cases.

Key Concepts

  • Topic: A named resource to which messages are published.
  • Subscription: A named resource that receives messages from a specific topic.
  • Publisher: An application that sends messages to a topic.
  • Subscriber: An application that receives messages from a subscription.
  • Message: The core unit of communication in Pub/Sub, consisting of data (payload) and attributes (key-value metadata).
  • Acknowledgment (Ack): Subscribers must confirm receipt of a message by acknowledging it. Unacknowledged messages are redelivered to ensure reliable processing.
  • Push vs. Pull Subscriptions
    • Pull: Subscribers explicitly fetch messages.
    • Push: Pub/Sub pushes messages to an HTTP(S) endpoint.
  • Dead Letter Queue (DLQ): A separate topic for routing undeliverable messages after a defined number of delivery attempts. Useful for troubleshooting.
  • Retention Policy: Controls how long messages are stored if they remain unacknowledged or acknowledged. Default: 7 days.
  • Filters: Enable subscribers to receive only messages that match specific criteria based on attributes.
  • Ordering Keys: Ensures messages with the same ordering key are delivered in the order they were published.
  • Exactly Once Delivery: Guarantees no duplicate messages are delivered when configured correctly.
  • Flow Control: Helps subscribers manage the rate of message delivery to prevent being overwhelmed.
  • IAM Permissions and Roles: Pub/Sub uses Google Cloud IAM to control access.
    • Key roles:
    • Publisher: roles/pubsub.publisher
    • Subscriber: roles/pubsub.subscriber
    • Viewer: roles/pubsub.viewer
  • Message Deduplication: Prevents duplicate messages caused by retries or network issues.
  • Message Expiration (TTL): Automatically drops messages from a subscription after a specified time-to-live, reducing storage costs.
  • Backlog: The collection of unacknowledged messages in a subscription, enabling message replay if required.
  • Schema Registry: Defines structured message formats (e.g., Avro, Protocol Buffers), ensuring consistent data validation.
  • Batching: Groups multiple messages together for efficient publishing and delivery, improving performance.
  • Regional Endpoints: Publish messages to region-specific endpoints for compliance and reduced latency.
  • Cross-Project Topics and Subscriptions: Share Pub/Sub resources across projects for multi-project architectures.
  • Monitoring and Metrics: Integrated with Cloud Monitoring, providing insights like throughput, acknowledgment latency, and backlog size.
  • Snapshot: Captures the state of a subscription at a specific time, enabling message replay from that point.
  • Message Encryption: Messages are encrypted in transit and at rest by default, with the option to use Customer-Managed Encryption Keys (CMEK).

Getting Started

Pre-requisites

  • Set Up Your Google Cloud Project:
    • Create a new Google Cloud project or select an existing one.
    • Enable the Pub/Sub API.
  • Install the Google Cloud SDK:

    • Download and install the SDK from the official website.
    • Authenticate your Google Cloud account using
     # using cli
     gcloud auth login
    

Creating a Topic and Subscription

# Create a topic using cli
gcloud pubsub topics create my-topic

# Create a subscription using cli
gcloud pubsub subscriptions create my-subscription --topic my-topic
Enter fullscreen mode Exit fullscreen mode
# Using the Python SDK
from google.cloud import pubsub_v1

# Create a Publisher client
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path("your-project-id", "my-topic")
publisher.create_topic(topic_path)
Enter fullscreen mode Exit fullscreen mode
# Create a Subscriber client
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path("your-project-id", "my-subscription")
subscriber.create_subscription(subscription_path, topic_path)
Enter fullscreen mode Exit fullscreen mode

Publishing Messages

# using cli
gcloud pubsub topics publish my-topic \
--message "Hello, Pub/Sub!"
Enter fullscreen mode Exit fullscreen mode
# using the Python SDK
# Publish a message
message = "Hello, Pub/Sub!"
future = publisher.publish(topic_path, message.encode("utf-8"))
print(future.result())
Enter fullscreen mode Exit fullscreen mode

Subscribing to Messages

# Using the CLI
gcloud pubsub subscriptions pull my-subscription --auto-ack
Enter fullscreen mode Exit fullscreen mode
# Using the Python SDK
def callback(message):
    print("Received message: {}".format(message.data))
    message.ack()

subscriber.subscribe(subscription_path, callback=callback)

# The subscriber will stay alive until interrupted
while True:
    time.sleep(60)
Enter fullscreen mode Exit fullscreen mode

Use Cases with Practical Examples

Triggers with Cloud Functions

Use Case: Automatically trigger a process whenever a new message is published to a Pub/Sub topic. For example, sending an email notification when a new user registers.

Solution: Use Cloud Functions to listen to Pub/Sub topics and handle messages.

Steps:
Pre-Requisites
-- Enable the apis run.googleapis.com, cloudbuild.googleapis.com, eventarc.googleapis.com
-- Grant the role roles/logging.logWriter to the default compute servie account

gcloud projects add-iam-policy-binding $GCP_PROJECT \
  --member="serviceAccount:<ID>-compute@developer.gserviceaccount.com" \
  --role="roles/logging.logWriter"
Enter fullscreen mode Exit fullscreen mode
  1. Create a Pub/Sub topic:

    gcloud pubsub topics create user-registration
    

pub-sub-topic

  1. Write a Cloud Function:

    def send_email_notification(event, context):
        import base64
        message = base64.b64decode(event['data']).decode('utf-8')
    print(f"Sending email notification for: {message}")
    
  2. Deploy the Cloud Function:

      gcloud functions deploy sendEmailNotification \
    --runtime python39 \
    --trigger-topic user-registration \
    --entry-point send_email_notification
    
  3. Publish a message to test:

      gcloud pubsub topics publish user-registration --message "New User: John Doe"
    

Result: The Cloud Function logs the email
notification process.

cloud-function-run

Streaming Data to BigQuery

Use Case: Process and analyze large volumes of event data, such as IoT sensor readings or transaction logs, by streaming messages from Pub/Sub to BigQuery.

Solution: Use a pre-built Dataflow template to ingest data into BigQuery.

Steps:

  1. Create a Pub/Sub topic and subscription:

    gcloud pubsub topics create sensor-data
    gcloud pubsub subscriptions create sensor-data-sub --topic=sensor-data
    

pub-sub-topic

pub-sub-subscription

  1. Enable BigQuery API and create a BigQuery dataset and table:

    # schema.json
    [
      {
        "name": "temperature",
        "type": "FLOAT",
        "mode": "NULLABLE"
      },
      {
        "name": "humidity",
        "type": "FLOAT",
        "mode": "NULLABLE"
      }
    ]
    
```
bq mk sensor_dataset
bq mk --table sensor_dataset.sensor_table \ 
schema.json
```
Enter fullscreen mode Exit fullscreen mode

Image description

  1. Enable the Dataflow API and run the Dataflow job:

    # enable dataflow api
    https://console.developers.google.com/apis/api/dataflow.googleapis.com/overview?project=your-project-id
    
```bash
# run the dataflow job
gcloud dataflow jobs run pubsub-to-bigquery \
--gcs-location gs://dataflow-templates-us-central1/latest/PubSub_to_BigQuery \
--region us-central1 \
--parameters inputTopic=projects/your-project-id/topics/sensor-data,outputTableSpec=your-project-id:sensor_dataset.sensor_table
```
Enter fullscreen mode Exit fullscreen mode

Image description
Image description

  1. Publish messages to the topic:

    gcloud pubsub topics publish sensor-data --message '{"temperature": 25, "humidity": 60}'
    

Image description

Result: The message data should appear in the BigQuery table for further analysis.

Image description

Logging Events with Sinks

Use Case: Centralize and process logs by routing Cloud Logging events to a Pub/Sub topic for downstream processing, like alerting or archiving.

Solution: Create a logging sink targeting a Pub/Sub topic.

Steps:

  1. Create a Pub/Sub topic:
   gcloud pubsub topics create log-events
Enter fullscreen mode Exit fullscreen mode
  1. Create a logging sink:
    gcloud logging sinks create log-sink \
    pubsub.googleapis.com/projects/your-project-id/topics/log-events
Enter fullscreen mode Exit fullscreen mode
  1. Set IAM permissions for the sink service account:
gcloud pubsub topics add-iam-policy-binding log-events \
    --member=serviceAccount:service-account-email \
    --role=roles/pubsub.publisher
Enter fullscreen mode Exit fullscreen mode
  1. Create a subscriber to process the logs:
from google.cloud import pubsub_v1

subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path('your-project-id', 'log-events-sub')

def callback(message):
    print(f"Received log: {message.data.decode('utf-8')}")
    message.ack()

subscriber.subscribe(subscription_path, callback=callback)
print("Listening for log events...")
Enter fullscreen mode Exit fullscreen mode

Result: Log messages flow into Pub/Sub and can be processed by your custom subscriber.

Streaming Messages to Cloud Storage

Use Case: Archive real-time data, such as application usage metrics or user activity logs, to Cloud Storage for long-term storage.

Solution: Use Dataflow to write Pub/Sub messages to a Cloud Storage bucket.

Steps:

  1. Create a Pub/Sub topic and subscription:
gcloud pubsub topics create app-metrics
gcloud pubsub subscriptions create app-metrics-sub --topic=app-metrics
Enter fullscreen mode Exit fullscreen mode
  1. Create a Cloud Storage bucket:
gsutil mb gs://your-bucket-name
Enter fullscreen mode Exit fullscreen mode
  1. Run the Dataflow job:
gcloud dataflow jobs run pubsub-to-gcs \
    --gcs-location gs://dataflow-templates-us-central1/latest/PubSub_to_Cloud_Storage_Text \
    --region us-central1 \
    --parameters inputTopic=projects/your-project-id/topics/app-metrics,outputDirectory=gs://your-bucket-name/data/,outputFilenamePrefix=metrics,outputFileSuffix=.json
Enter fullscreen mode Exit fullscreen mode
  1. Publish messages:
gcloud pubsub topics publish app-metrics --message '{"event": "page_view", "user": "123"}'
Enter fullscreen mode Exit fullscreen mode

Result: Messages are written as JSON files in the specified Cloud Storage bucket.

Conclusion
Google Pub/Sub is a versatile tool for building scalable and reliable applications. By understanding the core concepts and leveraging the CLI and Python SDK, you can effectively utilize Pub/Sub to solve a variety of real-world problems.

Heroku

This site is built on Heroku

Join the ranks of developers at Salesforce, Airbase, DEV, and more who deploy their mission critical applications on Heroku. Sign up today and launch your first app!

Get Started

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay