Data architecture stack

#dataengineering #programming #softwareengineering #database

Hey all,

My current stack is data coming as an eventing system to Pub/Sub (GCP). This contains streaming event data at scale of thousands every 5 minutes.

I also have an intelligence table in BigQuery which has all metadata information for those events. The pubsub events and BQ table share a common join key. However the metadata table is about 3 billion rows.I have an application that queries realtime metrics based on joined data (in clickhouse) returned via an API. We need the metadata joined in because it is the group by key for on-demand aggregations

My current setup is pubsubs to GCS to Clikchouse and then BQ to GCS to clickhouse and in clickhouse the incoming events get enriched by materialised view. However due to the size of metadata table each materialised view query is taking super long and costing a lot as well.

Are there any other tools/solutions I can use for this use case (if its something in GCP - amazing)

Any mistakes i am making because clickhouse join is not performant here (tried using dict as well) But there are too many keys to join (most of them wasteful)

Should I use another data model for pub/sub event i.e push all metadata there itself? But that leaves an issue of backfilling every time we add new metadata columns (which is frequent)

Any help would be appreciated

Struggling with slow API calls?

Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Top comments (0)

The Most Contextual AI Development Assistant

Our centralized storage agent works on-device, unifying various developer tools to proactively capture and enrich useful materials, streamline collaboration, and solve complex problems through a contextual understanding of your unique workflow.

👥 Ideal for solo developers, teams, and cross-company projects

Learn more

DEV Community

Data architecture stack

Struggling with slow API calls?

Top comments (0)

The Most Contextual AI Development Assistant

Okay