DEV Community

Cover image for How to use BigQuery Query Caching with Dynamic Wildcard Tables
Marcelo Costa
Marcelo Costa

Posted on

How to use BigQuery Query Caching with Dynamic Wildcard Tables

The Problem: Caching does not work with wildcard tables

From BigQuery official docs:

Wildcard Limitation

Let's say you have some tables named my_data_2023_*, where the asterisk represents various months. You want to analyze data across all these tables. Since BigQuery doesn't know automatically when new tables were created, it will invalidate any available cache and run a fresh query, so cache won't be used.

Just for reference, it's not a good practice to use date sharded tables:
Image description
Recently I faced a scenario where tables where dynamically created based on a business domain field, the date example is only for illustration purposes, if you are using sharded tables, the better solution is to migrate it to BigQuery partitions instead.

The Solution: Union THEM ALL!

Enter the BigQuery Information Schema:

The BigQuery INFORMATION_SCHEMA views are read-only, system-defined views that provide metadata information about your BigQuery objects.

Image description

We can use the tables view to dynamically generate a list of all tables matching our pattern (e.g., my_data_2023_*). Then, we leverage UNION to combine individual queries for each identified table.

Here's a sample using Python:

from google.cloud import bigquery

client = bigquery.Client()

# Specify the dataset and wildcard pattern
dataset_id = "your-project.your_dataset"
wildcard_pattern = "my_data_2023_"

# Query the INFORMATION_SCHEMA to get matching table names
query = f"""
    SELECT table_name
    FROM `{dataset_id}.INFORMATION_SCHEMA.TABLES`
    WHERE table_name LIKE '{wildcard_pattern}%'
"""

rows = list(client.query(f"SELECT table_name FROM `{dataset_id}.INFORMATION_SCHEMA.TABLES` "
                                f"where table_name like '{your_table_prefix_}%'"))

if not rows:
    return

view_query = __create_sql(dict(rows[0])["table_name"])
for row in table_names[1:]:
    view_query = f"""
    {view_query}
    UNION ALL
    {__create_sql(dict(row['table_name'])}
"""
Enter fullscreen mode Exit fullscreen mode

I omitted the __create_sql function, which is just a logic that creates a complex SQL based on each table name, with the generated SQL then you can use it to create a BigQuery view:

view = bigquery.Table(table_ref)
view.view_query = view_query
client.create_table(view, exists_ok=True)
Enter fullscreen mode Exit fullscreen mode

Hope that helps, cheers!

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more →

Top comments (0)

Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

AWS GenAI Live!

GenAI LIVE! is a dynamic live-streamed show exploring how AWS and our partners are helping organizations unlock real value with generative AI.

Tune in to the full event

DEV is partnering to bring live events to the community. Join us or dismiss this billboard if you're not interested. ❤️