Jessica Rudd for Google Developer Experts

Posted on May 3 • Originally published at funsizedatabytes.substack.com on Apr 28

Bringing Sexy Back to Data Engineering: Automating BigQuery and Looker Sync

#automation #dataengineering #gemini #googlecloud

(Co-written & Automated by Antigravity)

Let’s talk about one of the most tedious, and let’s be honest, least sexy parts of data engineering: schema drift.

Keeping your data warehouse tables (like BigQuery) perfectly in sync with your BI layer (like Looker) is a struggle every data team knows too well. Increased business demands often lead to pushing updates to the warehouse while leaving corresponding LookML updates lingering in the backlog—adding to mounting tech debt.

But what if we could bring sexy back to data engineering by keeping your BigQuery tables and Looker models perfectly, and automatically, synced? Let’s look at how to say “bye bye bye” to manual updates using Google Cloud and Context Engineering.

The Architecture: A Choreographed Dance

The goal is to automate LookML updates whenever BigQuery schemas change. Here’s the big picture of how we choreographed this workflow:

Scheduling : We use Airflow in Cloud Composer for scheduling, which triggers multiple Cloud Run microservices.
Schema Extraction : A BigQuery schema service pulls the current BI model schemas on demand. This is built to be generalizable so others across the business can use it.
State Comparison : We pull the current LookML state, push both schemas to Google Cloud Storage (GCS), and calculate the diffs using a Python script.

The Magic: Context Engineering with Gemini

This is where the magic happens. A Cloud Run service uses Gemini in Vertex AI to perform the actual LookML updates. It takes the BigQuery schema, the current LookML, and our company context as inputs.

Of course, with any AI system, the prompt is crucial. We tell Gemini exactly what to do, adding as much determinism as possible.

The Do’s : Add or delete dimensions based on the BigQuery table, and keep features in alphabetical order.
The Don’ts (Strict Avoid List): Don’t add comments, don’t move existing measures, and don’t re-order anything unless there’s a schema change. Basically, don’t add any flair. Keep to the strict choreography.

Gemini then generates the updated LookML files and synthesizes all those technical changes into a cohesive markdown summary for the GitHub PR description, grouped by table so engineers can review them at a glance.

Validating and Pushing the Code

Once Gemini writes the updated code, our sync service steps in:

It clones the Looker GitHub repository.
It validates the new LookML using the Looker API.
If everything looks good, it creates a Pull Request and alerts the team via Slack.

Overcoming Networking Challenges

Creating the GitHub PR was actually the most technically difficult part of the puzzle because the GitHub instance was behind a VPN. Since Cloud Run uses a dynamic rotating pool of IP addresses, we needed a consistent way to communicate with GitHub.

The Solution : We used a Virtual Private Cloud (VPC) and configured a Cloud NAT gateway with a reserved static external IP address. This ensured that the Cloud Run outbound traffic used the same fixed IP address that could be whitelisted by the VPN.

The Result: Trust and Automation

Automating is nothing without trust. The LookML validation results are included directly in the Slack alert with the PR link, so the team knows exactly when a new automated PR is ready for review.

By automating this process, we’ve eliminated manual Looker updates and schema drift. Gemini writes in the team’s specific style, engineers review PRs instead of hunting through schemas, and the self-service Explore experience for users remains seamless, leading to more trust and confidence in the data.

Stop babysitting your schemas. Let Gemini be your LookML backup dancer.

Thanks for reading Data Bytes! This post is public so feel free to share it.

Tags : #DataEngineering #GoogleCloud #BigQuery #Looker #Gemini #VertexAI #Automation #AI

Top comments (1)

PEACEBINFLOW • May 4

The strict avoidance list in the Gemini prompt — "don't add comments, don't move measures, don't reorder anything unless there's a schema change" — is the kind of constraint that sounds minor but is probably doing most of the heavy lifting for trust. It's not just about keeping the output clean. It's about making the diff reviewable.

When an AI modifies LookML, the worst outcome isn't a wrong change. It's a correct change wrapped in forty cosmetic reorderings and three new "helpful" comments, because now the human reviewer has to read every line to separate signal from noise. The avoidance list solves that by turning the model into a surgeon rather than a decorator. The PR becomes "here's exactly what changed and nothing else," which is what you need when schema syncs are supposed to be routine, not dramatic.

What I wonder is how that constraint set ages. The first version is easy: you know what annoys you about manual edits, so you forbid it. But over time, as the team's LookML conventions evolve, the avoidance list itself becomes a maintenance surface. Someone has to remember to update it when a new "helpful" behavior emerges that the model starts doing on its own. I'm curious if the team has hit that yet — where the prompt needs its own versioning — or if the strict choreography metaphor has held steady so far.