DEV Community

leo gu
leo gu

Posted on

I Tried to Analyze SQL Lineage Across 15 Databases — Everything Broke Until I Did This

The problem nobody talks about

If you’ve ever worked with SQL at scale, you’ve probably run into this:

  • Queries spanning multiple schemas
  • dbt models referencing each other
  • Views built on top of views
  • Different SQL dialects (Snowflake, BigQuery, Spark…)

And then someone asks:

“Where does this column actually come from?”

At that moment, everything falls apart.

Not because the answer doesn’t exist —

but because your tools can’t give it to you.


I tried to solve it the “normal” way

I went through the usual stack:

  • dbt lineage graphs
  • Database-native tools
  • SQL IDEs like DataGrip and DBeaver

They work… until they don’t.

Here’s where things break:

  • ❌ Cross-database lineage? Forget it
  • ❌ Offline analysis? Not possible
  • ❌ Large SQL projects? Slow or incomplete
  • ❌ Raw SQL parsing? Surprisingly fragile

Especially when you mix:

  • Snowflake + dbt
  • Spark SQL + Hive
  • BigQuery + custom scripts

So I ran an experiment

I wanted to see how bad it really is.

So I tested SQL lineage across:

  • 10+ SQL dialects
  • dbt projects with hundreds of models
  • Real-world open-source repositories

Including:

  • dbt projects (~400+ models)
  • Spark / Hive SQL codebases
  • Data warehouse examples across multiple vendors

The goal was simple:

Can I reliably trace column-level lineage across all of them?

Short answer:

No existing tool handled all of it well.


The workaround that actually worked

Instead of relying on cloud tools or database engines, I tried something different:

👉 Analyze SQL locally, directly inside VS Code

That’s where this comes in:

👉 gudu sql omni (VS Code extension)

It’s essentially:

A local, offline SQL lineage engine that supports multiple databases


What makes it different?

Here’s what stood out immediately:

1. Works across multiple SQL dialects

Not just one database.

It handled:

  • Snowflake
  • BigQuery
  • Spark SQL
  • Hive
  • Redshift
  • Databricks

…in a single workflow.


2. Fully offline

No:

  • uploading SQL
  • connecting to cloud services
  • worrying about sensitive data

Everything runs locally.


3. Actually parses complex SQL

Including:

  • nested queries
  • CTE chains
  • dbt-style transformations
  • multi-layer views

This is where most tools fail.


What it looks like in practice

Inside VS Code:

  • Open a SQL file (or a project)
  • Run lineage analysis
  • Instantly get:

👉 table-level lineage

👉 column-level lineage

👉 dependency graph

No setup. No infra.


Real use case: dbt projects

This is where things get interesting.

dbt already provides lineage — but:

  • It’s tied to dbt ecosystem
  • Requires dbt setup
  • Not always flexible for raw SQL

With a local parser:

  • You can analyze dbt SQL without dbt runtime
  • You can inspect edge cases dbt doesn’t visualize well
  • You can debug transformations faster

Where this approach wins

After testing across multiple datasets, this approach works best when:

  • You have mixed SQL environments
  • You need offline analysis
  • You deal with large SQL codebases
  • You want fast iteration inside your editor

Where it still needs improvement

To be fair:

  • It’s not a full replacement for dbt
  • Visualization can still improve
  • Edge cases exist (as with any parser)

But as a developer tool, it fills a gap that’s been ignored for years.


Final thoughts

SQL lineage shouldn’t be this hard.

And yet:

  • Most tools are tied to one ecosystem
  • Or require heavy infrastructure
  • Or simply break on real-world SQL

What surprised me most is this:

A lightweight, local approach actually works better in many cases.


Try it yourself

If you work with SQL seriously, it’s worth testing:

👉 https://marketplace.visualstudio.com/items?itemName=gudusoftware.gudu-sql-omni

Takes less than a minute to install.


I’m curious

If you’ve struggled with SQL lineage before:

  • What tools did you try?
  • Where did they fail?

I’d love to compare notes.

Top comments (0)