The problem nobody talks about
If you’ve ever worked with SQL at scale, you’ve probably run into this:
- Queries spanning multiple schemas
- dbt models referencing each other
- Views built on top of views
- Different SQL dialects (Snowflake, BigQuery, Spark…)
And then someone asks:
“Where does this column actually come from?”
At that moment, everything falls apart.
Not because the answer doesn’t exist —
but because your tools can’t give it to you.
I tried to solve it the “normal” way
I went through the usual stack:
- dbt lineage graphs
- Database-native tools
- SQL IDEs like DataGrip and DBeaver
They work… until they don’t.
Here’s where things break:
- ❌ Cross-database lineage? Forget it
- ❌ Offline analysis? Not possible
- ❌ Large SQL projects? Slow or incomplete
- ❌ Raw SQL parsing? Surprisingly fragile
Especially when you mix:
- Snowflake + dbt
- Spark SQL + Hive
- BigQuery + custom scripts
So I ran an experiment
I wanted to see how bad it really is.
So I tested SQL lineage across:
- 10+ SQL dialects
- dbt projects with hundreds of models
- Real-world open-source repositories
Including:
- dbt projects (~400+ models)
- Spark / Hive SQL codebases
- Data warehouse examples across multiple vendors
The goal was simple:
Can I reliably trace column-level lineage across all of them?
Short answer:
No existing tool handled all of it well.
The workaround that actually worked
Instead of relying on cloud tools or database engines, I tried something different:
👉 Analyze SQL locally, directly inside VS Code
That’s where this comes in:
👉 gudu sql omni (VS Code extension)
It’s essentially:
A local, offline SQL lineage engine that supports multiple databases
What makes it different?
Here’s what stood out immediately:
1. Works across multiple SQL dialects
Not just one database.
It handled:
- Snowflake
- BigQuery
- Spark SQL
- Hive
- Redshift
- Databricks
…in a single workflow.
2. Fully offline
No:
- uploading SQL
- connecting to cloud services
- worrying about sensitive data
Everything runs locally.
3. Actually parses complex SQL
Including:
- nested queries
- CTE chains
- dbt-style transformations
- multi-layer views
This is where most tools fail.
What it looks like in practice
Inside VS Code:
- Open a SQL file (or a project)
- Run lineage analysis
- Instantly get:
👉 table-level lineage
👉 column-level lineage
👉 dependency graph
No setup. No infra.
Real use case: dbt projects
This is where things get interesting.
dbt already provides lineage — but:
- It’s tied to dbt ecosystem
- Requires dbt setup
- Not always flexible for raw SQL
With a local parser:
- You can analyze dbt SQL without dbt runtime
- You can inspect edge cases dbt doesn’t visualize well
- You can debug transformations faster
Where this approach wins
After testing across multiple datasets, this approach works best when:
- You have mixed SQL environments
- You need offline analysis
- You deal with large SQL codebases
- You want fast iteration inside your editor
Where it still needs improvement
To be fair:
- It’s not a full replacement for dbt
- Visualization can still improve
- Edge cases exist (as with any parser)
But as a developer tool, it fills a gap that’s been ignored for years.
Final thoughts
SQL lineage shouldn’t be this hard.
And yet:
- Most tools are tied to one ecosystem
- Or require heavy infrastructure
- Or simply break on real-world SQL
What surprised me most is this:
A lightweight, local approach actually works better in many cases.
Try it yourself
If you work with SQL seriously, it’s worth testing:
👉 https://marketplace.visualstudio.com/items?itemName=gudusoftware.gudu-sql-omni
Takes less than a minute to install.
I’m curious
If you’ve struggled with SQL lineage before:
- What tools did you try?
- Where did they fail?
I’d love to compare notes.




Top comments (0)