<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: leo gu</title>
    <description>The latest articles on DEV Community by leo gu (@leo_gu_475098bd7f84e326ed).</description>
    <link>https://dev.to/leo_gu_475098bd7f84e326ed</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3860420%2Fb7608389-37d9-434b-ba0d-327f36b160ee.png</url>
      <title>DEV Community: leo gu</title>
      <link>https://dev.to/leo_gu_475098bd7f84e326ed</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/leo_gu_475098bd7f84e326ed"/>
    <language>en</language>
    <item>
      <title>I Tried to Analyze SQL Lineage Across 15 Databases — Everything Broke Until I Did This</title>
      <dc:creator>leo gu</dc:creator>
      <pubDate>Sat, 04 Apr 2026 04:31:00 +0000</pubDate>
      <link>https://dev.to/leo_gu_475098bd7f84e326ed/i-tried-to-analyze-sql-lineage-across-15-databases-everything-broke-until-i-did-this-m5j</link>
      <guid>https://dev.to/leo_gu_475098bd7f84e326ed/i-tried-to-analyze-sql-lineage-across-15-databases-everything-broke-until-i-did-this-m5j</guid>
      <description>&lt;h2&gt;
  
  
  The problem nobody talks about
&lt;/h2&gt;

&lt;p&gt;If you’ve ever worked with SQL at scale, you’ve probably run into this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Queries spanning multiple schemas
&lt;/li&gt;
&lt;li&gt;dbt models referencing each other
&lt;/li&gt;
&lt;li&gt;Views built on top of views
&lt;/li&gt;
&lt;li&gt;Different SQL dialects (Snowflake, BigQuery, Spark…)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And then someone asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Where does this column actually come from?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At that moment, everything falls apart.&lt;/p&gt;

&lt;p&gt;Not because the answer doesn’t exist —&lt;br&gt;&lt;br&gt;
but because &lt;strong&gt;your tools can’t give it to you&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  I tried to solve it the “normal” way
&lt;/h2&gt;

&lt;p&gt;I went through the usual stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dbt lineage graphs
&lt;/li&gt;
&lt;li&gt;Database-native tools
&lt;/li&gt;
&lt;li&gt;SQL IDEs like DataGrip and DBeaver
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They work… until they don’t.&lt;/p&gt;

&lt;p&gt;Here’s where things break:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ Cross-database lineage? Forget it
&lt;/li&gt;
&lt;li&gt;❌ Offline analysis? Not possible
&lt;/li&gt;
&lt;li&gt;❌ Large SQL projects? Slow or incomplete
&lt;/li&gt;
&lt;li&gt;❌ Raw SQL parsing? Surprisingly fragile
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Especially when you mix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Snowflake + dbt
&lt;/li&gt;
&lt;li&gt;Spark SQL + Hive
&lt;/li&gt;
&lt;li&gt;BigQuery + custom scripts
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  So I ran an experiment
&lt;/h2&gt;

&lt;p&gt;I wanted to see how bad it really is.&lt;/p&gt;

&lt;p&gt;So I tested SQL lineage across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10+ SQL dialects
&lt;/li&gt;
&lt;li&gt;dbt projects with hundreds of models
&lt;/li&gt;
&lt;li&gt;Real-world open-source repositories
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dbt projects (~400+ models)&lt;/li&gt;
&lt;li&gt;Spark / Hive SQL codebases&lt;/li&gt;
&lt;li&gt;Data warehouse examples across multiple vendors
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal was simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can I reliably trace column-level lineage across all of them?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Short answer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No existing tool handled all of it well.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The workaround that actually worked
&lt;/h2&gt;

&lt;p&gt;Instead of relying on cloud tools or database engines, I tried something different:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Analyze SQL locally, directly inside VS Code&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That’s where this comes in:&lt;/p&gt;

&lt;p&gt;👉 gudu sql omni (VS Code extension)&lt;/p&gt;

&lt;p&gt;It’s essentially:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A local, offline SQL lineage engine that supports multiple databases&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What makes it different?
&lt;/h2&gt;

&lt;p&gt;Here’s what stood out immediately:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Works across multiple SQL dialects
&lt;/h3&gt;

&lt;p&gt;Not just one database.&lt;/p&gt;

&lt;p&gt;It handled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Snowflake
&lt;/li&gt;
&lt;li&gt;BigQuery
&lt;/li&gt;
&lt;li&gt;Spark SQL
&lt;/li&gt;
&lt;li&gt;Hive
&lt;/li&gt;
&lt;li&gt;Redshift
&lt;/li&gt;
&lt;li&gt;Databricks
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…in a single workflow.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Fully offline
&lt;/h3&gt;

&lt;p&gt;No:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;uploading SQL
&lt;/li&gt;
&lt;li&gt;connecting to cloud services
&lt;/li&gt;
&lt;li&gt;worrying about sensitive data
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything runs locally.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Actually parses complex SQL
&lt;/h3&gt;

&lt;p&gt;Including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;nested queries
&lt;/li&gt;
&lt;li&gt;CTE chains
&lt;/li&gt;
&lt;li&gt;dbt-style transformations
&lt;/li&gt;
&lt;li&gt;multi-layer views
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where most tools fail.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it looks like in practice
&lt;/h2&gt;

&lt;p&gt;Inside VS Code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open a SQL file (or a project)&lt;/li&gt;
&lt;li&gt;Run lineage analysis&lt;/li&gt;
&lt;li&gt;Instantly get:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 table-level lineage  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbok5p3dy053y29idk54f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbok5p3dy053y29idk54f.png" alt=" " width="800" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;👉 column-level lineage  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foen689m1b01nj1apvo9t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foen689m1b01nj1apvo9t.png" alt=" " width="800" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;👉 dependency graph  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhw82xbarp1rgkb3tnyb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhw82xbarp1rgkb3tnyb.png" alt=" " width="800" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No setup. No infra.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjyikutokasng5x6tsxl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjyikutokasng5x6tsxl.png" alt=" " width="491" height="549"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Real use case: dbt projects
&lt;/h2&gt;

&lt;p&gt;This is where things get interesting.&lt;/p&gt;

&lt;p&gt;dbt already provides lineage — but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It’s tied to dbt ecosystem
&lt;/li&gt;
&lt;li&gt;Requires dbt setup
&lt;/li&gt;
&lt;li&gt;Not always flexible for raw SQL
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With a local parser:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can analyze dbt SQL without dbt runtime
&lt;/li&gt;
&lt;li&gt;You can inspect edge cases dbt doesn’t visualize well
&lt;/li&gt;
&lt;li&gt;You can debug transformations faster
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Where this approach wins
&lt;/h2&gt;

&lt;p&gt;After testing across multiple datasets, this approach works best when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have mixed SQL environments
&lt;/li&gt;
&lt;li&gt;You need offline analysis
&lt;/li&gt;
&lt;li&gt;You deal with large SQL codebases
&lt;/li&gt;
&lt;li&gt;You want fast iteration inside your editor
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Where it still needs improvement
&lt;/h2&gt;

&lt;p&gt;To be fair:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It’s not a full replacement for dbt
&lt;/li&gt;
&lt;li&gt;Visualization can still improve
&lt;/li&gt;
&lt;li&gt;Edge cases exist (as with any parser)
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But as a &lt;strong&gt;developer tool&lt;/strong&gt;, it fills a gap that’s been ignored for years.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;SQL lineage shouldn’t be this hard.&lt;/p&gt;

&lt;p&gt;And yet:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most tools are tied to one ecosystem
&lt;/li&gt;
&lt;li&gt;Or require heavy infrastructure
&lt;/li&gt;
&lt;li&gt;Or simply break on real-world SQL
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What surprised me most is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A lightweight, local approach actually works better in many cases.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you work with SQL seriously, it’s worth testing:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://marketplace.visualstudio.com/items?itemName=gudusoftware.gudu-sql-omni" rel="noopener noreferrer"&gt;https://marketplace.visualstudio.com/items?itemName=gudusoftware.gudu-sql-omni&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Takes less than a minute to install.&lt;/p&gt;




&lt;h2&gt;
  
  
  I’m curious
&lt;/h2&gt;

&lt;p&gt;If you’ve struggled with SQL lineage before:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What tools did you try?&lt;/li&gt;
&lt;li&gt;Where did they fail?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’d love to compare notes.&lt;/p&gt;

</description>
      <category>database</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
