Niclas Olofsson

Posted on Jan 5

Copy-paste is not a workflow: building dbt-core-mcp

#dbt #ai #githubcopilot #mcp

Three tools. Three windows. Clipboard gymnastics for breakfast, lunch, and dinner. I snapped and built something that actually works.

About this article

I wrote this. The ideas are mine; the execution is collaborative.

I write SQL transformations for a living. Medallion architecture, source to final data product, the whole pipeline. It's what data engineers do. And if you do it professionally, the SQL gets complex fast. CTEs stacked on CTEs, proper structure, because the alternative is unmaintainable spaghetti that no one can debug six months later.

Here's the problem: I can't actually develop those transformations in VS Code.

The tooling is primitive. SQL syntax highlighting barely works. Mix in Jinja (which we have to do for dbt) and it falls apart completely. Command completion? Forget it. Compared to what a C# or Python developer has at their fingertips, SQL in an IDE feels like we're still in 2005. dbt Fusion promises a better future, but we're not there yet.

So I develop transformations in a SQL editor. Databricks in my case. That's where I can actually run queries, test CTEs, see results, iterate. I write the transformation there until it works, then I copy it back to my dbt model file in VS Code and manually re-add all the dbt syntax: the ref() calls, the source() references, the Jinja. Hope I didn't break something in translation. Test it with dbt run. Find out I broke something. Repeat.

I'm developing in two places at once, manually translating between them.

And that's before AI enters the picture. Copilot is actually really good at SQL. It understands dbt syntax, helps review transformations, suggests improvements. So naturally I use it. Which means now I'm shuttling context between three places: VS Code for the model file, my SQL editor for testing, and Copilot for help. Copy this, paste that, translate back, repeat.

It's exhausting. And it's slow.

Why every existing tool is wrong

Let's talk about your options if you want AI help with dbt in VS Code. Spoiler: they all suck in different ways.

Power User for dbt? 388,000 installs, impressive feature list, but the AI parts need an Altimate API key. Which means your schema, your SQL, your metadata—all living on their servers. Datamates says "local-first" in the marketing but funny story: you still need an account at their SaaS platform and they still upload your "metadata" (schema, SQL, task summaries). Turns out "local-first" has a flexible definition.

And Power User itself has its own environmental issues—tries to install its own dbt environment that conflicts with yours, doesn't respect your adapter setup. Not great when you're already juggling version compatibility.

Here's the thing that drives me insane: they don't actually need your data on their servers. This isn't a technical requirement. It's an architecture decision. They built everything around their SaaS platform because that's their business model. Meanwhile, GitHub Copilot already has my code in the editor. There's zero technical reason another vendor needs my proprietary schemas living on their infrastructure.

The official dbt extension from dbt Labs? Actually looks promising—proper IntelliSense and everything. But it needs Fusion, which is still in beta and not production-ready. So that's future, not now.

And here's the kicker: none of them solve the actual problem. I'm still writing transformations in Databricks, then copy-pasting back to VS Code and manually re-adding all the dbt syntax. The tools just gave me a fourth window to juggle.

What should have been obvious

Here's what I'm not willing to compromise on (and frankly, shouldn't have to):

No copy-paste. That's the core requirement. I shouldn't be developing a transformation in one tool and copying it to another. I shouldn't be copying SQL to test it, or copying results back for analysis. Copy-paste means I'm manually shuttling information between disconnected tools. That's not a workflow—that's duct tape.

Syntax highlighting that actually works. SQL mixed with Jinja shouldn't break the editor. I need to see the structure of my queries—CTEs, joins, subqueries—at a glance. This is baseline functionality that's been standard in IDEs for decades.

IntelliSense for dbt. When I type ref(', I should see a list of available models. When I reference a column, the editor should know if it exists. When I change a model's schema, the editor should tell me what breaks downstream. This is how C# developers have worked since Visual Studio existed. There's no reason dbt can't have the same.

Integrated documentation. When I'm using a dbt function, adapter-specific syntax, or just need to remember how window functions work in SQL—I shouldn't have to context-switch to a browser. Whether it's Databricks SQL functions or standard SQL syntax, the IDE should show me what I need, when I need it.

One place to develop. I should be able to write a transformation and test it in the same environment. Not develop in Databricks, then copy it to VS Code and manually re-add all the dbt syntax. The iteration loop—write, test, refine—should happen in one place.

My data stays mine. I work with proprietary business logic. Schema definitions that represent months of modeling decisions. Transformations that encode competitive advantages. None of that belongs on a third-party vendor's servers just because they built their architecture around cloud uploads. A proper IDE works with my code locally.

Use what I already have. I have dbt installed. I have adapters configured. I have virtual environments. Whatever IDE tooling I use should work with my setup, not force me to maintain a parallel dbt installation with different versions and configurations.

These aren't luxury features. This is the foundation of software development that's existed for 30 years. dbt work deserves the same.

Then help arrived (thank god)

Those fundamentals I just listed? Syntax highlighting, IntelliSense, integrated docs—that's table stakes. The baseline we've had for decades in other languages. dbt is finally getting there with Fusion.

But here's what nobody saw coming:

AI assistance. Not the "autocomplete on steroids" kind. The "I understand what you're trying to do and can actually help you do it" kind. A pair programmer who doesn't get tired, doesn't need coffee breaks, and has instant access to documentation you'd spend 20 minutes searching for.

And here's the thing: it changes what we need from the IDE. I still need syntax highlighting and IntelliSense—I'm responsible for the code, I need to review it, own it, understand it. But the workflow shifts. The conversation becomes the anchor. I work through talking to the AI. The AI works with the IDE.

When it works right, you forget you're even working with AI—it just feels like the IDE finally understands what you're doing.

But there's a problem. My shiny new AI colleague can read my code. Suggest improvements. Write SQL. But when it comes to dbt? It's useless. Can't run anything. Can't query models. Can't check dependencies. It's stuck giving advice and waiting for me to be its hands. "You should run dbt list to see what's affected." Thanks, Copilot. Really helpful. Let me just switch to my terminal again.

So the AI can see everything but do nothing. It's a consultant, not a coworker.

That's the gap. And that's what I fixed.

What it actually looks like when it works

I'm troubleshooting a complex transformation. Medallion architecture, bronze through gold. The final mart model has twelve CTEs stacked on each other, and somewhere in that stack the numbers are wrong. I ask Copilot, "Test the intermediate aggregation CTE, just that fragment." It extracts that CTE and all its dependencies (just what's needed, nothing more) and executes it against the warehouse, shows me the results. The bug is in the join logic. I fix it. "Test it again." It does. Numbers look right now.

The conversation becomes the workflow—query execution, analysis, and iteration all in one place.

I need to add a new field to a mart. That means tracing it back through silver, all the way to bronze, potentially adding a new source table we haven't ingested yet. I tell Copilot what I need. It helps me locate the data—we have hundreds of models, complex ERP structures with thousands of tables. It knows the ERP documentation. It suggests which source table to pull from. It helps me add the field through each layer, following whatever style that layer uses. We have legacy code. Different patterns in different areas. It adapts. The edits are almost flawless.

When I'm working with unfamiliar data (and in an ERP system, most data is unfamiliar), Copilot can query using ref() and source() syntax. It can use our macros. It understands the structure. It helps me discover what's actually in these tables, analyze it, figure out if it's what I need. It's like having a colleague who's already memorized the entire data warehouse.

I'm validating a new transformation. "Analyze the quality of this output." It runs the standard aggregations we always do as data engineers. Checks distributions. Finds nulls where there shouldn't be any. Traces the issue back to the source. Suggests fixes.

The model works, but it's slow. "How can we optimize this?" It reviews the SQL, suggests simplifications. Then it actually tests them. Runs the original. Runs the optimized version. Extracts query plans from both. Compares execution times. Shows me which approach is faster and why. It's not just advice—it's empirical.

All of this happens in the conversation. Copilot executes. I review. I decide. I own the code.

This is what I meant by "no copy-paste." Not just avoiding clipboard gymnastics, but eliminating the entire pattern of manually shuttling context between disconnected tools. The AI has the powers it needs to actually help me work. Run models. Query results. Check dependencies. Understand impact. Analyze data. Trace issues.

And here's the part that should go without saying but apparently doesn't: all of this happens locally. dbt-core-mcp calls my dbt CLI. Uses my warehouse connection. Reads my manifest. Copilot sees the results, but my schema and data never leave my environment. No API keys to third-party vendors. No accounts. No uploads. No "metadata" living on someone else's servers.

Just my tools. Doing the work they're supposed to do.

This is flow development. The IDE understands dbt. The AI can execute, not just advise. I state intent, it handles mechanics. The conversation is the workflow.

It is my code. I supervise. I decide. I own it. But I'm not doing the AI's legs anymore. And the AI isn't stuck waiting for me to be its hands.

Get started

NiclasOlofsson / dbt-core-mcp

dbt Core MCP Server: Interact with dbt projects via Model Context Protocol

dbt Core MCP Server

Meet your new dbt pair programmer - the one who actually understands your environment, respects your workflow, and does the heavy lifting.

Why This Changes Everything

If you've tried other dbt tools with Copilot (dbt power user, datamate, etc.), you know the pain:

They don't respect your Python environment
They can't see your actual project structure
They fail when adapters are missing from THEIR environment
You end up doing the work yourself anyway

dbt-core-mcp is different. It's not just another plugin - it's a true pair programming partner that:

Zero dbt Dependencies: Our server needs NO dbt-core, NO adapters - works with YOUR environment
Stays in Flow: Keep the conversation going with Copilot while it handles dbt commands, runs tests, and analyzes impact
Respects Your Environment: Detects and uses YOUR exact dbt version, YOUR adapter, YOUR Python setup (uv, poetry, venv, conda)
Actually…

View on GitHub

MIT licensed. Works with your existing dbt installation. Click the install buttons in the repo, point it at your dbt project, and you're running. No accounts. No uploads. Just dbt, but now your AI can actually help you use it.

DEV Community