<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Matthieu</title>
    <description>The latest articles on DEV Community by Matthieu (@mattxchir).</description>
    <link>https://dev.to/mattxchir</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3770932%2Fdeadcfe3-7fe1-4a40-997f-2542a930a8f0.png</url>
      <title>DEV Community: Matthieu</title>
      <link>https://dev.to/mattxchir</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mattxchir"/>
    <language>en</language>
    <item>
      <title>I don’t hate SQL. I hate metadata friction.</title>
      <dc:creator>Matthieu</dc:creator>
      <pubDate>Sun, 15 Feb 2026 07:03:08 +0000</pubDate>
      <link>https://dev.to/mattxchir/i-dont-hate-sql-i-hate-metadata-friction-5g01</link>
      <guid>https://dev.to/mattxchir/i-dont-hate-sql-i-hate-metadata-friction-5g01</guid>
      <description>&lt;p&gt;I don’t struggle with writing SQL, but what I struggle with is everything around it. You know the drill:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open BigQuery console.&lt;/li&gt;
&lt;li&gt;Write a simple query on top of INFORMATION_SCHEMA.&lt;/li&gt;
&lt;li&gt;Realize you forgot the partition column.&lt;/li&gt;
&lt;li&gt;Copy an old query.&lt;/li&gt;
&lt;li&gt;Tweak it.&lt;/li&gt;
&lt;li&gt;Realize you need another column.&lt;/li&gt;
&lt;li&gt;Open docs.&lt;/li&gt;
&lt;li&gt;Switch to Airflow to check if you can get the answer faster.&lt;/li&gt;
&lt;li&gt;Go back to the BigQuery console.&lt;/li&gt;
&lt;li&gt;Forget what you were checking.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of this is hard. It’s just constant, low-level friction repeated every other day.&lt;/p&gt;

&lt;p&gt;So I built something small to reduce that friction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;I regularly ask questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which tables changed schema recently?&lt;/li&gt;
&lt;li&gt;What jobs are consuming the most slots?&lt;/li&gt;
&lt;li&gt;Why was BigQuery slow yesterday?&lt;/li&gt;
&lt;li&gt;Why are BigQuery costs skyrocketing?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;BigQuery exposes many answers through INFORMATION_SCHEMA.&lt;/p&gt;

&lt;p&gt;But the queries are rarely trivial. They’re long. They require joins. You need to remember field names. And I constantly find myself going back to the docs for small details.&lt;/p&gt;

&lt;p&gt;So I started modeling the metadata in DBT.&lt;/p&gt;

&lt;p&gt;I created a few summarized tables for jobs, storage, schema changes, dependencies, and slot usage.&lt;/p&gt;

&lt;p&gt;Nothing revolutionary. Just structured the data the way I actually use it. That helped, but I was still writing the same kinds of queries repeatedly.&lt;/p&gt;

&lt;h2&gt;
  
  
  So I threw an LLM at it
&lt;/h2&gt;

&lt;p&gt;Once I had clean DBT models with decent descriptions, the next step felt obvious.&lt;/p&gt;

&lt;p&gt;Instead of writing SQL, what if I just asked questions? Most BI tools advertise Text-to-SQL solutions for business users. Why would it not work for engineers as well?&lt;/p&gt;

&lt;p&gt;Here’s what the current setup looks like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;DBT models summarize BigQuery metadata&lt;/li&gt;
&lt;li&gt;The data is synced into DuckDB (to avoid scanning BigQuery every time, and because I don't trust the LLMs).&lt;/li&gt;
&lt;li&gt;I pass the schema + column descriptions to an LLM.&lt;/li&gt;
&lt;li&gt;The LLM generates SQL.&lt;/li&gt;
&lt;li&gt;SQL runs against DuckDB&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Results are shown in a small terminal UI. That’s it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flejlrq1equjdpujp25wf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flejlrq1equjdpujp25wf.png" alt="TUI console" width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’s basically a TUI where I can “chat” with my metadata.&lt;/p&gt;

&lt;h2&gt;
  
  
  A real example
&lt;/h2&gt;

&lt;p&gt;One week BigQuery was painfully slow. There was likely slot contention. Probably a new transformation was deployed earlier that week.&lt;/p&gt;

&lt;p&gt;Instead of digging manually, I tried:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;why was bigquery slow this week?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It analyzed slot usage and long-running jobs and identified a new transformation model that was timing out after 6 hours.&lt;/p&gt;

&lt;p&gt;I checked with the team responsible. They said they’d fix it.&lt;/p&gt;

&lt;p&gt;A few days later, I asked:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Are the slot timeouts gone?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It confirmed that the long-running jobs had disappeared.&lt;/p&gt;

&lt;p&gt;That’s when I realized this was actually useful for first-pass investigations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4fl6dcgz54ijgvms0wxq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4fl6dcgz54ijgvms0wxq.png" alt="Analytics mode" width="800" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  It’s not perfect
&lt;/h2&gt;

&lt;p&gt;There were issues immediately.&lt;/p&gt;

&lt;p&gt;First, long answers. Even simple questions triggered overly detailed reports. So I split it into two modes: Fast and Analytics.&lt;/p&gt;

&lt;p&gt;But the biggest challenge was hallucinations.&lt;/p&gt;

&lt;p&gt;BigQuery job IDs are UUID-like strings. Even when the query result was correct, the LLM sometimes invented job IDs in its explanation.&lt;/p&gt;

&lt;p&gt;That’s obviously not acceptable.&lt;/p&gt;

&lt;p&gt;I tried adding validation steps, but they quickly became too expensive in terms of tokens. Right now, I return raw query results along with summaries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3o3oqtsemrd5fwg7ck18.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3o3oqtsemrd5fwg7ck18.png" alt="Expand raw results from queries" width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I also hit an infinite loop where the LLM queried the data, interpreted it, then queried again, and so on. Luckily, I only had $5 in credits. I added strict usage limits after that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why DuckDB?
&lt;/h2&gt;

&lt;p&gt;Mainly cost and speed. I didn’t want to have accidental expensive scans and slow iteration loops.&lt;/p&gt;

&lt;p&gt;So I sync the relevant metadata into DuckDB and query that instead.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F279xum6zyq7smqx7tfm7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F279xum6zyq7smqx7tfm7.png" alt="TUI architecture" width="734" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It works well at this scale, though obviously you can’t mirror unlimited history.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this tool actually is
&lt;/h2&gt;

&lt;p&gt;This is not a BI or an observability tool. It’s just a faster way to ask operational questions without writing repetitive metadata queries.&lt;/p&gt;

&lt;p&gt;It can be used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First-level investigations&lt;/li&gt;
&lt;li&gt;Sanity checks&lt;/li&gt;
&lt;li&gt;Avoiding copy-paste SQL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s it. I use it when I need quick answers.&lt;/p&gt;

&lt;p&gt;It doesn’t replace proper analysis. But it lowers the activation energy. And that alone makes it useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open-sourcing?
&lt;/h2&gt;

&lt;p&gt;I’m considering open-sourcing it. Right now, it works with BigQuery metadata, but technically, you could plug in any BigQuery dataset with proper table definitions.&lt;/p&gt;

&lt;p&gt;But before I invest more time to support more data engineering ops metadata, I’d love to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Would you use something like this?&lt;/li&gt;
&lt;li&gt;Does this solve a real annoyance for you?&lt;/li&gt;
&lt;li&gt;What would immediately make it unusable?&lt;/li&gt;
&lt;li&gt;What would make it indispensable?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If there’s interest, I’ll clean up the repo and open-source it.&lt;/p&gt;

&lt;p&gt;I’m trying to validate whether this is just my personal itch or something broader.&lt;/p&gt;

&lt;p&gt;Honest feedback (including brutal feedback) is welcome.&lt;/p&gt;

</description>
      <category>bigquery</category>
      <category>ai</category>
      <category>automation</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
