<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Zeeshan Ghazanfar</title>
    <description>The latest articles on DEV Community by Zeeshan Ghazanfar (@zeeshan_ghazanfar_97).</description>
    <link>https://dev.to/zeeshan_ghazanfar_97</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3899937%2F9fc5658f-ba59-47bb-bb31-b4cbd93de8b0.png</url>
      <title>DEV Community: Zeeshan Ghazanfar</title>
      <link>https://dev.to/zeeshan_ghazanfar_97</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zeeshan_ghazanfar_97"/>
    <language>en</language>
    <item>
      <title>We measured 72% 91% accuracy on Text-to-SQL over a 600-table ERP - what actually fixed it</title>
      <dc:creator>Zeeshan Ghazanfar</dc:creator>
      <pubDate>Mon, 27 Apr 2026 11:45:17 +0000</pubDate>
      <link>https://dev.to/zeeshan_ghazanfar_97/we-measured-72-91-accuracy-on-text-to-sql-over-a-600-table-erp-what-actually-fixed-it-4a29</link>
      <guid>https://dev.to/zeeshan_ghazanfar_97/we-measured-72-91-accuracy-on-text-to-sql-over-a-600-table-erp-what-actually-fixed-it-4a29</guid>
      <description>&lt;p&gt;We measured 72% → 91% accuracy on Text-to-SQL over a 600-table ERP - what actually fixed it&lt;/p&gt;

&lt;p&gt;We deployed a Text-to-SQL agent on a legacy ERP with 612 tables, ~8,400 columns, and inconsistent naming across modules. Initial offline eval looked acceptable. Production said otherwise.&lt;/p&gt;

&lt;p&gt;Baseline&lt;/p&gt;

&lt;p&gt;Model: GPT-4 class via API&lt;br&gt;
Context: full schema dump + user query&lt;br&gt;
Prompt: standard "generate SQL from question"&lt;br&gt;
Eval set: 220 business questions from analysts&lt;/p&gt;

&lt;p&gt;Results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exact match accuracy: 72%&lt;/li&gt;
&lt;li&gt;Execution success: 81%&lt;/li&gt;
&lt;li&gt;Latency: ~4.8s avg&lt;/li&gt;
&lt;li&gt;Production failure rate (week 1): 31% required human correction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What failed&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Schema overload&lt;br&gt;
600+ tables in context diluted attention&lt;br&gt;
Wrong joins between similarly named tables like orders vs order_hdr vs order_archive&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Semantic mismatch&lt;br&gt;
Business language did not match schema&lt;br&gt;
Example: "revenue" mapped to total_amount instead of net_sales&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Join path ambiguity&lt;br&gt;
Multiple valid joins existed&lt;br&gt;
Model picked shortest path, not correct one&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Silent logical errors&lt;br&gt;
Queries executed but returned wrong numbers&lt;br&gt;
Hardest to detect&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What moved accuracy to 91%&lt;/p&gt;

&lt;p&gt;This was not fixed with prompting alone. We changed the system.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Schema retrieval layer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of passing full schema:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Embedded table names, column names, sample values&lt;/li&gt;
&lt;li&gt;Retrieved top 8–15 relevant tables per query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Impact:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accuracy: 72% → 84%&lt;/li&gt;
&lt;li&gt;Latency reduced by ~1.2s&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Join graph constraints&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Built join graph from DB metadata + query logs&lt;br&gt;
Forced model to only use valid relationships&lt;/p&gt;

&lt;p&gt;Impact:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accuracy: 84% → 88%&lt;/li&gt;
&lt;li&gt;~60% reduction in incorrect joins&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Business term mapping&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Added translation layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;revenue → net_sales&lt;/li&gt;
&lt;li&gt;customer → client_id&lt;/li&gt;
&lt;li&gt;orders → sales_order_hdr&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hybrid approach (rules + embeddings) worked best&lt;/p&gt;

&lt;p&gt;Impact:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accuracy: 88% → 90%&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;SQL self-check and retry&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Second pass:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Validate generated SQL against schema and intent&lt;/li&gt;
&lt;li&gt;Allow one retry&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Impact:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accuracy: 90% → 91%&lt;/li&gt;
&lt;li&gt;Execution success: 81% → 93%&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Evaluation change&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Switched to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exact match OR result equivalence&lt;/li&gt;
&lt;li&gt;added real production edge cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Initial accuracy dropped, then stabilized with real signal&lt;/p&gt;

&lt;p&gt;What did not work&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Larger models: &amp;lt;2% gain&lt;/li&gt;
&lt;li&gt;Longer prompts: worse performance&lt;/li&gt;
&lt;li&gt;Few-shot examples: inconsistent results&lt;/li&gt;
&lt;li&gt;Free schema exploration: more hallucinated joins&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Final production metrics&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accuracy: 91%&lt;/li&gt;
&lt;li&gt;Execution success: 93%&lt;/li&gt;
&lt;li&gt;Avg latency: 3.6s&lt;/li&gt;
&lt;li&gt;Human intervention: 31% → 9%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Still unsolved&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex financial aggregations&lt;/li&gt;
&lt;li&gt;Time-based comparisons across inconsistent date fields&lt;/li&gt;
&lt;li&gt;Implicit business rules not present in DB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bottom line&lt;/p&gt;

&lt;p&gt;Text-to-SQL at enterprise scale is not a prompt problem.&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieval&lt;/li&gt;
&lt;li&gt;constraints&lt;/li&gt;
&lt;li&gt;evaluation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without controlling schema exposure and join paths, accuracy plateaus early.&lt;/p&gt;

</description>
      <category>erp</category>
      <category>talktodb</category>
      <category>database</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
