<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Coco</title>
    <description>The latest articles on DEV Community by Coco (@random_coco).</description>
    <link>https://dev.to/random_coco</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3929034%2F13d2894a-75f9-4c11-bf13-9ac3aa9fcd05.jpg</url>
      <title>DEV Community: Coco</title>
      <link>https://dev.to/random_coco</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/random_coco"/>
    <language>en</language>
    <item>
      <title>In some scenarios, relying on LLMs to generate SQL is neither rigorous nor reliable.Right way to teach LLMs to generate SQL is here</title>
      <dc:creator>Coco</dc:creator>
      <pubDate>Tue, 19 May 2026 07:24:05 +0000</pubDate>
      <link>https://dev.to/random_coco/in-some-scenarios-relying-on-llms-to-generate-sql-is-neither-rigorous-nor-reliable-13c6</link>
      <guid>https://dev.to/random_coco/in-some-scenarios-relying-on-llms-to-generate-sql-is-neither-rigorous-nor-reliable-13c6</guid>
      <description>&lt;p&gt;Over the past two years, tools that generate SQL from natural language have become very popular.&lt;br&gt;&lt;br&gt;
Type a question in plain English, and an LLM produces SQL for you. It's convenient.&lt;/p&gt;

&lt;p&gt;But from my experience on real‑world projects, there is a clear problem when business requirements demand &lt;strong&gt;high accuracy&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This isn't because LLMs are weak. It's because of how they work – &lt;strong&gt;non‑deterministically&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three practical issues with LLM‑generated SQL
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Same question, different answers
&lt;/h3&gt;

&lt;p&gt;The same natural language description can lead to different SQL statements. Edge cases may be handled inconsistently.&lt;br&gt;&lt;br&gt;
For financial reports, contract calculations, or any scenario where numbers must be exact, this is unacceptable.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Lack of explainability
&lt;/h3&gt;

&lt;p&gt;If the SQL is wrong, it's very hard to know why the LLM chose a particular table, column, or join condition.&lt;br&gt;&lt;br&gt;
Debugging becomes guesswork.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Security and cost
&lt;/h3&gt;

&lt;p&gt;Every query incurs API costs. And there have been real RCE vulnerabilities caused by LLM‑generated malicious code (for example, in older versions of Vanna).&lt;br&gt;&lt;br&gt;
Running such systems in production requires extra safeguards.&lt;/p&gt;

&lt;h2&gt;
  
  
  My approach: deterministic graph pathfinding
&lt;/h2&gt;

&lt;p&gt;I'm not against LLMs.&lt;br&gt;&lt;br&gt;
In fact, I think they are great for &lt;strong&gt;intent understanding&lt;/strong&gt; and &lt;strong&gt;structured configuration generation&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
But I don't think they should generate SQL directly – especially for complex multi‑table queries.&lt;/p&gt;

&lt;p&gt;So I tried a different approach: turning SQL generation from &lt;strong&gt;probabilistic reasoning&lt;/strong&gt; into &lt;strong&gt;deterministic graph pathfinding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;How it works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model your database tables and relationships as a &lt;strong&gt;weighted directed graph&lt;/strong&gt; (tables as nodes, possible joins as edges). This graph is defined at deployment time – call it the “queryable graph”.&lt;/li&gt;
&lt;li&gt;Users only need to describe &lt;strong&gt;which fields and filters they want&lt;/strong&gt; using a simple &lt;strong&gt;JSON DSL&lt;/strong&gt; – no need to write any JOINs.&lt;/li&gt;
&lt;li&gt;The engine starts from the tables specified by the user, &lt;strong&gt;statically prunes&lt;/strong&gt; the queryable graph to a relevant subgraph, then uses &lt;strong&gt;Dijkstra's algorithm&lt;/strong&gt; to dynamically find the shortest join path, automatically filling in intermediate tables.&lt;/li&gt;
&lt;li&gt;The final SQL is &lt;strong&gt;100% deterministic&lt;/strong&gt; – same input, same SQL, every time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I built this into an open source tool called &lt;strong&gt;Lexipathos&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two‑level pruning: both flexible and controllable
&lt;/h2&gt;

&lt;p&gt;The key is the &lt;strong&gt;two‑level graph design&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;At deployment time&lt;/strong&gt;: define &lt;code&gt;resourceDictionary&lt;/code&gt; based on your business ER diagram (which tables, field labels, relationships). This defines the upper bound of what the engine can query.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;At runtime&lt;/strong&gt;: the user specifies the tables they care about via &lt;code&gt;dataSources&lt;/code&gt; (e.g., only “contracts” and “buildings”). The engine routes &lt;strong&gt;only within that subgraph&lt;/strong&gt;, automatically filling in intermediate tables (e.g., “contract_units”, “units”).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As a result, each query only touches a closed subgraph of 3–6 tables. The full database graph is never dragged into a single query, which avoids cycles and ambiguity.&lt;/p&gt;

&lt;h2&gt;
  
  
  A quick example
&lt;/h2&gt;

&lt;p&gt;Imagine an office leasing system with tables: &lt;code&gt;contracts&lt;/code&gt;, &lt;code&gt;units&lt;/code&gt;, &lt;code&gt;buildings&lt;/code&gt;, &lt;code&gt;tenants&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You want to query: for all active contracts, show tenant name, unit area, and building address.&lt;/p&gt;

&lt;p&gt;You only need to write this JSON:&lt;/p&gt;

&lt;p&gt;`&lt;code&gt;json&lt;br&gt;
{&lt;br&gt;
  "dataSources": [&lt;br&gt;
    {"table": "contracts"},&lt;br&gt;
    {"table": "buildings"}&lt;br&gt;
  ],&lt;br&gt;
  "rowDims": [&lt;br&gt;
    "tenants.name",&lt;br&gt;
    "units.lease_area",&lt;br&gt;
    "buildings.address"&lt;br&gt;
  ],&lt;br&gt;
  "filters": [&lt;br&gt;
    {"field": "contracts.status", "op": "=", "value": "active"}&lt;br&gt;
  ]&lt;br&gt;
}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The engine automatically discovers the join path: &lt;code&gt;contracts → contract_units → units → buildings&lt;/code&gt;, and &lt;code&gt;contracts → tenants&lt;/code&gt;, then generates the complete SQL including all necessary JOINs.&lt;/p&gt;

&lt;p&gt;You never write a JOIN again.&lt;/p&gt;

&lt;h2&gt;
  
  
  More than SQL generation: built‑in features
&lt;/h2&gt;

&lt;p&gt;Besides automatic JOIN inference and deterministic generation, Lexipathos includes several practical features for business analytics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Enum auto‑translation&lt;/strong&gt;: internal English codes (e.g., &lt;code&gt;"vacant"&lt;/code&gt;) are automatically translated to human‑readable text in responses, while filters still use the codes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two output formats&lt;/strong&gt;: returns both raw &lt;code&gt;rows&lt;/code&gt; and a flattened &lt;code&gt;flat&lt;/code&gt; structure, plus a pivot &lt;code&gt;matrix&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In‑memory computation&lt;/strong&gt;: supports executing JavaScript on the &lt;code&gt;flat&lt;/code&gt; array (e.g., grouping, ratios, ratings) without changing SQL or restarting the service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Combined dimensions&lt;/strong&gt;: merge multiple fields into one column, supporting string concatenation or arithmetic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All these features are controlled via the JSON DSL – no code changes required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security note: in‑memory computation has risks
&lt;/h2&gt;

&lt;p&gt;To be honest, the &lt;code&gt;calcFn&lt;/code&gt; and &lt;code&gt;calcFnForMatrix&lt;/code&gt; design is &lt;strong&gt;very aggressive&lt;/strong&gt; – it uses &lt;code&gt;new Function()&lt;/code&gt; to dynamically execute JavaScript strings coming from the request. This introduces risks of code injection, infinite loops, and memory leaks.&lt;/p&gt;

&lt;p&gt;Therefore, &lt;strong&gt;this mechanism is currently only recommended for trusted internal network environments&lt;/strong&gt;. If exposed to the public internet, either disable in‑memory computation or move it to the client side.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pluggable business configuration
&lt;/h2&gt;

&lt;p&gt;Although the project uses “office leasing” as a demonstration case, the engine itself is completely decoupled from the business domain. Switching to another industry only requires changing three configuration files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;resourceDictionary.js&lt;/code&gt;: defines tables, field labels, and relationships&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;values_mapping.js&lt;/code&gt;: defines enum translation rules&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dataLoader.js&lt;/code&gt;: defines how to load data into DuckDB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No engine code needs to be changed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Not anti‑LLM, but AI‑friendly in a different way
&lt;/h2&gt;

&lt;p&gt;If you still want a natural language interface, you can let an LLM generate the JSON configuration, and then Lexipathos executes it deterministically.&lt;br&gt;&lt;br&gt;
This gives you the best of both worlds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flexible frontend: natural language → JSON&lt;/li&gt;
&lt;li&gt;Reliable backend: JSON → deterministic SQL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No more “the AI changed my query and broke the report”.&lt;/p&gt;

&lt;h2&gt;
  
  
  Here is my work's link.
&lt;/h2&gt;

&lt;p&gt;Lexipathos is open source, runs on DuckDB, and is written in JavaScript.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/cocosiu/lexipathos" rel="noopener noreferrer"&gt;cocosiu/lexipathos&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The best natural language interaction is to let an AI agent learn how to use this engine – narrowing down the query semantics and avoiding hallucinations. Lexipathos has already been integrated with OpenClaw. If you have thoughts or want more information, feel free to discuss.I will show you some examples&lt;/p&gt;

</description>
      <category>ai</category>
      <category>sql</category>
      <category>database</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I got tired of calculating commercial lease billing by hand, so I built a tool</title>
      <dc:creator>Coco</dc:creator>
      <pubDate>Wed, 13 May 2026 10:50:09 +0000</pubDate>
      <link>https://dev.to/random_coco/i-got-tired-of-calculating-commercial-lease-billing-by-hand-so-i-built-a-tool-4nij</link>
      <guid>https://dev.to/random_coco/i-got-tired-of-calculating-commercial-lease-billing-by-hand-so-i-built-a-tool-4nij</guid>
      <description>&lt;p&gt;I worked in commercial real estate. Not as a developer — as an operator.&lt;br&gt;
Every month, someone on the team had to sit down and manually calculate billing schedules. Every contract had a free rent period, a rent escalation clause, a non-standard start date. Usually all three. You’d open Excel, start calculating day counts, apply the escalation rate, handle the stub period at the beginning, handle the one at the end.&lt;br&gt;
It took hours. It was error-prone. And when you got it wrong, the tenant pushed back.&lt;br&gt;
I got tired of it. I knew how to code. So I built something.&lt;br&gt;
What the problem actually looks like&lt;br&gt;
A real contract:&lt;br&gt;
Lease starts March 15&lt;br&gt;
Free rent for the first 2 months&lt;br&gt;
From month 3: base rate kicks in&lt;br&gt;
From month 13: base rate × 1.05 (anniversary escalation)&lt;br&gt;
Billing anchor is the 15th of each month&lt;br&gt;
Now generate a clean monthly billing schedule for 3 years. Every period needs the right dates, the right applied rate, the right amount. The stub periods at the start and end need to be handled correctly. The free rent and escalation can overlap.&lt;br&gt;
This is not a hard algorithm problem. It’s a tedious, high-stakes edge case problem. Get it wrong and you have a legal dispute. Get it right and you’ve spent two hours on something that should be automated.&lt;br&gt;
@cosiu/periodix — an npm package that takes a lease contract and returns a fully split billing schedule.&lt;br&gt;
const ContractPeriodMonthSplitter = require('@cosiu/periodix');&lt;/p&gt;

&lt;p&gt;&lt;code&gt;const result = ContractPeriodMonthSplitter.splitContractPeriods({&lt;br&gt;
  startDate: '2024-03-15',&lt;br&gt;
  endDate: '2027-03-14',&lt;br&gt;
  pivotDate: '2024-03-15',&lt;br&gt;
  area: 500,&lt;br&gt;
  baseTotalRentRate: 10000,&lt;br&gt;
  serviceRate: 3,&lt;br&gt;
  freePeriods: [&lt;br&gt;
    { startDate: '2024-03-15', endDate: '2024-05-14' }&lt;br&gt;
  ],&lt;br&gt;
  increaseRules: [&lt;br&gt;
    {&lt;br&gt;
      type: 'ANNIVERSARY',&lt;br&gt;
      anchorDate: '2024-03-14',&lt;br&gt;
      rate: 0.05&lt;br&gt;
    }&lt;br&gt;
  ]&lt;br&gt;
});&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Each period in the output has exact dates, the applied escalation rate, rent and service fee amounts, and a flag for whether it’s a stub period or a full month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;try it:npm install @cosiu/periodix&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;GitHub: &lt;a href="https://github.com/cocosiu/periodix" rel="noopener noreferrer"&gt;https://github.com/cocosiu/periodix&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
If you’re building property management software and you’ve been solving this problem yourself, I’d be curious whether your approach is different. And if you find an edge case this doesn’t handle, open an issue.&lt;/p&gt;

</description>
      <category>realestate</category>
      <category>javascript</category>
      <category>npm</category>
      <category>proptech</category>
    </item>
  </channel>
</rss>
