Natural language to SQL gets expensive when the agent keeps solving the same problem from scratch.
In our Talk-to-DB layer, we added a semantic query cache:
text-embedding-3-large- 3072-dimensional embeddings
- pgvector cosine similarity
- top 5 similar cached queries injected into the system prompt
- table schemas stored with the cached SQL
- user feedback stored as
thumb_uporthumb_down
The failure mode was predictable.
A user would ask:
"Show revenue by product."
The agent would generate a working SQL query.
Next week, another user would ask:
"Which products made the most sales?"
Same intent. Same schema. But the model might regenerate a slightly different query, sometimes with the classic Odoo mistake: using product_product.list_price when the column actually lives on product_template.
The cache does not blindly replay SQL. It gives the agent prior working queries plus the table schemas used by those queries. If the request is close enough, the agent can adapt the known-good pattern instead of rediscovering joins.
For standardized Odoo databases, we also support a global cache because the schema shape is repeatable across deployments. For custom databases, the cache stays scoped to the datasource.
One honest caveat: similarity thresholds matter. Too low and irrelevant SQL leaks into the prompt. Too high and the agent misses useful prior work. We currently use per-instance config with top 5 candidates and a similarity threshold that needs tuning against real query logs.
This is the part of production AI people underestimate.
The first deployment is not the finish line. The system gets better when every query, failure, schema mismatch, and user correction becomes operating memory.
Top comments (0)