When I started working with Snowflake Cortex Analyst, I assumed the hard part would be getting the system to answer questions correctly.
It wasn't. The hard part was deciding which questions it shouldn't answer.
In this post I want to share two things that took more thought than I expected — verified queries and guardrails.
A Quick Overview of Cortex Analyst
Snowflake Cortex Analyst lets users ask questions in plain English and get answers from structured data. Under the hood, it uses a semantic model defined in YAML to understand the data and generate SQL responses.
There are two ways it can respond:
- Verified queries — pre-validated question-answer pairs you define
- LLM-generated SQL — the model generates SQL on its own when no verified query matches
The goal of a well-structured semantic model is to maximize verified query hits. The more questions route through verified queries, the more controlled and reliable your output.
The Verified Queries Trade-off
My first instinct was to add as many verified query variations as possible — cover every way a user might ask the same question.
That backfired.
| Approach | Problem |
|---|---|
| Too few variations | Model misses valid questions, falls back to LLM generation |
| Too many variations | Introduces noise, wrong query gets matched |
The Guardrail Problem — Define What It Shouldn't Do
This is the part most people skip.
In data engineering we always plan for edge cases. I applied the same thinking here — users will assume this works like any AI tool and ask anything. You can't control that. So instead of trying to restrict users, I put the responsibility on the YAML.
Cortex Analyst has a question_categorization block where you explicitly define categories of questions the system should refuse. Here's a simplified example:
question_categorization:
- category: unavailable_topics
examples:
- "What is the return rate by supplier?"
- "Show me customer lifetime value"
- category: greetings
examples:
- "Hey"
- "Can you help me?"
- category: forecast_or_prediction
examples:
- "What will sales look like next month?"
- "Predict inventory needs for Q4"
- category: ambiguous_queries
examples:
- "Show me something interesting"
- "What should I look at?"
Without this block, the system will attempt to answer everything — including questions it has no business answering. That doesn't happen by itself. You have to build it in.
Summary
- Structure your semantic model to maximize verified query hits, not just expose data.
- Verified queries need enough variation to be useful — but too many creates noise.
- Use
question_categorizationto explicitly define what the system should refuse. - Think defensively from day one — don't wait for something to break in production.
Still early in this build, but these are the decisions I'm glad I made at the start rather than retrofitting later.


Top comments (0)