Gowtham Potureddi

Posted on Jun 6

Databricks Genie (AI/BI): Text-to-SQL Spaces, Trust & Production Use Cases

#python #sql #dataengineering #interview

databricks genie is the chat-first surface every data leader thinks they understand — until the first board demo lands on the wrong number and the analytics team has to explain that the assistant cheerfully hallucinated a join across two fact tables that should never touch. Genie is not a thin GPT wrapper over the warehouse; it is an opinionated AI/BI product built around a semantic layer of curated datasets, plain-English instructions, and a corpus of verified sample queries — and the difference between a magical executive co-pilot and a trust-shattering toy is almost entirely a question of how that semantic layer is built and certified.

This guide is the manual every data engineer and BI lead wishes they had before pointing a Genie space at production. It walks the anatomy of an AI/BI Genie space, the six-stage text-to-sql flow under the hood, the four-label trust spectrum that governs which answers can face an exec, and the production rollout topology — dev → staging → certified — that makes the rollout safe. Each section pairs a teaching block with a Solution-Tail worked answer: code, a step-by-step trace, an output table, then a concept-by-concept breakdown of why it works.

When you want hands-on reps on the SQL that Genie itself drafts under the hood, drill the aggregation practice library →, rehearse on joins problems →, and stack the BI muscles with CTE and subquery drills →.

On this page

What Genie actually is — and what it is not
Anatomy of a Genie space
The text-to-SQL flow under the hood
The trust spectrum — verified, certified, hallucination
Production rollout topology and governance
Cheat sheet — Genie recipes
Frequently asked questions
Practice on PipeCode

1. What Genie actually is — and what it is not

`databricks genie` is a curated, governed AI/BI surface on top of Unity Catalog — not a thin chat-on-warehouse wrapper

The one-sentence invariant: a Databricks Genie space is a curated, permissioned, instruction-rich semantic context that a foundation model uses to draft auditable SQL against a Unity Catalog warehouse — every answer is constrained by what the space was certified to see, never by what the raw warehouse contains. Once you internalise that "Genie answers are only as good as the curated dataset plus instructions plus sample queries," the entire "why does the pilot keep producing wrong numbers?" debate collapses into a list of missing artefacts.

Where Genie sits in the Databricks platform.

AI/BI Genie — chat-first text-to-SQL surface. One Genie space = one domain (one star schema, one fact table, one set of metrics). Lives next to Dashboards and SQL Editor inside the Databricks workspace.
AI/BI Dashboards — the BI canvas: charts, filters, and parameter widgets. Static visualisation; Genie is the conversational counterpart that drafts the SQL behind each chart.
SQL Editor — the power-user IDE: hand-written SQL, query history, query profiles. Genie shares the same Serverless DBSQL warehouse but adds the semantic layer + LLM drafting layer on top.
Unity Catalog — the governance backbone. Every read Genie performs goes through UC row filters, column masks, and table ACLs. Genie cannot leak a row the asker is not permitted to see.

What Genie is not.

Not a generic GPT-on-SQL plugin. Generic plugins point an LLM at the raw schema and hope. Genie spaces add three first-class artefacts the LLM must use: certified datasets, instructions, and sample queries. The LLM is constrained to the curated context, not the entire metastore.
Not a replacement for dashboards. Dashboards are the durable, exec-facing artefact; Genie is the ad-hoc exploration surface that can promote frequent questions into dashboards. The two coexist.
Not a free pass on data modelling. A messy star schema with three "revenue" columns will produce three contradictory Genie answers. Curate the dataset first, point Genie at it second.

The semantic-layer mindset in one paragraph.

Genie is best understood as a semantic layer in plain English. The dataset says "these are the tables Genie can see"; the instructions say "in our business, revenue means net_revenue after returns"; the sample queries say "here is the SQL pattern we use for the top-10 customers question." When the foundation model drafts SQL, it retrieves all three and weaves them into a query that matches both the user's English and the team's certified definitions. No semantic layer, no certified answers.

Cost model in 2026.

DBSQL Serverless is the compute. Every executed query bills the standard Serverless warehouse rate per second.
Foundation models endpoint drafts the SQL. Each turn (question → SQL → optional re-ask) bills per-token through the same pay-as-you-go Databricks foundation models endpoint that powers Mosaic AI.
Storage is the underlying Delta tables in Unity Catalog. Genie does not duplicate or move data.
Net effect: a heavily-used certified prod space tends to bill 60–70% Serverless warehouse + 25–35% foundation models + a residual for embedding/retrieval, depending on the question mix.

What interviewers and leadership listen for.

Do you say "Genie is a semantic layer on top of Unity Catalog," not "Genie is text-to-SQL"? — senior signal.
Do you reach for certified datasets, instructions, sample queries as the three artefacts that make a Genie space trustworthy? — required answer.
Do you mention trust badges (certified / verified / unverified) as the operational contract that decides who can ask what? — senior signal.
Do you recognise that every Genie answer is auditable because the "Show SQL" button + system audit tables capture both the natural-language question and the executed query? — required answer.

Worked example — a "GPT-on-warehouse" pilot that fails without the semantic layer

Detailed explanation. A BI team installs a generic LLM plugin that points at the raw main.sales schema and asks "what was last quarter's revenue by region?" The plugin emits SQL that sums gross_revenue from a staging table, ignores the test-order flag, and quietly drops the EMEA region because the column happened to be named region_code. The board demo is a disaster.

Question. Compare the SQL drafted by a naive GPT-on-warehouse plugin against the SQL Genie drafts when the same question is asked of a curated Genie space with an instruction revenue means net_revenue after returns; exclude is_test = true and a sample query for revenue by region.

Input — table main.sales.daily_orders.

order_id	region	gross_revenue	net_revenue	is_test	quarter
1	EU	100	95	false	2025-Q1
2	EU	200	0	false	2025-Q1
3	US	500	480	false	2025-Q1
4	US	50	50	true	2025-Q1
5	EU	9999	9999	true	2025-Q1

Code.

-- Naive GPT plugin draft — reads gross_revenue, ignores instructions
SELECT region, SUM(gross_revenue) AS revenue
FROM main.sales.daily_orders
WHERE quarter = '2025-Q1'
GROUP BY region;

-- Genie draft — bound by the curated instructions in the space
SELECT region, SUM(net_revenue) AS revenue
FROM main.sales.daily_orders
WHERE quarter = '2025-Q1'
  AND is_test = false
GROUP BY region
ORDER BY revenue DESC;

Step-by-step explanation.

The naive plugin has no access to the instruction "revenue means net_revenue" so it picks the first plausible column (gross_revenue). The number is bigger than the official metric.
The naive plugin also has no access to "exclude is_test = true," so it sums the synthetic load-test row (order_id=5, gross 9999). EU revenue is inflated by almost 10x.
Genie retrieves both instructions from the space and weaves them into the WHERE clause. The result is the team's certified definition of quarterly revenue per region.
Genie also retrieves the sample query for "revenue by region," which canonicalised the ORDER BY revenue DESC shape — so the result is in the order execs expect to read.

Output.

Source	EU revenue	US revenue
Naive plugin	10,299	550
Genie	95	480

Rule of thumb. "Point GPT at the warehouse" is a demo, not a product. A trustworthy AI/BI surface requires a curated dataset, plain-English instructions, and a sample-query corpus — the three artefacts that turn a chat into a certified answer.

Worked example — Unity Catalog ACLs constrain every Genie read

Detailed explanation. A finance analyst opens a Genie space that exposes main.finance.gl_lines. The same space also exists for HR, but HR users do not see the finance dataset. Genie does not invent a back-door — Unity Catalog enforces row filters and table ACLs uniformly, whether the SQL comes from a human in the SQL Editor or from a foundation model inside Genie.

Question. Given a Unity Catalog row filter that limits main.finance.gl_lines to rows where cost_center IN (caller_allowed_centers), what does Genie return when a user asks "what was total spend last month"?

Input — table main.finance.gl_lines with a UC row filter.

line_id	cost_center	amount	period
1	CC-100	1000	2026-05
2	CC-200	2000	2026-05
3	CC-100	500	2026-05
4	CC-300	5000	2026-05

Code.

-- Genie draft (visible via Show SQL)
SELECT SUM(amount) AS total_spend
FROM main.finance.gl_lines
WHERE period = '2026-05';

-- Effective query after UC row filter for a user
-- whose caller_allowed_centers = {'CC-100', 'CC-200'}
SELECT SUM(amount) AS total_spend
FROM main.finance.gl_lines
WHERE period = '2026-05'
  AND cost_center IN ('CC-100', 'CC-200');  -- injected by UC

Step-by-step explanation.

Genie drafts the unconstrained SQL: a simple SUM over the period. Genie cannot bypass UC — it does not even see the row filter at draft time.
At execute time, UC injects the row filter clause for the calling user. The user with {CC-100, CC-200} access sees a SUM over only those rows.
A second user, with {CC-300} access, runs the same English question and the same drafted SQL but UC injects cost_center IN ('CC-300'). They get a different number — correctly.
Genie's Show SQL button reveals the drafted query (before UC injection). The Databricks query profile reveals the executed query (after UC injection). Both are auditable.

Output.

User	Allowed cost centers	total_spend
Finance analyst	CC-100, CC-200	3,500
Plant manager	CC-300	5,000
Auditor	all	8,500

Rule of thumb. Genie does not have a separate permission model — it uses Unity Catalog's. Lock down access at the UC layer (row filters + column masks + table ACLs) and Genie inherits it automatically. There is no Genie ACL to forget to set.

Databricks Genie interview question — explain the cost model

A senior data leader often opens with: "Walk me through the cost model of a Genie space at scale — Serverless DBSQL, foundation model tokens, storage — and how you would govern each layer." It blends platform fluency, FinOps, and an honest accounting of where the bills come from.

Solution Using the three-layer Genie cost model

-- Per-space cost audit (system tables — pseudocode shape)
WITH warehouse_cost AS (
    SELECT
        space_id,
        SUM(billable_seconds * dbsql_serverless_rate) AS warehouse_usd
    FROM system.billing.dbsql_serverless_usage
    WHERE month = '2026-05'
    GROUP BY space_id
),
fm_cost AS (
    SELECT
        space_id,
        SUM(input_tokens) * input_rate
      + SUM(output_tokens) * output_rate AS foundation_models_usd
    FROM system.billing.foundation_models_usage
    WHERE month = '2026-05'
    GROUP BY space_id
),
storage_cost AS (
    SELECT
        space_id,
        SUM(bytes_stored) / 1e12 * delta_storage_rate AS storage_usd
    FROM system.billing.uc_storage_usage
    WHERE month = '2026-05'
    GROUP BY space_id
)
SELECT
    w.space_id,
    w.warehouse_usd,
    f.foundation_models_usd,
    s.storage_usd,
    w.warehouse_usd + f.foundation_models_usd + s.storage_usd AS total_usd
FROM warehouse_cost w
LEFT JOIN fm_cost     f USING (space_id)
LEFT JOIN storage_cost s USING (space_id)
ORDER BY total_usd DESC;

Step-by-step trace.

Layer	What bills	Example driver	Typical share
Serverless DBSQL	seconds the warehouse is hot, executing Genie SQL	each question that hits execute	60–70%
Foundation models	input + output tokens per drafted SQL	each turn (rewrite + draft + follow-up)	25–35%
UC storage	bytes the underlying Delta tables occupy	shared with all readers, not Genie-specific	5–10%

Each layer is billed independently and audited via its own system table. The aggregated CTE above is the shape every FinOps lead writes within the first month of rollout — it surfaces which space is expensive and whether the cost is warehouse-driven or model-driven.

Output:

space_id	warehouse_usd	foundation_models_usd	storage_usd	total_usd
finance_certified	1,820	690	110	2,620
sales_certified	940	320	80	1,340
ops_staging	410	280	60	750

Why this works — concept by concept:

Three independent meters — Serverless warehouse, foundation models endpoint, and UC storage are three separate billing meters. Treating them as one bucket hides which lever to pull.
Per-space attribution — the system tables expose space_id so FinOps can chargeback each LOB without manual tagging. This is the foundation of per-space query budgets.
Warehouse dominates at scale — if foundation models exceed warehouse spend, the space is asking many short questions or the sample-query corpus is thin (forcing the model to draft more from scratch). Either grow the corpus or cap turns per session.
Storage is shared — UC storage is not a Genie-specific cost; it is the underlying Delta table footprint. Genie users pay it whether they query through Genie or SQL Editor.
Cost — the audit query is O(rows in system tables) per month; system tables are partitioned by date, so the cost is negligible compared to the warehouse spend it surfaces.

SQL
Topic — aggregation
Aggregation problems for Genie analytics (SQL)

Practice →

Company
Databricks interview prep
Databricks practice library

Practice →

2. Anatomy of a Genie space

Every certified `ai/bi genie space` is six compartments plus a trust badge — datasets, instructions, sample queries, joins, certified-by, permissions

The mental model in one line: an AI/BI Genie space is a small, well-curated semantic context — datasets + instructions + sample queries + join hints + a named human owner + permissions — that the foundation model uses on every question; the badge in the top-right of the space tells you which compartments are populated and signed-off. Once you can list the six compartments cold, every "why did Genie pick the wrong join?" question reduces to "which compartment was empty?"

The six compartments.

Datasets — what the space can read. Curated tables, views, and materialised metrics from Unity Catalog. The rule of thumb is one star schema or one fact table per space. Pointing Genie at the entire main catalogue is the most common anti-pattern.
Instructions — plain-English domain rules. Definitions ("revenue means net_revenue after returns"), exclusions ("exclude where is_test = true"), vocabulary aliases ("when the user says GMV, they mean gross_merch_value"), and calendar conventions ("fiscal year starts Feb 1"). These are the artefact that turns business language into correct SQL.
Sample queries — the verified Q→SQL corpus. A library of 10–30 question/SQL pairs that anchor retrieval. Every certified answer should map to at least one sample query so retrieval has a strong template to draw from.
Joins + dimensions — explicit FK hints and measure declarations. Tell Genie that orders.customer_id → customers.customer_id and that net_revenue is the canonical measure. Without these, the model picks reasonable but wrong joins.
Certified by — the human owner. A named person (or two-person quorum), not a team alias. The badge in the corner of the space carries this name; an answer is "certified" only because a real human signed off on the dataset + instruction + sample-query bundle.
Permissions — three-layer ACL. Who can chat (analytics_users), who can edit (data_eng), who can certify (governance_leads). Mirrored to the workspace identity system so promotion + revocation are automatic.

Trust badges in the corner of the space.

Certified (green ✓✓) — datasets locked, instructions reviewed, sample-query corpus complete, a named human signed off in the last 90 days.
Verified (blue ✓) — used in production but not yet locked. Instructions cover the core questions, sample queries exist, no SME sign-off this quarter.
Draft / unverified (grey ⚠) — still being built. Analysts can iterate; execs should not see this space.

Backed by.

Serverless DBSQL — the warehouse that runs every executed query.
Foundation models endpoint — the LLM that drafts SQL from natural language.
Unity Catalog ACLs — every read goes through the row filters and column masks the table owner declared.

Common interview probes on space anatomy.

"What is the minimum certified space?" — one fact table, two-to-three dimensions, an instructions list of business definitions, 10+ sample queries, a named human owner.
"Why one space per domain?" — retrieval quality. A narrow context yields high-precision answers; a broad context dilutes the relevance signal and the model picks wrong tables.
"Can a space read across catalogs?" — yes, if UC ACLs allow it. The space lists fully-qualified table names from any catalog the owner has SELECT on.
"Who can certify a space?" — the principals listed in the certify permission. Convention is governance_leads, never the data engineers who wrote the dataset.

Worked example — minimal certified `sales_genie_space`

Detailed explanation. A sales analytics team stands up its first Genie space. The dataset is one fact table (daily_revenue) plus two dimensions (customer, date_dim). The instructions encode the team's revenue definition. The sample queries cover the eight most-asked exec questions. The space is then submitted for certification.

Question. Lay out the six compartments of a certified sales_genie_space for a team whose business definitions are: revenue = net_revenue after returns, exclude test orders, fiscal year starts Feb 1, and the canonical question vocabulary uses "GMV" for gross_merch_value.

Input — the curated dataset list.

Type	Object	Purpose
fact	`main.sales.daily_revenue`	one row per day per region per channel
dim	`main.sales.customer`	customer master with segment + region
dim	`main.analytics.date_dim`	calendar with fiscal year, quarter, week
metric	`main.sales.metrics.net_revenue`	UC metric view: net_revenue after returns

Code.

# space.yml — declarative space definition (managed via Databricks Asset Bundle)
name: sales_genie_space
description: Certified sales / revenue Genie space for FY26
datasets:
  - main.sales.daily_revenue
  - main.sales.customer
  - main.analytics.date_dim
  - main.sales.metrics.net_revenue
instructions: |
  - "revenue" means net_revenue after returns and refunds; never gross
  - exclude rows where is_test = true (synthetic load-test orders)
  - fiscal year starts Feb 1; FY26 = 2026-02-01 to 2027-01-31
  - "GMV" means gross_merch_value (the column on daily_revenue)
  - default time grain is "month" unless the user says day or week
sample_queries:
  - q: "revenue by region last quarter"
    sql: |
      SELECT region, SUM(net_revenue) AS revenue
      FROM main.sales.daily_revenue
      WHERE quarter = current_quarter() - INTERVAL 1 QUARTER
        AND is_test = false
      GROUP BY region
      ORDER BY revenue DESC
  - q: "top 10 customers by revenue YTD"
    sql: |
      SELECT c.customer_id, c.name, SUM(d.net_revenue) AS ytd_revenue
      FROM main.sales.daily_revenue d
      JOIN main.sales.customer c ON d.customer_id = c.customer_id
      WHERE d.fiscal_year = 'FY26'
        AND d.is_test = false
      GROUP BY 1, 2
      ORDER BY ytd_revenue DESC
      LIMIT 10
joins:
  - daily_revenue.customer_id -> customer.customer_id
  - daily_revenue.date_key    -> date_dim.date_key
certified_by:
  - alice.chen@co
  - bob.patel@co
permissions:
  chat:    analytics_users
  edit:    data_eng_sales
  certify: governance_leads

Step-by-step explanation.

The datasets list is intentionally short: one fact, two dimensions, one metric view. Genie cannot read anything outside this list, no matter how the user phrases the question.
The instructions encode the four most-violated business rules. Each is one sentence, present-tense, and audit-friendly. Genie injects this list into every prompt.
The sample_queries corpus anchors retrieval. When a user asks "revenue by region this quarter," retrieval matches the first sample query 1:1 and the drafted SQL is a near-exact reuse — the answer is certified.
The joins block teaches Genie the canonical FK relationships. Without this, the model might join daily_revenue to customer via a string match on email and silently miscount.
The certified_by list is two named humans; the badge in the UI displays both. Revoking either one downgrades the badge from certified back to verified.
The three-layer permissions block maps to UC + workspace identity groups; promotion and revocation are managed in one place.

Output (badge in the space UI).

Field	Value
Space name	`sales_genie_space`
Trust badge	green ✓✓ certified
Certified by	`alice.chen@co`, `bob.patel@co`
Last certified	2026-05-12
Datasets	4
Instructions	5
Sample queries	8

Rule of thumb. A certified space is small — four datasets, five instructions, eight to twenty sample queries, two named humans. If the space is larger than that, it is two spaces pretending to be one. Split before you certify.

Worked example — instructions that fix a hallucination in one line

Detailed explanation. Genie draws a chart of "revenue by quarter" and shows Q1 starting in January. The CFO points out that the company's fiscal year starts February 1 — so Q1 should be Feb–Apr, not Jan–Mar. The fix is a single instruction; no SQL change, no dataset change.

Question. Add the instruction that corrects Genie's quarter alignment and show how the drafted SQL changes on the next "revenue by quarter" question.

Input — daily_revenue table.

order_date	net_revenue
2026-01-15	100
2026-02-20	200
2026-04-30	300
2026-05-10	400

Code.

-- Before the instruction — calendar quarter (wrong)
SELECT QUARTER(order_date) AS q, SUM(net_revenue) AS revenue
FROM main.sales.daily_revenue
GROUP BY q
ORDER BY q;

-- After adding instruction: "fiscal year starts Feb 1; FY26 Q1 = Feb-Apr"
SELECT
    CASE
        WHEN MONTH(order_date) BETWEEN 2 AND 4 THEN 'FY-Q1'
        WHEN MONTH(order_date) BETWEEN 5 AND 7 THEN 'FY-Q2'
        WHEN MONTH(order_date) BETWEEN 8 AND 10 THEN 'FY-Q3'
        ELSE 'FY-Q4'
    END AS fiscal_quarter,
    SUM(net_revenue) AS revenue
FROM main.sales.daily_revenue
WHERE is_test = false
GROUP BY 1
ORDER BY 1;

Step-by-step explanation.

Before the instruction, Genie uses the dialect default (calendar quarter). The Jan-15 row aggregates with the Q1 bucket, putting it before the official FY26 start.
After the instruction is added, retrieval injects "fiscal year starts Feb 1" into the prompt. The model rewrites the quarter expression as a CASE on month.
The Jan-15 row now correctly aggregates into the previous fiscal year's Q4. The Feb-Apr rows aggregate into FY-Q1.
The instruction also pulled is_test = false from the standing rules — even though the question said nothing about test orders, Genie applies the standing exclusion every time.

Output (after fix).

fiscal_quarter	revenue
FY-Q1	500
FY-Q2	400
(prior FY) Q4	100

Rule of thumb. The cheapest hallucination fix is almost always a new sentence in the instructions, not a code change. Audit the unverified-answer log weekly; for each wrong answer, ask "is there a single English sentence that would have prevented this?" If yes, add it.

Databricks Genie interview question — design a certified marketing space

A senior interviewer often poses: "Stand up a certified Genie space for our Marketing team — campaign performance, attribution, channel mix. What datasets, instructions, sample queries, and permissions would you ship on day one?"

Solution Using the six-compartment template

# marketing_genie_space — day-one certified bundle
name: marketing_genie_space
datasets:
  - main.marketing.campaign        # one row per campaign
  - main.marketing.daily_spend     # one row per campaign per channel per day
  - main.marketing.attribution     # multi-touch attribution lines
  - main.analytics.date_dim
  - main.sales.daily_revenue       # for ROAS denominator
instructions: |
  - "spend" means committed_spend, not invoiced_spend
  - "ROAS" means revenue attributed to the campaign / committed_spend
  - exclude campaigns where status = 'cancelled'
  - default attribution model is data_driven; mention "last touch" to override
  - fiscal year starts Feb 1
sample_queries:
  - q: "ROAS by channel last month"
    sql: |
      WITH spend AS (
        SELECT channel, SUM(committed_spend) AS spend
        FROM main.marketing.daily_spend
        WHERE month = date_trunc('month', current_date - INTERVAL 1 MONTH)
        GROUP BY 1
      ),
      attrib AS (
        SELECT channel, SUM(attributed_revenue) AS revenue
        FROM main.marketing.attribution
        WHERE month = date_trunc('month', current_date - INTERVAL 1 MONTH)
          AND attribution_model = 'data_driven'
        GROUP BY 1
      )
      SELECT s.channel,
             a.revenue,
             s.spend,
             ROUND(a.revenue / NULLIF(s.spend, 0), 2) AS roas
      FROM spend s
      LEFT JOIN attrib a USING (channel)
      ORDER BY roas DESC NULLS LAST;
  - q: "top 5 campaigns YTD by attributed revenue"
    sql: |
      SELECT c.campaign_id, c.name,
             SUM(a.attributed_revenue) AS revenue
      FROM main.marketing.campaign c
      JOIN main.marketing.attribution a
        ON a.campaign_id = c.campaign_id
      WHERE c.status != 'cancelled'
        AND a.fiscal_year = 'FY26'
      GROUP BY 1, 2
      ORDER BY revenue DESC
      LIMIT 5;
joins:
  - daily_spend.campaign_id -> campaign.campaign_id
  - attribution.campaign_id -> campaign.campaign_id
  - daily_spend.date_key    -> date_dim.date_key
certified_by:
  - priya.shah@co (marketing analytics lead)
  - jordan.lee@co (governance)
permissions:
  chat:    marketing_users + finance_users
  edit:    data_eng_marketing
  certify: governance_leads

Step-by-step trace.

Compartment	Why this content	Risk it mitigates
Datasets	five tables — fact + spend + attribution + date + revenue	cross-team metric drift; revenue and spend live in one space
Instructions	five sentences covering vocab, exclusions, defaults	"what counts as ROAS" debates in QBR
Sample queries	two — ROAS by channel, top campaigns	retrieval has a strong anchor for the two most-asked questions
Joins	three FK hints	model picking attribution → revenue via date alone
Certified by	two humans, one analytics, one governance	single-point-of-failure on ownership
Permissions	chat for marketing + finance, certify for governance only	finance can ask but cannot edit definitions

Output:

Metric	Day-one value
Datasets	5
Instructions	5
Sample queries	2 (will grow to 15+ in staging)
Certified by	2 humans
Trust badge	green ✓✓ certified

Why this works — concept by concept:

Narrow dataset, multi-table — five tables is still narrow; what matters is that each table earns its place. Adding main.support.tickets because "someone might ask about churn" is the path to a diluted space.
Instructions are the cheapest lever — every English sentence in instructions is injected into every Genie prompt. Five sentences cover the four most-asked-about ambiguities.
Sample queries anchor retrieval — the two seeded queries seed the corpus; staging will grow it to 15+. Without seed queries, Genie is drafting blind on day one.
Two-name certification — one analytics lead (subject expertise) plus one governance owner (process expertise). Either can revoke; both must sign off to recertify.
Split chat from certify — chat is broad (marketing + finance) so the value is felt; certify is narrow (governance only) so the badge means something.
Cost — building the bundle is days, not weeks. The Asset Bundle below versions every artefact in Git; the diff for next quarter's recertify is small.

SQL
Topic — joins
JOIN problems for AI/BI spaces (SQL)

Practice →

SQL
Topic — dimensional modeling
Dimensional modeling drills

Practice →

3. The text-to-SQL flow under the hood

Every Genie answer goes through six stages — rewrite → retrieve → draft → validate → execute → audit — and any of the six can be the failure point

The mental model in one line: Genie's text-to-sql pipeline is six stages, not one — question rewrite + intent classification, retrieval over the semantic context, LLM-drafted SQL, SQL validation against Unity Catalog, execution on Serverless DBSQL, and an auditable result with trust badge. Once you can trace a question through all six, every "why did Genie do X?" debug becomes "which stage went sideways?"

The six stages in detail.

Stage 1 — Question rewrite + intent classification. The model normalises the user's English (resolves pronouns from the chat history, expands abbreviations like "GMV" via the instructions, classifies the intent as aggregation / lookup / trend / comparison). This is the cheapest stage to skip and the easiest to debug.
Stage 2 — Retrieval. A hybrid retriever pulls the most-relevant items from three buckets: dataset schemas, instructions, and sample queries. The retrieved context is the prompt's "semantic layer."
Stage 3 — LLM SQL draft. The Databricks foundation models endpoint drafts a SQL query using the retrieved context. Drafting is constrained by the schema and instructions; the model is not free to invent tables.
Stage 4 — SQL validation. The drafted SQL is parsed, checked for syntactic validity, dry-run against Unity Catalog (does each referenced table exist? does the calling user have SELECT?), and rejected if any check fails. The model retries up to a bounded number of times with the error message in context.
Stage 5 — Execute. The validated SQL runs on the Serverless DBSQL warehouse attached to the space. Unity Catalog row filters and column masks are injected at this layer.
Stage 6 — Result + audit. The result table (and an auto-suggested chart) is returned to the user. The Show SQL button reveals the drafted query; system.access.audit and the Genie usage tables log the question, the SQL, the trust label, the user, and the latency.

What Genie blocks by design.

Cross-space leak. A space cannot read tables outside its datasets list, even if the calling user has UC access to them. The space is its own boundary.
Ungoverned reads. Every read goes through UC. There is no "raw" mode that bypasses row filters.
Unsafe DDL. Genie refuses to draft DROP, DELETE, UPDATE, INSERT, MERGE, ALTER, CREATE, GRANT, REVOKE. It is read-only by construction.
Cross-database joins the space did not declare. If customer.region is not in the dataset list, the model cannot reach for it.

The Show SQL button — your trust anchor.

Every answer the model produces is paired with the exact SQL it ran. Click Show SQL and you see (a) the question, (b) the drafted SQL, (c) the executed SQL after UC injection (if you have admin rights), and (d) the query profile. Genie is not a black box — it is a visible box.

Follow-up turns inherit context.

When the user asks a follow-up ("now break that by channel"), Genie reuses the prior conversation's dataset, filters, and trust label. The context window is per-conversation; a new conversation starts fresh. This is the difference between "GPT chat" (chain-of-thought) and "Genie" (chain-of-context).

Common interview probes on the flow.

"What happens when a drafted SQL fails validation?" — the model retries with the error message in context, up to N attempts (typically 3). If still failing, Genie returns a "low confidence — please rephrase" message with a hallucination flag.
"Can I see the drafted SQL?" — yes, the Show SQL button is on every answer. This is the single most-important trust mechanism.
"What logs the question + SQL pair?" — system.access.audit for the execute step, plus Genie's own usage tables for the question text and trust label.
"Does Genie cache results?" — Serverless DBSQL caches result sets the same way SQL Editor does. Genie itself does not introduce an additional cache layer.

Worked example — tracing one question through the six stages

Detailed explanation. A user asks "how did last quarter's revenue compare to the quarter before, by region?" The trace walks through what each stage produces and what it sends to the next stage.

Question. Trace the natural-language question through stages 1–6 of Genie's text-to-SQL pipeline. Show the intermediate artefacts each stage emits.

Input. The user types into the sales_genie_space chat panel: "how did last quarter's revenue compare to the quarter before, by region?"

Code (the artefacts emitted at each stage).

=== Stage 1 — Question rewrite + intent classification ===
intent          = "comparison + aggregation"
metric          = "revenue" (→ instruction: net_revenue)
group_by        = "region"
time_filter     = "last quarter vs prior quarter"
test_exclusion  = applied from standing instruction (is_test = false)

=== Stage 2 — Retrieval (top-k from space) ===
datasets        : main.sales.daily_revenue, main.analytics.date_dim
instructions    : "revenue means net_revenue", "exclude is_test = true",
                  "fiscal year starts Feb 1"
sample_queries  : "revenue by region last quarter" (0.92 cosine score)

=== Stage 3 — LLM SQL draft ===

WITH q AS (
    SELECT
        region,
        CASE
            WHEN quarter = current_quarter() - INTERVAL 1 QUARTER THEN 'last'
            WHEN quarter = current_quarter() - INTERVAL 2 QUARTER THEN 'prior'
        END AS bucket,
        net_revenue
    FROM main.sales.daily_revenue
    WHERE is_test = false
      AND quarter IN (
          current_quarter() - INTERVAL 1 QUARTER,
          current_quarter() - INTERVAL 2 QUARTER
      )
)
SELECT
    region,
    SUM(CASE WHEN bucket = 'last'  THEN net_revenue ELSE 0 END) AS last_q,
    SUM(CASE WHEN bucket = 'prior' THEN net_revenue ELSE 0 END) AS prior_q,
    ROUND(100.0 * (
        SUM(CASE WHEN bucket = 'last'  THEN net_revenue ELSE 0 END)
      - SUM(CASE WHEN bucket = 'prior' THEN net_revenue ELSE 0 END)
    ) / NULLIF(
        SUM(CASE WHEN bucket = 'prior' THEN net_revenue ELSE 0 END), 0
    ), 2) AS pct_change
FROM q
GROUP BY region
ORDER BY last_q DESC;

=== Stage 4 — Validation ===
parser_ok       = true
uc_ack_ok       = true (caller has SELECT on daily_revenue)
dry_run_ok      = true (estimated cost: 0.012 DBU)

=== Stage 5 — Execute on Serverless DBSQL ===
warehouse       = sales_serverless_xs
duration_ms     = 480
rows_returned   = 4

=== Stage 6 — Result + audit ===
chart_suggested = grouped bar
trust_label     = verified (matched sample query)
audit_logged    = system.access.audit + genie.usage_log

Step-by-step explanation.

Stage 1 normalises the question into structured intent. "Revenue" is expanded via the instructions to mean net_revenue. The standing is_test = false exclusion is attached automatically because it is a standing instruction, not a per-question opt-in.
Stage 2 retrieves three items: two table schemas, three instruction strings, and one sample query whose embedding cosine score (0.92) is above the retrieval threshold. The sample query is the strongest signal.
Stage 3 drafts the SQL by templating the sample query's shape (compare two quarters, group by region) and weaving in the standing exclusion. The PCT-change column is added because the user's word "compare" classified intent as comparison.
Stage 4 validates: SQL parses; the calling user has UC SELECT on daily_revenue; the dry-run cost estimate is low. All three green-light execute.
Stage 5 runs on the attached Serverless warehouse. Sub-second latency on a small fact table; the result has four rows (one per region).
Stage 6 paints the result, suggests a grouped-bar chart, and labels the answer verified because the drafted SQL was a near-1:1 match against a sample query. The audit row is written; the question, SQL, label, and latency are now queryable.

Output (the answer in the chat panel).

region	last_q	prior_q	pct_change
US	1,850,000	1,720,000	+7.56
EU	980,000	920,000	+6.52
APAC	420,000	360,000	+16.67
LATAM	110,000	130,000	-15.38

Rule of thumb. If a Genie answer is wrong, debug by stage: re-ask and click Show SQL to see Stage 3's draft. If the draft is wrong but uses the right tables, fix Stage 1 (rewrite is misclassifying intent) or Stage 2 (retrieval is missing a key instruction). If the draft references the wrong tables, fix the dataset list or add an explicit join hint.

Worked example — when validation rescues a hallucination

Detailed explanation. The model drafts a SQL query referencing a column that does not exist (gross_revenue on a fact table that only has net_revenue). Validation catches it at Stage 4. The model retries with the error message in context, and the second draft is correct.

Question. Trace what happens when the LLM hallucinates a column name. Show how the validation stage prevents the bad SQL from executing.

Input. The user asks: "sum gross revenue by region this month." The instructions say gross_revenue is not the same as revenue and points at a different metric, but the user explicitly typed gross revenue.

Code.

-- Stage 3 attempt #1 (HALLUCINATION — gross_revenue not on this table)
SELECT region, SUM(gross_revenue) AS gross_revenue
FROM main.sales.daily_revenue
WHERE month = '2026-05'
GROUP BY region;

-- Stage 4 validation result:
--   parser_ok = true
--   uc_ack_ok = true
--   dry_run_ok = FALSE
--   error: "column gross_revenue does not exist on main.sales.daily_revenue"

-- Stage 3 attempt #2 (retry with error in context — uses correct column)
SELECT region, SUM(gross_merch_value) AS gross_revenue
FROM main.sales.daily_revenue
WHERE month = '2026-05'
  AND is_test = false
GROUP BY region;

-- Stage 4 validation result:
--   parser_ok = true
--   uc_ack_ok = true
--   dry_run_ok = TRUE

Step-by-step explanation.

The first draft references gross_revenue directly. Validation dry-runs the query against the UC catalog; the parser binds names against the actual schema and rejects the query.
The retry loop feeds the error message back into the prompt: column gross_revenue does not exist. The model is also given the actual column list of daily_revenue.
The second draft picks the closest match — gross_merch_value — which the instructions actually call "GMV." The user's mismatch ("gross revenue" vs the column name) is resolved by the retry.
Validation passes. Execute runs. The user sees the right answer with a verified (not certified) trust label, because the retry path indicates the question is slightly off-spec.

Output (after retry).

region	gross_revenue
US	2,150,000
EU	1,180,000
APAC	510,000

Rule of thumb. Validation is the cheapest hallucination guardrail Genie has. If a column or table is missing, the draft is rejected before it ever runs. The model can retry; if it still cannot draft valid SQL after N tries, Genie surfaces "low confidence, please rephrase" — which is far better than running incorrect SQL.

Databricks Genie interview question — explain the retry budget

A senior interviewer often probes: "What is Genie's retry budget when a drafted SQL fails validation, and what is the cost-vs-correctness trade-off you would tune?"

Solution Using a bounded retry budget with explicit fallback

-- Pseudocode shape of the retry budget logic, expressed as SQL
WITH attempts AS (
    SELECT generate_series(1, max_retries) AS attempt_n
)
SELECT
    attempt_n,
    CASE
        WHEN attempt_n = 1 THEN 'first draft from retrieved context'
        WHEN attempt_n = 2 THEN 'retry with parser error injected'
        WHEN attempt_n = 3 THEN 'retry with column list injected'
        ELSE                    'fallback: low-confidence flag, refuse'
    END AS strategy,
    CASE
        WHEN attempt_n <= 2 THEN 'cheap (input tokens dominate)'
        WHEN attempt_n  = 3 THEN 'expensive (schema injection adds tokens)'
        ELSE                    'no cost (refuse)'
    END AS cost_profile
FROM attempts;

Step-by-step trace.

attempt	what changes	success rate	cost
1	first draft from retrieved context	~85% on certified spaces	baseline
2	retry with parser error in prompt	recovers ~70% of failures	+1 input prompt
3	retry with full column list injected	recovers ~80% of remaining	+1 input prompt + schema tokens
≥4	refuse, hallucination flag, ask user to rephrase	n/a	none

A typical Databricks Genie tenant sets max_retries = 3. The success-rate gains drop sharply after the third attempt; beyond that, the model is mostly thrashing on a question the space cannot answer. Refusing is the correct behaviour — it is the operational signal that drives the "weekly unverified triage" loop.

Output:

Final outcome	Share of questions on a healthy space
Verified on attempt 1	~85%
Verified after retry	~10%
Refused (low confidence)	~5%
Wrong + executed	<1% (target)

Why this works — concept by concept:

Bounded retry budget — three attempts captures almost all transient failures (typo, missing column) without burning compute on questions the space genuinely cannot answer.
Error-in-prompt loop — feeding the validation error back into the prompt is much cheaper than re-retrieval. The model usually self-corrects within one extra pass.
Schema injection as the heavy hammer — the third attempt injects the full column list of the candidate tables. Expensive in tokens, but high-yield when the user's vocabulary differs from the column names.
Explicit refuse is a feature — surfacing "low confidence" to the user is the operational primitive that drives sample-query backlog. Each refusal is a candidate for a new instruction or sample query.
Hallucination floor — the goal is "wrong + executed" below 1% of all questions on certified spaces. That is the published SLA most teams use to justify exec exposure.
Cost — three attempts at ~200 input tokens + ~150 output tokens per attempt is the worst-case envelope. At foundation-models rates, that is fractions of a cent per refused question.

SQL
Topic — CTEs
CTE and subquery problems (SQL)

Practice →

SQL
Topic — case expression
CASE expression drills (SQL)

Practice →

4. The trust spectrum — verified, certified, hallucination

Every Genie answer carries one of four trust labels — `certified`, `verified`, `unverified`, `hallucination flag` — and the badge is the contract between AI/BI and the exec audience

The mental model in one line: trust in a Genie space is not a binary "is it right?" — it is a four-segment spectrum that maps the retrieval evidence + SME sign-off + self-declared confidence into a single colour-coded badge, and the badge determines which audience can act on the answer. Once you can describe the spectrum cold, every "can the CFO see this?" decision collapses into "what badge does the answer carry?"

The four trust labels.

Certified (green ✓✓) — the drafted SQL matched a sample query 1:1 and the space is itself certified by a named human within the last 90 days. This is the only label exec-facing dashboards should display.
Verified (blue ✓) — the drafted SQL is supported by retrieval (datasets + instructions matched), and the question has been asked-and-reviewed in the last 30 days, but it was not a direct sample-query match. Safe for analyst use; not for executive consumption.
Unverified (grey ⚠) — Genie drafted SQL using retrieved context, but nobody has reviewed the answer. Allowed only in sandbox / dev spaces; blocked from certified prod by default.
Hallucination flag (red ✗) — Genie self-declares low retrieval confidence (the retrieved context is thin, or validation failed after N retries). The answer is not surfaced; the user sees "I'm not confident — please rephrase."

How to read the badge in the UI.

Colour — green / blue / grey / red, displayed as a pill at the top of the answer card.
Label — the literal word (certified / verified / unverified / low confidence).
Tooltip — hover reveals the evidence trail: matched sample queries, retrieved instructions, last reviewed date.
Show SQL — click to see the drafted query and the executed query. Even an unverified answer is auditable.

Operational rituals — turning trust into a process.

Weekly — unverified triage. A 30-minute meeting where the analytics lead and an SME walk the past week's unverified answers. Each gets sorted into: certify (promote to sample query), discard (one-off), or instructions backlog (write a new English rule).
Monthly — refresh instructions. The instructions list is reviewed for staleness; obsolete rules are removed, new product/policy changes are added.
Quarterly — re-certify SMEs. The certified_by list is reviewed. Anyone who has left the team or changed roles is replaced; the badge resets to verified during the re-cert window.
Per-question — Show SQL audit. Anyone with admin rights can click Show SQL on any answer to verify the executed query matches the verbal claim.

Mapping trust labels to audience.

Certified spaces → exec-facing dashboards, board prep, CFO-tier reports.
Verified-only spaces → analyst exploration, internal team reports.
Sandbox / dev spaces → analytics engineers iterating; never exposed to non-DE users.

Audit trail — every answer is logged.

system.access.audit — captures the SQL, the calling user, the warehouse, the latency.
Genie usage tables — capture the natural-language question, the trust label, the matched sample-query id, the retrieved instruction ids.
Together they form a complete chain: question → drafted SQL → executed SQL → result rows → trust label → user. A JOIN across the two surfaces the full audit row.

Common interview probes on trust.

"What is the difference between certified and verified?" — certified means a 1:1 sample-query match in a space that itself has a green badge within 90 days; verified means retrieval supported the draft but it was not a direct sample-query match.
"Who promotes an unverified answer to certified?" — the SME during weekly triage. The drafted SQL becomes a new sample query in Git; the next time the question is asked, the badge is green.
"Can Genie certify itself?" — no. Genie can label an answer verified automatically, but certified requires a human to push the corresponding sample query into the space.
"What does the hallucination flag look like to the end user?" — the answer card is replaced by a low-confidence message: "I couldn't find a strong match in the certified context. Try rephrasing or contact the data team." No SQL is executed.

Worked example — moving an unverified question into the certified corpus

Detailed explanation. During weekly triage, the team finds three unverified questions from the past week. One was a real exec question, drafted correctly, but it had no sample-query match. The fix is to promote the drafted SQL into the sample-query corpus, sign it off, and ship it via Asset Bundle.

Question. Take an unverified answer and walk it through the promotion to certified status.

Input — the unverified row from the Genie usage log.

question	drafted_sql	result_rows	trust_label	user
"what was net new revenue from new customers FY26 YTD?"	`SELECT SUM(net_revenue) FROM ...`	1	unverified	cfo@co

Code.

# diff applied to sales_genie_space sample_queries.yml
- q: "net new revenue from new customers FY26 YTD"
  sql: |
    WITH first_order AS (
      SELECT customer_id, MIN(order_date) AS first_order_date
      FROM main.sales.daily_revenue
      WHERE is_test = false
      GROUP BY customer_id
    )
    SELECT SUM(d.net_revenue) AS net_new_revenue
    FROM main.sales.daily_revenue d
    JOIN first_order f ON f.customer_id = d.customer_id
    WHERE d.fiscal_year = 'FY26'
      AND f.first_order_date >= '2026-02-01'  -- new in FY26
      AND d.is_test = false;
  certified_by: alice.chen@co
  certified_on: 2026-05-19

# Promote via Databricks Asset Bundle
databricks bundle deploy --target staging
# SME smoke-tests on staging; if green, promote:
databricks bundle deploy --target prod

Step-by-step explanation.

The CFO's original question went through stages 1–6 of the text-to-SQL flow. Retrieval found a thin match (the dataset retrieved, but no sample query), so the answer was labelled verified (had retrieval) but not certified.
During weekly triage, the analytics lead reviews the question. The drafted SQL is correct. The lead opens a PR to add the question as a new sample query.
The PR ships via Databricks Asset Bundle — first to staging, where an SME smoke-tests the result against a known answer, then to prod.
The next time anyone asks the same question, retrieval matches the new sample query 1:1 and Genie labels the answer certified.

Output (badge journey).

When	Trust badge	Audience
Original CFO ask	unverified	analyst-only
After triage + Git PR	verified (in staging)	data team smoke-test
After bundle promote to prod	certified (next ask)	exec-facing

Rule of thumb. Every CFO-tier question that came back unverified is a candidate for promotion. The promotion path is Git → Asset Bundle → staging → prod. No "edit in the UI" path — the corpus is code.

Worked example — quarterly re-cert ritual

Detailed explanation. Quarterly, the governance team reviews every certified space. Each space must have (a) at least one named human still in role, (b) instructions reviewed in the last 90 days, (c) sample-query corpus exercised in the last 30 days. If any check fails, the badge auto-downgrades from certified to verified until the gap is closed.

Question. Write the audit query that surfaces every certified space whose badge would fail the quarterly re-cert check.

Input — pseudocode tables.

Table	Purpose
`genie.spaces`	space metadata + current badge
`genie.certifications`	one row per `(space_id, certifier_email, certified_on)`
`genie.usage_log`	one row per question, with `space_id`, `sample_query_id`, `asked_on`

Code.

WITH active_certifiers AS (
    -- Certifiers who signed off in the last 90 days AND are still in role
    SELECT space_id, COUNT(*) AS active_cert_count
    FROM genie.certifications c
    JOIN hr.employees           e ON e.email = c.certifier_email
    WHERE c.certified_on >= current_date - INTERVAL 90 DAY
      AND e.is_active = true
    GROUP BY space_id
),
fresh_instructions AS (
    SELECT space_id
    FROM genie.spaces
    WHERE instructions_last_reviewed >= current_date - INTERVAL 90 DAY
),
exercised_corpus AS (
    SELECT space_id, COUNT(DISTINCT sample_query_id) AS recent_matches
    FROM genie.usage_log
    WHERE asked_on >= current_date - INTERVAL 30 DAY
      AND sample_query_id IS NOT NULL
    GROUP BY space_id
)
SELECT
    s.space_id,
    s.name,
    s.badge AS current_badge,
    COALESCE(ac.active_cert_count, 0) AS active_certifiers,
    CASE WHEN fi.space_id IS NULL THEN 'stale' ELSE 'fresh' END AS instructions_state,
    COALESCE(ec.recent_matches, 0) AS recent_sample_query_matches,
    CASE
        WHEN COALESCE(ac.active_cert_count, 0) = 0 THEN 'downgrade: no active certifier'
        WHEN fi.space_id IS NULL                   THEN 'downgrade: instructions stale'
        WHEN COALESCE(ec.recent_matches, 0) = 0    THEN 'downgrade: corpus dormant'
        ELSE                                            'keep certified'
    END AS recert_action
FROM genie.spaces s
LEFT JOIN active_certifiers ac USING (space_id)
LEFT JOIN fresh_instructions fi ON fi.space_id = s.space_id
LEFT JOIN exercised_corpus  ec USING (space_id)
WHERE s.badge = 'certified'
ORDER BY recert_action, s.name;

Step-by-step explanation.

active_certifiers joins the certifications table to HR to drop people who left the company. A space whose only certifier left silently loses its trust contract — re-cert is mandatory.
fresh_instructions flags spaces whose instruction list has not been reviewed in 90 days. Stale instructions accumulate metric drift; the badge cannot survive a stale instruction set.
exercised_corpus checks that the sample-query corpus is actually being matched. A dormant corpus is a sign the space is unused (drop the badge to verified and ask the owner whether to retire) or that retrieval is bypassing the corpus (a separate alarm).
The final CASE expression decides the action per space. Three reasons to downgrade; one to keep.

Output.

space_id	name	current_badge	active_certifiers	instructions_state	recent_matches	recert_action
11	finance_certified	certified	2	fresh	142	keep certified
12	sales_certified	certified	0	fresh	88	downgrade: no active certifier
13	marketing_certified	certified	1	stale	24	downgrade: instructions stale
14	legacy_revenue	certified	1	fresh	0	downgrade: corpus dormant

Rule of thumb. The badge must be earned every quarter. Build the re-cert query above as a scheduled job; spaces whose action is anything other than keep certified get an automatic Slack notification to their owner. No human reminders needed.

Databricks Genie interview question — designing the audit table for trust labels

A senior interviewer often poses: "Design the audit schema for Genie that captures the trust label, the matched sample query, and the retrieved instructions per answer — enough that a regulator could reconstruct any executed query end-to-end."

Solution Using a denormalised audit fact with FK links to the corpus

-- Genie audit fact — one row per answer
CREATE OR REPLACE TABLE governance.genie_audit (
    answer_id          STRING       PRIMARY KEY,
    asked_at           TIMESTAMP,
    space_id           STRING,
    user_email         STRING,

    -- the question
    question_text      STRING,
    intent             STRING,        -- aggregation / lookup / trend / ...

    -- the retrieval evidence (FK links)
    matched_sample_id  STRING,        -- NULL if no 1:1 match
    retrieved_instruction_ids ARRAY<STRING>,
    retrieved_dataset_names   ARRAY<STRING>,

    -- the drafted + executed SQL
    drafted_sql        STRING,        -- before UC injection
    executed_sql       STRING,        -- after UC row filter + column mask injection
    query_profile_url  STRING,

    -- result + trust
    rows_returned      BIGINT,
    chart_type         STRING,
    trust_label        STRING,        -- certified / verified / unverified / low_confidence
    latency_ms         INT,
    foundation_model_tokens INT,
    warehouse_dbu      DECIMAL(10,4)
) USING DELTA
PARTITIONED BY (asked_at);

-- Daily summary view for FinOps + compliance
CREATE OR REPLACE VIEW governance.genie_audit_daily AS
SELECT
    date_trunc('day', asked_at) AS day,
    space_id,
    trust_label,
    COUNT(*)                           AS answers,
    COUNT(DISTINCT user_email)         AS unique_users,
    SUM(foundation_model_tokens)       AS total_tokens,
    SUM(warehouse_dbu)                 AS total_dbu,
    APPROX_PERCENTILE(latency_ms, 0.95) AS p95_latency_ms
FROM governance.genie_audit
GROUP BY 1, 2, 3;

Step-by-step trace.

Column	What it captures	Why a regulator cares
`question_text`	the user's English	establishes intent
`matched_sample_id`	which sample query anchored the answer	shows the answer was on-spec
`retrieved_instruction_ids`	which instructions Genie applied	shows the standing rules were honoured
`drafted_sql`	what the model wrote	shows the model's behaviour
`executed_sql`	what UC actually ran	proves row-filter / column-mask enforcement
`trust_label`	the badge at answer time	shows which audience the answer was scoped to
`latency_ms`, `tokens`, `dbu`	cost + performance	enables FinOps + SLA review

Output:

day	space_id	trust_label	answers	unique_users	p95_latency_ms
2026-05-19	sales_certified	certified	312	47	820
2026-05-19	sales_certified	verified	88	31	940
2026-05-19	sales_certified	unverified	6	4	1,120
2026-05-19	sales_certified	low_confidence	4	3	510

Why this works — concept by concept:

One row per answer — the audit grain is a single Genie answer, so every interaction is reconstructible. Aggregation comes later, not in the source table.
Drafted vs executed SQL — two columns, not one. Drafted shows the model's behaviour; executed shows UC enforcement. Regulators care about both.
FK links to corpus — matched_sample_id and retrieved_instruction_ids point at the certified artefacts, so the audit row remains meaningful even after the corpus evolves.
Partition by day — Delta partitioning + Z-ORDER on space_id and trust_label gives sub-second daily compliance queries on a billion-row audit.
Daily summary view — separates the "raw forensics" grain (one row per answer) from the "operational dashboard" grain (one row per day per label). FinOps reads the view, regulators read the table.
Cost — Delta storage cost on a ~1KB row * ~10K answers/day = ~10MB/day. Negligible. The value of full audit is several orders of magnitude higher.

SQL
Topic — conditional logic
Conditional logic problems (SQL)

Practice →

SQL
Topic — null handling
NULL-handling drills (SQL)

Practice →

5. Production rollout topology and governance

Promote Genie spaces like code — `dev → staging → certified prod` — with a Slack feedback loop and Git-versioned Asset Bundles

The mental model in one line: a Genie rollout is a three-environment topology that mirrors the SDLC — dev for analytics-engineer iteration, staging for SME sign-off, certified prod for exec-facing answers — with instructions and sample queries versioned as code in Git via Databricks Asset Bundles, and an SME feedback channel that converts unanswered questions into new instructions. Once you can draw the three environments + the Git arrow + the feedback loop, the rollout governance debate is settled.

The three environments.

Dev space — analytics engineers iterate on datasets and instructions. Trust badge: none (grey ⚠). Audience: data engineers only. Anything goes here.
Staging space — SMEs add sample queries, fix joins, log gaps. Trust badge: verified (blue ✓). Audience: SME group + data engineers. No exec exposure.
Certified prod space — locked datasets, full sample-query corpus, exec-facing. Trust badge: certified (green ✓✓). Audience: all analysts and execs.

Promotion arrows.

Dev → Staging — when the dataset list is stable, the instructions cover the top 10 questions, and at least 5 sample queries exist. PR to space.yml and sample-queries.yml; bundle deploys to staging.
Staging → Prod — when SMEs have stress-tested the corpus, the trust label has been verified for 30 days, and a governance lead has signed the certify ticket. Bundle deploys to prod; badge flips to certified.

SME review loop.

Slack/Teams channel — #genie-feedback. SMEs and power users drop questions Genie answered wrong (or refused). Every entry is triaged weekly.
Triage outcomes. Add an instruction; add a sample query; flag a missing dataset; mark as out-of-scope.
Cycle time. From "wrong answer reported" to "certified prod has the fix" in <1 week is the target; <2 weeks is acceptable.

Versioning + change-log — Asset Bundles.

What is versioned. space.yml (datasets, joins, permissions), instructions.md (the standing rules), sample-queries.sql (the verified Q→SQL corpus).
How it ships. databricks bundle deploy --target {dev|staging|prod}. The bundle resolves the target from a workspace mapping; secrets and warehouse names are environment-specific.
Code review. Every PR is reviewed by the analytics lead. Sample-query additions require an SME approval; instruction edits require a governance approval.

Cost governance.

Per-space query budget. Each certified space has a monthly cap on Serverless DBSQL DBUs. Exceeding the cap triggers a Slack alert; persistent overspend triggers a budget conversation.
Per-LOB foundation model spend. Each line of business (LOB) has a token budget. Marketing's spend on marketing_genie_space rolls up to marketing's overall AI budget, not the data team's.
Tag-based attribution. Every Genie warehouse is tagged with space_id, lob, env; cost reports group by tag.

Metrics that matter — the four KPIs.

% verified answers — share of answers labelled certified-or-verified. Target: >95% on certified prod.
% questions answered — share of asked questions that got an answer (not "low confidence"). Target: >90%.
p95 latency — end-to-end time from question submitted to chart drawn. Target: <2s on small fact tables; <5s on multi-billion-row joins.
Hallucination rate — share of answers later flagged as wrong by SMEs. Target: <1% on certified prod.

Anti-patterns to avoid.

Pointing Genie at the entire Hive metastore. "Let Genie figure it out" is the surest way to dilute retrieval. Always start narrow.
Skipping staging. Promoting dev directly to certified prod is the most common cause of exec-facing hallucinations. Staging is where the corpus is stress-tested.
Letting a draft space face execs. A grey-badge space is for analytics engineers, not for the CFO. Permission enforcement matters.
Treating instructions as docs. Instructions are prompts. Every sentence is injected; verbosity costs tokens and dilutes the signal. Keep each instruction one sentence, present-tense.

Common interview probes on production rollout.

"How do you version a Genie space?" — Databricks Asset Bundle. space.yml, instructions.md, sample-queries.sql in Git; deploy via databricks bundle deploy.
"Who signs off on a promotion to certified?" — the governance lead listed in the space's certified_by block. Convention is two signoffs (analytics lead + governance).
"What KPIs would you put on a Genie rollout dashboard?" — % verified, % answered, p95 latency, hallucination rate. The Big Four.
"How do you handle a hallucination that reached an exec?" — treat as a P1 incident. Pull the audit row, retrace the six stages, add an instruction or sample query, redeploy via bundle, post-mortem in the next week's triage.

Worked example — the rollout bundle file structure

Detailed explanation. A team adopts Asset Bundles for sales_genie_space. The repo layout, bundle file, and promotion script are the artefacts every reviewer needs to see in code review.

Question. Lay out the Git repo + bundle file for sales_genie_space such that the dev / staging / prod environments share the same code with environment-specific warehouses and certifiers.

Input — directory tree.

genie/
├── databricks.yml                # bundle root
├── targets/
│   ├── dev.yml
│   ├── staging.yml
│   └── prod.yml
└── spaces/
    └── sales/
        ├── space.yml             # datasets, joins, permissions
        ├── instructions.md       # standing rules
        └── sample-queries.yml    # Q -> SQL pairs

Code.

# genie/databricks.yml
bundle:
  name: sales-genie

include:
  - spaces/sales/*.yml

targets:
  dev:
    workspace:
      host: https://dbc-dev.cloud.databricks.com
    variables:
      space_suffix: "-dev"
      warehouse:    dev-serverless-xs
      certifiers:   [] # no certifiers in dev

  staging:
    workspace:
      host: https://dbc-staging.cloud.databricks.com
    variables:
      space_suffix: "-staging"
      warehouse:    stage-serverless-s
      certifiers:   ["alice.chen@co"]

  prod:
    workspace:
      host: https://dbc-prod.cloud.databricks.com
    variables:
      space_suffix: ""
      warehouse:    prod-serverless-m
      certifiers:   ["alice.chen@co", "bob.patel@co"]

# Promotion script — same code, three targets
databricks bundle deploy --target dev      # iterate
databricks bundle deploy --target staging  # SME stress-test
databricks bundle deploy --target prod     # certify

Step-by-step explanation.

The repo has one space.yml, one instructions.md, one sample-queries.yml. The same files are deployed to all three environments — no copy-paste.
The targets/*.yml files override only the environment-specific bits: workspace host, warehouse size, certifier list. Dev has no certifiers (no certified badge); prod has two.
The promotion script is three identical commands with a different --target. Code review on the PR happens once; the same code lands in dev, staging, prod after the appropriate approvals.
The space_suffix variable disambiguates the same space name across environments (sales_genie_space-dev, sales_genie_space-staging, sales_genie_space).

Output (after bundle deploy).

Environment	Space name	Warehouse	Badge	Audience
dev	`sales_genie_space-dev`	dev-serverless-xs	grey ⚠	data_eng
staging	`sales_genie_space-staging`	stage-serverless-s	blue ✓	data_eng + SMEs
prod	`sales_genie_space`	prod-serverless-m	green ✓✓	analytics + execs

Rule of thumb. The same space.yml deploys to all three environments. Anything environment-specific lives in targets/*.yml. If you find yourself copy-pasting space.yml between branches, your bundle is wrong.

Worked example — the four KPIs on one dashboard

Detailed explanation. The analytics lead builds a Genie health dashboard that surfaces the four KPIs at a glance. Each KPI is one SQL query against the governance.genie_audit table.

Question. Write the four KPI queries for a certified Genie space's weekly health dashboard.

Input. governance.genie_audit populated for the last 7 days, partitioned by asked_at.

Code.

-- 1) % verified answers (target >= 95%)
SELECT
    space_id,
    ROUND(100.0 * SUM(CASE
        WHEN trust_label IN ('certified', 'verified') THEN 1 ELSE 0
    END) / COUNT(*), 2) AS pct_verified
FROM governance.genie_audit
WHERE asked_at >= current_date - INTERVAL 7 DAY
GROUP BY space_id;

-- 2) % questions answered (target >= 90%)
SELECT
    space_id,
    ROUND(100.0 * SUM(CASE
        WHEN trust_label != 'low_confidence' THEN 1 ELSE 0
    END) / COUNT(*), 2) AS pct_answered
FROM governance.genie_audit
WHERE asked_at >= current_date - INTERVAL 7 DAY
GROUP BY space_id;

-- 3) p95 end-to-end latency (target < 2000ms on small fact tables)
SELECT
    space_id,
    APPROX_PERCENTILE(latency_ms, 0.95) AS p95_latency_ms
FROM governance.genie_audit
WHERE asked_at >= current_date - INTERVAL 7 DAY
GROUP BY space_id;

-- 4) Hallucination rate (target < 1%) -- requires SME-marked wrong flag
SELECT
    space_id,
    ROUND(100.0 * SUM(CASE
        WHEN sme_marked_wrong = true THEN 1 ELSE 0
    END) / COUNT(*), 3) AS hallucination_rate_pct
FROM governance.genie_audit
WHERE asked_at >= current_date - INTERVAL 7 DAY
GROUP BY space_id;

Step-by-step explanation.

The first KPI buckets answers into "verified-or-certified" vs "not." A weekly window smooths daily noise; a monthly window would hide regressions too long.
The second KPI inverts the low_confidence count to express "answered" as the positive case. This is the metric leadership wants to see going up.
The third KPI uses APPROX_PERCENTILE — Spark SQL's standard implementation. p95 is the convention because p99 is too noisy on small daily samples; p95 captures user-felt latency.
The fourth KPI depends on the sme_marked_wrong flag in the audit row. That flag is populated by the SME during weekly triage — every wrong answer they discover marks the row. The KPI is "rate that we know we got wrong"; the unknown-wrong rate is necessarily harder to measure.

Output.

space_id	pct_verified	pct_answered	p95_latency_ms	hallucination_rate_pct
sales_certified	96.8	93.2	1,420	0.40
marketing_certified	91.5	88.0	1,810	0.85
finance_certified	98.4	95.7	990	0.10

Rule of thumb. The four KPIs all share one window (last 7 days), one table (governance.genie_audit), and one grouping (space_id). Build them as four tiles on one dashboard, refreshed nightly. Any space whose number falls below target gets a weekly Slack notification to its owner.

Databricks Genie interview question — design the SME feedback channel

A senior interviewer often poses: "How would you wire a Slack/Teams feedback loop so that questions Genie answered wrong become tickets in the analytics backlog and, after triage, become commits in the corpus repo?"

Solution Using a Slack-Genie-Linear-Git pipeline

-- The shape of the backlog table that backs the workflow
CREATE OR REPLACE TABLE governance.genie_feedback (
    feedback_id     STRING PRIMARY KEY,
    reported_at     TIMESTAMP,
    space_id        STRING,
    answer_id       STRING REFERENCES governance.genie_audit(answer_id),
    reporter_email  STRING,

    -- the SME's complaint
    issue_type      STRING,   -- 'wrong_answer' / 'missing_data' / 'unclear'
    free_text       STRING,   -- pasted Slack message

    -- triage state
    triage_state    STRING,   -- 'new' / 'queued' / 'in_progress' / 'resolved' / 'wontfix'
    triage_owner    STRING,
    fix_type        STRING,   -- 'new_instruction' / 'new_sample_query'
                              -- / 'new_dataset' / 'out_of_scope'
    fix_commit_sha  STRING,
    resolved_on     TIMESTAMP
) USING DELTA
PARTITIONED BY (reported_at);

-- Weekly triage view -- everything still 'new' or 'queued'
CREATE OR REPLACE VIEW governance.genie_triage AS
SELECT
    f.feedback_id,
    f.space_id,
    f.reported_at,
    f.reporter_email,
    f.issue_type,
    f.free_text,
    a.question_text,
    a.drafted_sql,
    a.trust_label,
    f.triage_state,
    f.triage_owner
FROM governance.genie_feedback f
LEFT JOIN governance.genie_audit a USING (answer_id)
WHERE f.triage_state IN ('new', 'queued')
ORDER BY f.reported_at;

Step-by-step trace.

Step	Where it lives	Who acts
1. SME drops a Slack message in `#genie-feedback`	Slack	SME
2. Slack bot inserts a `genie_feedback` row	Slack -> Databricks	bot
3. Weekly triage meeting reads `genie_triage` view	dashboard	analytics lead + SME
4. Triage chooses fix_type (instruction / sample / dataset / wontfix)	meeting	team
5. Analytics engineer opens a PR against `sample-queries.yml` or `instructions.md`	Git	data_eng
6. PR approved + bundle deploys to staging	CI	CI runner
7. SME smoke-tests on staging	staging	SME
8. Bundle promotes to prod; `genie_feedback.triage_state` → resolved with `fix_commit_sha`	prod	CI runner

The loop is closed when the next ask of the original question lands on the new sample query and the trust label is certified. The feedback row carries the commit SHA, so the audit chain is complete.

Output:

Week	feedbacks_opened	feedbacks_resolved	avg_cycle_days	new_sample_queries
2026-W19	23	19	4.2	11
2026-W20	18	22	3.9	9
2026-W21	15	17	3.6	8

Why this works — concept by concept:

One backlog table backs the workflow — genie_feedback is the source of truth. Slack is just the inbox; Git is just the patch; the table is the ledger.
FK to the audit row — each feedback ties to the exact answer the SME complained about. No "which answer?" confusion at triage time.
Triage view filters to actionable rows — the analytics lead opens one dashboard and sees only new or queued items. Resolved rows stay in the table for historical analysis.
Cycle time is the headline KPI — average days from reported to resolved. Sub-week is the target; weeks-long cycle times mean the feedback loop is broken.
Commit SHA in the resolution — closes the loop. Audit can trace from "answer was wrong" → "this commit fixed it" → "next answer was certified."
Cost — Slack bot + Databricks insert is near-free. The value is in the discipline of writing every fix as a commit, not a UI edit.

SQL
Topic — ETL
ETL pipeline problems (SQL)

Practice →

SQL
Topic — window functions
Window function drills (SQL)

Practice →

Cheat sheet — Genie recipes

Start narrow. One curated star schema or one fact table per space — never the whole metastore. Retrieval quality drops sharply as the dataset count climbs past five.
Write instructions in plain English. Each one sentence, present-tense, audit-friendly: "revenue means net_revenue after returns; exclude test orders where is_test = true; fiscal year starts Feb 1."
Seed 10–20 sample queries per space. Every certified answer should map to at least one. The sample-query corpus is the anchor that turns retrieval into a near-deterministic draft.
Pin certified-by to two real humans. Never an alias like analytics-team. Quarterly re-cert against the HR roster; auto-downgrade when a certifier leaves.
Refresh instructions monthly; sample queries quarterly. Stale instructions are the most common cause of slow metric drift. Quarterly is the floor for the corpus.
Always expose Show SQL to readers. Trust = transparency. Hiding the SQL turns Genie into a black box that nobody can defend in a code review.
Roll out dev → staging → certified. Never let a grey-badge space face execs. Skipping staging is the most common cause of exec-facing hallucinations.
Treat NOT EXISTS as the join cousin of certified. Just like NULL-safe joins, "I'm sure this answer is certifiable" requires positive evidence: matched sample query + fresh instructions + named human owner. No evidence, no certified badge.
Version every artefact in Git. space.yml, instructions.md, sample-queries.yml in a Databricks Asset Bundle. UI-only edits do not survive an environment migration.
Cap per-space Serverless DBSQL spend. Each certified space gets a monthly DBU budget; persistent overspend is the trigger for "is this space being used as intended?"
Watch the four KPIs weekly. % verified, % answered, p95 latency, hallucination rate. One dashboard, one window (7 days), one grouping (space_id).
Promote unverified into certified via Git. Every unverified answer that an SME endorses becomes a PR against sample-queries.yml. No UI edits — the corpus is code.
Refuse > guess. When retrieval confidence is low, surface "I'm not confident — please rephrase." A refusal is a feature, not a failure; it drives the feedback loop.

Frequently asked questions

What is a Databricks Genie space?

A Databricks Genie space is a curated AI/BI surface inside the Databricks workspace that lets users ask questions in natural language and get a SQL-drafted answer with a trust badge. The space bundles six artefacts — datasets (tables the space can read), instructions (plain-English business rules), sample queries (verified Q→SQL pairs), join hints, a named human owner who certifies the space, and a permission matrix for chat / edit / certify. Every read flows through Unity Catalog ACLs and runs on Serverless DBSQL, so the answer respects the asker's row-level permissions while remaining fully auditable via the Show SQL button.

How is Genie different from a generic GPT-on-SQL plugin?

A generic GPT-on-SQL plugin points an LLM at the raw warehouse schema and hopes the model picks the right tables and columns. Databricks Genie inverts the contract: the model is constrained to a curated semantic layer of certified datasets, plain-English instructions ("revenue means net_revenue"), and a sample-query corpus that anchors retrieval. The model cannot reach for an ungoverned table, cannot bypass Unity Catalog row filters, and cannot draft unsafe DDL. Every answer carries a trust badge (certified / verified / unverified / low confidence) and is logged with the drafted SQL, the executed SQL after UC injection, and the matched sample query — so the answer is reconstructible end-to-end. A generic plugin gives you "chat on SQL"; Genie gives you "chat on a certified semantic layer."

What is the difference between verified, certified and unverified answers?

The four trust labels map evidence to audience. Certified (green ✓✓) means the drafted SQL matched a sample query 1:1 inside a space that itself has a named human certifier within the last 90 days — these are the only answers safe for exec-facing dashboards. Verified (blue ✓) means retrieval supported the draft (instructions and datasets matched) but no direct sample-query hit — safe for analyst use, not for executive consumption. Unverified (grey ⚠) means Genie drafted SQL from retrieval but nobody has reviewed the answer — allowed only in sandbox/dev spaces. Hallucination flag / low confidence (red ✗) means Genie self-declared low retrieval confidence and refused to surface a SQL answer — the user is asked to rephrase. The badge in the corner of the answer card is the contract between the AI/BI surface and its audience.

Does Genie write SQL on its own or does it always use sample queries?

Both. The drafting stage of the text-to-SQL pipeline retrieves the top-k items from three buckets — dataset schemas, instructions, and sample queries — and the foundation model weaves them into a query. If a sample query matches the question 1:1 (cosine score above the retrieval threshold), the model essentially templates that sample query and the answer is labelled certified. If the retrieval is weaker (instructions matched but no sample query), the model drafts from scratch using the retrieved context, and the answer is labelled verified or unverified depending on whether the question has been asked-and-reviewed before. Even when drafting from scratch, the model is constrained to the dataset list — it cannot reach for an ungoverned table — and the validation stage rejects any drafted SQL that fails parser or UC checks. The sample-query corpus is what makes the corpus certified; the model can still answer beyond it, just with a weaker trust badge.

Can Genie read data outside Unity Catalog?

No — Genie is a Unity Catalog-native product and every read goes through UC row filters, column masks, and table ACLs. The space's datasets list is itself a UC-qualified set of fully-qualified table names; the space cannot reference Hive metastore tables or arbitrary cloud paths. At execute time, UC injects the caller's row-filter clauses into the SQL, so two users asking the same English question against the same Genie space may see different numbers if their UC permissions differ. The audit row captures both the drafted SQL (before UC injection) and the executed SQL (after injection), so the enforcement is verifiable. This is the design that makes Genie safe to expose across functions — finance, HR, operations — from a single chat surface.

How do I prevent Genie from hallucinating on my warehouse?

Hallucination prevention in Databricks Genie is a process, not a feature. Five levers, in order of impact: (1) start narrow — point each space at one star schema or one fact table, not the entire metastore; (2) write standing instructions for the four-to-five most-violated business rules (revenue definition, test-order exclusion, fiscal calendar, vocabulary aliases) — each as a one-sentence English rule that gets injected on every prompt; (3) seed a sample-query corpus of 10–20 verified Q→SQL pairs and grow it via weekly unverified-answer triage; (4) rely on validation — drafted SQL that fails the UC parser or dry-run is rejected before execute, and the retry loop usually fixes it within one extra pass; (5) ship via Asset Bundles with a dev → staging → certified prod rollout, and require a governance signoff before the badge flips to certified. Combine these and the hallucination rate (SME-flagged wrong answers) lands below 1% on certified prod — the published SLA most teams use to justify exec exposure.

Practice on PipeCode

Drill the aggregation practice library → for the GROUP BY / SUM / AVG patterns Genie drafts most often.
Rehearse on joins problems → for the multi-table BI queries that need explicit FK hints in the space.
Sharpen CTE and subquery drills → for the WITH-clause sample queries every certified space ships.
Stack the case-expression library → for the fiscal-quarter and bucketing logic that the instructions encode.
Layer the conditional-logic drills → for the IF / CASE patterns Genie translates from natural language.
Build the BI muscles with the dimensional-modeling library → for the fact-vs-dimension calls every space owner has to make.
Run the window-functions drills → for the ranking and YoY patterns that show up in sample queries.
Sharpen the ranking practice library → for the "top N customers" answers exec dashboards demand.
Stack the ETL practice library → for the pipelines that prepare the certified Delta tables Genie reads.
Drill the Databricks company practice set → for the interview surface specifically.
For the broader surface, read top data engineering interview questions →.
Stack the prerequisites with the only 5 skills you need to become a data engineer →.
Sharpen the SQL axis with the SQL for data engineering interviews course →.
Build the platform fluency with the ETL system design course → and the data modelling for DE interviews course →.

Pipecode.ai is Leetcode for Data Engineering — every Genie recipe above ships with hands-on practice rooms where you write the GROUP BY, the LEFT JOIN + COALESCE fallback, the window-function ranking, and the CTE-shaped sample queries that the certified corpus is made of. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so by the time you stand up your first AI/BI Genie space, you already know what the foundation model is going to draft — because you have drafted it yourself.

Practice aggregation now →
Databricks company drills →

1. What Genie actually is — and what it is not

databricks genie is a curated, governed AI/BI surface on top of Unity Catalog — not a thin chat-on-warehouse wrapper

Worked example — a "GPT-on-warehouse" pilot that fails without the semantic layer

Worked example — Unity Catalog ACLs constrain every Genie read

Databricks Genie interview question — explain the cost model

Solution Using the three-layer Genie cost model

2. Anatomy of a Genie space

Every certified ai/bi genie space is six compartments plus a trust badge — datasets, instructions, sample queries, joins, certified-by, permissions

Worked example — minimal certified sales_genie_space

Worked example — instructions that fix a hallucination in one line

Databricks Genie interview question — design a certified marketing space

Solution Using the six-compartment template

3. The text-to-SQL flow under the hood

Every Genie answer goes through six stages — rewrite → retrieve → draft → validate → execute → audit — and any of the six can be the failure point

Worked example — tracing one question through the six stages

Worked example — when validation rescues a hallucination

Databricks Genie interview question — explain the retry budget

Solution Using a bounded retry budget with explicit fallback

4. The trust spectrum — verified, certified, hallucination

Every Genie answer carries one of four trust labels — certified, verified, unverified, hallucination flag — and the badge is the contract between AI/BI and the exec audience

Worked example — moving an unverified question into the certified corpus

Worked example — quarterly re-cert ritual

Databricks Genie interview question — designing the audit table for trust labels

Solution Using a denormalised audit fact with FK links to the corpus

5. Production rollout topology and governance

Promote Genie spaces like code — dev → staging → certified prod — with a Slack feedback loop and Git-versioned Asset Bundles

Worked example — the rollout bundle file structure

Worked example — the four KPIs on one dashboard

Databricks Genie interview question — design the SME feedback channel

Solution Using a Slack-Genie-Linear-Git pipeline

Cheat sheet — Genie recipes

Frequently asked questions

What is a Databricks Genie space?

How is Genie different from a generic GPT-on-SQL plugin?

What is the difference between verified, certified and unverified answers?

Does Genie write SQL on its own or does it always use sample queries?

Can Genie read data outside Unity Catalog?

How do I prevent Genie from hallucinating on my warehouse?

Practice on PipeCode

`databricks genie` is a curated, governed AI/BI surface on top of Unity Catalog — not a thin chat-on-warehouse wrapper

Every certified `ai/bi genie space` is six compartments plus a trust badge — datasets, instructions, sample queries, joins, certified-by, permissions

Worked example — minimal certified `sales_genie_space`

Every Genie answer carries one of four trust labels — `certified`, `verified`, `unverified`, `hallucination flag` — and the badge is the contract between AI/BI and the exec audience

Promote Genie spaces like code — `dev → staging → certified prod` — with a Slack feedback loop and Git-versioned Asset Bundles