Naveen Ayalla

Posted on Jun 8 • Originally published at community.databricks.com

# Moving RAG From Demo to Production on Databricks: A Developer-Focused Checklist

#ai #programming #gpt3 #datascience

By Naveen Ayalla

This article is adapted from my original post in the Databricks Community and is shared here for developers, data engineers, and GenAI practitioners building production AI workflows.

A RAG demo is easy to build compared to a production RAG system.

For a demo, you can upload documents, create embeddings, connect an LLM, ask a question, and return an answer.

That is a great starting point.

But production needs more than a working answer.

A production RAG workflow has to answer questions like:

Is the source data trusted?
Is the user allowed to access this content?
Did the system retrieve the right context?
Is the answer grounded in that context?
Can we monitor quality, latency, cost, and failures?
Who owns the data and the workflow after launch?

When these questions are ignored, many GenAI projects slow down after the demo stage.

Below is a practical checklist I use when thinking about RAG workflows on Databricks.

Demo vs. Production

Area	Demo Thinking	Production Thinking
Data	Use sample documents.	Use trusted, current, approved data.
Access	Assume one access level.	Enforce user permissions and sensitive-data rules.
Retrieval	Return similar chunks.	Return the right context for the right user.
Response	Generate a helpful answer.	Answer only from supported context.
Evaluation	Try a few test prompts.	Measure retrieval quality, groundedness, correctness, and failures.
Monitoring	Check usage.	Track quality, latency, cost, errors, and feedback.
Ownership	AI team owns everything.	Data owners, platform teams, and business users share ownership.

1. Start With a Narrow Use Case

The first mistake is trying to index everything.

A better starting point is one clear use case.

Examples:

Help support teams answer product questions faster.
Help analysts search internal documentation.
Help engineers troubleshoot pipeline failures.
Help business users understand policy documents.

A narrow use case helps you choose better data, test better questions, and measure value more clearly.

2. Use Data You Can Trust

Not every document should go into a RAG system.

Before indexing content, ask:

Who owns the data?
Is it current?
Is it approved for this use case?
Does it include sensitive information?
Which users should be allowed to see it?

If the source data is outdated or poorly governed, the generated answer will not be reliable.

3. Add Metadata Early

Metadata is easy to skip in a demo, but it becomes very useful in production.

Useful metadata includes:

document owner
source system
updated date
department
product name
region
sensitivity level
access group

Metadata helps with filtering, debugging, governance, and retrieval quality.

For example, if two documents answer the same question but one is newer, metadata can help the system prefer the latest source.

4. Build Access Control Into Retrieval

In enterprise RAG, access control cannot be an afterthought.

If a user cannot access a document directly, they should not be able to access it through an AI assistant.

This means the retrieval layer should respect permissions, sensitivity rules, and data ownership.

On Databricks, this is where a governed lakehouse design becomes important. The AI workflow should follow the same governance principles as the rest of the data platform.

5. Evaluate Retrieval and Generation Separately

When a RAG answer is wrong, it is important to know why.

The issue may be retrieval.
The issue may be the model.
The issue may be missing data.
The issue may be stale content.
The issue may be bad chunking.

That is why I prefer to evaluate retrieval and answer generation separately.

Evaluation Area	Main Question
Retrieval quality	Did the system retrieve the right context?
Answer quality	Did the model use the context correctly?

This makes debugging much easier.

6. Tell the Model When to Stop

One of the most useful production rules is simple:

If the retrieved context is not enough, say that the information is not available instead of guessing.

For internal business users, a confident wrong answer is worse than a clear limitation.

A good RAG system should know when not to answer.

7. Monitor After Launch

A RAG system changes after it goes live.

Users ask new questions.
Documents get updated.
Models change.
Costs change.
Business rules change.

After launch, monitor:

user feedback
failed questions
retrieval quality
latency
cost
error rate
outdated sources
low-confidence answers

Monitoring should feed back into better data preparation, improved metadata, better prompts, and stronger evaluation datasets.

Final Thought

Production RAG is not just an LLM connected to a vector index.

It is a governed data product.

It needs trusted data, metadata, permissions, evaluation, monitoring, and clear ownership.

Databricks can be a strong foundation for this kind of workflow because data engineering, governance, machine learning, and AI workflows can be connected through the lakehouse approach.

I would like to hear from other developers and data engineers:

What has been the hardest part of moving RAG from demo to production: access control, retrieval quality, evaluation, monitoring, cost, or user adoption?

This article was originally published in the Databricks Community and is republished here for developers, data engineers, and GenAI practitioners building production AI workflows. Original post: https://community.databricks.com/t5/data-engineering/from-rag-demo-to-production-on-databricks-7-things-teams-should/m-p/158526#M54730

DEV Community