DEV Community

Naveen Ayalla
Naveen Ayalla

Posted on • Originally published at community.databricks.com

# Moving RAG From Demo to Production on Databricks: A Developer-Focused Checklist

By Naveen Ayalla

This article is adapted from my original post in the Databricks Community and is shared here for developers, data engineers, and GenAI practitioners building production AI workflows.

A RAG demo is easy to build compared to a production RAG system.

For a demo, you can upload documents, create embeddings, connect an LLM, ask a question, and return an answer.

That is a great starting point.

But production needs more than a working answer.

A production RAG workflow has to answer questions like:

  • Is the source data trusted?
  • Is the user allowed to access this content?
  • Did the system retrieve the right context?
  • Is the answer grounded in that context?
  • Can we monitor quality, latency, cost, and failures?
  • Who owns the data and the workflow after launch?

When these questions are ignored, many GenAI projects slow down after the demo stage.

Below is a practical checklist I use when thinking about RAG workflows on Databricks.

Demo vs. Production

Area Demo Thinking Production Thinking
Data Use sample documents. Use trusted, current, approved data.
Access Assume one access level. Enforce user permissions and sensitive-data rules.
Retrieval Return similar chunks. Return the right context for the right user.
Response Generate a helpful answer. Answer only from supported context.
Evaluation Try a few test prompts. Measure retrieval quality, groundedness, correctness, and failures.
Monitoring Check usage. Track quality, latency, cost, errors, and feedback.
Ownership AI team owns everything. Data owners, platform teams, and business users share ownership.

1. Start With a Narrow Use Case

The first mistake is trying to index everything.

A better starting point is one clear use case.

Examples:

  • Help support teams answer product questions faster.
  • Help analysts search internal documentation.
  • Help engineers troubleshoot pipeline failures.
  • Help business users understand policy documents.

A narrow use case helps you choose better data, test better questions, and measure value more clearly.

2. Use Data You Can Trust

Not every document should go into a RAG system.

Before indexing content, ask:

  • Who owns the data?
  • Is it current?
  • Is it approved for this use case?
  • Does it include sensitive information?
  • Which users should be allowed to see it?

If the source data is outdated or poorly governed, the generated answer will not be reliable.

3. Add Metadata Early

Metadata is easy to skip in a demo, but it becomes very useful in production.

Useful metadata includes:

  • document owner
  • source system
  • updated date
  • department
  • product name
  • region
  • sensitivity level
  • access group

Metadata helps with filtering, debugging, governance, and retrieval quality.

For example, if two documents answer the same question but one is newer, metadata can help the system prefer the latest source.

4. Build Access Control Into Retrieval

In enterprise RAG, access control cannot be an afterthought.

If a user cannot access a document directly, they should not be able to access it through an AI assistant.

This means the retrieval layer should respect permissions, sensitivity rules, and data ownership.

On Databricks, this is where a governed lakehouse design becomes important. The AI workflow should follow the same governance principles as the rest of the data platform.

5. Evaluate Retrieval and Generation Separately

When a RAG answer is wrong, it is important to know why.

The issue may be retrieval.
The issue may be the model.
The issue may be missing data.
The issue may be stale content.
The issue may be bad chunking.

That is why I prefer to evaluate retrieval and answer generation separately.

Evaluation Area Main Question
Retrieval quality Did the system retrieve the right context?
Answer quality Did the model use the context correctly?

This makes debugging much easier.

6. Tell the Model When to Stop

One of the most useful production rules is simple:

If the retrieved context is not enough, say that the information is not available instead of guessing.

For internal business users, a confident wrong answer is worse than a clear limitation.

A good RAG system should know when not to answer.

7. Monitor After Launch

A RAG system changes after it goes live.

Users ask new questions.
Documents get updated.
Models change.
Costs change.
Business rules change.

After launch, monitor:

  • user feedback
  • failed questions
  • retrieval quality
  • latency
  • cost
  • error rate
  • outdated sources
  • low-confidence answers

Monitoring should feed back into better data preparation, improved metadata, better prompts, and stronger evaluation datasets.

Final Thought

Production RAG is not just an LLM connected to a vector index.

It is a governed data product.

It needs trusted data, metadata, permissions, evaluation, monitoring, and clear ownership.

Databricks can be a strong foundation for this kind of workflow because data engineering, governance, machine learning, and AI workflows can be connected through the lakehouse approach.

I would like to hear from other developers and data engineers:

What has been the hardest part of moving RAG from demo to production: access control, retrieval quality, evaluation, monitoring, cost, or user adoption?

This article was originally published in the Databricks Community and is republished here for developers, data engineers, and GenAI practitioners building production AI workflows. Original post: https://community.databricks.com/t5/data-engineering/from-rag-demo-to-production-on-databricks-7-things-teams-should/m-p/158526#M54730

Top comments (0)