DEV Community

Cover image for RAG observability in 2 lines of code with Llama Index & Langfuse
clemra
clemra

Posted on • Originally published at llamaindex.ai

RAG observability in 2 lines of code with Llama Index & Langfuse

Why you need observability for RAG

There are so many different ways to make RAG work for a use case. What vector store to use? What retrieval strategy to use? LlamaIndex makes it easy to try many of them without having to deal with the complexity of integrations, prompts and memory all at once.

Initially, we at Langfuse worked on complex RAG/agent applications and quickly realized that there is a new need for observability and experimentation to tweak and iterate on the details. In the end, these details matter to get from something cool to an actually reliable RAG application that is safe for users and customers. Think of this: if there is a user session of interest in your production RAG application, how can you quickly see whether the retrieved context for that session was actually relevant or the LLM response was on point?

What is Langfuse?

Thus, we started working on Langfuse.com (GitHub) to establish an open source LLM engineering platform with tightly integrated features for tracing, prompt management, and evaluation. In the beginning we just solved our own and our friends’ problems. Today we are at over 1000 projects which rely on Langfuse, and 2.3k stars on GitHub. You can either self-host Langfuse or use the cloud instance maintained by us.

We are thrilled to announce our new integration with LlamaIndex today. This feature was highly requested by our community and aligns with our project's focus on native integration with major application frameworks. Thank you to everyone who contributed and tested it during the beta phase!

Integrating Llama Index with Langfuse for open source observability and analytics

The challenge

We love LlamaIndex, since the clean and standardized interface abstracts a lot of complexity away. Let’s take this simple example of a VectorStoreIndex and a ChatEngine.

from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex

documents = SimpleDirectoryReader("./data").load_data()

index = VectorStoreIndex.from_documents(documents)

chat_engine = index.as_chat_engine()

print(chat_engine.chat("What problems can I solve with RAG?"))
print(chat_engine.chat("How do I optimize my RAG application?"))
Enter fullscreen mode Exit fullscreen mode

In just 3 lines we loaded our local documents, added them to an index and initialized a ChatEngine with memory. Subsequently we had a stateful conversation with the chat_engine.

This is awesome to get started, but we quickly run into questions like:

  • “What context is actually retrieved from the index to answer the questions?”
  • “How is chat memory managed?”
  • “Which steps add the most latency to the overall execution? How to optimize it?”

One-click OSS observability to the rescue

We integrated Langfuse to be a one-click integration with LlamaIndex using the global callback manager.

Preparation

  1. Install the community package (pip install llama-index-callbacks-langfuse)
  2. Copy/paste the environment variables from the Langfuse project settings to your Python project: 'LANGFUSE_SECRET_KEY', 'LANGFUSE_PUBLIC_KEY' and 'LANGFUSE_HOST'

Now, you only need to set the global langfuse handler:

from llama_index.core import set_global_handler

set_global_handler("langfuse")
Enter fullscreen mode Exit fullscreen mode

And voilá, with just two lines of code you get detailed traces for all aspects of your RAG application in Langfuse. They automatically include latency and usage/cost breakdowns.

Group multiple chat threads into a session

Working with lots of teams building GenAI/LLM/RAG applications, we’ve continuously added more features that are useful to debug and improve these applications. One example is session tracking for conversational applications to see the traces in context of a full message thread.

To activate it, just add an id that identifies the session as a trace param before calling the chat_engine.

from llama_index.core import global_handler

global_handler.set_trace_params(
    session_id="your-session-id"
)

chat_engine.chat("What did he do growing up?")
chat_engine.chat("What did he do at USC?")
chat_engine.chat("How old is he?")
Enter fullscreen mode Exit fullscreen mode

Thereby you can see all these chat invocations grouped into a session view in Langfuse Tracing:

RAG Observability with Langfuse

Next to sessions, you can also track individual users or add tags and metadata to your Langfuse traces.

Trace more complex applications and use other Langfuse features for prompt management and evaluation

This integration makes it easy to get started with Tracing. If your application ends up growing into using custom logic or other frameworks/packages, all Langfuse integrations are fully interoperable.

We have also built additional features to version control and collaborate on prompts (langfuse prompt management), track experiments, and evaluate production traces. For RAG specifically, we collaborated with the RAGAS team and it’s easy to run their popular eval suite on traces captured with Langfuse (see cookbook).

Get started

The easiest way to get started is to follow the cookbook and check out the docs.

Feedback? Ping us

We’d love to hear any feedback. Come join us on our community discord or add your thoughts to this GitHub thread.

Top comments (3)

Collapse
 
matijasos profile image
Matija Sosic

A very nice overview and intro to Langfuse, thanks for sharing!

Collapse
 
maxdeichmann profile image
Maximilian Deichmann

Really happy to have this feature out of the door. Looking forward to merge this to the new Llamaindex observation implementation: docs.llamaindex.ai/en/stable/modul...

Collapse
 
marcklingen profile image
Marc Klingen • Edited

thanks to everyone who contributed to this

questions -> dm on twitter: x.com/marcklingen