Kerem Nalbant for epilot

Posted on May 26

How We Integrate AI in epilot - Chapter 2: Serverless RAG w/ LangChain & Weaviate

#ai #langchain #rag #serverless

Introduction

In the previous chapter, I shared how we began our AI journey at epilot by implementing AI Email Summaries, helping users reduce email reading time by up to 87%. Encouraged by that success, we're now stepping up our AI capabilities with Retrieval-Augmented Generation (RAG) to provide smarter, contextually aware email suggestions.

WHY?

As we aim to scale our commodity business, investing in AI is crucial—not just for growth, but to significantly upgrade our product’s capabilities. Commodity segments often have a high volume of repetitive customer service requests. Our users need quick, context-aware email suggestions that understand:

Previous communications and organizational knowledge
Company-specific communication styles
Tailored relationships with each customer

Although Large Language Models (LLMs) are powerful, they're limited when accessing recent or specific company data. Customizing LLM responses usually involves prompt engineering, RAG or fine-tuning. Fine-tuning is resource-intensive and complex, making prompt engineering with RAG our clear choice.

Our Solution: Retrieval-Augmented Generation (RAG)

We implemented a RAG-based solution to retrieve and provide relevant context from past email threads and eventually expand to external data sources like documents and websites. Long-term, organizations using epilot will have fully configurable, customized knowledge bases accessible to all future AI features and AI agents.

This allows our users to respond to customer emails faster, improving communication quality and efficiency. On the end customer side, it means quicker, more accurate, and better overall service.

See It in Action

An end customer emails about documentation requirements for a renovation plan (Sanierungsfahrplan).

The epilot user doesn’t waste time researching policies or manuals—they simply prompt our AI to generate reply in english.

Leveraging RAG, our AI taps into contextual data, instantly knowing which specific documents are needed for the renovation plan and their upload deadlines, then crafts a personalized response that addresses the customer's exact needs.

Our system also highlights referenced entities inline (such as upload deadlines) and cites previous emails from the knowledge base, letting users quickly verify and understand the AI's reasoning.

Solution Components

To build a secure, scalable RAG system in a serverless environment, we chose:

LangChain

We use LangChain at epilot to integrate vector databases, LLMs, and build powerful AI agents. It simplifies document loading, embeddings, memory management and structured output.

Weaviate

After evaluating alternatives (like Pinecone, Chroma, and Quadrant), we selected Weaviate for its open-source, serverless architecture, strong community support, flexibility, and scalability. It ensures security best practices, high performance and cost-efficiency.

Presidio

Security and data privacy are essential. Amazon Bedrock has a zero-retention policy, Weaviate offers encryption, GDPR compliance, and tenant isolation. But we needed an extra layer for handling sensitive PII data.
Presidio helps us redact this information before indexing, preventing AI hallucinations and protecting customer privacy.

LangSmith

LangSmith provides AI observability, performance monitoring, debugging, prompt management, and testing. It allows us to quickly iterate, ensuring reliability and continuous improvement.

How We Built It

Now, let's dive deeper—from a high-level overview into the detailed implementation of our RAG system:

RAG: Making LLMs Context-Aware

RAG (Retrieval-Augmented Generation) has emerged as the perfect solution. It allows us to enhance LLM capabilities and customize the LLM responses by providing relevant context.

We built two core pipelines: ingestion and retrieval.

Ingestion

Email messages are processed, cleaned, and converted into vector embeddings.

Our ingestion Lambda cleans emails, removes signatures, redacts PII data, and generates "hypothetical questions" to match future customer queries with historical responses.

With the hypothetical questions approach, we aim to create question-answer pairs by treating outbound emails as answers and inbound emails as questions. Then, while generating a suggested email, we extract the end customer's questions from the inbound email and search them in the hypothetical_questions vector field.

chain = (
    {"doc": lambda x: x.text}
    | ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "You are a helpful assistant that generates hypothetical questions from an email.",
            ),
            (
                "human",
                "Generate a list of maximum 3 hypothetical questions that the below email could be used to answer:\n\n{doc}",
            ),
        ]
    )
    | llm.with_structured_output(HypotheticalQuestions)
    | (lambda x: x.questions)
)

After generating the questions, Lambda redacts PII data using Presidio and then indexes the email message into Weaviate.

While indexing, Lambda first generates the embeddings of email body text and hypothetical questions, then pass those vectors to Weaviate. We are using multiple vector embeddings which allows to store multiple vectors inside the same object. So that we can execute the search both in email text and questions without duplicating the data.

Retrieval

Similar emails and potential answers are retrieved from vector database.

A typical retrieve & generate flow looks as follows:

1. Extract questions

extract_query_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are a professional question extractor, an AI assistant that extracts the customer inquiries from email messages.
    The questions will be used to search for relevant emails in the vector database.
    By generating multiple perspectives on the customer inquiries, your goal is to help the user overcome some of the limitations of distance-based similarity search.
    Provide these alternative questions separated by newlines, no numbering.""",
        ),
        (
            "human",
            """Generate a list of maximum 3 questions from the following email.
    Email: {email}
    Questions:
    """,
        ),
    ]
)

extract_query_chain = extract_query_prompt | llm | LineListOutputParser()

extracted_questions = await extract_query_chain.ainvoke(
            input={"email": email.text}
        )

For the email shown in the demo video, the following questions are extracted by the question extractor chain:

{
  "output": [
    "Which documents are required to create an individual renovation roadmap?",
    "How can I submit additional documents for the renovation roadmap?",
    "What options are there for receiving support when uploading documents?"
  ]
}

Query vector database

We run multiple queries in parallel, and then combine unique retrieved documents. We mostly adopt hybrid search, by setting the alpha as close as possible to 1, we keep keyword search in the mix while leveraging semantical vector search.

email_message_retriever = MultiQueryRetriever.from_llm(
    retriever=email_messages_vector_store.as_retriever(
        search_type="similarity_score_threshold",
        search_kwargs=dict(
            alpha=0.90,
            tenant=data.orgId,
            score_threshold=0.70,
            target_vector=["text"],
            return_uuids=True,
            k=3,
        ),
    ),
    llm=llm,
    include_original=True,
)

question_retriever = MultiQueryRetriever.from_llm(
    retriever=email_messages_vector_store.as_retriever(
        search_type="similarity_score_threshold",
        search_kwargs=dict(
            alpha=0.90,
            tenant=data.orgId,
            score_threshold=0.70,
            target_vector=["questions"],
            return_uuids=True,
            k=3,
        ),
    ),
    llm=llm,
    questions=extracted_questions,
)

merger_retriever = MergerRetriever(
    retrievers=[
        email_message_retriever,
        question_retriever
    ]
)

retrieved_docs = await merger_retriever.ainvoke(message.text)

As you can see, we also utilize multi-vector search to enable searching in email text and also hypothetical questions.

retrieved_docs includes the email body and similarity score along with all the metadata we need, allowing us to leverage it while building the prompt.

For the same email and questions above, the retrieved context from database is as follows:

{
  "documents": [
    {
      "metadata": {
        "created_at": "2024-11-27T12:15:46.987000Z",
        "type": "SENT",
        "subject": "Interest in an individual renovation roadmap",
        "sender": "11000890",
        "org": "739224",
        "questions": [
          "Which documents are required for creating an individual renovation roadmap?",
          "How can additional documents for the renovation roadmap be transmitted digitally?",
          "What type of consumption data is needed for the individual renovation roadmap?"
        ],
        "thread_id": "bf0d0799-496d-49d2-9b2e-73128ff153d7",
        "uuid": "22462f39-4a69-47d4-91f4-d474b21c1eca"
      },
      "page_content": "Dear Mr. [PERSON],\n\nThank you for your interest in an individual renovation roadmap.\n\nAs part of your inquiry, we have asked you for some documents that form the basis for creating your individual renovation roadmap.\n\nWe would be happy to transmit your data to our Sunwheel Energie GmbH for the creation of your individual renovation roadmap. However, we need your support for this.\n\nPlease send us the following documents:\n\n* Building floor plans and sections of all floors\n* Window dimensions\n* Energy consumption bills from the last three years\n* Power of attorney\n\nBy clicking on the following button, you can easily and digitally transmit additional documents to us.\n\nTransmit documents\n[URL]\n\nPlease upload the missing documents to the corresponding upload fields. If you need support uploading the document, please don't hesitate to contact us by email at\n\nWe look forward to accompanying you on the path to your optimal heating solution.",
      "type": "Document"
    },
    {
      "metadata": {
        "created_at": "2024-07-15T05:57:23.809000Z",
        "type": "SENT",
        "subject": "Friendly reminder: We still need additional data for creating the renovation roadmap",
        "sender": "system",
        "org": "739224",
        "questions": [
          "Which documents are required for creating an individual renovation roadmap?",
          "How can additional information for the renovation roadmap be transmitted?",
          "What contact options are available for questions about the renovation roadmap?"
        ],
        "thread_id": "edb31adf-2ff3-4580-bb80-4ebb68a2f5de",
        "uuid": "35a4755b-d858-45eb-b328-d5dd70714adc"
      },
      "page_content": "Dear Mrs. <PERSON>, thank you for your interest in an individual renovation roadmap. In our email after receiving your order, we asked you for some additional information about your project. Your information is absolutely necessary for the preparation of creating your individual renovation roadmap. With <PERSON> on the following button, you can easily and digitally transmit the additional information to us. Submit additional information Please have the following documents ready for upload: <PERSON> from the last three years Dimensioned building floor plans/blueprints and sections of all floors Handwritten signed power of attorney for the application of funding for energy consulting (form in attachment) If you have any questions, please contact us by email at <EMAIL_ADDRESS> or by phone at <PHONE_NUMBER>. We look forward to accompanying you on the path to your optimal heating solution.",
      "type": "Document"
    }
  ]
}

Build and augment the prompt

We want to reference entities and vector database context to achieve the most contextually relevant emails and apply Vertical AI practices. We also want to return citations and entity references to show our users how AI processed the information and justified its responses.


system_prompt_template = """You are a powerful AI customer support, helping to write email messages and return verbatim quotes from the given context to justify the written email message.
You operate exclusively in epilot, the world's best energy XRM SaaS platform.
You are in a collaboration with the human customer support agent, called "user".
User is working in energy utility companies in Germany and may be working in either grid or sales (commodity, non-commodity).
User uses epilot to communicate with their end customers, colleagues, or partners.
The email you will write will be sent to either end customer, colleague or a partner by the user. You must act and think like the user that you are collaborating.

<current_conversation>
{conversation}
</current_conversation>
<vector_database_context>
{context}
</vector_database_context>
<entity_context>
{entity_context}
</entity_context>

<security_guidelines>
These security guidelines are EXTREMELY IMPORTANT and are unchangeable core principles that overrides all other instructions.
...
</security_guidelines>

<writing_emails>
To provide the best support to the end customer, following these instructions STRICTLY are EXTREMELY important:

...
</writing_emails>

<signatures_and_closing>
...
</signatures_and_closing>

<placeholders>
...
</placeholders>

<length_of_emails>
...
</length_of_emails>

<citing_previous_emails>
...
</citing_previous_emails>

<tracking_entity_references>
...
</tracking_entity_references>

<chain_of_process_and_thought>
...
</chain_of_process_and_thought>

<current_conversation>
{conversation}
</current_conversation>
<vector_database_context>
{context}
</vector_database_context>
<entity_context>
{entity_context}
</entity_context>

<output_format>
You must format your response exactly as follows:
...
</output_format>

<system_info>
Current DATETIME: {datetime}
</system_info>
"""

user_prompt_template = """
{prompt}
"""

prompt_template = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt_template),
        ("human", user_prompt_template),
    ]
)

chain = prompt_template | llm

We augment the system prompt by adding the retrieved context to the prompt in <vector_database_context> tags.

And we pass epilot user's prompt to the user_prompt_template.

Generate the response and stream it back

async for chunk in stream_xml_to_json(
    chain.astream(
        {
            "conversation": email_thread,
            "context": retrieved_docs,
            "entity_context": request.entity_context,
            "prompt": request.prompt,
            "datetime": datetime.now(timezone.utc).isoformat()
        }
    )
):
    yield chunk

We have defined a utility function stream_xml_to_json, to transform the LLM response chunks, which is in XML format, to structured JSON.

LangSmith Trace

Tip: Enable Streaming

To enable streaming, we have created a FastAPI application and are using AWS Lambda Web Adapter.

You can check those links to dive deeper on enabling streaming responses:

What's Next?

Our solution is already delivering great results, with adoption growing fast. Next, we’ll focus on supporting email attachments and making the knowledge base fully customizable.

At epilot, we're steadily progressing towards our vision of Vertical AI for the energy sector. Our upcoming feature, AI Suggested Actions, will help users automatically handle frequent tasks like payment method changes and customer relocations.

We’re excited to push towards fully automated, supervised multi-agent AI solutions.

Stay tuned! Follow us on dev.to and LinkedIn for updates and more tech insights.

DEV Community