Gao Dalie (Ilyass)

Posted on Feb 10

Langchain (Upgraded) + DeepSeek-R1 + RAG Just Revolutionized AI Forever

Last week, I made a video about DeepSeek-V3, and it caused a huge stir in the global AI community.

Two Days ago, a Chinese DeepSeek released the inference-based large-scale language model “DeepSeek-R1” as open source!

It’s said to perform just as well as OpenAI’s most accurate inference model, ‘o1.’ What’s even more impressive is that It is an incredibly price-breaking model with an API price of less than 1/25 of OpenAI o1. On top of that, it’s been open-sourced under the highly flexible MIT license, so anyone can download and use the model

As soon as the R1 model came out this time, it not only refuted the previous statement of distilling OpenAI o1, but the official also directly said: “We can tie with the open source version of o1.”

It is worth mentioning that R1 breaks through the previous model training methods and does not use any SFT data at all. It only trains the model through pure RL. This shows that R1 has learned to think about problems by itself — which is actually more in line with human thinking. rule.

While I wanted to develop the RAG chatbot, I found that LangChain made huge updates before I developed the RAG chatbot with it. Unfortunately, it does not remember the conversation content like ChatGPT, and it also cannot input new data or be trained to answer specific questions like a customer service chatbot

So, Let me give you a quick demo of a live chatbot to show you what I mean.

I will upload a PDF that includes images and then ask the chatbot a question: ‘Summarize this PDF.’ Feel free to ask any questions you want. If you look at how the chatbot generates the output, you will see that the PDF file stores its content in a temporary file, processes it with PDFPlumberLoader to extract complex data like tables, and cleans up the temporary file.

It uses SemanticChunker to split documents into semantic chunks and FAISS for efficient similarity-based search. The retriever finds the top 3 most similar chunks for a query and uses history_aware_retriever to enhance the agent’s ability to incorporate the entire conversation history into the retrieval process.

It also integrates context_chain for external input data and docs_chain for conversation records, using create_retrieval_chain to form a complete retrieval process. It handles context-aware responses by verifying if documents are uploaded, retrieving context, and generating responses with an RAG chain

By the end of this video, you will understand how Deepseek R1 is trained and how to use Deepseek-R1 with RAG.

Before we start! 🦸🏻‍♀️

If you like this topic and you want to support me:

like my article; that will really help me out.👏
Follow me on my YouTube channel
Subscribe to me to get the latest article.

Overview of DeepSeek R1 learning method

DeepSeek R1 is characterized by its post-training using reinforcement learning (RL). In general, large-scale language models are developed through the following steps:

Pre-training: A large corpus creates a model that “predicts the next word.”
Supervised fine-tuning (SFT): High-quality, human-created instruction-response pairs are used to fine-tune the model for a specific task.
RLHF (Reinforcement Learning with Human Feedback): Humans evaluate the model’s output and use the score as a reward to update the model.
DeepSeek R1 reportedly performed reinforcement learning, especially step 3, on a large scale. Additionally, they explored an approach by applying RL directly, without the conventional SFT (a version called DeepSeek-R1-Zero), and then added SFT to complete DeepSeek R1. This method is said to have promoted the model’s internal thought process (chain-of-thought) and self-verification.

Normally, handling the chain of thought effectively requires some SFT data and human labels. However, DeepSeek-R1-Zero claims to have achieved inference power through RL alone, without going through SFT. This approach offers the following advantages:

Reducing the cost of large-scale data collection
Enabling self-correction for unknown tasks and complex situations

However, models that have not gone through SFT completely still have issues such as text being somewhat difficult to read and unintended multilingual content.

Let’s Build The APP

Let us now explore step by step and unravel the answer to how to create RAG APP. we will install the libraries that support the model. For this, we will do a pip install requirements

pip install -r requirements.txt

The next step is the usual one where we will import the relevant libraries, the significance of which will become evident as we proceed.

PDFPlumberLoader is a very powerful tool for processing complex structured data such as tables from PDF files. It can extract text, images, tables, fields, etc

SemanticChunker splits text into chunks based on semantic similarity, ensuring that related content stays together in the same chunk.

and so on please, if you want to know more about the latest langchain upgrade where I will explain it in detail


import os
import streamlit as st
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_experimental.text_splitter import SemanticChunker
from langchain_huggingface import HuggingFaceEmbeddings
import os
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.chains.history_aware_retriever import create_history_aware_retriever
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain_core.messages import AIMessage, HumanMessage
import tempfile
from langchain_openai.chat_models.base import BaseChatOpenAI

I create a function to handle an uploaded file using a temporary file to store its content. I then process the PDF with PDFPlumberLoader to process complex structured data such as tables from PDF files, ensuring I clean up the temporary file afterwards to keep the system tidy. Finally, I return the extracted data for further use.


def process_uploaded_file(uploaded_file):

    with tempfile.NamedTemporaryFile(delete=False, suffix='.pdf') as tmp_file:
        tmp_file.write(uploaded_file.getvalue())
        loader = PDFPlumberLoader(tmp_file.name)
        documents = loader.load()
    os.unlink(tmp_file.name)

    return documents

I create a function to process a list of documents and build a retriever for similarity-based searches. First, I use the
SemanticChunker with OpenAIEmbeddings to split the documents into smaller chunks while preserving semantic relationships, ensuring related content stays grouped.

After confirming the splitting process, I developed a vector store using FAISS, which organizes the chunks for efficient similarity search.

Finally, I return a retriever that can find the top 3 most similar chunks for a given query, making the document retrieval process both accurate and efficient.

def get_vs_retriever_from_docs(doc_list):

    text_splitter = SemanticChunker(HuggingFaceEmbeddings())
    documents = text_splitter.split_documents(doc_list)
    st.write('Document splitting done')


    embedder = HuggingFaceEmbeddings()
    vector = FAISS.from_documents(documents, embedder)

    return vector.as_retriever(search_type="similarity", search_kwargs={"k": 3})

I create the init_ui() function to set up a user-friendly interface for uploading and analyzing PDF documents using Streamlit. First, I configure the page with a title and descriptive header.

I then initialize session state variables to manage the retriever, chat history, and upload status. Next, I provide a file uploader widget for users to upload PDF files.

Once a file is uploaded, I process it to extract its content, create a retriever using get_vs_retriever_from_docs, and store it in the session state for further use. Finally, I notify the user of the successful document processing with a success message.

def init_ui():
    st.set_page_config(page_title='Document uploader')
    st.markdown('#### :books:🧙Gao Dalie (高達烈): Your Document Summarizer')
    st.markdown("<h8 style='text-align: right; color: green;'>*Share the pdf of the book you want to read*</h8>", unsafe_allow_html=True)

    if "vector_store" not in st.session_state:
        st.session_state.vector_store = None
    if "chat_history" not in st.session_state:
        st.session_state.chat_history = []  
    if "doc_upload" not in st.session_state:
        st.session_state.doc_upload = False

    uploaded_file = st.file_uploader("Upload PDF", type=["pdf"])
    if uploaded_file:
        docs = process_uploaded_file(uploaded_file)
        if docs:
            retriever = get_vs_retriever_from_docs(docs)
            st.session_state.vector_store = retriever
            st.session_state.doc_upload = True
            st.success("Document processed successfully")

I create the get_related_context() function to improve the retrieval of relevant information in a conversational system. First, I initialize the Ollama model (deepseek-r1) to process user inputs.

Then, I define a prompt template that combines the chat history and user input, instructing the model to generate a search query based on the ongoing conversation.

Finally, I use create_history_aware_retriever to enhance the agent's ability to incorporate the entire conversation history into the retrieval process.

def get_related_context(vector_store):

    # llm = Ollama(model="deepseek-r1")


    llm = BaseChatOpenAI(
        model='deepseek-reasoner', 
        openai_api_key='sk-68f459660c7b4d179e074cbedce962c0', 
        openai_api_base='https://api.deepseek.com',
        max_tokens=1024
    )
    prompt = ChatPromptTemplate.from_messages([
        ("system", "Generate a search query based on the conversation."),
        ("user", "{input}")
    ])
    return create_history_aware_retriever(llm, vector_store, prompt)

In this function, I designed a function that integrates two core chains:

context_chain Responsible for finding external input data
Responsible for finding conversation records
Finally, use it retrieval_chain = create_retrieval_chain(context_chain, docs_chain) to create a chain that can retrieve conversations and external input data, and complete a chain that can allow natural conversations.

The conversation record needs to create a List storage by itself, which is chat_history the part.


def get_context_aware_prompt(context_chain):
    # llm_! = Ollama(model="deepseek-r1")

    llm = BaseChatOpenAI(
        model='deepseek-reasoner', 
        openai_api_key='sk-68f459660c7b4d179e074cbedce962c0', 
        openai_api_base='https://api.deepseek.com',
        max_tokens=1024
    )
    prompt = ChatPromptTemplate.from_messages([
       ("system", "Answer questions using the provided context:\n\n{context}"),
       ("user", "{input}")
   ])
    # st.write(docs_chain)
    return create_retrieval_chain(context_chain, docs_chain)

In this function, I handle the process of generating a context-aware response for a given query. I start checking if the necessary documents are uploaded.

If no documents are uploaded, I return an error. Once the documents are verified, I retrieve a context chain to access relevant information from the uploaded documents. If retrieving the context fails, I return an error.

I then create an RAG chain that combines the context with the conversation history and the user’s query to generate a response. Finally, I invoke the RAG chain, process the response, and return the generated answer.

def get_response(query: str) -> str:
    if not st.session_state.vector_store:
        return "Error: Please upload documents first"
    try:
        context_chain = get_related_context(st.session_state.vector_store)
        # st.write(context_chain)
        if not context_chain:
            return "Error: Failed to process context"

        rag_chain = get_context_aware_prompt(context_chain)
        # st.write(rag_chain)
        current_history = st.session_state.chat_history[-2:] if len(st.session_state.chat_history) > 1 else []
        return rag_chain.invoke({
            "chat_history": current_history,
            "input": query
        })["answer"].do
    except Exception as e:
        return f"Error: {str(e)}"

In this function, I set up and display a chat interface for user interaction. First, I loop through the stored chat history and display both user and AI messages in the chat.

Then, I provide an input where the user can type a new question. If the user submits a query, I call the get_response function to generate an answer, and then I update the chat history with the new message and response.

def init_chat_interface():
    for message in st.session_state.chat_history:
        with st.chat_message("user" if isinstance(message, HumanMessage) else "assistant"):
            st.write(message.content)

    if prompt := st.chat_input("Ask a question...", disabled=not st.session_state.doc_upload):
        st.session_state.chat_history.append(HumanMessage(content=prompt))
        response = get_response(prompt)
        st.session_state.chat_history.append(AIMessage(content=response))

if __name__ == "__main__":
    init_ui()
    if st.session_state.vector_store:
        init_chat_interface()

Conclusion :

The launch of DeepSeek R1 is a technological breakthrough and a key milestone in AI’s democratization. Its open approach is reshaping our view of AI, allowing more people to contribute to its development.

Understanding concepts like the Retrieval Chain and the Conversation Retrieval Chain is crucial for using LangChain.

As long as you understand the principles of these two Chains, you can not only know the operating principles behind customer service chatbots made with language models but also enable us to do similar applications.

However, LangChain is not just that. Many functions and tips need to be learned to create language model-related applications.