[LangChain] Potential Issues with LangChain Embeddings

#llm #rag #python #ai

title: [LangChain] Potential Issues with LangChain Embedding
published: false
date: 2023-07-10 00:00:00 UTC
tags: 
canonical_url: http://www.evanlin.com/langchain-embedding-issue/
---

![What is LangChain and how to use it](http://www.evanlin.com/images/2022/langchain-1679313960-9080320.jpg)

# Preface:

I previously took the new DeepLearning.ai course "[LangChain Chat with Your Data](https://learn.deeplearning.ai/langchain-chat-with-your-data)", which shares many very practical case studies. Here, I want to share a problem that is easy to misunderstand.

## Case 1: Searching for Excessive Similarity, But Forgetting Important Information

![image-20230711210300257](http://www.evanlin.com/images/2022/image-20230711210300257.png)

Here, four documents are loaded. You can see that it intentionally rereads the files once or twice, intending to make the data incorrect.

![image-20230711210431891](http://www.evanlin.com/images/2022/image-20230711210431891.png)

When trying to query "whether there is data about Regression in 'Chapter 3'", due to the data confusion, it will also return the results of Chapter 1, because there is a confusion problem. This causes it to not pay attention to "Chapter 3", but only to the word "regression".

![image-20230711211030414](http://www.evanlin.com/images/2022/image-20230711211030414.png)

At this time, use the "Maximum Marginal Relevance (MMR)" method to find the data, and don't look for the closest data.

vectordb.max_marginal_relevance_search(query,k=2, fetch_k=3)


![image-20230711211637196](http://www.evanlin.com/images/2022/image-20230711211637196.png)

`query = "What is the bitcoin?"
query = "What is the bitcoin?"
vectordb.similarity_search(query, k=2, filter={"page": 1})
`

Through `filter`, you can specify many parameters, from which document, from which content is cut, and even content comparison.

DEV Community

[LangChain] Potential Issues with LangChain Embeddings

Top comments (0)