DEV Community

sagaruprety
sagaruprety

Posted on

How to Build a Legal Information Retrieval Engine Using Mistral, Qdrant, and LangChain

Introduction

Finding legal cases is an extremely important task that lawyers do, and also the most time consuming and labor-intensive. There is a vast trove of judicial decisions data which needs to be searched, and existing softwares mostly offer boolean and keyword-based search approaches.

In this blog we build a simple way for lawyers to upload case documents, and build a simple AI application that allows for search and analyses of legal case documents related to a given new case scenario. We utilize all open-source components — Mistral AI model, Qdrant vector database, and the Langchain library.

To run the code, one needs to download the Mistral AI model (Mistral-7B) in either the local machine or a cloud. Although we quantize the model and reduce its size, it would need a GPU with at least 16 GB of RAM. I would recommend using Google Colab for running the code, as their free-tier GPU can take the above load easily.

The full codebase of this tutorial is available here.

Why Legal Case Discovery Search?

Legal case discovery is the process of identifying and gathering relevant information to support a given legal case. Technically termed as Case law retrieval, it is needed to analyze judicial precedents and decisions so that lawyers can advise their clients on a similar legal case. Case law is one of two sources of law, along with statutes. Although statutes are limited in their size and slowly amended or expanded, case law forms a rapidly and ever expanding source. This process can be time-consuming and labor-intensive, especially when dealing with large volumes of data. Large language models (LLMs) can help expedite this process by semantically searching for keywords to match a broad yet relevant set of case precedents and statutes. This can not only help legal professionals but also benefit a layperson who can get some preliminary understanding of similar cases and their outcomes, before deciding to proceed with their own case and hiring a lawyer.

Architecture

This tutorial utilizes LLMs and the Retrieval Augmented Generation (RAG) architecture to build a search agent over case law documents. We build a traditional retrieval component using vector databases to filter down the large number of case documents, based on a user query. Then those filtered document chunks are passed on to the LLM, along with the query. The reasoning and semantic understanding capabilities of LLMs helps them the exact answer to the query.

About Mistral

Mistral-7B is a relatively recent large language model which is open-source and developed by Mistral AI, a french startup, which has gained attention because of it outperforming the popular Llama2 models. Specifically, the 7 billion parameter version of Mistral is reported to outperform the 13 Billion and 34 Billion parameter versions of Llama2, which is a significant milestone in generative AI, as this means improved latency without sacrificing model performance.

About Qdrant

Qdrant is an open-source vector search engine that enables fast and efficient similarity search. It is designed to work with high-dimensional data, making it suitable for use with large language models like Mistral. The integration of Qdrant in the architecture aims to enhance the search capabilities for legal case discovery, allowing for quick and accurate retrieval of relevant information from a large corpus of legal documents.

About Langchain

Langchain is an open-source library used for building solutions using LLMs. It offers ready made modules and functions to iteratively build prompts and connect to vector databases. It also offers intuitive syntax to chain together all these components to input into an LLM.

About Dataset

The dataset used in this tutorial was constructed as part of the Artificial Intelligence for Legal Assistance Track at FIRE 2019 conference, which is an important conference in the discipline of Information Retrieval. It can be downloaded from here. It contains thousands of case law documents, but one can consider a subset of 500 documents (files c1.txt to c500.txt in the Object_casedocs directory) for the purpose of this tutorial.

Guide to building the app

We first start with installing the required libraries.

!pip install -q -U transformers langchain bitsandbytes qdrant-client
!pip install -q -U sentence-transformers accelerate unstructured
Enter fullscreen mode Exit fullscreen mode

While all libraries will be clearly visible as and when they are imported and used, except accelerate, which is used under the hood in the transformers model pipeline to map the loaded model into the gpu or cpu, according to the availability. Also, unstructured library is utilized by langchain’s dataloader while loading all files from a directory.

Next, load all required libraries, modules and functions:

import torch
from operator import itemgetter
from transformers import BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.vectorstores import Qdrant
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_community.document_loaders import DirectoryLoader
from langchain.embeddings import HuggingFaceEmbeddings
file_path = '/content/drive/MyDrive/Colab Notebooks/data/Object_casedocs_500/'
Enter fullscreen mode Exit fullscreen mode

We start first by loading the Mistral model and quantizing it to 4 bit weights using the bitsandbytes library.

# preparing config for quantizing the model into 4 bits
quantization_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_compute_dtype=torch.float16,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
)

# load the tokenizer and the quantized mistral model
model_id = "mistralai/Mistral-7B-Instruct-v0.2"

model_4bit = AutoModelForCausalLM.from_pretrained(
             model_id, 
             device_map="auto",
             quantization_config=quantization_config,)

tokenizer = AutoTokenizer.from_pretrained(model_id)

# using HuggingFace's pipeline
pipeline = pipeline(
       "text-generation",
       model=model_4bit,
       tokenizer=tokenizer,
       use_cache=True,
       device_map="auto",
       max_new_tokens=5000,
       do_sample=True,
       top_k=1,
       temperature = 0.01,
       num_return_sequences=1,
       eos_token_id=tokenizer.eos_token_id,
       pad_token_id=tokenizer.eos_token_id,
)
model = HuggingFacePipeline(pipeline=pipeline)
Enter fullscreen mode Exit fullscreen mode

Now that we have the LLM ready, let’s get the fodder for the LLM — legal case documents. We first load the documents from disk and then define a text splitter to chunk the documents into manageable chunks.

# load the legal case documents and define text splitter
loader = DirectoryLoader(file_path)
docs = loader.load()
print(len(docs))

text_splitter = RecursiveCharacterTextSplitter(
   chunk_size=1000,
   chunk_overlap=20,
   length_function=len,
   is_separator_regex=False,
)

docs = text_splitter.split_documents(docs)
Enter fullscreen mode Exit fullscreen mode

The standard next step is the key part of RAG — defining embedding model, embedding documents and indexing them into a vector store.

# define the embedding model
emb_model = "sentence-transformers/all-MiniLM-L6-v2"
embeddings = HuggingFaceEmbeddings(
   model_name=emb_model,
   cache_folder=os.getenv('SENTENCE_TRANSFORMERS_HOME'))

qdrant_collection = Qdrant.from_documents(
docs,
embeddings,
location=":memory:", # Local mode with in-memory storage only
collection_name="it_resumes",
)
# construct a retriever on top of the vector store
qdrant_retriever = qdrant_collection.as_retriever()
Enter fullscreen mode Exit fullscreen mode

We can check how well the qdrant retriever has indexed the legal case documents by querying something:

qdrant_retriever.invoke('Cite me a dispute related to electricity board tender')
Enter fullscreen mode Exit fullscreen mode
[Document(page_content="The Chhattisgarh State Electricity Board (for short 'the CSEB') issued an advertisement inviting tender (NIT) bearing No. T- 136/2004 dated 02.06.2004 for its work at Hasedeo Thermal Power Station (Korba West) towards Designing, Engineering, Testing, Supply, Erection & Commission of HEA Ignition system. The applications received there under…", metadata={'source': '/content/drive/MyDrive/Colab Notebooks/data/Object_casedocs_500/C21.txt'}),
 Document(page_content='25. In the present case, the respondent no.1 challenged the impugned advertisement dated 6.12.2004 issued by the Nagar Nigam. We have carefully perused the said advertisement and find no illegality in the same. It has been held by this Court in several decisions that the Court should not ordinarily interfere with the terms mentioned in such an advertisement. Thus in Global Energy Ltd. and Anr. vs. Adani Exports Ltd. and Ors. 2005(4) SCC 435 2005 Indlaw SC 384 this Court observed at para 11:\n\n"The principle is, therefore, well settled…"', metadata={'source': '/content/drive/MyDrive/Colab Notebooks/data/Object_casedocs_500/C401.txt'}),
 Document(page_content="Ram, General Manager of the said Power Station furnished his report dated 28.12.2004 wherein it was summed up that due to the defects in the scanning system, supplied by the respondent, generation had been adversely effected and the said Electricity Board was not satisfied with the equipment supplied by the respondent. In spite of the aforesaid material, the tender Committee acted with caution and even the technical expertise was sought…", metadata={'source': '/content/drive/MyDrive/Colab Notebooks/data/Object_casedocs_500/C21.txt'}),]
Enter fullscreen mode Exit fullscreen mode

We see that we get all document chunks which have the embedding closely matching the query. Inspecting the document chunks, we do find mention of cases related to electricity board and tenders.

But we want a bit more than just search, we want specific question answering and reasoning over these documents, in order to glean newer insights and curate new ideas.

We want our engine to answer more specific queries regarding the cases.

In the next step we therefore chain together these retrieved document chunks and pass on to the LLM. Here we start using the Langchain Expression Language (LCEL) which is a new set of syntax in Langchain to easily create LLM input chains.

An essential concept in LCEL is Runnables, which is an atomic unit of work which can be invoked, batched or streamed. It can also be understood as something which can be ‘run’ (through an LLM). In Pythonic terms, Runnables is a class. LCEL has two types of Runnables, RunnableSequence — invokes a series of runnables sequentially, with one runnable’s output serving as the next one’s input, and RunnableParallel — which invokes runnables concurrently, providing the same input to each.

The prompt used below takes two inputs — the context of retrieved documents and a user question. We first set up these two inputs by using the RunnableParallel class, which parallelly takes the input question and creates the context as output of the retriever. Note the RunnablePassthrough function, which captures all the text when the chain is invoked.

# define prompt template
template = """<s>[INST] You are a helpful, respectful and honest legal assistant.
Your task is to assist lawyers in legal case discovery.
Answer the question below from context below.
{context}
{question} [/INST] </s>
"""

# create the prompt from the above template
prompt = ChatPromptTemplate.from_template(template)

# combine document chunks into one
def format_docs(docs):
   return "\n\n".join(doc.page_content for doc in docs)

# setup the context and question part of the chain
setup_and_retrieval = RunnableParallel(
   {"context": qdrant_retriever| format_docs, "question": RunnablePassthrough()})

# extend the chain to include the prompt, model, and output parser
rag_chain = setup_and_retrieval | prompt | model | StrOutputParser()
Enter fullscreen mode Exit fullscreen mode

Results

Let’s start with a bit more complex query than what we passed to the qdrant retriever:

rag_chain.invoke("Cite me a dispute related to electricity board tender")
Enter fullscreen mode Exit fullscreen mode
In this dispute, the Chhattisgarh State Electricity Board (CSEB) invited tenders for the Designing, Engineering, Testing, Supply, Erection & Commission of HEA Ignition system at Hasedeo Thermal Power Station (Korba West). The respondent, M/s Control Electronics India (CEI), submitted an application for the tender documents but it was rejected due to incomplete documents, specifically the non-submission of documentary evidence of past performance and experience. The respondent then complained against the appellant for not issuing the tender documents.\n\nHowever, the present case is not about the initial rejection of the respondent's application. Instead, it revolves around the respondent's allegations that the records were fabricated and the tender document was not opened and returned in furtherance of official duties by the appellant.\n\nThe Court has carefully examined the impugned advertisement issued by the Nagar Nigam and found no illegality in it. The Court has also considered th...
Enter fullscreen mode Exit fullscreen mode

Now one may say that this is not too different from the response of the qdrant retriever. Indeed the retriever returned a list of possible documents and the LLM response returns the most relevant of them all. However, the utility of LLM goes beyond, and can answer questions from within this document, as we see:

rag_chain.invoke("In the dispute related to electricity board tender, what was the outcome?")
Enter fullscreen mode Exit fullscreen mode
In this dispute, the respondent had filed a civil suit challenging the decision of the Electricity Board in returning his tender documents due to non-compliance with the pre-qualifying conditions. However, he withdrew the suit, leading to its dismissal for non-prosecution. The respondent's attempt to challenge the decision of the Tender Committee in not considering his tender was unfaulted due to the constructive res judicata effect of the withdrawn suit. The tender of the respondent was rejected due to the defects in the scanning system supplied by him, which adversely affected the generation at Patratu Thermal Power Station. The Tender Committee sought expert opinions and rejected the respondent's tender based on the reports received. The allegations of fabricating records made by the respondent were considered mischievous and an afterthought. The appellant, R.C. Jain, was deputed to verify the claim of the respondent, and he reported that the works carried out by the respondent at Patratu Thermal Power Station were not satisfactory. Based on this information, the respondent's tender document was not opened and returned.
Lastly, you can also construct your chain so as to return the source documents. You just need to pass them through, all the way.
Enter fullscreen mode Exit fullscreen mode

As mentioned earlier RunnablePassthrough simply takes the input and passes it through. We can use the assign function to add extra arguments to it. Below cell executes a composite chain, where the first chain, rag_chain_from_docs, actually gets executed second. The rag_chain_with_source, on invocation constructs the context with the question and retriever, which is passed on to rag_chain_from_docs, whose output is returned and assigned to a new variable answer. The final output from the LLM thus has three variables — question, the LLM answer and all the retrieved documents as context.

rag_chain_from_docs = (
   RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
   | prompt
   | model
   | StrOutputParser()
)
Enter fullscreen mode Exit fullscreen mode
rag_chain_with_source = RunnableParallel(
   {"context": qdrant_retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)
Enter fullscreen mode Exit fullscreen mode
rag_chain_with_source.invoke("Cite me a dispute related to electricity board tender")
Enter fullscreen mode Exit fullscreen mode
{'context': [Document(page_content="The Chhattisgarh State Electricity Board (for short 'the CSEB') issued an advertisement inviting tender (NIT) bearing No. T- 136/2004 dated 02.06.2004 for its work at Hasedeo Thermal Power Station (Korba West) towards Designing, Engineering, Testing, Supply, Erection & Commission of HEA Ignition system. The applications received there under were required to be processed in three stages successively namely…", metadata={'source': '/content/drive/MyDrive/Colab Notebooks/data/Object_casedocs_500/C21.txt'}),
  Document(page_content='15. As already pointed above, tender was floated by the CSEB and the CEI herein was one of the parties who had submitted its bid through the respondent. However, tender conditions mentioned certain conditions and it was necessary to fulfill those conditions to become eligible to submit the bid and have it considered. As per the appellants, tender of the respondent was rejected on the ground that plant…', metadata={'source': '/content/drive/MyDrive/Colab Notebooks/data/Object_casedocs_500/C21.txt'})],

 'question': 'Cite me a dispute related to electricity board tender',

 'answer': "In this dispute, the Chhattisgarh State Electricity Board (CSEB) invited tenders for the Designing, Engineering, Testing, Supply, Erection & Commission of HEA Ignition system at Hasedeo Thermal Power Station (Korba West). The respondent, M/s Control Electronics India (CEI), submitted an application for the tender documents but it was rejected due to incomplete documents, specifically the non-submission of documentary evidence.."}
Enter fullscreen mode Exit fullscreen mode

Conclusion

We have built a simple case law retrieval system in a few lines of code, which looks effective in not only retrieving relevant legal precedents to a query, but also answer specific questions related to that precedent. The secret sauce lies in the capability of LLMs which have been trained extensively with billions of text tokens, and in qdrant’s ability to index and retrieve documents with very low latency.

We hope that this serves as a starting point to a larger case retrieval and discovery engine, which can index and query millions and billions of legal precedents, so that anyone seeking legal advice and justice have access to their own Mike Ross from Suits.

Top comments (0)