Imagine you were given a large contract and asked a really specific question about it: "What is the notice for termination for convenience?" It would be an ordeal to locate the answer for this in the contract.
But what if we could use AI 🤖 to analyze the contract and answer this for us?
What we want here is to perform something known as retrieval-augmented generation (RAG). This is the process by which we give a language model some external sources (such as a contract). The external sources are intended to enhance the model's context, giving it a more comprehensive understanding of a topic. The model should then give us more accurate responses to the questions we ask it on the topic.
Now, a general purpose model like Chat-GPT might be able to answer questions about contracts with RAG, but what if we instead used a model that's been trained and fine-tuned specifically on contract data?
Our AI model 🤖
For this example, we'll be using LLMWare's dragon-yi-6b-gguf model. This model is RAG-finetuned for fact-based question-answering on complex business and legal documents.
This means that it is specialized in giving us short and concise responses to questions involving documents like contracts. This makes it perfect for our example!
This is also a GGUF quantized model, meaning that it is a smaller and faster version of the original 6 billion parameter dragon-yi-6b model. Fortunately for us, this means that we can run it on a CPU 💻 without the need for powerful computational resources like GPUs!
Now, let's look at an example of using the llmware
library for contract analysis from start to end!
Step 1: Loading in files 📁
Let's start off by loading in our contracts to be analyzed. The llmware
library provides sample contracts via the Setup
class, but you can also use your own files in this example by replacing the agreements_path
below.
local_path = Setup().load_sample_files()
agreements_path = os.path.join(local_path, "AgreementsLarge")
Here, we load in the AgreementsLarge
set of files.
Next, we'll create a Library
object and add our sample files to this library. An llmware
library breaks documents down into text chunks and stores them in a database so that we can access them easily later.
msa_lib = Library().create_new_library("msa_lib503_635")
msa_lib.add_files(agreements_path)
Step 2: Locating MSA files 🔍
Let's say that we want to consider only MSA (master services agreements) files from our sample contracts.
We can first create a Query
object containing all our files, and then run a text_search_by_page()
to filter only the files that contain "master services agreement" on their front page.
q = Query(msa_lib)
query = "master services agreement"
results = q.text_search_by_page(query, page_num=1, results_only=False)
msa_docs = results["file_source"]
The results
from the text search will be a dictionary containing detailed information about the text query. However, we're only interested in the file_source
key representing the file names.
Great! We now have our MSA files.
Step 3: Loading our model 🪫🔋
Now, we can load in our model using the Prompt
class in the llmware
library.
model_name = "llmware/dragon-yi-6b-gguf"
prompter = Prompt().load_model(model_name)
Step 4: Analyzing our files using AI 🧠💡
Let's now iterate over our MSA files, and for each file, we'll:
- identify the text chunks containing the word "termination",
- add those chunks as a source for our AI call, and
- run the AI call "What is the notice for termination for convenience?"
We can start by performing a text query for the word "termination".
for i, docs in enumerate(msa_docs):
doc_filter = {"file_source": [docs]}
termination_provisions = q.text_query_with_document_filter("termination", doc_filter)
We'll then add these termination_provisions
as a source to our model.
sources = prompter.add_source_query_results(termination_provisions)
And with that done, we can call the LLM and ask it our question.
response = prompter.prompt_with_source("What is the notice for termination for convenience?")
Results! ✅
Let's print out our response
and see what the output looks like.
for i, resp in enumerate(response):
print("update: llm response - ", resp)
Here's what the output of our code looks like:
What we have is a Python dictionary with several keys, notably:
-
llm_response
: giving us the answer to our question, which here is "30 days written notice" -
evidence
: giving us the text where the model found the answer to the question
The dictionary also contains detailed information about the metadata of the AI call, but these are not relevant to our example and have been omitted from the output above.
Human in the loop! 👤
We're not done just yet! If we wanted to generate a CSV report for a human to review the results of our analysis, we can make use of the HumanInTheLoop
class. All we need to do is save the current state of our prompter
and call the export_current_interaction_to_csv()
function.
prompter.save_state()
csv_output = HumanInTheLoop(prompter).export_current_interaction_to_csv()
Conclusion
And that brings us to the end of our example! To summarize, we used the llmware
library to:
- Load in sample files
- Filter only the MSA files
- Use the dragon-yi-6b-gguf model to ask questions about termination provisions.
And remember that we did all of this on just a CPU! 💻
Check out our YouTube video on this example!
If you made it this far, thank you for taking the time to go through this topic with us ❤️! For more content like this, make sure to visit our dev.to page.
The source code for many more examples like this one are on our GitHub. Find this example here.
Our repository also contains a notebook for this example that you can run yourself using Google Colab, Jupyter or any other platform that supports .ipynb notebooks.
Join our Discord to interact with a growing community of AI enthusiasts of all levels of experience!
Please be sure to visit our website llmware.ai for more information and updates.
Top comments (1)
Some comments have been hidden by the post's author - find out more