DEV Community: Simon Risman

🚀Supercharged SLIM models Multistep RAG analysis that never leaves your CPU🧑‍💻

Simon Risman — Tue, 02 Jul 2024 13:30:11 +0000

Many of us are used to models running in the cloud, sending API calls to far-away servers, filed away as training data for the next wave of GPTs. And how else would this even work? Surely an individual laptop just doesn't have the power to manage and execute the workflows that a cloud based service does.

Consider, for a moment, the mighty ant. At first glance, it may seem insignificant—a mere speck in the grand tapestry of nature. Yet, beneath its tiny exterior lies a powerhouse of strength, resilience, and ingenuity.

Enter SLIM - Structured Language Instruction Models.🏋️

These models are tiny and run comfortably on a CPU, but pack a punch when it comes to providing specialized, structured outputs. Instead of an AI summary being more bullet points or god forbid paragraphs, SLIM models output a variety of structured data like CSVs, JSONs, and SQL.

The highly specialized nature of the SLIM models is precisely what makes them so powerful - instead of a general solution to a large problem, stringing together a few SLIM models yields more robust performance with greater flexibility.

To show just how much these models can do, we are going to take a look at a tech tale worthy of invoking Gavin Belson: The partnership-turned-rivalry between Microsoft and IBM.

🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜

0️⃣ Setup 🛠️

Make sure you have installed llmware and imported the libraries we are going to use. The code below should get you all set up.

Run this command in your terminal

pip install llmware

Add these imports to the top of your code.

import os
import shutil

from llmware.agents import LLMfx
from llmware.library import Library
from llmware.retrieval import Query
from llmware.configs import LLMWareConfig
from llmware.setup import Setup

1️⃣ Build a Knowledge Base of Microsoft Documents 📖

First we need to create a database to query. In your case it can be anything from customer service reports to earnings calls, but for now we will use a range of Microsoft-related documents.

def multistep_analysis():

    """ In this example, our objective is to research Microsoft history and rivalry in the 1980s with IBM. """

    #   step 1 - assemble source documents and create library

    print("update: Starting example - agent-multistep-analysis")

    #   note: the program attempts to automatically pull sample document into local path
    #   depending upon permissions in your environment, you may need to set up directly
    #   if you pull down the samples files with Setup().load_sample_files(), in the Books folder,
    #   you will find the source: "Bill-Gates-Biography.pdf"
    #   if you have pulled sample documents in the past, then to update to latest: set over_write=True

    print("update: Loading sample files")

    sample_files_path = Setup().load_sample_files(over_write=False)
    bill_gates_bio = "Bill-Gates-Biography.pdf"
    path_to_bill_gates_bio = os.path.join(sample_files_path, "Books", bill_gates_bio)

    microsoft_folder = os.path.join(LLMWareConfig().get_tmp_path(), "example_microsoft")

    print("update: attempting to create source input folder at path: ", microsoft_folder)

    if not os.path.exists(microsoft_folder):
        os.mkdir(microsoft_folder)
        os.chmod(microsoft_folder, 0o777)
        shutil.copy(path_to_bill_gates_bio,os.path.join(microsoft_folder, bill_gates_bio))

    #   create library
    print("update: creating library and parsing source document")

    LLMWareConfig().set_active_db("sqlite")
    my_lib = Library().create_new_library("microsoft_history_0210_1")
    my_lib.add_files(microsoft_folder)

2️⃣ Locate Mentions of IBM and Create an Agent to Process Them 🔍

In our first pass we focus on any mention of IBM, and since we have a multistep process we can analyse these instances on a more granular level.

query = "ibm"
    search_results = Query(my_lib).text_query(query)
    print(f"update: executing query to filter to key passages - {query} - results found - {len(search_results)}")

    #   create an agent and load several tools that we will be using
    agent = LLMfx()
    agent.load_tool_list(["sentiment", "emotions", "topic", "tags", "ner", "answer"])

    #   load the search results into the agent's work queue
    agent.load_work(search_results)

3️⃣ Pick out Negative Sentiment 🫳

This is where you get to decide the depth of your analysis for each item. For our scenario, we want only mentions of IBM that carry negative sentiment (evidence of the rivalry.)

    while True:

        agent.sentiment()

        if not agent.increment_work_iteration():
            break

    #   analyze sections where the sentiment on ibm was negative
    follow_up_list = agent.follow_up_list(key="sentiment", value="negative")

4️⃣ Deep Dive Analysis 🤿

Now that we have picked out the instances we want to explore further, we arm our agent with tools - each tool is a SLIM model built to perform at the highest level on each individual task, providing a comprehensive overview of the pertinent results.

for job_index in follow_up_list:

        # follow-up 'deep dive' on selected text that references ibm negatively
        agent.set_work_iteration(job_index)
        agent.exec_multitool_function_call(["tags", "emotions", "topics", "ner"])
        agent.answer("What is a brief summary?", key="summary")

    my_report = agent.show_report(follow_up_list)

    activity_summary = agent.activity_summary()

    for entries in my_report:
        print("my report entries: ", entries)

    return my_report

Results 🎉🎉🎉

Your multi-step local RAG model should return a filled out dictionary that looks something like this:

report 1 entries:  {'sentiment': ['negative'], 'tags': '["IBM", "COBOL", "PL/1", "BAL", "OS/2", "Presentation Manager", "K.", "OS/2 1.0", "December 1987", "1.0"]', 'emotions': ['anger'], 'topics': ['ibm'], 'people': [], 'organization': ['IBM'], 'misc': ['OS/2', 'Presentation Manager'], 'summary': ['•IBM wrote "clunky" code that was top-heavy with lines of documentation to make the software "easy to service."\t\t•IBM wrote "clunky" code that was top-heavy with lines of documentation to make the software "easy to service."\t\t•IBM wrote "clunky" code that was top-heavy with lines of documentation to make the software "easy to service."\t\t•IBM wrote'], 'source': {'query': 'ibm', '_id': '174', 'text': 'writers were contemptuous of IBM and it\'s coding   culture. In the increasingly irrelevant world of IBM, the classical   languages were COBOL, PL/1, and BAL (Basic Assembly Language),   NOT C!    J.    In addition, IBM wrote "clunky" code that was top-heavy with lines of   documentation to make the software "easy to service."   K.    Finally, in December 1987 OS/2 1.0 without Presentation Manager ', 'doc_ID': 1, 'block_ID': 173, 'page_num': 35, 'content_type': 'text', 'author_or_speaker': 'IBM_User', 'special_field1': '', 'file_source': 'Bill-Gates-Biography.pdf', 'added_to_collection': 'Mon Jul  1 13:14:36 2024', 'table': '', 'coords_x': 162, 'coords_y': 414, 'coords_cx': 34, 'coords_cy': 45, 'external_files': '', 'score': -4.040003091801133, 'similarity': 0.0, 'distance': 0.0, 'matches': [[29, 'ibm'], [100, 'ibm'], [215, 'ibm']], 'account_name': 'llmware', 'library_name': 'microsoft_history_0210_1'}}

The beauty of the output is the structured nature. You could easily write a program to hand off your report to, a program that wouldn't need to waste precious time parsing natural language and could just flip to the right part of the dictionary. Besides saving time, you also increase accuracy and consistency.

If you want to learn more, below is a video walkthrough for this tutorial.

The full code for this example can be found in our Github repo.

If you have any questions, or would like to learn more about LLMWARE, come to our Discord community. Click here to join. See you there!🚀🚀🚀

Please be sure to visit our website llmware.ai for more information and updates.

Are we all prompting wrong? Balancing Creativity and Consistency in RAG.

Simon Risman — Mon, 17 Jun 2024 18:44:07 +0000

For a Boston native like myself, there are few things more heartwarming than Artificial Intelligence understanding the brilliance of Good Will Hunting. A few cursory prompts reveal that it views it as a "must-watch tale of redemption and self discovery".

But a slightly closer look reveals what many users of LLMs have accepted as a given - slight variations on an otherwise consistent topic. This is the result of Stochastic Generation.

Stochastic generation 🤖

This is a fairly common term, from online bootcamps to college lectures, students of AI are familiar with this concept. For those who need a quick refresher, here is the 3-step generation loop that many LLMs follow.

LLMs are trained using a next-token prediction task, where the model predicts the next token in a sequence based on the previous tokens. This process involves:

Tokenized Input: The input text is converted into a sequence of numbers (tokens).
Probability Distribution: The model generates a probability distribution over the possible next tokens.
Sampling Algorithm: This distribution is passed through a sampling algorithm to select the next token.

The probabilistic elements that this process introduces enables LLMs to generate more captivating dialogue, novel images, and creatively praise award-winning films.

Randomness and RAG 🎰

When building RAG based applications, we are often not as concerned with creativity as we are with facts. When dealing with facts, we want as little probability involved as possible. In other words, instead of sampling a probability distribution, its beneficial to just take the token with the maximum likelihood every time.

LLMWARE allows you to explore how random your generated results are, as well as augment how random you want them to be. Heres a quick demonstration:

Demo 🙌

Load the model

model = ModelCatalog().load_model("bling-stablelm-3b-tool",
                                  sample=True,
                                  temperature=0.3,
                                  get_logits=True,
                                  max_output=123)

In the load_model method, we make a few important selections. The bling 3B is one of our newest and highest performing models.

Setting the sample attribute to True or False will allow you to change between a stochastic approach and a top-token model.

The temperature can be an important tool to control the randomness of the output, with lower values making responses more focused and higher values increasing diversity in the generated text.

These key settings will allow you to see what kind of approach you want to take when it comes to the probabilistic nature of your model.

Run a simple inference model on some sample text

response = model.inference("What is a list of the key points?", sample)

This step is where your model is doing the heavy lifting, analyzing and summarizing the loaded-in documents.

Run a sampling analysis

sampling_analysis = ModelCatalog().analyze_sampling(response)
print("sampling analysis: ", sampling_analysis)

Now you get to see the analytics - giving you a better idea of how heavily your model samples from the lower-probability side of the distribution.

This analysis will include what percentage of the tokens selected by the model were also the highest probability output, and will note cases where the not-top-token was selected.

In cases where the top token was not selected, the below code will print out the exact entries of the outputs, including their token rank.

for i, entries in enumerate(sampling_analysis["not_top_tokens"]):
    print("sampled choices: ", i, entries)

All these tools can help you make an informed decision on whether you want your model to think a little outside the box, or stick to the most likely answer. To see this process in action, check out our youtube video on consistent LLM output generation.

The full code for this example can be found in our Github repo.

If you have any questions, or would like to learn more about LLMWARE, come to our Discord community. Click here to join. See you there!🚀🚀🚀

Please be sure to visit our website llmware.ai for more information and updates.