<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Simon Risman</title>
    <description>The latest articles on DEV Community by Simon Risman (@simon_risman_1991f73692bc).</description>
    <link>https://dev.to/simon_risman_1991f73692bc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1586924%2F8ae5f214-f7c1-45fa-a550-cc17159bd85e.jpg</url>
      <title>DEV Community: Simon Risman</title>
      <link>https://dev.to/simon_risman_1991f73692bc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/simon_risman_1991f73692bc"/>
    <language>en</language>
    <item>
      <title>🚀Supercharged SLIM models Multistep RAG analysis that never leaves your CPU🧑‍💻</title>
      <dc:creator>Simon Risman</dc:creator>
      <pubDate>Tue, 02 Jul 2024 13:30:11 +0000</pubDate>
      <link>https://dev.to/llmware/supercharged-slim-models-multistep-rag-analysis-that-never-leaves-your-cpu-26j0</link>
      <guid>https://dev.to/llmware/supercharged-slim-models-multistep-rag-analysis-that-never-leaves-your-cpu-26j0</guid>
      <description>&lt;p&gt;Many of us are used to models running in the cloud, sending API calls to far-away servers, filed away as training data for the next wave of GPTs. And how else would this even work? Surely an individual laptop just doesn't have the power to manage and execute the workflows that a cloud based service does. &lt;/p&gt;

&lt;p&gt;Consider, for a moment, the mighty ant. At first glance, it may seem insignificant—a mere speck in the grand tapestry of nature. Yet, beneath its tiny exterior lies a powerhouse of strength, resilience, and ingenuity. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExaW1xM2hjeWRpb3MzZDVrNmFyMmZwNW82dGFxcHoxcnVoank1b3UzZiZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/26BRsfBU7ct4jgaCQ/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExaW1xM2hjeWRpb3MzZDVrNmFyMmZwNW82dGFxcHoxcnVoank1b3UzZiZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/26BRsfBU7ct4jgaCQ/giphy.gif" width="400" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Enter &lt;strong&gt;SLIM&lt;/strong&gt; - &lt;strong&gt;S&lt;/strong&gt;tructured &lt;strong&gt;L&lt;/strong&gt;anguage &lt;strong&gt;I&lt;/strong&gt;nstruction &lt;strong&gt;M&lt;/strong&gt;odels.🏋️
&lt;/h2&gt;

&lt;p&gt;These models are tiny and run comfortably on a CPU, but pack a punch when it comes to providing specialized, structured outputs. Instead of an AI summary being more bullet points or god forbid paragraphs, SLIM models output a variety of structured data like CSVs, JSONs, and SQL. &lt;/p&gt;

&lt;p&gt;The highly specialized nature of the SLIM models is precisely what makes them so powerful - instead of a general solution to a large problem, stringing together a few SLIM models yields more robust performance with greater flexibility.&lt;/p&gt;

&lt;p&gt;To show just how much these models can do, we are going to take a look at a tech tale worthy of invoking Gavin Belson: The partnership-turned-rivalry between Microsoft and IBM.&lt;/p&gt;

&lt;p&gt;🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜&lt;/p&gt;

&lt;h2&gt;
  
  
  0️⃣ Setup 🛠️
&lt;/h2&gt;

&lt;p&gt;Make sure you have installed llmware and imported the libraries we are going to use. The code below should get you all set up. &lt;/p&gt;

&lt;p&gt;Run this command in your terminal&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install llmware
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add these imports to the top of your code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;shutil&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llmware.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLMfx&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llmware.library&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Library&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llmware.retrieval&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Query&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llmware.configs&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLMWareConfig&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llmware.setup&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Setup&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  1️⃣ Build a Knowledge Base of Microsoft Documents 📖
&lt;/h2&gt;

&lt;p&gt;First we need to create a database to query. In your case it can be anything from customer service reports to earnings calls, but for now we will use a range of Microsoft-related documents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;multistep_analysis&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;

    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt; In this example, our objective is to research Microsoft history and rivalry in the 1980s with IBM. &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;#   step 1 - assemble source documents and create library
&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update: Starting example - agent-multistep-analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;#   note: the program attempts to automatically pull sample document into local path
&lt;/span&gt;    &lt;span class="c1"&gt;#   depending upon permissions in your environment, you may need to set up directly
&lt;/span&gt;    &lt;span class="c1"&gt;#   if you pull down the samples files with Setup().load_sample_files(), in the Books folder,
&lt;/span&gt;    &lt;span class="c1"&gt;#   you will find the source: "Bill-Gates-Biography.pdf"
&lt;/span&gt;    &lt;span class="c1"&gt;#   if you have pulled sample documents in the past, then to update to latest: set over_write=True
&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update: Loading sample files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;sample_files_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Setup&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;load_sample_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;over_write&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bill_gates_bio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bill-Gates-Biography.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;path_to_bill_gates_bio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_files_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Books&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bill_gates_bio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;microsoft_folder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;LLMWareConfig&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get_tmp_path&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;example_microsoft&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update: attempting to create source input folder at path: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;microsoft_folder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;microsoft_folder&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mkdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;microsoft_folder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chmod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;microsoft_folder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mo"&gt;0o777&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path_to_bill_gates_bio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;microsoft_folder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bill_gates_bio&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;#   create library
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update: creating library and parsing source document&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nc"&gt;LLMWareConfig&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;set_active_db&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlite&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;my_lib&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Library&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;create_new_library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;microsoft_history_0210_1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;my_lib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;microsoft_folder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2️⃣ Locate Mentions of IBM and Create an Agent to Process Them 🔍
&lt;/h2&gt;

&lt;p&gt;In our first pass  we focus on any mention of IBM, and since we have a multistep process we can analyse these instances on a more granular level.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ibm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;search_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_lib&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;text_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update: executing query to filter to key passages - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; - results found - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;#   create an agent and load several tools that we will be using
&lt;/span&gt;    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMfx&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_tool_list&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;emotions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;#   load the search results into the agent's work queue
&lt;/span&gt;    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3️⃣ Pick out Negative Sentiment 🫳
&lt;/h2&gt;

&lt;p&gt;This is where you get to decide the depth of your analysis for each item. For our scenario, we want only mentions of IBM that carry negative sentiment (evidence of the rivalry.)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sentiment&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;increment_work_iteration&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="c1"&gt;#   analyze sections where the sentiment on ibm was negative
&lt;/span&gt;    &lt;span class="n"&gt;follow_up_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;follow_up_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;negative&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4️⃣ Deep Dive Analysis 🤿
&lt;/h2&gt;

&lt;p&gt;Now that we have picked out the instances we want to explore further, we arm our agent with tools - each tool is a SLIM model built to perform at the highest level on each individual task, providing a comprehensive overview of the pertinent results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;job_index&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;follow_up_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

        &lt;span class="c1"&gt;# follow-up 'deep dive' on selected text that references ibm negatively
&lt;/span&gt;        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_work_iteration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec_multitool_function_call&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;emotions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is a brief summary?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;my_report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;follow_up_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;activity_summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;activity_summary&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;my_report&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my report entries: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;my_report&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Results 🎉🎉🎉
&lt;/h2&gt;

&lt;p&gt;Your multi-step local RAG model should return a filled out dictionary that looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;report 1 entries:  {'sentiment': ['negative'], 'tags': '["IBM", "COBOL", "PL/1", "BAL", "OS/2", "Presentation Manager", "K.", "OS/2 1.0", "December 1987", "1.0"]', 'emotions': ['anger'], 'topics': ['ibm'], 'people': [], 'organization': ['IBM'], 'misc': ['OS/2', 'Presentation Manager'], 'summary': ['•IBM wrote "clunky" code that was top-heavy with lines of documentation to make the software "easy to service."\t\t•IBM wrote "clunky" code that was top-heavy with lines of documentation to make the software "easy to service."\t\t•IBM wrote "clunky" code that was top-heavy with lines of documentation to make the software "easy to service."\t\t•IBM wrote'], 'source': {'query': 'ibm', '_id': '174', 'text': 'writers were contemptuous of IBM and it\'s coding   culture. In the increasingly irrelevant world of IBM, the classical   languages were COBOL, PL/1, and BAL (Basic Assembly Language),   NOT C!    J.    In addition, IBM wrote "clunky" code that was top-heavy with lines of   documentation to make the software "easy to service."   K.    Finally, in December 1987 OS/2 1.0 without Presentation Manager ', 'doc_ID': 1, 'block_ID': 173, 'page_num': 35, 'content_type': 'text', 'author_or_speaker': 'IBM_User', 'special_field1': '', 'file_source': 'Bill-Gates-Biography.pdf', 'added_to_collection': 'Mon Jul  1 13:14:36 2024', 'table': '', 'coords_x': 162, 'coords_y': 414, 'coords_cx': 34, 'coords_cy': 45, 'external_files': '', 'score': -4.040003091801133, 'similarity': 0.0, 'distance': 0.0, 'matches': [[29, 'ibm'], [100, 'ibm'], [215, 'ibm']], 'account_name': 'llmware', 'library_name': 'microsoft_history_0210_1'}}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The beauty of the output is the structured nature. You could easily write a program to hand off your report to, a program that wouldn't need to waste precious time parsing natural language and could just flip to the right part of the dictionary. Besides saving time, you also increase accuracy and consistency. &lt;/p&gt;

&lt;p&gt;If you want to learn more, below is a video walkthrough for this tutorial.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/y4WvwHqRR60"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The full code for this example can be found in our &lt;a href="https://github.com/llmware-ai/llmware/blob/main/examples/SLIM-Agents/agent-multistep-analysis.py" rel="noopener noreferrer"&gt;Github repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you have any questions, or would like to learn more about LLMWARE, come to our Discord community. Click &lt;a href="https://discord.gg/6nNVdn7A" rel="noopener noreferrer"&gt;here&lt;/a&gt; to join. See you there!🚀🚀🚀&lt;/p&gt;

&lt;p&gt;Please be sure to visit our website &lt;a href="https://llmware.ai/" rel="noopener noreferrer"&gt;llmware.ai&lt;/a&gt; for more information and updates.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Are we all prompting wrong? Balancing Creativity and Consistency in RAG.</title>
      <dc:creator>Simon Risman</dc:creator>
      <pubDate>Mon, 17 Jun 2024 18:44:07 +0000</pubDate>
      <link>https://dev.to/llmware/are-we-all-prompting-wrong-balancing-creativity-and-consistency-in-rag-20fm</link>
      <guid>https://dev.to/llmware/are-we-all-prompting-wrong-balancing-creativity-and-consistency-in-rag-20fm</guid>
      <description>&lt;p&gt;For a Boston native like myself, there are few things more heartwarming than Artificial Intelligence understanding the brilliance of &lt;em&gt;Good Will Hunting&lt;/em&gt;. A few cursory prompts reveal that it views it as a "must-watch tale of redemption and self discovery". &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Figxw6cfugs9mb2ivue1i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Figxw6cfugs9mb2ivue1i.png" alt="Chat Will Hunting" width="800" height="683"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But a slightly closer look reveals what many users of LLMs have accepted as a given - slight variations on an otherwise consistent topic. This is the result of Stochastic Generation. &lt;/p&gt;

&lt;h2&gt;
  
  
  Stochastic generation 🤖
&lt;/h2&gt;

&lt;p&gt;This is a fairly common term, from online bootcamps to college lectures, students of AI are familiar with this concept. For those who need a quick refresher, here is the 3-step generation loop that many LLMs follow. &lt;/p&gt;

&lt;p&gt;LLMs are trained using a next-token prediction task, where the model predicts the next token in a sequence based on the previous tokens. This process involves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tokenized Input&lt;/strong&gt;: The input text is converted into a sequence of numbers (tokens).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Probability Distribution&lt;/strong&gt;: The model generates a probability distribution over the possible next tokens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sampling Algorithm&lt;/strong&gt;: This distribution is passed through a sampling algorithm to select the next token.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The probabilistic elements that this process introduces enables LLMs to generate more captivating dialogue, novel images, and creatively praise award-winning films. &lt;br&gt;
&lt;a href="https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExbHpta3l3aXZicWp4dDhjbjc1b3p5MjVmMnZvcGQ2d3FqMzNnZDF2ZyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/7pLv68ItwBaHS/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExbHpta3l3aXZicWp4dDhjbjc1b3p5MjVmMnZvcGQ2d3FqMzNnZDF2ZyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/7pLv68ItwBaHS/giphy.gif" width="480" height="270"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Randomness and RAG 🎰
&lt;/h2&gt;

&lt;p&gt;When building RAG based applications, we are often not as concerned with creativity as we are with facts. When dealing with facts, we want as little probability involved as possible. In other words, instead of sampling a probability distribution, its beneficial to just take the token with the maximum likelihood every time. &lt;/p&gt;

&lt;p&gt;LLMWARE allows you to explore how random your generated results are, as well as augment how random you want them to be. Heres a quick demonstration:&lt;/p&gt;
&lt;h2&gt;
  
  
  Demo 🙌
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Load the model&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ModelCatalog&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bling-stablelm-3b-tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                  &lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                  &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                  &lt;span class="n"&gt;get_logits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                  &lt;span class="n"&gt;max_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the load_model method, we make a few important selections. The bling 3B is one of our newest and highest performing models. &lt;/p&gt;

&lt;p&gt;Setting the sample attribute to True or False will allow you to change between a stochastic approach and a top-token model. &lt;/p&gt;

&lt;p&gt;The temperature can be an important tool to control the randomness of the output, with lower values making responses more focused and higher values increasing diversity in the generated text.&lt;/p&gt;

&lt;p&gt;These key settings will allow you to see what kind of approach you want to take when it comes to the probabilistic nature of your model. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Run a simple inference model on some sample text&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is a list of the key points?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This step is where your model is doing the heavy lifting, analyzing and summarizing the loaded-in documents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Run a sampling analysis&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;sampling_analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ModelCatalog&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;analyze_sampling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sampling analysis: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sampling_analysis&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you get to see the analytics - giving you a better idea of how heavily your model samples from the lower-probability side of the distribution.&lt;/p&gt;

&lt;p&gt;This analysis will include what percentage of the tokens selected by the model were also the highest probability output, and will note cases where the not-top-token was selected. &lt;/p&gt;

&lt;p&gt;In cases where the top token was not selected, the below code will print out the exact entries of the outputs, including their token rank.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sampling_analysis&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;not_top_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sampled choices: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All these tools can help you make an informed decision on whether you want your model to think a little outside the box, or stick to the most likely answer. To see this process in action, check out our youtube video on consistent LLM output generation.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/iXp1tj-pPjM"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The full code for this example can be found in our &lt;a href="https://github.com/llmware-ai/llmware/blob/main/examples/Models/adjusting_sampling_settings.py" rel="noopener noreferrer"&gt;Github repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you have any questions, or would like to learn more about LLMWARE, come to our Discord community. Click &lt;a href="https://discord.gg/6nNVdn7A" rel="noopener noreferrer"&gt;here&lt;/a&gt; to join. See you there!🚀🚀🚀&lt;/p&gt;

&lt;p&gt;Please be sure to visit our website &lt;a href="https://llmware.ai/" rel="noopener noreferrer"&gt;llmware.ai&lt;/a&gt; for more information and updates.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>python</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
