<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jeffrey Ip</title>
    <description>The latest articles on DEV Community by Jeffrey Ip (@guybuildingai).</description>
    <link>https://dev.to/guybuildingai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1101474%2Fb2892dba-7ad6-4ef0-9143-68588fda282a.jpeg</url>
      <title>DEV Community: Jeffrey Ip</title>
      <link>https://dev.to/guybuildingai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/guybuildingai"/>
    <language>en</language>
    <item>
      <title>‼️ Top 5 Arize AI Competitors in 2026 💥⚖️</title>
      <dc:creator>Jeffrey Ip</dc:creator>
      <pubDate>Wed, 19 Mar 2025 08:45:52 +0000</pubDate>
      <link>https://dev.to/guybuildingai/-top-5-arize-ai-competitors-alternatives-compared-30cp</link>
      <guid>https://dev.to/guybuildingai/-top-5-arize-ai-competitors-alternatives-compared-30cp</guid>
      <description>&lt;h1&gt;
  
  
  TL;DR 📚
&lt;/h1&gt;

&lt;p&gt;Arize AI is great for LLM observability. Depending on what you need though, its feature set might not always be ideal for every use case. If you care more about evaluating the performance of your LLM apps, you should be using something like Confident AI or Giskard, while for tracing and observability, there are other cheaper options such as Langsmith.&lt;/p&gt;

&lt;p&gt;Let's begin!&lt;/p&gt;



&lt;h1&gt;
  
  
  What Do People Like &amp;amp; Don't Like About Arize AI?
&lt;/h1&gt;

&lt;p&gt;Arize AI is a platform to monitor and evaluate LLM applications. It's main product, Phoenix, is great for debugging LLM applications such as AI agents (for customer support, for example), and can be used to evaluate their performances as well. Originally built for more ML focused workflows, they have since pivoted into focusing on LLMs since 2023.&lt;/p&gt;

&lt;p&gt;However, depending on your use case (and budget) 🚩, you may find that Arize AI may or may not be the right fit for your use case. In this article, we'll list out the top 5 alternatives that you must consider in 2025 before deciding whether Arize is right for you.&lt;/p&gt;



&lt;h1&gt;
  
  
  1. &lt;a href="https://www.confident-ai.com/" rel="noopener noreferrer"&gt;Confident AI&lt;/a&gt; - The Eval-First LLM Observability Platform
&lt;/h1&gt;

&lt;p&gt;Confident AI is an eval-first cloud platform for LLM observability. It's evals are powered by &lt;a href="https://github.com/confident-ai/deepeval" rel="noopener noreferrer"&gt;DeepEval&lt;/a&gt;, one of the world's most popular and adopted open-source LLM evaluation framework. It is well known for unit-testing LLM applications ✅&lt;/p&gt;
&lt;h2&gt;
  
  
  Key differences
&lt;/h2&gt;

&lt;p&gt;As the name suggest, it is most known for its laser focus on LLM evaluation-first observability. While Arize AI offers evaluations in its spans and traces during LLM observability through one-off debugging, Confident AI focuses on the custom benchmarking of LLM applications instead.&lt;/p&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;More controllable and customizable metrics&lt;/li&gt;
&lt;li&gt;Evaluation results are more accurate&lt;/li&gt;
&lt;li&gt;Easier for entire organizations to collaborate on testing LLMs&lt;/li&gt;
&lt;li&gt;Scales to LLM safety testing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With Confident AI, you're able to easily A|B test different iterations of your LLM application with a side-by-side, GitHub like diff view of all regressions and improvements. Arize AI, on the other hand, focuses more on one-off debugging.&lt;/p&gt;

&lt;p&gt;They also target slightly differently in the LLM development lifecycle. Arize is more for production monitoring while Confident AI for LLM evaluation before deployment. They both do the other part well as well however.&lt;/p&gt;
&lt;h2&gt;
  
  
  Side by side comparison summary
&lt;/h2&gt;

&lt;p&gt;We'll go down the feature list so you can make a more informed decision on which is best for you.&lt;/p&gt;
&lt;h3&gt;
  
  
  Metrics
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Confident AI&lt;/th&gt;
&lt;th&gt;Arize AI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Out-of-the-box metrics&lt;/td&gt;
&lt;td&gt;50+&lt;/td&gt;
&lt;td&gt;10+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAG metrics&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conversation (chatbot) metrics&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent metrics&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Research-backed custom metrics&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deterministic LLM-as-a-judge metrics&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open-source&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integrates with any LLM&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can be run locally in code&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can be run on the cloud&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto improves&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For open-source users, Confident AI allows you to use literally any LLM for evaluation metrics, whereas Arize AI metrics are limited to the LLMs available on their platform.&lt;/p&gt;
&lt;h3&gt;
  
  
  Platform
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Confident AI&lt;/th&gt;
&lt;th&gt;Arize AI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Evaluation&lt;/td&gt;
&lt;td&gt;50+&lt;/td&gt;
&lt;td&gt;10+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dataset Management&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Management&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metric Alignment&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human Feedback&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM Observability&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;From afar, no big differences here. Let's dive deeper into each feature on the platform.&lt;/p&gt;
&lt;h3&gt;
  
  
  Evaluation
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Confident AI&lt;/th&gt;
&lt;th&gt;Arize AI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Testing Report&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A/B Experimentation&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;🚧&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regression Testing&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Side-by-side evaluation comparisons&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Statistical metric scores analysis&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Publicly sharable testing report&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Advanced filtering for metrics/test cases&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human labelling for metrics&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metric score accuracy validation (confusion matrix)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scales to safety testing&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Although Arize supports LLM evaluation features, there's a lot of things that doesn't scale well into &amp;gt;100s of test cases. This means it will be harder to benchmark LLM applications that is required for experimentation and to satisfy external stakeholders through publicly sharable testing reports.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://confident-ai.com" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Visit Confident AI Website&lt;/a&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  Dataset management
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Confident AI&lt;/th&gt;
&lt;th&gt;Arize AI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100% DeepEval integration&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dataset editor&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Uploading datasets from CSV&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Push/pull datasets in code&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create datasets from production data&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create Datasets from testing reports&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comment on datasets&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PIT recovery&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dataset backup&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Revision history&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom columns&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAG support&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Finalized" flag&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Arize and Confident is mainly the same here. Confident does have a little bit of an edge in terms of dataset collaboration where comments can be left by domain experts while engineers can focus on building to ensure these test cases pass.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt management
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Confident AI&lt;/th&gt;
&lt;th&gt;Arize AI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100% DeepEval integration&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt editor&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt auto versioning&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dynamic prompt variables&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can be used for evaluation&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can be used for observability&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Arize has prompt management support but not as integrated for evaluation and observability.&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM observability
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Confident AI&lt;/th&gt;
&lt;th&gt;Arize AI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM output monitoring&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integrated LLM tracing&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom LLM tracing&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Has chatbot specific monitoring&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time evaluations&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human feedback leaving&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Advance filtering for prompts and models&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Advance filtering for custom properties&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Arize AI focuses more on deep, detailed debugging while Confident AI's observability is for monitoring the output of each LLM interaction, with tracing included.&lt;/p&gt;

&lt;h3&gt;
  
  
  Support, Security &amp;amp; Others
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Confident AI (Premium)&lt;/th&gt;
&lt;th&gt;Arize AI (Pro)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pricing&lt;/td&gt;
&lt;td&gt;Monthly&lt;/td&gt;
&lt;td&gt;Monthly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User roles &amp;amp; permissions&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SOC2 Type II&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HIPAA&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Retention&lt;/td&gt;
&lt;td&gt;1 year&lt;/td&gt;
&lt;td&gt;6 months&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Support&lt;/td&gt;
&lt;td&gt;Dedicated&lt;/td&gt;
&lt;td&gt;Community + email&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both providers are compliant however for their enterprise tier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which One Should You Chooose?
&lt;/h2&gt;

&lt;p&gt;Arize AI's great for debugging, while Confident AI is great for LLM evaluation and benchmarking. Both has their own strengths and weaknesses, and has overlap in features, but it ultimately depends whether you care more about evaluation or observability.&lt;/p&gt;

&lt;p&gt;If you want to do both, go for Confident AI, since LLM observability is the same for most providers anyway.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://confident-ai.com" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Visit Confident AI Website&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  2. Giskard - Secure your LLM Agents
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Primary Use Case: Testing and debugging LLMs before deployment&lt;/li&gt;
&lt;li&gt;Features:

&lt;ul&gt;
&lt;li&gt;Focuses on pre-deployment testing and model validation.&lt;/li&gt;
&lt;li&gt;Helps identify biases, vulnerabilities, and errors in LLMs before production.&lt;/li&gt;
&lt;li&gt;Provides automated testing and explainability tools.&lt;/li&gt;
&lt;li&gt;Can be used for unit testing LLMs, similar to software testing frameworks.&lt;/li&gt;
&lt;li&gt;Helps ensure compliance with AI safety and fairness guidelines.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ideal for:&lt;/strong&gt; LLM teams who want to debug models, ensure robustness, and prevent issues before deployment.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;Key Differences&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Arize AI&lt;/th&gt;
&lt;th&gt;Giskard&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Focus Area&lt;/td&gt;
&lt;td&gt;Production monitoring &amp;amp; observability&lt;/td&gt;
&lt;td&gt;Pre-deployment testing &amp;amp; debugging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Drift Detection&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bias &amp;amp; Fairness Testing&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Root Cause Analysis&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Explainability&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automated LLM Testing&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance &amp;amp; Safety Checks&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Which One Should You Choose?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;If you need to monitor production LLMs for drift and performance degradation, go with Arize AI.&lt;/li&gt;
&lt;li&gt;If you need to test and debug LLMs before deployment, go with Giskard.&lt;/li&gt;
&lt;li&gt;If you need both testing and monitoring, you might consider using both together.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://www.giskard.ai/" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;Visit Giskard Website&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  3. Lunary - AI Developer Platform
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Primary Use Case: LLM &lt;strong&gt;chatbot&lt;/strong&gt; observability, evaluation, and debugging&lt;/li&gt;
&lt;li&gt;Features:

&lt;ul&gt;
&lt;li&gt;Provides logging, monitoring, and analytics for LLM chatbots.&lt;/li&gt;
&lt;li&gt;Tracks conversation history, user feedback, and model performance.&lt;/li&gt;
&lt;li&gt;Supports prompt versioning, management, and collaboration.&lt;/li&gt;
&lt;li&gt;Measures cost, latency, token usage, and model performance metrics.&lt;/li&gt;
&lt;li&gt;Offers both cloud-hosted and self-hosted deployment options.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Ideal for: Teams developing and deploying LLM chatbots who need monitoring, evaluation, and debugging capabilities.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Key Differences
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Arize AI&lt;/th&gt;
&lt;th&gt;Lunary&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Focus Area&lt;/td&gt;
&lt;td&gt;LLM agents&lt;/td&gt;
&lt;td&gt;LLM chatbots&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Root Cause Analysis&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging and Tracing&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Versioning&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost and Token Tracking&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automated LLM Testing&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance and Security&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ (SOC 2, ISO 27001)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Which One Should You Choose?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;If you need to track, debug, and evaluate LLM applications with logging, analytics, and user feedback, choose Lunary. It helps teams iterate on prompts, detect hallucinations, and analyze costs before and after deployment.
&lt;/li&gt;
&lt;li&gt;If you need a solution focused on production monitoring with real-time performance tracking and drift detection, choose Arize AI. It is designed for LLM observability at scale, ensuring models remain reliable in deployment.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://lunary.ai/" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;Visit Lunary Website&lt;/a&gt;
&lt;/p&gt;



&lt;h2&gt;
  
  
  4. Datadog - Modern monitoring &amp;amp; security
&lt;/h2&gt;

&lt;p&gt;Datadog is not LLM specific, but it does offer some good features compared to Arize AI.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Primary Use Case: General application monitoring, logging, and infrastructure observability
&lt;/li&gt;
&lt;li&gt;Features:

&lt;ul&gt;
&lt;li&gt;Provides monitoring for servers, databases, and cloud services with real-time dashboards.
&lt;/li&gt;
&lt;li&gt;Supports log management, distributed tracing, and security monitoring across applications.
&lt;/li&gt;
&lt;li&gt;Detects anomalies and performance bottlenecks in system infrastructure.
&lt;/li&gt;
&lt;li&gt;Offers alerting and automated incident response for system failures.
&lt;/li&gt;
&lt;li&gt;Integrates with various cloud providers, DevOps tools, and microservices architectures.
&lt;/li&gt;
&lt;li&gt;Focuses on infrastructure observability rather than model-specific insights.
&lt;/li&gt;
&lt;li&gt;Weaker than Arize AI when it comes to LLM evaluation, as it lacks built-in model performance tracking, data drift detection, and detailed LLM-specific analytics.
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Key differences
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Datadog&lt;/th&gt;
&lt;th&gt;Arize AI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Focus Area&lt;/td&gt;
&lt;td&gt;Infrastructure and application monitoring&lt;/td&gt;
&lt;td&gt;LLM observability and performance tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model Performance Monitoring&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM Drift Detection&lt;/td&gt;
&lt;td&gt;🚧&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging and Tracing&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Root Cause Analysis&lt;/td&gt;
&lt;td&gt;🚧&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security and Compliance&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Application Performance Monitoring&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM Evaluation and Debugging&lt;/td&gt;
&lt;td&gt;🚧&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Which one should you choose?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;If you need to monitor system infrastructure, application performance, and security events, choose Datadog. It is best suited for DevOps and cloud-native applications that require end-to-end observability.
&lt;/li&gt;
&lt;li&gt;If you need to monitor LLMs in production, detect model drift, and analyze performance issues, choose Arize AI. Arize is significantly stronger in LLM evaluation, providing model-specific insights, drift detection, and performance tracking that Datadog lacks.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://www.datadoghq.com/" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;Visit Datadog Website&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  5. MLFlow - ML and GenAI made simple
&lt;/h1&gt;

&lt;p&gt;As the name suggest, MLFlow is undecided on whether to focus on traditional ML or GenAI. I would not recommend MLFlow if you don't have traditional ML workflows to satisfy for this reason.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Primary Use Case: Experiment tracking, model management, and deployment&lt;/li&gt;
&lt;li&gt;Features:

&lt;ul&gt;
&lt;li&gt;Tracks and logs ML experiments, including parameters, metrics, and artifacts.&lt;/li&gt;
&lt;li&gt;Provides a central model registry for versioning and managing models.&lt;/li&gt;
&lt;li&gt;Supports model packaging for deployment in multiple environments.&lt;/li&gt;
&lt;li&gt;Enables reproducibility by logging code, dependencies, and environment configurations.&lt;/li&gt;
&lt;li&gt;Integrates with various ML frameworks, including TensorFlow, PyTorch, and Scikit-learn.&lt;/li&gt;
&lt;li&gt;Allows deployment of models to cloud services, on-premises, and edge devices.&lt;/li&gt;
&lt;li&gt;Offers APIs and a UI for tracking and managing experiments.&lt;/li&gt;
&lt;li&gt;Supports collaborative workflows for ML teams.&lt;/li&gt;
&lt;li&gt;Provides lifecycle management for ML models, from development to production.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Which One Should You Choose?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;If you need to track experiments, manage model versions, and handle deployments, choose MLflow. It is best suited for the early stages of the LLM lifecycle, helping teams develop, iterate, and manage models before deployment.&lt;/li&gt;
&lt;li&gt;If you need to monitor LLMs in production, detect performance issues, and analyze model drift, choose Arize AI. It is specifically designed for LLM observability, helping teams detect data drift, hallucinations, and degradation over time.&lt;/li&gt;
&lt;li&gt;If your workflow involves both training and production monitoring, consider using MLflow for experiment tracking and Arize AI for post-deployment monitoring.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://mlflow.org/" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;Visit MLFlow Website&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;So there you have it, the list of the top 5 Arize AI alternatives in 2025. Think there's something I've missed? Comment below to let me know!&lt;/p&gt;

&lt;p&gt;Thank you for reading, and till next time 😊&lt;/p&gt;

</description>
    </item>
    <item>
      <title>🚨🏆 Top 5 Open-source Alternatives for LLM Development You Must Know About 💥</title>
      <dc:creator>Jeffrey Ip</dc:creator>
      <pubDate>Wed, 13 Nov 2024 11:55:27 +0000</pubDate>
      <link>https://dev.to/guybuildingai/top-5-open-source-alternatives-for-llm-development-you-must-know-about-p30</link>
      <guid>https://dev.to/guybuildingai/top-5-open-source-alternatives-for-llm-development-you-must-know-about-p30</guid>
      <description>&lt;h1&gt;
  
  
  TL:DR
&lt;/h1&gt;

&lt;p&gt;I'm not a fan of closed-source, especially when it comes to LLM application development 👎 But thankfully, for every closed-source framework in nature there ought to be an equal open-source counterpart (after all, isn't this the third law of some famous scientist🤔?).&lt;/p&gt;

&lt;p&gt;So in this article, as someone who have soaked and bathed in the LLM development rabbit hole 🐇 for more than two years, I'm going to let you know five most important open-source alternatives for closed-source LLM development solutions. &lt;/p&gt;

&lt;p&gt;Here we go!🙌&lt;br&gt;
&lt;/p&gt;


&lt;p&gt;&lt;em&gt;(PS. please star &lt;strong&gt;all&lt;/strong&gt; the open-source repos to help them gain awareness over their closed-source counterparts!)&lt;/em&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  1. &lt;a href="https://github.com/confident-ai/deepeval" rel="noopener noreferrer"&gt;DeepEval&lt;/a&gt; &amp;gt;&amp;gt; Humanloop
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;DeepEval is the open-source LLM evaluation framework&lt;/strong&gt;👍, while Humanloop, well you guessed it, is a closed-source LLM evaluation solution 👎, with hidden API endpoints, instead of open algorithms that you can see how evaluation is carried out.&lt;/p&gt;

&lt;p&gt;This is top of the list because in my opinion, nothing is more important than open LLM evaluation 💯. Openness allows for transparency, and transparency, especially in LLM development, allows everyone to see what the standard of evaluation is. You wouldn't want some LLM safety evaluation to be done behind close doors, while you simply get informed of the results, right?&lt;/p&gt;

&lt;p&gt;Open-source code gets scrutinized all the time, which helps DeepEval to be is much easier to use than Humanloop. Here's how to evaluate your LLM application in DeepEval:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;evaluate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AnswerRelevancyMetric&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.test_case&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLMTestCase&lt;/span&gt;

&lt;span class="n"&gt;test_case&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMTestCase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How many evaluation metrics does DeepEval offers?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;actual_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;14+ evaluation metrics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AnswerRelevancyMetric&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_cases&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;test_case&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star DeepEval on GitHub&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" alt="Github stars" width="800" height="292"&gt;&lt;/a&gt;&lt;br&gt;
(DeepEval's humble mascot wants a star)&lt;/p&gt;



&lt;h1&gt;
  
  
  2. &lt;a href="https://huggingface.co/meta-llama/Llama-3.1-8B" rel="noopener noreferrer"&gt;Llama3.1&lt;/a&gt; &amp;gt;&amp;gt; Open AI GPT-4
&lt;/h1&gt;

&lt;p&gt;I bet you saw this coming, but the next on the list is Meta's Llama3.1 vs OpenAI's GPT-4. Llama3.1 can be self-hosted, with much faster inference times and cheaper token costs than GPT-4, and the best part is it is open-source, with open-weights. What does this mean?&lt;/p&gt;

&lt;p&gt;This means if you want to customize Llama3.1, which by the way performs as good as GPT-4 on several benchmarks, you will be able to do it yourself. The millions (or billions?) of dollars Meta spent on training Llama3.1 can be leveraged by literally anyone, and available for fine-tuning.&lt;/p&gt;

&lt;p&gt;Use Llama3.1 today:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;

&lt;span class="n"&gt;model_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-3.1-8B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch_dtype&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hey how are you doing today?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;h1&gt;
  
  
  3. LangChain &amp;gt;&amp;gt; OpenAI Assistants
&lt;/h1&gt;

&lt;p&gt;I'm sorry OpenAI, it's you again. LangChain is an LLM application orchestration framework, while OpenAI Assistants is like a RAG API, with the inner orchestration logic hidden behind close doors.&lt;/p&gt;

&lt;p&gt;I know, I'm using big words and this sounds very fancy, but let me explain what each of those means. LLM orchestration simply means connecting external data to your LLM, and allowing your LLM to fetch data as it see fits by giving it access to your APIs. For example, a chatbot reporting the daily weather to you that's built on LangChain would allow the LLM to fetch the latest weather for today. In OpenAI assistants, they've hidden it all away behind an API.&lt;/p&gt;

&lt;p&gt;This means it's not as customizable, and quite frankly, I haven't met a single person who uses the Assistants feature even though it was hugely hyped up as the next big thing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/langchain-ai/langchain" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star on GitHub&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  4. Flowise &amp;gt;&amp;gt; Relevance AI
&lt;/h1&gt;

&lt;p&gt;So while LangChain allows you to build your LLM application in code. Flowise allows you to do it via an UI, in a open-source way. Simply click and drop to customize what data your LLM has access to, and you're pretty much good to go. &lt;/p&gt;

&lt;p&gt;The alternative? A slightly less pretty paid version if it.&lt;/p&gt;

&lt;p&gt;Look at Flowise:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzenqdt5x00r2h1f4eivy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzenqdt5x00r2h1f4eivy.png" alt="Image description" width="800" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/FlowiseAI/Flowise" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star on GitHub&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  5. Lite LLM &amp;gt;&amp;gt; Martian AI
&lt;/h1&gt;

&lt;p&gt;Lite LLM is an open-source library that allows you to switch out LLMs for LLMs in a single line of code, while Martian on the other hand, is a closed-source version of it.&lt;/p&gt;

&lt;p&gt;Ok not quite, in reality, although both allows you to switch out LLMs for LLMs, Martian AI is an LLM router, meaning it chooses the best LLM for each input to your LLM to optimize on accuracy, speed, and cost.&lt;/p&gt;

&lt;p&gt;In this rare occasion, I'd have to say both are pretty good products.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/BerriAI/litellm" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star on GitHub&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Martian AI here: &lt;a href="https://withmartian.com/" rel="noopener noreferrer"&gt;https://withmartian.com/&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;So there you have it, the list of open-source vs closed-source LLM development tools you should definitely know about. Think there's something I've missed? Comment below to let me know!&lt;/p&gt;

&lt;p&gt;Thank you for reading, and till next time 😊&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>🚨💥 Top 5 Trending Open-source LLM Tools &amp; Frameworks You Must Know About ✨🚀</title>
      <dc:creator>Jeffrey Ip</dc:creator>
      <pubDate>Tue, 29 Oct 2024 11:00:38 +0000</pubDate>
      <link>https://dev.to/guybuildingai/top-5-trending-open-source-llm-tools-frameworks-you-must-know-about-1fk7</link>
      <guid>https://dev.to/guybuildingai/top-5-trending-open-source-llm-tools-frameworks-you-must-know-about-1fk7</guid>
      <description>&lt;h1&gt;
  
  
  TL;DR
&lt;/h1&gt;

&lt;p&gt;"Just the other day, I was deciding which set of LLM tools to use to build my company's upcoming customer support chatbot, and it was the easiest decision of my life!" - &lt;strong&gt;said no one ever&lt;/strong&gt; 🚩🚩🚩&lt;/p&gt;

&lt;p&gt;It has been a while since gpt-4's release but still, it seems like every week a new open-source LLM framework is launched, each doing the same thing as its 50+ other competitors while desperately explaining how it is better than its predecessor. At the end of the day, what developers like yourself really want is some quick personal anecdotes to weigh out the pros and cons of each. 👨🏻‍💻&lt;/p&gt;

&lt;p&gt;So, as someone who played around with more than a dozen of open-source LLM tools, I'm going to tell you my top picks so you don't have to do the boring work yourself. 😌&lt;/p&gt;

&lt;p&gt;Let's begin!&lt;/p&gt;



&lt;h1&gt;
  
  
  1. &lt;a href="https://github.com/confident-ai/deepeval" rel="noopener noreferrer"&gt;DeepEval&lt;/a&gt; - The LLM Evaluation Framework
&lt;/h1&gt;

&lt;p&gt;DeepEval is the LLM tool that will &lt;strong&gt;help you quantify how well your LLM application, such as a customer support chatbot, is performing&lt;/strong&gt; 🎉 &lt;/p&gt;

&lt;p&gt;It takes top spot for &lt;strong&gt;two&lt;/strong&gt; simple reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Evaluating and testing LLM performance is IMO the most important part of building an LLM application.&lt;/li&gt;
&lt;li&gt;It is the best LLM evaluation framework available, and it's open-source 💯&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For those who might not be as familiar, &lt;strong&gt;LLM testing is hard because there are infinite possibilities in the responses an LLM can output&lt;/strong&gt;.😟 DeepEval makes testing LLM applications, such as those built with LlamaIndex or LangChain, extremely easy by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Offers 14+ research backed evaluation metrics&lt;/strong&gt; to test LLM applications built with literally any framework like LangChain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple to use&lt;/strong&gt;, great docs, and intuitive to understand. Perfect for those just getting started, but also technical enough for experts to dive deep into this rabbit hole.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrated with Pytest&lt;/strong&gt;, include it in your CI/CD pipeline for deployment checks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthetic dataset generation&lt;/strong&gt; - to help you get started with evaluation in case you don't have a dataset ready.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM safety scanning&lt;/strong&gt; - automatically scans for safety risks like your LLM app being bias, toxic, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After testing, simply go back to the LLM tool used for building your application (which I'll reveal my pick later) to iterate on areas that need improvement. Here's a quick example to test for how relevant your LLM chatbot responses are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;evaluate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AnswerRelevancyMetric&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.test_case&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLMTestCase&lt;/span&gt;

&lt;span class="n"&gt;test_case&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMTestCase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How many evaluation metrics does DeepEval offers?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;actual_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;14+ evaluation metrics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AnswerRelevancyMetric&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_cases&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;test_case&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star DeepEval on GitHub&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" alt="Github stars" width="800" height="292"&gt;&lt;/a&gt;&lt;br&gt;
(DeepEval's humble mascot wants a star)&lt;/p&gt;



&lt;h1&gt;
  
  
  2. LlamaIndex - Data Framework for LLM applications
&lt;/h1&gt;

&lt;p&gt;While DeepEval evaluates, LlamaIndex builds. LlamaIndex is a data framework specifically designed for integrating large language models (LLMs) with various data sources, particularly for applications involving retrieval-augmented generation (RAG).&lt;/p&gt;

&lt;p&gt;For those who haven't heard of RAG, it is the programmatic equivalent of pasting some text into ChatGPT and asking some questions about it. RAG simply helps your LLM application to be aware of context it is not aware of through the process of retrieval, and LlamaIndex makes this extremely easy.&lt;/p&gt;

&lt;p&gt;You see, a big problem in RAG is connecting to data sources and parsing unstructured data (like tables in PDFs) from them. It's not hard, but extremely tedious to build out.&lt;/p&gt;

&lt;p&gt;Here's an example of how you can use LlamaIndex to build a customer support chatbot to answer questions on your private data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.core&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;VectorStoreIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SimpleDirectoryReader&lt;/span&gt;

&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimpleDirectoryReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;load_data&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;VectorStoreIndex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;query_engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_query_engine&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query_engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Some question about the data should go here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/run-llama/llama_index" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star on GitHub&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  3. Ollama - Get up and running with large language models
&lt;/h1&gt;

&lt;p&gt;Evaluating and building is important, but what about data privacy?&lt;/p&gt;

&lt;p&gt;Ollama is an interesting one because it unlocks LLMs to be used locally. It allows users to run, customize, and interact with LLMs directly on their own hardware, which can improve privacy, reduce dependency on cloud providers, and optimize latency for certain use cases. Ollama streamlines working with open-source LLMs, making them more accessible and manageable for individuals and organizations without needing extensive machine learning expertise or cloud infrastructure.&lt;/p&gt;

&lt;p&gt;For instance, using Ollama, you might load a model for customer support automation that runs locally on company servers. This setup keeps customer data private and may reduce response latency compared to a cloud-based setup. Ollama is also suitable for experimentation with open-source LLMs, like fine-tuning models for specific tasks or integrating them into larger applications without relying on external cloud services.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# List available models
ollama list

# Run a model with a prompt (for example, running a GPT-4-like model named `gpt4-all`)
ollama run gpt4-all -p "Explain the benefits of using DSPy."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/ollama/ollama" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star on GitHub&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  4. Guidance
&lt;/h1&gt;

&lt;p&gt;Guidance is a framework designed to help developers craft dynamic, efficient prompts for large language models (LLMs). Unlike traditional prompt engineering, which often relies on fixed templates, Guidance allows prompts to be dynamically constructed, leveraging control structures like loops and conditionals directly within the prompt. This flexibility makes it especially useful for generating responses that require complex logic or customized outputs.&lt;/p&gt;

&lt;p&gt;A simple example is, customer Support Bots: Use conditionals to create prompts that adapt based on the customer’s question, providing personalized responses while maintaining consistency in tone and style instead of manual prompting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;guidance&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the Guidance model (e.g., OpenAI or another model API)
&lt;/span&gt;&lt;span class="n"&gt;gpt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;guidance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-3.5-turbo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# You can specify another model if available
&lt;/span&gt;
&lt;span class="c1"&gt;# Define the dynamic prompt with Guidance
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;guidance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
{{#if summary}}
Please provide a brief summary of the topic: {{topic}}.
{{else}}
Provide a detailed explanation of the topic: {{topic}}, covering all relevant details.
{{/if}}
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Set up input parameters
&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Machine Learning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# Toggle between True for summary or False for detailed response
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Run the prompt
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/guidance-ai/guidance" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star on GitHub&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  5. DSPy - Algorithmically optimize LM prompts and weights
&lt;/h1&gt;

&lt;p&gt;DSPy is designed to simplify the process of building applications that use LLMs, like those from OpenAI or Hugging Face. It makes it easier to manage how these models respond to inputs without needing to constantly adjust prompts or settings manually.&lt;/p&gt;

&lt;p&gt;The benefit of DSPy is that it simplifies and speeds up application development with large language models by separating logic from prompts, automating prompt tuning, and enabling flexible model switching. This means developers can focus on defining tasks rather than on technical details, making it easier to achieve reliable and consistent results.&lt;/p&gt;

&lt;p&gt;However, I've personally found DSPy hard to get started with, hence why it is the lowest on the list than the others.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/stanfordnlp/dspy" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star on GitHub&lt;/a&gt;
&lt;/p&gt;




&lt;p&gt;So there you have it, the list of top LLM open-source trending tools and frameworks on Github you should definitely use to build your next LLM application. Think there's something I've missed? Comment below to let me know!&lt;/p&gt;

&lt;p&gt;Thank you for reading, and till next time 😊&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>‼️ Top 5 Open-Source LLM Evaluation Frameworks in 2026 🎉🔥</title>
      <dc:creator>Jeffrey Ip</dc:creator>
      <pubDate>Wed, 17 Jan 2024 10:24:02 +0000</pubDate>
      <link>https://dev.to/guybuildingai/-top-5-open-source-llm-evaluation-frameworks-in-2024-98m</link>
      <guid>https://dev.to/guybuildingai/-top-5-open-source-llm-evaluation-frameworks-in-2024-98m</guid>
      <description>&lt;h1&gt;
  
  
  TL:DR
&lt;/h1&gt;

&lt;p&gt;"I feel like there are more LLM evaluation solutions out there than there are problems around LLM evaluation" - said Dylan, a Head of AI at a Fortune 500 company. &lt;/p&gt;

&lt;p&gt;And I couldn't agree more - it seems like every week there is a new open-source repo trying to do the same thing as the other 30+ frameworks that already exists. At the end of the day, what Dylan really wants is a framework, package, library, whatever you want to call it, that would simply quantify the performance of the LLM (application) he's looking to productionize.&lt;/p&gt;

&lt;p&gt;So, as someone who were once in Dylan's shoes, I've compiled a list of the top 5 LLM evaluation framework that exists in 2025 :) 😌&lt;/p&gt;

&lt;p&gt;Let's begin!&lt;/p&gt;



&lt;h1&gt;
  
  
  1. &lt;a href="https://github.com/confident-ai/deepeval" rel="noopener noreferrer"&gt;DeepEval&lt;/a&gt; - The Evaluation Framework for LLMs
&lt;/h1&gt;

&lt;p&gt;DeepEval is your favorite evaluation framework's favorite evaluation framework. It takes top spot for a variety of reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Offers &lt;strong&gt;14+ LLM evaluation metrics (both for RAG and fine-tuning use cases)&lt;/strong&gt;, updated with the latest research in the LLM evaluation field. These metrics include:

&lt;ul&gt;
&lt;li&gt;G-Eval&lt;/li&gt;
&lt;li&gt;Summarization&lt;/li&gt;
&lt;li&gt;Hallucination&lt;/li&gt;
&lt;li&gt;Faithfulness&lt;/li&gt;
&lt;li&gt;Contextual Relevancy&lt;/li&gt;
&lt;li&gt;Answer Relevancy&lt;/li&gt;
&lt;li&gt;Contextual Recall&lt;/li&gt;
&lt;li&gt;Contextual Precision&lt;/li&gt;
&lt;li&gt;RAGAS&lt;/li&gt;
&lt;li&gt;Bias&lt;/li&gt;
&lt;li&gt;Toxicity&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Most metrics are self-explaining, which means DeepEval's metrics will literally tell you why the metric score cannot be higher.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Offers &lt;strong&gt;modular components that is extremely simple to plug and use&lt;/strong&gt;. You can easily mix and match different metrics, or even use DeepEval to build your own evaluation pipeline if needed.&lt;/li&gt;
&lt;li&gt;Treats evaluations as unit tests. With &lt;strong&gt;an integration for Pytest&lt;/strong&gt;, DeepEval is a complete testing suite most developers are familiar with.&lt;/li&gt;
&lt;li&gt;Allows you to generate synthetic datasets using your knowledge base as context, or load datasets from CSVs, JSONs, or Hugging face.&lt;/li&gt;
&lt;li&gt;Offers a hosted platform with a generous free tier to &lt;strong&gt;run real-time evaluations in production&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With Pytest Integration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;assert_test&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HallucinationMetric&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.test_case&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLMTestCase&lt;/span&gt;

&lt;span class="n"&gt;test_case&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMTestCase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How many evaluation metrics does DeepEval offers?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;actual_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;14+ evaluation metrics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DeepEval offers 14+ evaluation metrics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HallucinationMetric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minimum_score&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_hallucination&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
  &lt;span class="nf"&gt;assert_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_case&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;deepeval test run test_file.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, without Pytest (perfect for notebook environments):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;evaluate&lt;/span&gt;
&lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;test_case&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star DeepEval on GitHub&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  2. MLFlow LLM Evaluate - LLM Model Evaluation
&lt;/h1&gt;

&lt;p&gt;MLFlow is a modular and simplistic package that allows you to run evaluations in your own evaluation pipelines. It offers RAG evaluation and  QA evaluation. &lt;/p&gt;

&lt;p&gt;MLFlow is good because of its intuitive developer experience. For example, this is how you run evaluations with MLFlow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;eval_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;targets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ground_truth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question-answering&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star MLFlow on GitHub&lt;/a&gt;
&lt;/p&gt;

&lt;h1&gt;
  
  
  3. RAGAs - Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
&lt;/h1&gt;

&lt;p&gt;Third on the list, RAGAs was build for RAG pipelines. They offer 5 core metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faithfulness&lt;/li&gt;
&lt;li&gt;Contextual Relevancy&lt;/li&gt;
&lt;li&gt;Answer Relevancy&lt;/li&gt;
&lt;li&gt;Contextual Recall&lt;/li&gt;
&lt;li&gt;Contextual Precision&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These metrics make up the final RAGAs score. DeepEval and RAGAs have very similar implementations, but RAGAs metrics are not self-explaining, making it much harder to debug unsatisfactory results.  &lt;/p&gt;

&lt;p&gt;RAGAs is third on the list primarily because it also incorporates the latest research into its RAG metrics, is simple to use, but not higher on the list because of its limited features and inflexibility as a framework.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ragas&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;evaluate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dataset&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-openai-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# prepare your huggingface dataset in the format
# Dataset({
#     features: ['question', 'contexts', 'answer', 'ground_truths'],
#     num_rows: 25
# })
&lt;/span&gt;
&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dataset&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star RAGAs on GitHub&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  4. Deepchecks
&lt;/h1&gt;

&lt;p&gt;Deepchecks stands out as it is geared more towards evaluating the LLM itself, rather than LLM systems/applications. &lt;/p&gt;

&lt;p&gt;It is not higher on the list due to its complicated developer experience (seriously, try setting it up yourself and let me know how it goes), but its open-source offering is unique as it focuses heavily on the dashboards and the visualization UI, which makes it easy for users to visualize evaluation results. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fknagi6pkcj8d0vednz74.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fknagi6pkcj8d0vednz74.png" alt=" " width="800" height="319"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/deepchecks/deepchecks" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star Deepchecks on GitHub&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  5. Arize AI Phoenix
&lt;/h1&gt;

&lt;p&gt;Last on the list, Arize AI evaluates LLM applications through extensive observability into LLM traces. However it is extremely limited as it only offers three evaluation criteria:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;QA Correctness&lt;/li&gt;
&lt;li&gt;Hallucination&lt;/li&gt;
&lt;li&gt;Toxicity&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6b8iomrnqz1djeyva3qy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6b8iomrnqz1djeyva3qy.png" alt=" " width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Arize-ai/phoenix" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star Phoenix on GitHub&lt;/a&gt;
&lt;/p&gt;




&lt;p&gt;So there you have it, the list of top LLM evaluation frameworks GitHub has to offer in 2025. Think there's something I've missed? Comment below to let me know! &lt;/p&gt;

&lt;p&gt;Thank you for reading, and till next time 😊&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>🔪 6 Killer Open-Source Libraries to Achieve AI Mastery in 2024 🔥🪄</title>
      <dc:creator>Jeffrey Ip</dc:creator>
      <pubDate>Mon, 11 Dec 2023 10:41:32 +0000</pubDate>
      <link>https://dev.to/confidentai/6-killer-open-source-libraries-to-achieve-ai-mastery-before-2024-4p1c</link>
      <guid>https://dev.to/confidentai/6-killer-open-source-libraries-to-achieve-ai-mastery-before-2024-4p1c</guid>
      <description>&lt;h1&gt;
  
  
  TL;DR
&lt;/h1&gt;

&lt;p&gt;AI has traditionally been a very difficult field for web developers to break into... until now 😌 With the introduction of large language models (LLMs) like ChatGPT, it seems like nowadays anyone can become an AI engineer. But make no mistake, this cannot be further from the truth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this article, I will reveal the current top AI libraries that makes a mediocre AI engineer exceptional.&lt;/strong&gt; As an ex-Google,  ex-Microsoft AI engineer myself, I will show you how exceptional AI engineers use these libraries to build great applications.&lt;/p&gt;

&lt;p&gt;Are you ready to up-skill yourself and be one step closer to becoming an AI wizard before 2024? Lets begin 🤗&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/IN8gg3Gci335S/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/IN8gg3Gci335S/giphy.gif" width="499" height="199"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  1. &lt;a href="https://github.com/confident-ai/deepeval" rel="noopener noreferrer"&gt;DeepEval&lt;/a&gt; - Open-source Evaluation Infrastructure for LLMs
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4mefpmuz0bkbgzyy5ujd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4mefpmuz0bkbgzyy5ujd.png" alt="Image description" width="800" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A good engineer can build, but an exceptional engineer can communicate the value of what they're built. DeepEval allows you to do exactly that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DeepEval allows you to unit test and debug your large language model (LLM, or just AI) applications at scale in both development and production&lt;/strong&gt; in under 10 lines of code.&lt;/p&gt;

&lt;p&gt;Why is this valuable you ask? Because companies nowadays want to be seen as an innovative AI company and so stakeholders prefer engineers that can not just build like an indie hacker, but know how to &lt;strong&gt;ship reliable AI applications&lt;/strong&gt; like a seasonal AI specialist.**&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;assert_test&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.test_case&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLMTestCase&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AnswerRelevancyMetric&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chatbot&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_chatbot&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
   &lt;span class="nb"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How to become an AI engineer in 2024?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
   &lt;span class="n"&gt;test_case&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMTestCase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actual_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;chatbot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
   &lt;span class="n"&gt;answer_relevancy_metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AnswerRelevancyMetric&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
   &lt;span class="nf"&gt;assert_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_case&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;answer_relevancy_metric&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star DeepEval on GitHub&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  2. Unstructured - Pre-processing for Unstructured Data
&lt;/h1&gt;

&lt;p&gt;LLMs thrive because they are versatile and can handle a large variety of inputs, but not all. Unstructured helps you easily transform unstructured data like webpages, PDFs, tables into readable formats for LLMs. &lt;/p&gt;

&lt;p&gt;What does this mean? This means you can now enable your AI application to be customized on your internal documents. Unstructured is amazing because it in my opinion, operates at the right level of abstraction - it gives the boring hard work while giving you enough control as a developer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;unstructured.partition.auto&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;partition&lt;/span&gt;

&lt;span class="n"&gt;elements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;example-docs/eml/fake-email.eml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;elements&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/cpacker/MemGPT" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star Unstructured&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  3. Airbyte - Data Integration for LLMs
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvvz4bwmtilvs5qwu8qgg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvvz4bwmtilvs5qwu8qgg.png" alt="Image description" width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Connect data sources, move data around, basically most of what you need to build a real-time AI application, using Airbyte. Allows your LLMs to be connected to information outside of the data it was trained on. &lt;/p&gt;

&lt;p&gt;Alike Unstructured, Airbyte provides a great level of abstraction over the work an AI engineer does.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/airbytehq/airbyte" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star Airbyte&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  4. Qdrant - Fast Vector Search for LLMs
&lt;/h1&gt;

&lt;p&gt;Ever wondered what happens if you feed in too much data to ChatGPT? That's right, you'll encounter a context overflow error. &lt;/p&gt;

&lt;p&gt;That's because LLMs cannot take in infinite information. To help with that, we need a way to only feed in relevant information. And this process, is known as retrieval augmented generation (RAG). &lt;a href="https://www.confident-ai.com/blog/what-is-retrieval-augmented-generation" rel="noopener noreferrer"&gt;Here's another great article on what RAG is.&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Qdrant is a vector database that helps you do just that. It stores and retrieve relevant information at blazing fast speed, ensuring your application stays up to date with the real world.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/qdrant/qdrant" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star Qdrant&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  5. MemGPT - Memory Management for LLMs
&lt;/h1&gt;

&lt;p&gt;So Qdrant helps give LLMs "long-term memory", but what happens if there's too much to "remember"? MemGPT helps you manage memory for this exact use case. &lt;/p&gt;

&lt;p&gt;MemGPT is like a cache for vector databases, with its own proprietary way to clearing caches. It helps you manage redundant information in your knowledge bases, making your AI application more performant and accurate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/cpacker/MemGPT" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star MemGPT&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  6. LiteLLM - LLM proxy
&lt;/h1&gt;

&lt;p&gt;LiteLLM is a proxy for multiple LLMs. It is great for experimentation and combined with &lt;a href="https://github.com/confident-ai/deepeval" rel="noopener noreferrer"&gt;DeepEval&lt;/a&gt;, allows you to pick the best model for your use case. The best part? it allows you to use any model it supports in the same OpenAI interface.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;litellm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;completion&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;## set ENV variables 
&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-openai-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 

&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello, how are you?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

&lt;span class="c1"&gt;# openai call
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-3.5-turbo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/BerriAI/litellm" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 Star LiteLLM&lt;/a&gt;
&lt;/p&gt;



&lt;h1&gt;
  
  
  Closing Remarks
&lt;/h1&gt;

&lt;p&gt;That's all folks, thanks for reading and I'd hope you learned a few things along the way!&lt;/p&gt;

&lt;p&gt;Please like and comment if enjoyed this article, and as always, don't forget to give open-source some love by starring their repos as a token of appreciation 🌟.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>⤴️How to build a Midjourney API with Nest.js 🚀</title>
      <dc:creator>Jeffrey Ip</dc:creator>
      <pubDate>Wed, 29 Nov 2023 10:59:17 +0000</pubDate>
      <link>https://dev.to/confidentai/how-to-build-unofficial-midjourney-api-with-nestjs-1lnd</link>
      <guid>https://dev.to/confidentai/how-to-build-unofficial-midjourney-api-with-nestjs-1lnd</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;In this post I will show you the architecture of building an unofficial Midjourney API with Typescript and Nest.js. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Famqplf9oby0m0pvbzau4.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Famqplf9oby0m0pvbzau4.gif" alt="Lets go" width="480" height="320"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  DeepEval - open-source evaluation framework for LLM applications
&lt;/h3&gt;

&lt;h4&gt;
  
  
  DeepEval evaluates performance based on metrics such as factual consistency, accuracy, answer relevancy
&lt;/h4&gt;

&lt;p&gt;We are just starting out.&lt;br&gt;
Can you help us with a star, please? 😽&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" rel="noopener noreferrer"&gt;https://github.com/confident-ai/deepeval&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" alt="Github stars" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  ➡️ Please &lt;strong&gt;Like, Heart and star this article&lt;/strong&gt; 
&lt;/h2&gt;
&lt;h2&gt;
  
  
  What are we going to build?
&lt;/h2&gt;

&lt;p&gt;To start off, let's understand how Mindjourney works on Discord. People use simple commands to talk to an AI bot. The bot then takes these commands and creates pictures that match the descriptions given by the users.&lt;/p&gt;

&lt;p&gt;In order to mimic this type of behavior, we will need to create an API that interacts with a Discord Bot. This bot has the ability to interact with Discord, hence send commands to Midjourney. Here is a high level design of the command flow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb5dyahofq7juij9e83sj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb5dyahofq7juij9e83sj.png" alt="architecture of midjourney" width="800" height="264"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: This tutorial is only for educational purpose and understanding the nature of Discord and Midjourney interaction. It is not recommended to use it as a production service or in any official project.&lt;/p&gt;

&lt;p&gt;The reason I chose Nest.js is because there are plenty of Python examples of how to build API for Midjourney, but not a decent one that shows the process using Javascript or Node.js. I prefer using Nest.js as it also well organized and easy for bootstrapping a project. &lt;/p&gt;
&lt;h3&gt;
  
  
  Connecting a Midjourney's Discord bot
&lt;/h3&gt;

&lt;p&gt;In the absence of a formal API, a connection to Midjourney is facilitated via a Discord bot. The process includes the following steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt; : In order for this to work, you must have Midjourney subscription.&lt;/p&gt;


&lt;h4&gt;
  
  
  Step 1: Create a Discord bot.
&lt;/h4&gt;

&lt;p&gt;Take a moment to help me please. I am working very hard to create the best open source for LLM evaluation. &lt;/p&gt;

&lt;p&gt;Please give me a star - I will truly appreciate it. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 DeepEval on GitHub&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;The first step towards a complete Midjourney API is to create our Discord bot. Discord has an interface for creating bots for different purposes. Go ahead &lt;a href="https://discord.com/developers/applications" rel="noopener noreferrer"&gt;and create your MJ bot&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzwg95irzw6u0n7i020ii.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzwg95irzw6u0n7i020ii.png" alt="Discord bot configuration" width="800" height="268"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is a &lt;a href="https://www.upwork.com/resources/how-to-make-discord-bot" rel="noopener noreferrer"&gt;great article&lt;/a&gt; for creating a Discord bot.&lt;/p&gt;

&lt;p&gt;Once you've created the bot, you'll receive an invite link. Use it to invite the bot to your Discord server - we'll use this later to generate and receive images.&lt;/p&gt;




&lt;h4&gt;
  
  
  Step 2: Implementing /Imagine command
&lt;/h4&gt;

&lt;p&gt;Once creating a &lt;a href="https://nestjs.com/" rel="noopener noreferrer"&gt;Nest.js&lt;/a&gt; app, go ahead and create your &lt;code&gt;discord&lt;/code&gt; module. This module will interact with our Discord server and MidJourney.”&lt;/p&gt;

&lt;p&gt;Let's begin with our &lt;em&gt;controller&lt;/em&gt; that should look something like that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Controller&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;discord&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DiscordController&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kr"&gt;private&lt;/span&gt; &lt;span class="nx"&gt;discordService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DiscordService&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

  &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;imagine&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;imagine&lt;/span&gt;&lt;span class="p"&gt;(@&lt;/span&gt;&lt;span class="nd"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;prompt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;discordService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendImagineCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, I have created a discord module with a single &lt;code&gt;POST&lt;/code&gt; request. We will pass a &lt;code&gt;prompt&lt;/code&gt; to our &lt;code&gt;discord/imagine&lt;/code&gt; request.&lt;/p&gt;

&lt;p&gt;Next, let's create our discord service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Injectable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DiscordService&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kr"&gt;private&lt;/span&gt; &lt;span class="nx"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;httpService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;HttpService&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;


  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;sendImagineCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;postUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://discord.com/api/v9/interactions&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;uniqueId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateId&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;postPayload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;application_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;APPLICATION_ID&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;guild_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;GUILD_ID&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;channel_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;CHANNEL_ID&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;SESSION_ID&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;COMMAND_VERSION&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;IMAGINE_COMMAND_ID&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;imagine&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;prompt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; --no &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;uniqueId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="na"&gt;application_command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;IMAGINE_COMMAND_ID&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;application_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;APPLICATION_ID&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;COMMAND_VERSION&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;default_member_permissions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;nsfw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;imagine&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Create images with Midjourney&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;dm_permission&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;contexts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
          &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;prompt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;The prompt to imagine&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;attachments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;


    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;postHeaders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;your&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;httpService&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;postUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;postPayload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;postHeaders&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toPromise&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;


    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;uniqueId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;


  &lt;span class="nf"&gt;generateId&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will notice a few things here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;We are using &lt;code&gt;https://discord.com/api/v9/interactions&lt;/code&gt; discord endpoint to interact with Discord server and send commands. This is the main entry point to deal with requests to Midjourney. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We mimic a web-browser request to Discord, and here is the real "magic" - go ahead and send &lt;code&gt;/imagine&lt;/code&gt; command from your  Discord web interface to Midjourney, after signing in to Midjourney web. &lt;br&gt;
Once sending a request , you will notice the imagine command sent in &lt;code&gt;Network&lt;/code&gt; tab as well, which is very similar to the above. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Copy the relevant fields : &lt;code&gt;IMAGINE_COMMAND_ID&lt;/code&gt; , &lt;code&gt;COMMAND_VERSION&lt;/code&gt;, &lt;code&gt;SESSION_ID&lt;/code&gt;, &lt;code&gt;GUILD_ID&lt;/code&gt;, &lt;code&gt;CHANNEL_ID&lt;/code&gt; and &lt;code&gt;APPLICATION_ID&lt;/code&gt;. This will be used on our service. We also need to copy &lt;code&gt;MIDJOURNEY_TOKEN&lt;/code&gt; which is sent as part of the request. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Copy &lt;code&gt;BOT_TOKEN&lt;/code&gt; from the bot application page we created earlier. This is important in order to communicate with our bot. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You will also notice the &lt;code&gt;uniqueId&lt;/code&gt; that we generate using our &lt;code&gt;generateId()&lt;/code&gt; function. This is using Midjourney's &lt;code&gt;--no&lt;/code&gt; command so we can later track back the unique request sent to Discord and get the generated images. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once completing this step, you are now able to call Discord with &lt;code&gt;/imagine&lt;/code&gt; command and generate images with Midjourney. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reminder&lt;/strong&gt; : This is only a technical post describing how this flow works, and is not recommended for use for any project.   &lt;/p&gt;




&lt;h4&gt;
  
  
  Step 3: Fetching generated images.
&lt;/h4&gt;

&lt;p&gt;Let's create a new controller to fetch images:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;mj/results/:id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;getMidjourneyResults&lt;/span&gt;&lt;span class="p"&gt;(@&lt;/span&gt;&lt;span class="nd"&gt;Param&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;discordService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getResultFromMidjourney&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;attachmentUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;image&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;attachments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;url&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;attachmentUrl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;discordService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;processAndUpload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;attachmentUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;image&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We are going to use the unique &lt;code&gt;id&lt;/code&gt; generated when creating our &lt;code&gt;/imagine&lt;/code&gt; request, in order to fetch results from Discord.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;getResultFromMidjourney&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;MIDJOURNEY_TOKEN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;channelUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`https://discord.com/api/v9/channels/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;CHANNEL_ID&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/messages?limit=50`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;httpService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;channelUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;toPromise&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;matchingMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
        &lt;span class="nx"&gt;message&lt;/span&gt;
          &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;components&lt;/span&gt;
          &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;component&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;component&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;components&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;label&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;U1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// means that we can upscale results&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;matchingMessage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;matchingMessage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attachments&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;matchingMessage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attachments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;attachment&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;matchingMessage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attachments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;attachment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchAndEncodeImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;attachment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;matchingMessage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
     &lt;span class="c1"&gt;// do something &lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;fetchAndEncodeImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AxiosResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;any&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;httpService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;responseType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;arraybuffer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;toPromise&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;base64&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;binary&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`data:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content-type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="s2"&gt;;base64,&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;https://discord.com/api/v9/channels/${CHANNEL_ID}/messages?limit=50&lt;/code&gt; endpoint is being used to fetch our Discord channel and get the response in order to retrieve our images. &lt;/p&gt;

&lt;p&gt;Since Midjourney generation takes about 60 seconds or more, we will need to poll this channel every x seconds to check for results. &lt;/p&gt;

&lt;p&gt;Let's give it a try with &lt;code&gt;{ prompt: "a cat" }&lt;/code&gt; :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe8y1fbxfunpovnju6n8r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe8y1fbxfunpovnju6n8r.png" alt="Midjourney API cat" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;That's it! You should now have a fully working Midjourney API for testing and fun and you've learned how Discord bot architecture works. &lt;/p&gt;

&lt;h3&gt;
  
  
  Final thoughts
&lt;/h3&gt;

&lt;p&gt;You now have a bootstrap project that demonstrates how Discord communicates with MidJourney to generate the most amazing AI images.&lt;br&gt;
You can build a nice UI and have your own generative AI platform. Good luck!&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>midjourney</category>
      <category>api</category>
      <category>nestjs</category>
    </item>
    <item>
      <title>Why OpenAI Assistants is a Big Win for LLM Evaluation</title>
      <dc:creator>Jeffrey Ip</dc:creator>
      <pubDate>Fri, 24 Nov 2023 12:13:34 +0000</pubDate>
      <link>https://dev.to/confidentai/why-openai-assistants-is-a-big-win-for-llm-evaluation-540l</link>
      <guid>https://dev.to/confidentai/why-openai-assistants-is-a-big-win-for-llm-evaluation-540l</guid>
      <description>&lt;p&gt;A week after the famous, or infamous, OpenAI Dev Day, we at Confident AI released JudgementalGPT — an LLM agent built using OpenAI’s Assistants API, specifically designed for the purpose of evaluating other LLM applications. What initially started off as an experimental idea quickly turned into a prototype that we were eager to ship as we received feedback from users that JudgementalGPT gave more accurate and reliable results when compared to other state-of-the-art LLM-based evaluation approaches such as G-Eval.&lt;/p&gt;

&lt;p&gt;Understandably, knowing that &lt;a href="https://www.confident-ai.com/" rel="noopener noreferrer"&gt;Confident AI is the world’s first open-source evaluation infrastructure for LLMs&lt;/a&gt;, many demanded more transparency into how JudgementalGPT was built after our initial public release:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I thought it’s all open source, but it seems like JudgementalGPT, in particular, is a black box for users. It would be great if we had more knowledge on how this is built.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So here you go, dear anonymous internet stranger, this article is dedicated to you.&lt;/p&gt;



&lt;h1&gt;
  
  
  DeepEval - open-source evaluation framework for LLM applications
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;DeepEval is a framework that help engineers evaluate the performance of their LLM applications by providing default metrics to measure hallucination, relevancy, and much more.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We are just starting out, and we really want to help more developers build safer AI apps. Would you mind giving it a star to spread the word, please? 🥺❤️🥺&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 DeepEval on GitHub&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" alt="Github stars" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;h1&gt;
  
  
  Limitations of LLM-based evaluations
&lt;/h1&gt;

&lt;p&gt;The authors of G-Eval, &lt;a href="https://arxiv.org/pdf/2303.16634.pdf" rel="noopener noreferrer"&gt;state that&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Conventional reference-based metrics, such as BLEU and ROUGE, have been shown to have relatively low correlation with human judgments, especially for tasks that require creativity and diversity.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For those who don’t already know, &lt;a href="https://www.confident-ai.com/blog/a-gentle-introduction-to-llm-evaluation" rel="noopener noreferrer"&gt;G-Eval is a framework that utilizes Large Language Models (LLMs) with chain-of-thought (CoT) processing to evaluate the quality of generated texts in a form-filling paradigm&lt;/a&gt;, and if you’ve ever tried implementing a version of your own, you’ll quickly find that using LLMs for evaluation presents its own set of problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unreliability&lt;/strong&gt; — although G-Eval uses a low-precision grading scale (1–5), which makes it easier for interpretation, these scores can vary a lot even under the same evaluation conditions. This variability is due to an intermediate step in G-Eval that dynamically generates steps for later evaluation, which increases the stochasticity of evaluation scores (which is also why providing an initial seed value doesn’t help).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Inaccuracy&lt;/strong&gt; — for certain tasks, one digit usually dominates (e.g., 3 for a grading scale of 1–5 using gpt-3.5-turbo). A way to get around this problem would be to take the probabilities of output tokens from an LLM to normalize the scores and take their weighted summation as the final score. But, unfortunately, this isn’t an option if you’re using OpenAI’s GPT models as an evaluator, since they deprecated the logprobs parameter a few months ago.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In fact, &lt;a href="https://arxiv.org/pdf/2306.05685.pdf" rel="noopener noreferrer"&gt;another paper that explored LLM-as-a-judge&lt;/a&gt; pointed out that using LLMs as an evaluator is flawed in several ways. For example, GPT-4 gives preferential treatment to self-generated outputs, is not very good at math (but neither am I), and is prone to verbosity bias. Verbosity bias means it favors longer, verbose responses instead of accurate, shorter alternatives. &lt;em&gt;(In fact, an initial study has shown that GPT-4 exhibits verbosity bias 8.75% of the time)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Can you see how this becomes a problem if you’re trying to evaluate a summarization task?&lt;/p&gt;



&lt;h1&gt;
  
  
  OpenAI Assistants offers a workaround to existing problems
&lt;/h1&gt;

&lt;p&gt;Here’s a surprise — JudgementalGPT isn’t composed of one evaluator built using the new OpenAI Assistant API, but multiple. That’s right, behind the scenes, JudgementalGPT is a proxy for multiple assistants that perform different evaluations depending on the evaluation task at hand. Here are the problems JudgementalGPT was designed to solve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bias&lt;/strong&gt; — we’re still experimenting with this (another reason for close-sourcing JudgementalGPT!), but assistants have the ability to write and execute code using the code interpreter tool, which means that, with a bit of prompt engineering, it can account for tasks that are more prone to logical fallacies, such as asserting coding or math problems, or tasks that require more factuality rather than giving preferential treatment to its own outputs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reliability&lt;/strong&gt; — since we no longer require LLMs to dynamically generate CoTs/evaluation steps, we can enforce a set of rules for specific evaluation tasks. In other words, since we’ve pre-defined multiple sets of evaluation steps based on the evaluation task at hand, we have removed the biggest parameter contributing to stochasticity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Accuracy&lt;/strong&gt; — having a set of pre-defined evaluation steps for different tasks also means we can provide more guidance based on what we as humans actually expect from each evaluator and quickly iterate on the implementation based on user feedback.Another thing that we learnt when implementing G-Eval into our open-source project DeepEval was evaluation steps generated by LLMs are be lengthy and full of fluff.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another insight we gained while integrating G-Eval into our open-source project DeepEval was the realization that LLM-generated evaluation steps tend to be arbitrary and generally does not help in providing guidance for evaluation. Some of you might also wonder what happens when JudgementalGPT can’t find a suitable evaluator for a particular evaluation task. For this edge case, we default back to G-Eval. Here’s a quick architecture diagram on how JudgementalGPT works:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ed8bxf1noyles82a4mp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ed8bxf1noyles82a4mp.png" alt="Image description" width="800" height="193"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As I’m writing this article, I discovered recent paper introducing &lt;a href="https://arxiv.org/pdf/2310.08491.pdf" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt;, “a fully open-source LLM that is on par with GPT-4’s evaluation capabilities when the appropriate reference materials (reference answer, score rubric) are accompanied”, which also requires evaluation steps to be explicitly defined instead.&lt;/p&gt;




&lt;h1&gt;
  
  
  Still, problems with LLM-based evaluation lingers
&lt;/h1&gt;

&lt;p&gt;One unresolved issue pertains to the accuracy challenges stemming from the predominance of a single digit in evaluation scores. This phenomenon, theoretically, isn’t exclusive to older models and is likely to affect advanced versions like gpt-4–1106-preview as well. So, I’m keeping an open mind about how this might affect JudgementalGPT. We’re really looking forward to more research that’ll either back up what we think or give us a whole new perspective — either way, I’m all ears.&lt;/p&gt;

&lt;p&gt;Lastly, there can still be intricacies involved in defining our own set of evaluators. For example, just like how G-Eval isn’t a one-size-fits-all solution, neither is summarization, or relevancy. Any metric that is subject to interpretability is guaranteed to disappoint users who expect something different. For now, the best solution would be to have users clearly define their evaluation criteria to rid LLMs of any evaluation ambiguity.&lt;/p&gt;



&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;At the end of the day, there’s no one-size-fits-all solution for LLM-based evaluations, which is why engineers/data scientists are frequently disappointed by non-human evaluation scores. However, by defining specific and concise evaluation steps for different use cases, LLMs are able to navigate ambiguity better, as they are provided more guidance into what a human might expect for different evaluation criteria.&lt;/p&gt;

&lt;p&gt;P.S. By now, those of you who read between the lines will probably know the key to building a better evaluator is to tailor them for specific use cases, and OpenAI’s new Assistant API along with its code interpreter functionality is merely the icing on the cake (and a good marketing strategy!).&lt;/p&gt;

&lt;p&gt;So, dear anonymous internet stranger, I hope you’re satisfied, and till next time.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>learning</category>
    </item>
    <item>
      <title>⤴️ Be a prompt engineer: Understanding Midjourney LLM</title>
      <dc:creator>Jeffrey Ip</dc:creator>
      <pubDate>Wed, 15 Nov 2023 09:43:06 +0000</pubDate>
      <link>https://dev.to/confidentai/be-a-prompt-engineer-understanding-midjourney-llm-464k</link>
      <guid>https://dev.to/confidentai/be-a-prompt-engineer-understanding-midjourney-llm-464k</guid>
      <description>&lt;h1&gt;
  
  
  TL;DR
&lt;/h1&gt;

&lt;p&gt;By now, you've probably seen those incredible AI-generated images on your social feeds and thought to yourself, &lt;strong&gt;"How are people making these amazing images?"&lt;/strong&gt; So you jump onto Midjourney, ready to create your own, but somehow, what comes out isn't quite what you pictured.&lt;/p&gt;

&lt;p&gt;Don't worry — I've got you covered.&lt;br&gt;
In order to get amazing images out of Midjourney, you need to be able to write prompts like a pro. Since Midjourney is based on an LLM, it all comes down to understanding its nature and how to get the most out of it.&lt;/p&gt;

&lt;p&gt;Do you want to become a Prompt Hero? Then this guide is for you!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fntctmiu0qh7j4wljj8a3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fntctmiu0qh7j4wljj8a3.png" alt="Midjourney prompt hero" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  DeepEval - open-source evaluation framework for LLM applications
&lt;/h3&gt;
&lt;h4&gt;
  
  
  DeepEval evaluates performance based on metrics such as factual consistency, accuracy, answer relevancy
&lt;/h4&gt;

&lt;p&gt;We are just starting out.&lt;br&gt;
Can you help us with a star, please? 😽&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" rel="noopener noreferrer"&gt;https://github.com/confident-ai/deepeval&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" alt="Github stars" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Creating your first Midjourney artwork
&lt;/h2&gt;

&lt;p&gt;To get started with Midjourney, sign up to &lt;a href="https://discord.com/register" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; and complete the registration process. Once you have got Discord running, open &lt;a href="//midjourney.com"&gt;Midjourney website&lt;/a&gt; and choose &lt;code&gt;Join Beta&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxkuyy1upc0w6da6ndeg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxkuyy1upc0w6da6ndeg.png" alt="Midjourney website" width="794" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once signing up, you can select a paid or a free plan. &lt;br&gt;
If you are using a free plan, you may generate images in any of the Midjourney newbies channels. Paid users can send commands directly to Midjourney bot. &lt;/p&gt;

&lt;p&gt;To begin with your first image, start typing &lt;code&gt;/&lt;/code&gt; followed by &lt;code&gt;imagine&lt;/code&gt; command. Then, it will let you enter a prompt (a description for generating an image), for example: &lt;/p&gt;

&lt;p&gt;&lt;code&gt;/imagine&lt;/code&gt; &lt;code&gt;prompt: beautiful colorful horse&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hkb2mjtottaat2jc4hb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hkb2mjtottaat2jc4hb.png" alt="beautiful horse" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Midjourney will generate an image based on your prompt. &lt;/p&gt;
&lt;h2&gt;
  
  
  How does Midjourney works?
&lt;/h2&gt;

&lt;p&gt;Midjourney uses an LLM (a large language model) to create images from text descriptions. This model has been trained on a vast array of text-image pairs, enabling it to understand and interpret the text prompts to produce similar images.&lt;/p&gt;

&lt;p&gt;Let's break down this image creation process:&lt;/p&gt;
&lt;h4&gt;
  
  
  Analyzing the Prompt
&lt;/h4&gt;

&lt;p&gt;The LLM starts by dissecting the prompt into its core ideas and terms. If you input something like "a photorealistic portrait of a woman," the system identifies key concepts like "photorealistic," "portrait," and "woman."&lt;/p&gt;

&lt;p&gt;A basic Midjourney prompt looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmyf29vpfbmdgc9ygorm9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmyf29vpfbmdgc9ygorm9.png" alt="basic prompt" width="570" height="165"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A more advanced prompt may look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2d6eodywvyu8wnivfmop.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2d6eodywvyu8wnivfmop.png" alt="advanced prompt" width="800" height="113"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We'll get back to that later. What's important is to understand that whatever you write is used to create the &lt;strong&gt;latent vector&lt;/strong&gt; in the following step.&lt;/p&gt;
&lt;h4&gt;
  
  
  Generating a Latent Vector
&lt;/h4&gt;

&lt;p&gt;Next, the LLM translates these concepts into a latent vector. This is a numerical code that captures all the image details - its color palette, shapes, style, objects, and more.&lt;/p&gt;

&lt;p&gt;All those parameters are used inside the model to understand your request, by matching the vector to data it already knows and has been trained on.&lt;/p&gt;

&lt;p&gt;This is why the following tip by official Midjourney documentation is important:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The Midjourney Bot works best with simple, short sentences that describe what you want to see. Avoid long lists of requests. Instead of: "Show me a picture of lots of blooming California poppies, make them bright, vibrant orange, and draw them in an illustrated style with colored pencils," try: "Bright orange California poppies drawn with colored pencils."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;Pro tip: use short prompts!&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 DeepEval on GitHub&lt;/a&gt;
&lt;/p&gt;

&lt;h4&gt;
  
  
  Using a Diffusion Model to generate the image
&lt;/h4&gt;

&lt;p&gt;The final step of generating the image involves converting this latent vector into the actual image. This is where a diffusion model comes into play. It's a kind of AI that can form images from seemingly random patterns.&lt;/p&gt;

&lt;p&gt;Starting with a blank canvas, the model slowly refines the image, adding layers of detail until it reflects what the latent vector describes. The way it adds this 'noise' is controlled, making sure the final image is clear and recognizable.&lt;/p&gt;

&lt;p&gt;Other well-known generative AI platforms such as &lt;a href="https://stability.ai/" rel="noopener noreferrer"&gt;Stable Diffusion&lt;/a&gt; uses the same technics. &lt;/p&gt;

&lt;p&gt;This is also the reason while waiting for Midjourney to complete its image creation, you notice blurry images which eventually turn into amazing art work. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxcwh5azpn7qomzq5f6u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxcwh5azpn7qomzq5f6u.png" alt="Diffusion model" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The basics
&lt;/h2&gt;

&lt;p&gt;Begin with a short prompt, focus on what you want to create - our subject.&lt;br&gt;
Let's say we are interested in creating a portrait of a woman. We can begin with something like this:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/imagine A portrait of a young woman with light blue eyes&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmayr4bnu62lc28wn1lc8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmayr4bnu62lc28wn1lc8.png" alt="A portrait of a young woma" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once we have our initial image, it is all about iterations and improvements. We can now focus on details that matter, such as &lt;strong&gt;medium, mood, composition, environment&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;Let's say we want to get a more &lt;strong&gt;realistic&lt;/strong&gt; photo:&lt;br&gt;
&lt;code&gt;/imagine A realistic photo of a young woman with light blue eyes&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fau1dmx9xz04f3qfsdk88.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fau1dmx9xz04f3qfsdk88.png" alt="A realistic photo" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This one is more realistic; however, let's give it the touch of an old photograph. To achieve that, we can simply add a year, say, 1960.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/imagine A realistic photo of a young woman with light blue eyes, year 1960&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgdd4vboiw2mymve5r5xg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgdd4vboiw2mymve5r5xg.png" alt="year 1960" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We've come a long way by only adding small details, such as the year and the medium type (realistic).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Pro tip: The Midjourney Bot does not comprehend grammar, sentence structure, or words as humans do. Using fewer words means that each one has a more powerful influence.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Now, let's add a composition; for instance, if I am interested in a headshot from above, we can revise our prompt accordingly:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/imagine Bird-eye view realistic photo, of a young woman with light blue eyes, 1960&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdstc4p9qlj4f0ptqkv8h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdstc4p9qlj4f0ptqkv8h.png" alt="Bird-eye view realistic photo" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pretty cool, right?&lt;/p&gt;

&lt;p&gt;Continue experimenting with various elements such as environment, emotions, colors, and more to discover the diverse outcomes they can produce.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6ymaqcfnurgkdn3hazk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6ymaqcfnurgkdn3hazk.png" alt="Midjourney styles" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Midjourney, utilizing a well-trained Large Language Model (LLM) and a diffusion model, has the capability to generate a wide range of variations based on your initial image. This allows for a great deal of flexibility and creativity in the image creation process.&lt;/p&gt;

&lt;p&gt;By instructing the bot to produce either strong or weak variations, you can refine the output step by step. You might start with a broad concept and then progressively narrow down the details, or you could begin with a highly specific image and explore slight adjustments. The process continues until you reach a result that meets your vision or preference.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9coeaqhq5mos9ibiq45b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9coeaqhq5mos9ibiq45b.png" alt="Image variation" width="479" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Asking for a strong variation will result in the following images:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffw0nr9x4kytkbjp499yf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffw0nr9x4kytkbjp499yf.png" alt="Image variation" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Advanced techniques
&lt;/h2&gt;

&lt;p&gt;Now that we understand the basics of Midjourney LLM, we can dive into its parameters. Parameters are options added to a prompt that change how an image is generated.&lt;/p&gt;
&lt;h3&gt;
  
  
  Changing aspect ratio
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Pro tip: parameters are always added at the end of the prompt&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;One of the most important parameters is the aspect ratio. Midjourney's default aspect ratio is square (1:1), but what if we want to create a great cover image (such as this article's cover) or a portrait image?&lt;br&gt;
We just need to add --ar  at the end of the prompt. For example:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/imagine Bird-eye view realistic photo, of a young woman with light blue eyes, 1960 --ar 1:2&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ciaa35gke0glvlq6mrt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ciaa35gke0glvlq6mrt.png" alt="aspect ratio" width="768" height="1536"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice the &lt;code&gt;--ar&lt;/code&gt; followed by the spect ratio, here. &lt;/p&gt;
&lt;h3&gt;
  
  
  Getting more artistic
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Using styles
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;--style&lt;/code&gt; parameter replaces the default style of some Midjourney Model Versions. &lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;--style raw&lt;/code&gt; will result in a more accurate prompt style, and less beautification. Let's have a look at the following example: &lt;/p&gt;

&lt;p&gt;&lt;code&gt;/imagine cat icon&lt;/code&gt; will generate this kind of image, which is beautiful, but not really an icon:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq7gq5vfr4qysc32krk0b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq7gq5vfr4qysc32krk0b.png" alt="Image icon" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we add &lt;code&gt;--style raw&lt;/code&gt; to it, Midjourney will generate a much more relevant image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff0vyf84qf321wh9ecfr0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff0vyf84qf321wh9ecfr0.png" alt="Image icon raw" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Niji model
&lt;/h4&gt;

&lt;p&gt;Midjourney has an alternative model called &lt;code&gt;niji 5&lt;/code&gt; which allows to use other style parameters. &lt;br&gt;
Adding &lt;code&gt;--niji 5&lt;/code&gt; followed by different styles such as: &lt;code&gt;cute&lt;/code&gt;, &lt;code&gt;expressive&lt;/code&gt;, &lt;code&gt;original&lt;/code&gt; or &lt;code&gt;scenic&lt;/code&gt; will result in more sophisticated images. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;/imagine cat --niji 5 --style cute&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwwhti1lra9tm7hnpetwv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwwhti1lra9tm7hnpetwv.png" alt="a cute cat" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As an LLM-based generator, Midjourney is trained on a huge amount of data, incorporating different artistic styles.&lt;br&gt;
Providing a &lt;code&gt;--stylize&lt;/code&gt; parameter influences how strongly this training is applied, with the range being between 0 and 1000; higher values will generate a more artistic image.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/imagine child's drawing of a dog&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0asxrqqdqf8hbi6k8r32.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0asxrqqdqf8hbi6k8r32.png" alt="stylize images" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Ready to become a pro?
&lt;/h2&gt;

&lt;p&gt;Before moving forward, I would appreciate it if you could like or 'heart' this article — it would help me a lot.&lt;/p&gt;

&lt;p&gt;Also, please check out my open-source GitHub library. Would you mind giving it a star? ❤️&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 DeepEval on GitHub&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Here comes the fun part. But before we start, I would like to share with you the way I create nice photos and understand Midjourney LLM better.&lt;/p&gt;

&lt;h3&gt;
  
  
  Finding inspirations
&lt;/h3&gt;

&lt;p&gt;When looking for inspiration, I head to the &lt;a href="https://www.midjourney.com/showcase/" rel="noopener noreferrer"&gt;Midjourney Showcase page &lt;/a&gt; where I look for inspiring photos. Once I've found one, I download the photo and ask Midjourney to &lt;code&gt;describe&lt;/code&gt; it. This process is similar to the reverse engineering of the LLM, which reveals how Midjourney transforms text to image.&lt;/p&gt;

&lt;p&gt;For example, I have found this image interesting:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiv2awjg9i4w23wlww5s1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiv2awjg9i4w23wlww5s1.png" alt="elephant Midjourney" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And asked Midjourney to describe it using &lt;code&gt;/describe&lt;/code&gt; command. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhjjowbs5wno5iusu8hwf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhjjowbs5wno5iusu8hwf.png" alt="Describe image" width="800" height="670"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's a good starting point for your next image generation. Take the keywords that created this image and use them to generate images with a similar look and feel.&lt;br&gt;
Here I noticed the text "a polygonal elephant in a dark background", which is dominant, but also &lt;strong&gt;"in the style of graphic design influence, stephen shortridge"&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;Pro tip: Midjourney knows how to generate images in the style of a given artist&lt;/code&gt; &lt;/p&gt;

&lt;p&gt;Prompt &lt;code&gt;/imagine a polygonal elephant, in the style of stephen shortridge&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff194mkbexn6pzhjseevu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff194mkbexn6pzhjseevu.png" alt="A polygon elephant" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Let's get weird
&lt;/h3&gt;

&lt;p&gt;We can get unconventional images with the --weird parameter. When using this parameter, Midjourney creates unique and unexpected outcomes. &lt;code&gt;--weird&lt;/code&gt; accepts values from 0 to 3000 (the default is 0), and the higher the value we provide, the more unexpected the outcome is.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/imagine elephant --weird ...&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52ky52iwiqkjr37d2o88.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52ky52iwiqkjr37d2o88.png" alt="weird elephant" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Permutations
&lt;/h3&gt;

&lt;p&gt;What if we wish to try different colors, say red/green/blue/yellow elephant?&lt;/p&gt;

&lt;p&gt;We can use permutations by adding &lt;code&gt;{ ... }&lt;/code&gt; to our prompt, comma separating our permutations. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;/imagine a { red, green, blue, yellow } elephant&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This will create 4 Midjourney jobs in a single shot. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5r94baafn5fxcn9cvrjh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5r94baafn5fxcn9cvrjh.png" alt="4 elephants" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Midjourney Tiles
&lt;/h3&gt;

&lt;p&gt;This is probably one of the most amazing, yet hidden,  Midjourney features. The &lt;code&gt;--tile&lt;/code&gt; parameter will generate an image which can be repeatedly used as a tile.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/imagine watercolor elephant --tile&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1u9fu3s6ncg41p24buat.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1u9fu3s6ncg41p24buat.png" alt="Midjourney tiles" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Final thoughts
&lt;/h3&gt;

&lt;p&gt;Understanding Midjourney LLM results in generating amazing images and photos. &lt;br&gt;
If you think of any other helpful Midjourney prompt engineering that I haven't covered in this article, please share them in the comments section below. 👇🏻&lt;/p&gt;

&lt;p&gt;So, that is it for this article.&lt;/p&gt;

&lt;p&gt;Thank you so much for reading! 🤩🙏&lt;/p&gt;

</description>
      <category>llm</category>
      <category>promptengineering</category>
      <category>midjourney</category>
      <category>productivity</category>
    </item>
    <item>
      <title>What is Retrieval Augmented Generation (RAG)? 🚀</title>
      <dc:creator>Jeffrey Ip</dc:creator>
      <pubDate>Wed, 25 Oct 2023 10:30:40 +0000</pubDate>
      <link>https://dev.to/confidentai/what-is-retrieval-augmented-generation-rag-4n7g</link>
      <guid>https://dev.to/confidentai/what-is-retrieval-augmented-generation-rag-4n7g</guid>
      <description>&lt;h1&gt;
  
  
  TL;DR
&lt;/h1&gt;

&lt;p&gt;In this article, I’m going to talk about what RAG is and how to implement a RAG-based LLM application (yes, with a complete code sample 😚)&lt;/p&gt;

&lt;p&gt;Let’s dive right in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/C2L2bXRnv2chSO1mAH/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/C2L2bXRnv2chSO1mAH/giphy.gif" width="480" height="270"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;h2&gt;
  
  
  DeepEval - open-source evaluation framework for LLM applications
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;DeepEval is a framework that help engineers evaluate the performance of their LLM applications by providing default metrics to measure hallucination, relevancy, and much more.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We are just starting out, and we really want to help more developers build safer AI apps. Would you mind giving it a star to help spread the word, please? 🥺❤️🥺&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 DeepEval on GitHub&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" alt="Github stars" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;h1&gt;
  
  
  What is RAG?
&lt;/h1&gt;

&lt;p&gt;Retrieval augmented generation is a technique in NLP that allows LLMs like ChatGPT to generate customized outputs that are outside the scope of the data it was trained on. An LLM application without RAG, is akin to asking ChatGPT to summarize an email without providing the actual email as context.&lt;/p&gt;

&lt;p&gt;A RAG system consists of two primary components: the retriever and the generator.&lt;/p&gt;

&lt;p&gt;The retriever is responsible for searching through the knowledge base for the most relevant pieces of information that correlate with the given input, which is referred to as retrieval results. On the other hand, the generator utilizes these retrieval results to craft a series of prompts based on a predefined prompt template to produce a coherent and relevant response to the input.&lt;/p&gt;

&lt;p&gt;Here’s a diagram of a RAG architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7m0u8g9hmpjtq6igc9b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7m0u8g9hmpjtq6igc9b.png" alt="A typical RAG architecture" width="800" height="696"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In most cases, your “knowledge base” consists of vector embeddings stored in a vector database like ChromaDB, and your “retriever” will 1) embed the given input at runtime and 2) search through the vector space containing your data to find the top K most relevant retrieval results 3) rank the results based on relevancy (or distance to your vectorized input embedding). This will then be processed into a series of prompts and passed onto your “generator”, which is your LLM of choice (GPT-4, lLama2, etc.).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhb75hcfkkj8sit0idgbe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhb75hcfkkj8sit0idgbe.png" alt="Image description" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For more curious users, here are the models a retriever commonly employs to extract the most pertinent retrieval results:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Neural Network Embeddings&lt;/strong&gt; (eg. OpenAI/Cohere’s embedding models): ranks documents based on their locational proximity in a multidimensional vector space, enabling an understanding of textual relationships and relevance between an input and the document corpus.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Best Match 25 (BM25)&lt;/strong&gt;: a probabilistic retrieval model that enhances text retrieval precision. By considering term frequencies with inverse document frequencies, it takes into account term significance, ensuring that both common and rare terms influence the relevance ranking.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;TF-IDF (Term Frequency — Inverse Document Frequency)&lt;/strong&gt;: calculates the significance of a term within a document relative to the broader corpus. By juxtaposing a term’s occurrence in a document with its rarity across the corpus, it ensures a comprehensive relevance ranking.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hybrid Search&lt;/strong&gt;: optimizes the relevance of the search results by assigning distinctive weights to different methodologies, such as Neural Network Embeddings, BM25, and TF-IDF.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;



&lt;h1&gt;
  
  
  Applications
&lt;/h1&gt;

&lt;p&gt;RAG has various applications across different fields due to its ability to combine retrieval and generation of text for enhanced responses. Having worked with numerous companies building LLM applications at Confident, here is the top four use cases I’ve seen:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Customer support / user onboarding chatbots&lt;/strong&gt;: No surprises here, retrieve data from internal documents to generate more personalized responses. &lt;a href="https://www.confident-ai.com/blog/building-a-customer-support-chatbot-using-gpt-3-5-and-llamaindex" rel="noopener noreferrer"&gt;Click here to read a full tutorial on how to build one yourself using lLamaindex.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Extraction&lt;/strong&gt;. Interestingly, we can use RAG to extract relevant data from documents such as PDFs. &lt;a href="https://www.confident-ai.com/blog/how-to-build-a-pdf-qa-chatbot-using-openai-and-chromadb" rel="noopener noreferrer"&gt;You can find a tutorial on how to do it here.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sales enablement&lt;/strong&gt;: retrieve data from LinkedIn profiles and email threads to generate more personalized outreach messages&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Content creation and enhancement&lt;/strong&gt;: retrieve data from past message conversations to generate suggested message replies&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the following code walkthrough, we’ll be building a very generalized chatbot, and you’ll be able to customize it’s functionality into any of the use cases listed above by tweaking prompts and data stored in your vector database.&lt;/p&gt;



&lt;h1&gt;
  
  
  Project Setup
&lt;/h1&gt;

&lt;p&gt;For this project, we’re going to build a question-answering (QA) chatbot based on your knowledge base. We’re not going to cover the part on how to index your knowledge base, as that’s a discussion for another day.&lt;/p&gt;

&lt;p&gt;We’re going to be using python, ChromaDB for our vector database, and OpenAI for both vector embeddings and chat completion. We’re going to build a chatbot on your favorite Wikipedia page.&lt;/p&gt;

&lt;p&gt;First, set up a new project directory and install the dependencies we need.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;rag-llm-app
&lt;span class="nb"&gt;cd &lt;/span&gt;rag-llm-app
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your terminal should now start with something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;(&lt;/span&gt;venv&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Installing dependencies
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openai chromadb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, create a new main.py file — the entry point to your LLM application.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;touch &lt;/span&gt;main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Getting your API keys
&lt;/h1&gt;

&lt;p&gt;Lastly, go ahead and get your OpenAI API key here if you don’t already have one, and set it as an enviornment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-openai-api-key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’re good to go! Let’s start coding.&lt;/p&gt;



&lt;h1&gt;
  
  
  Building a RAG-based LLM application
&lt;/h1&gt;

&lt;p&gt;Begin by creating an Retriever class that will retrieve the most relevant data from ChromaDB for a given user question.&lt;/p&gt;

&lt;p&gt;Open main.py and paste in the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;chromadb.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;embedding_functions&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;heartbeat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Retriver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_retrieval_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;openai_ef&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding_functions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAIEmbeddingFunction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-openai-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-ada-002&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_collection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding_function&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;openai_ef&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;retrieval_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;query_texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;retrieval_results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;openai_ef&lt;/code&gt; is the embedding function used under the hood by ChromaDB to vectorize an input. When a user sends a question to your chatbot, a vector embedding will be created from this question using OpenAI’s &lt;code&gt;text-embedding-ada-002&lt;/code&gt; model. This vector embedding will then be used for ChromaDB to perform a vector similarity search in the collection vector space, which contains data from your knowledge base (remember, we’re assuming you’ve already indexed data for this tutorial). This process allows you to search for the top K most relevant retrieval results on any given input.&lt;/p&gt;

&lt;p&gt;Now that you’ve created your retriever, paste in the following code to create a generator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Generator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;openai_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;openai_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai_model&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt_template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a helpful assistant with a thick country accent. Answer the question below and if you don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t know the answer, say you don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t know.

            {text}
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieval_results&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;prompts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;retrieval_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt_template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;prompts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;prompts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reverse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChatCompletion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;openai_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prompts&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we constructed a series of prompts in the &lt;code&gt;generate_response&lt;/code&gt; method based on a list of &lt;code&gt;retrieval_results&lt;/code&gt; that will be provided by the retriever we built earlier. We then send this series of prompts to OpenAI to generate an answer. Using RAG, your QA chatbot can now produce more customized outputs by enhancing the generation with retrieval results!&lt;/p&gt;

&lt;p&gt;To wrap things up, lets put everything together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Chatbot&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retriver&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Retriver&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Generator&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;retrieval_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retriver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_retrieval_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieval_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# Creating an instance of the Chatbot class
&lt;/span&gt;&lt;span class="n"&gt;chatbot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Chatbot&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Taking user input from the CLI
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chatbot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Chatbot: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s all folks! You just built your very first RAG-based chatbot.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;In this article, you’ve learnt what RAG is, some use cases for RAG, and how to build your own RAG-based LLM application. However, you might have noticed that building your own RAG application is pretty complicated, and indexing your data is often a non-trivial task. Luckily, there are existing open-source frameworks like LangChain and lLamaIndex that allows you to implement what we’ve demonstrated in a much simpler way.&lt;/p&gt;

&lt;p&gt;If you like the article, don’t forget to give us a star on Github ❤️: &lt;a href="https://github.com/confident-ai/deepeval" rel="noopener noreferrer"&gt;https://github.com/confident-ai/deepeval&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also find the full code example here: &lt;a href="https://github.com/confident-ai/blog-examples/tree/main/rag-llm-app" rel="noopener noreferrer"&gt;https://github.com/confident-ai/blog-examples/tree/main/rag-llm-app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Till next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>programming</category>
      <category>opensource</category>
    </item>
    <item>
      <title>The one thing everyone's doing wrong with ChatGPT... 🤫🤔</title>
      <dc:creator>Jeffrey Ip</dc:creator>
      <pubDate>Tue, 03 Oct 2023 09:08:39 +0000</pubDate>
      <link>https://dev.to/confidentai/the-one-thing-everyones-doing-wrong-with-chatgpt-3api</link>
      <guid>https://dev.to/confidentai/the-one-thing-everyones-doing-wrong-with-chatgpt-3api</guid>
      <description>&lt;h1&gt;
  
  
  TL;DR
&lt;/h1&gt;

&lt;p&gt;Most developers don't evaluate their GPT outputs when building applications even if that means introducing unnoticed breaking changes because evaluation is very, very, hard. In this article, you're going to learn how to evaluate ChatGPT (LLM) outputs the right way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔥 On the agenda&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what are LLMs and why they're difficult to evaluate&lt;/li&gt;
&lt;li&gt;different ways to evaluate LLM outputs&lt;/li&gt;
&lt;li&gt;how to evaluate in python&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Enjoy! 🤗&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/QJvwBSGaoc4eI/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/QJvwBSGaoc4eI/giphy.gif" width="500" height="363"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;h2&gt;
  
  
  DeepEval - open-source evaluation framework for LLM applications
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;DeepEval is a framework that help engineers evaluate the performance of their LLM applications by providing default metrics to measure hallucination, relevancy, and much more.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We are just starting out, and we really want to help more developers build safer AI apps. Would you mind giving it a star to help spread the word, please? 🥺❤️🥺&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;🌟 DeepEval on GitHub&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkb2pvk36eqd892p30ug.png" alt="Github stars" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;h1&gt;
  
  
  What are LLMs and what makes them so hard to evaluate?
&lt;/h1&gt;

&lt;p&gt;To understand why LLMs are difficult to evaluate and why they're often times referred to as a "black box", let's debunk are LLMs and how they work.&lt;/p&gt;

&lt;p&gt;ChatGPT is an example of a large language model (LLM) and was trained on huge amounts of data. To be exact, around 300 billion words from articles, tweets, r/tifu, stack-overflow, how-to-guides, and other pieces of data that were scraped off the internet 🤯&lt;/p&gt;

&lt;p&gt;Anyway, the GPT behind "Chat" stands for Generative Pre-trained Transformers. A transformer is a specific neural network architecture which is particularly good at predicting the next few tokens (a token == 4 characters for ChatGPT, but this can be as short as one character or as long as a word depending on the specific encoding strategy). &lt;/p&gt;

&lt;p&gt;So in fact, LLMs don't really "know" anything, but instead "understand" linguistic patterns due to the way in which they were trained, which often times makes them pretty good at figuring out the right thing to say. Pretty manipulative huh?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/bQjaedezBNDNvyyGHT/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/bQjaedezBNDNvyyGHT/giphy.gif" width="480" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All jokes aside, if there's one thing you need to remember, it's this: the process of predicting the next plausible "best" token is probabilistic in nature. This means that, &lt;strong&gt;LLMs can generate a variety of possible outputs for a given input, instead of always providing the same response&lt;/strong&gt;. It is exactly this non-deterministic nature of LLMs that makes them challenging to evaluate, as there's often more than one appropriate response.&lt;/p&gt;



&lt;h1&gt;
  
  
  Why do we need to evaluate LLM applications?
&lt;/h1&gt;

&lt;p&gt;When I say LLM applications, here are some examples of what I'm referring to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chatbots&lt;/strong&gt;: For customer support, virtual assistants, or general conversational agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Assistance&lt;/strong&gt;: Suggesting code completions, fixing code errors, or helping with debugging.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legal Document Analysis&lt;/strong&gt;: Helping legal professionals quickly understand the essence of long contracts or legal texts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personalized Email Drafting&lt;/strong&gt;: Helping users draft emails based on context, recipient, and desired tone.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLM applications usually have one thing in common - they perform better when augmented with proprietary data to help with the task at hand. Want to build an internal chatbot that helps boost your employee's productivity? OpenAI certainly doesn't keep tabs on your company's internal data (hopefully 😥). &lt;/p&gt;

&lt;p&gt;This matters because it is now not only OpenAI's job to ensure ChatGPT is performing as expected ⚖️ but also yours to make sure your LLM application is generating the desired outputs by using the right prompt templates, data retrieval pipelines, model architecture (if you're fine-tuning), etc.&lt;/p&gt;

&lt;p&gt;Evaluation (I'll just call them evals from hereon) helps you measure how well your application is handling the task at hand. Without evals, you will be introducing unnoticed breaking changes and would have to manually inspect all possible LLM outputs each time you iterate on your application 👀 which to me sounds like a terrible idea 💀&lt;/p&gt;



&lt;h1&gt;
  
  
  How to evaluate LLM outputs
&lt;/h1&gt;

&lt;p&gt;There are two ways everyone should know about when it comes to evals - with and without ChatGPT. &lt;/p&gt;
&lt;h2&gt;
  
  
  Evals without ChatGPT
&lt;/h2&gt;

&lt;p&gt;A nice way to evaluate LLM outputs without using ChatGPT is using other machine learning models derived from the field of NLP. You can use specific models to judge your outputs on different metrics such as factual correctness, relevancy, biasness, and helpfulness (just to name a few, but the list goes on), despite non-deterministic outputs.&lt;/p&gt;

&lt;p&gt;For example, we can use natural language inference (NLI) models (which outputs an entailment score) to determine how factually correct a response is based on some provided context. The higher the entailment score, the more factually correct an output is, which is particularity helpful if you're evaluating a long output that's not so black and white in terms of factual correctness.&lt;/p&gt;

&lt;p&gt;You might also wonder how can these models possibly "know" whether a piece of text is factually correct 🤔 It turns out you can provide context to these models for them to take at face value 🥳 In fact, we call these context &lt;strong&gt;ground truths&lt;/strong&gt; or &lt;strong&gt;references&lt;/strong&gt;. A collection of these references are often referred to an evaluation dataset.&lt;/p&gt;

&lt;p&gt;But not all metrics require references. For example, relevancy can be calculated using cross-encoder models (another ML model), and all you need is supply the input and output for it to determine how relevant they are to each another.&lt;/p&gt;

&lt;p&gt;Off the top of my head, here's a list of reference-less metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;relevancy&lt;/li&gt;
&lt;li&gt;bianess&lt;/li&gt;
&lt;li&gt;toxicity&lt;/li&gt;
&lt;li&gt;helpfulness&lt;/li&gt;
&lt;li&gt;harmlessness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And here is a list of reference based metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;factual correctness&lt;/li&gt;
&lt;li&gt;conceptual similarity &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that reference based metrics doesn't require you to provide the initial input, as it only judges the output based on the provided context.&lt;/p&gt;
&lt;h2&gt;
  
  
  Using ChatGPT for Evals
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzscwwk58fwpkkg29l64x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzscwwk58fwpkkg29l64x.png" alt="Image description" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's a new emerging trend to use state-of-the-art (aka ChatGPT) LLMs to evaluate themselves or even other others LLMs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;G-Eval is a recently developed framework that uses LLMs for evals.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I'll attach an image from the &lt;a href="https://arxiv.org/pdf/2303.16634.pdf" rel="noopener noreferrer"&gt;research paper that introduced G-eval&lt;/a&gt; below, but in a nutshell G-Eval is a two part process - the first generates evaluation steps, and the second uses the generated evaluation steps to output a final score.&lt;/p&gt;

&lt;p&gt;Let's run though a concrete example. Firstly, to generate evaluation steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;introduce an evaluation task to ChatGPT (eg. rate this summary from 1 - 5 based on relevancy)&lt;/li&gt;
&lt;li&gt;introduce an evaluation criteria (eg. Relevancy will based on the collective quality of all sentences)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once the evaluation steps has been generated:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;concatenate the input, evaluation steps, context, and the actual output&lt;/li&gt;
&lt;li&gt;ask it to generate a score between 1 - 5, where 5 is better than 1&lt;/li&gt;
&lt;li&gt;(Optional) take the probabilities of the output tokens from the LLM to normalize the score and take their weighted summation as the final result&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step 3 is actually pretty complicated 🙃 because to get the probability of the output tokens, you would typically need access to the raw model outputs, not just the final generated text. This step was introduced in the paper because it offers more fine-grained scores that better reflect the quality of outputs.&lt;/p&gt;

&lt;p&gt;Here's a diagram taken from the paper that can help you visualize what we've learnt:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcjlzg9wh6dbpwenjty2f.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcjlzg9wh6dbpwenjty2f.jpg" alt="Image description" width="800" height="573"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Utilizing GPT-4 with G-Eval outperformed traditional metrics in areas such as coherence, consistency, fluency, and relevancy 😳 but, evaluations using LLMs can often be very expensive.&lt;/p&gt;

&lt;p&gt;So, my recommendation would be to evaluate with G-Eval as a starting point to establish a performance standard and then transition to more cost-effective traditional methods where suitable.&lt;/p&gt;



&lt;h1&gt;
  
  
  Evaluating LLM outputs in python
&lt;/h1&gt;

&lt;p&gt;By now, you probably feel inundated by all the jargon and definitely wouldn't want to implement everything from scratch. Imagine having to research what's the best way to compute each individual metric, train your own model for it, and code up an evaluation framework... 😰&lt;/p&gt;

&lt;p&gt;Luckily, there are a few open source packages such as ragas and DeepEval that provides an evaluation framework so you don't have to write your own 😌&lt;/p&gt;

&lt;p&gt;As the cofounder of Confident (the company behind DeepEval), I'm going to go ahead and shamelessly show you how you can unit test your LLM applications using DeepEvals 😊 (but seriously, we have an amazing Pytest-like developer experience, easy to setup, and offer a free platform for you to visualize your evaluation results)&lt;/p&gt;

&lt;p&gt;Let's wrap things up with some coding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6udlfxpxh54ipk8qynj.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6udlfxpxh54ipk8qynj.gif" alt="Image description" width="500" height="281"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Setting up your test environment
&lt;/h2&gt;

&lt;p&gt;To implement our much anticipated evals, create a project folder and initialize a python virtual environment by running the code below in your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir evals-example
cd evals-example
python3 -m venv venv
source venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your terminal should now start something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(venv)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Installing dependencies
&lt;/h2&gt;

&lt;p&gt;Run the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install deepeval
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Setting your OpenAI API Key
&lt;/h2&gt;

&lt;p&gt;Lastly, set your OpenAI API key as an environment variable. We'll need OpenAI for G-Evals later (which basically means using LLMs for evaluation). In your terminal, paste in this with your own API key (get yours &lt;a href="https://openai.com/blog/openai-api" rel="noopener noreferrer"&gt;here&lt;/a&gt; if you don't already have one):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export OPENAI_API_KEY="your-api-key-here"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Writing your first test file
&lt;/h2&gt;

&lt;p&gt;Let's create a file called &lt;code&gt;test_evals.py&lt;/code&gt; (note that test files must start with "test"):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;touch test_evals.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paste in the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.metrics.factual_consistency&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FactualConsistencyMetric&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.metrics.answer_relevancy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AnswerRelevancyMetric&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.metrics.conceptual_similarity&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConceptualSimilarityMetric&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.metrics.llm_eval&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLMEvalMetric&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.test_case&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLMTestCase&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepeval.run_test&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;assert_test&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_factual_correctness&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What if these shoes don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t fit?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;All customers are eligible for a 30 day full refund at no extra costs.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;We offer a 30-day full refund at no extra costs.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;factual_consistency_metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FactualConsistencyMetric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minimum_score&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;test_case&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMTestCase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;assert_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_case&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;factual_consistency_metric&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_relevancy&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What does your company do?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Our company specializes in cloud computing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;relevancy_metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AnswerRelevancyMetric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minimum_score&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;test_case&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMTestCase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;assert_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_case&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;relevancy_metric&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_conceptual_similarity&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What did the cat do?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The cat climbed up the tree&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The cat ran up the tree.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;conceptual_similarity_metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConceptualSimilarityMetric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minimum_score&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;test_case&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMTestCase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;assert_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_case&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;conceptual_similarity_metric&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_humor&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;make_chat_completion_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChatCompletion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-3.5-turbo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write me something funny related to programming&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Why did the programmer quit his job? Because he didn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t get arrays!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;llm_metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMEvalMetric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;criteria&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How funny it is&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;completion_function&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;make_chat_completion_request&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;test_case&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMTestCase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;assert_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_case&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;llm_metric&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now run the test file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;deepeval test run test_evals.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For each of the test cases, there is a predefined metric provided by DeepEval, and each of these metrics output a score from 0 - 1. For example, &lt;code&gt;FactualConsistencyMetric(minimum_score=0.5)&lt;/code&gt; means we want to evaluate how factually correct an output is, where the &lt;code&gt;minimum_score=0.5&lt;/code&gt; means the test will only pass if the output score is higher than a 0.5 threshold. &lt;/p&gt;

&lt;p&gt;Let's go over the test cases one by one:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;test_factual_correctness&lt;/code&gt; tests how factually correct your LLM output is relative to the provided context. &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;test_relevancy&lt;/code&gt; tests how relevant the output is relative to the given input.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;test_conceptual_similarity&lt;/code&gt; tests how conceptually similar the LLM output is relative to the expected output.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;test_humor&lt;/code&gt; tests how funny your LLM output is. This test case is the only test case that uses ChatGPT for evaluation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Notice how there's up to 4 moving parameters for a single test case - the input, the expected output, the actual output (of your application), and the context (that was used to generate the actual output). Depending on the metric you're testing, some parameters are optional, while some are mandatory. &lt;/p&gt;

&lt;p&gt;Lastly, what if you want to test more than a metric on the same input? Here's how you can aggregate metrics on a single test case:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_everything&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What did the cat do?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The cat climbed up the tree&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The cat ran up the tree.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The cat ran up the tree.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;conceptual_similarity_metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConceptualSimilarityMetric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minimum_score&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;relevancy_metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AnswerRelevancyMetric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minimum_score&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;factual_consistency_metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FactualConsistencyMetric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minimum_score&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;test_case&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMTestCase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;assert_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_case&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;conceptual_similarity_metric&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relevancy_metric&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;factual_consistency_metric&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not so hard after all huh? Write enough of these (10-20), and you'll have much better control over what you're building 🤗&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PS. And here's a bonus feature DeepEval offers: free web platform for you to view data on all your test runs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Try running the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;deepeval login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Follow the instructions (login, get your API key, paste it in the CLI), and run this again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;deepeval test run test_example.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let me know what happens!&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;In this article, you've learnt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how ChatGPT work&lt;/li&gt;
&lt;li&gt;examples of LLM applications&lt;/li&gt;
&lt;li&gt;why it's hard to evaluate LLM outputs&lt;/li&gt;
&lt;li&gt;how to evaluate LLM outputs in python&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With evals, you can stop making breaking changes to your LLM application ✅ quickly iterate on your implementation to improve on metrics you care about ✅ and most importantly be confident in the LLM application you build 😇&lt;/p&gt;

&lt;p&gt;If you enjoyed this article, don't forget to &lt;a href="https://github.com/confident-ai/deepeval" rel="noopener noreferrer"&gt;give us a star on GitHub!&lt;/a&gt; The source code for this tutorial is available here:&lt;br&gt;
&lt;a href="https://github.com/confident-ai/blog-examples/tree/main/evals-example" rel="noopener noreferrer"&gt;https://github.com/confident-ai/blog-examples/tree/main/evals-example&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thank you for reading, and till next time 🫡&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/DMHEccCwpNxCQBZlvQ/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/DMHEccCwpNxCQBZlvQ/giphy.gif" width="480" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>programming</category>
      <category>opensource</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to build a PDF QA chatbot using OpenAI and ChromaDB 🤗</title>
      <dc:creator>Jeffrey Ip</dc:creator>
      <pubDate>Tue, 26 Sep 2023 10:38:37 +0000</pubDate>
      <link>https://dev.to/confidentai/how-to-build-a-pdf-qa-chatbot-using-openai-and-chromadb-4mcj</link>
      <guid>https://dev.to/confidentai/how-to-build-a-pdf-qa-chatbot-using-openai-and-chromadb-4mcj</guid>
      <description>&lt;h1&gt;
  
  
  TL;DR
&lt;/h1&gt;

&lt;p&gt;In this article, you'll learn how to build a RAG based chatbot to chat with any PDF of your choice so you can achieve your lifelong dream of talking to PDFs 😏 In the end, I'll also show how you can test what you've built ✅&lt;/p&gt;

&lt;p&gt;I know, I wrote something similar in my &lt;a href="https://www.confident-ai.com/blog/building-a-customer-support-chatbot-using-gpt-3-5-and-llamaindex" rel="noopener noreferrer"&gt;last article on building a customer support chatbot&lt;/a&gt; 😅 but this week we're going to dive deep into how to use the raw OpenAI API to chat with PDF data (including text trapped in visuals like tables) stored in ChromaDB, as well as how to use Streamlit to build the chatbot UI.&lt;/p&gt;

&lt;h1&gt;
  
  
  A small request 🙏🏻
&lt;/h1&gt;

&lt;p&gt;I'm trying to get DeepEval to &lt;strong&gt;5k stars&lt;/strong&gt; by the end of 2023, can you please help me out by starring my repo? It helps me create more weekly high quality content ❤️ thank you very very much!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" rel="noopener noreferrer"&gt;https://github.com/confident-ai/deepeval&lt;/a&gt;&lt;br&gt;
&lt;a href="https://i.giphy.com/media/5xtDarmwsuR9sDRObyU/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/5xtDarmwsuR9sDRObyU/giphy.gif" width="443" height="250"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Introducing RAG, Vector Databases, and OCR
&lt;/h1&gt;

&lt;p&gt;Before we dive into the code, let's debunk what we're going to implement 🕵️ To begin, &lt;strong&gt;OCR&lt;/strong&gt; (Optical Character Recognition) is a technology within the field of computer vision that recognizes the characters present in the document and converts them into text - this is particularly helpful in the case of tables and charts in documents 😬 We'll be using OCR provided by Azure Cognitive Services in this tutorial.&lt;/p&gt;

&lt;p&gt;Once text chunks are extracted using OCR, they are converted into a high-dimensional vector (aka. vectorized) using embedding models like Word2Vec, FastText, or BERT. These vectors, which encapsulate the semantic meaning of the text, are then indexed in a &lt;strong&gt;vector database&lt;/strong&gt;. We'll be using ChromaDB as our in-memory vector database 🥳&lt;/p&gt;

&lt;p&gt;Now, let's see what happens when a user asks their PDF something. First, the user query is first vectorized using the same embedding model used to vectorize the extracted PDF text chunks. Then, the top K most semantically similar text chunk is fetched by searching through the vector database, which remember, contains the text chunks from our PDF. The retrieved text chunks are then provided as context for ChatGPT to generate an answer based on information in their PDF. This is the process of &lt;strong&gt;retrieval, augmented, generation (RAG)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/QVt7Jq9Sd7ZraYohxi/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/QVt7Jq9Sd7ZraYohxi/giphy.gif" width="480" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feeling educated? 😊 Let's begin. &lt;/p&gt;
&lt;h1&gt;
  
  
  Project Setup
&lt;/h1&gt;

&lt;p&gt;First, I'm going to guide you through how to set up your project folders and any dependencies you need to install.&lt;/p&gt;

&lt;p&gt;Create a project folder and a python virtual environment by running the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir chat-with-pdf
cd chat-with-pdf
python3 -m venv venv
source venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your terminal should now start something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(venv)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Installing dependencies
&lt;/h2&gt;

&lt;p&gt;Run the following command to install OpenAI API, ChromaDB, and Azure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install openai chromadb azure-ai-formrecognizer streamlit tabulate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's briefly go over what each of those package does: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;streamlit&lt;/code&gt; - sets up the chat UI, which includes a PDF uploader (thank god 😌)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;azure-ai-formrecognizer&lt;/code&gt; - extracts textual content from PDFs using OCR &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;chromadb&lt;/code&gt; - is an in-memory vector database that stores the extracted PDF content&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;openai&lt;/code&gt; - we all know what this does (receives relevant data from chromadb and returns a response based on your chatbot input)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next, create a new &lt;code&gt;main.py&lt;/code&gt; file - the entry point to your application&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;touch main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Getting your API keys
&lt;/h2&gt;

&lt;p&gt;Lastly, get your OpenAI and Azure API key ready (click the hyperlink to get them if you don't already have one)&lt;/p&gt;

&lt;p&gt;Note: It's pretty troublesome to sign up for an account on Azure Cognitive Services. You'll need a card (although they won't charge you automatically), and phone number 😔 but do give it a try if you're trying to build something serious!&lt;/p&gt;

&lt;h1&gt;
  
  
  Building the Chatbot UI with Streamlit
&lt;/h1&gt;

&lt;p&gt;Streamlit is an easy way to build frontend applications using python.&lt;/p&gt;

&lt;p&gt;Lets import streamlit along with setting up everything else we'll need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import streamlit as st
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
from tabulate import tabulate
from chromadb.utils import embedding_functions
import chromadb
import openai

# You'll need this client later to store PDF data
client = chromadb.Client()
client.heartbeat()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Give our chat UI a title and create a file uploader:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
st.write("#Chat with PDF")

uploaded_file = st.file_uploader("Choose a PDF file", type="pdf")
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Listen for a change event in &lt;code&gt;uploaded_file&lt;/code&gt;. This will be triggered when you upload a file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
if uploaded_file is not None:
    # Create a temporary file to write the bytes to
    with open("temp_pdf_file.pdf", "wb") as temp_file:
        temp_file.write(uploaded_file.read())
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;View your streamlit app by running &lt;code&gt;main.py&lt;/code&gt; (we'll implement the chat input UI later):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;streamlit run main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the easy part done 🥳! Next comes the not so easy part...&lt;/p&gt;

&lt;h1&gt;
  
  
  Extracting text from PDFs
&lt;/h1&gt;

&lt;p&gt;Carrying on from the previous code snippet, we're going to send &lt;code&gt;temp_file&lt;/code&gt; to Azure Cognitive Services for OCR:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    ...
    # you can set this up in the azure cognitive services portal
    AZURE_COGNITIVE_ENDPOINT = "your-custom-azure-api-endpoint"
    AZURE_API_KEY = "your-azure-api-key"
    credential = AzureKeyCredential(AZURE_API_KEY)
    AZURE_DOCUMENT_ANALYSIS_CLIENT = DocumentAnalysisClient(AZURE_COGNITIVE_ENDPOINT, credential)

    # Open the temporary file in binary read mode and pass it to Azure
    with open("temp_pdf_file.pdf", "rb") as f:
        poller = AZURE_DOCUMENT_ANALYSIS_CLIENT.begin_analyze_document("prebuilt-document", document=f)
        doc_info = poller.result().to_dict()
    ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;dict_info&lt;/code&gt; is a dictionary containing information on the extracted text chunks. It's a pretty complicated dictionary, so I would recommend printing it out and seeing for yourself what it looks like.&lt;/p&gt;

&lt;p&gt;Paste in the following to finish processing the data received from Azure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   ...
   res = []
   CONTENT = "content"
   PAGE_NUMBER = "page_number"
   TYPE = "type"
   RAW_CONTENT = "raw_content"
   TABLE_CONTENT = "table_content"

   for p in doc_info['pages']:
        dict = {}
        page_content = " ".join([line["content"] for line in p["lines"]])
        dict[CONTENT] = str(page_content)
        dict[PAGE_NUMBER] = str(p["page_number"])
        dict[TYPE] = RAW_CONTENT
        res.append(dict)

    for table in doc_info["tables"]:
        dict = {}
        dict[PAGE_NUMBER] = str(table["bounding_regions"][0]["page_number"])
        col_headers = []
        cells = table["cells"]
        for cell in cells:
            if cell["kind"] == "columnHeader" and cell["column_span"] == 1:
                for _ in range(cell["column_span"]):
                    col_headers.append(cell["content"])

        data_rows = [[] for _ in range(table["row_count"])]
        for cell in cells:
            if cell["kind"] == "content":
                for _ in range(cell["column_span"]):
                    data_rows[cell["row_index"]].append(cell["content"])
        data_rows = [row for row in data_rows if len(row) &amp;gt; 0]

        markdown_table = tabulate(data_rows, headers=col_headers, tablefmt="pipe")
        dict[CONTENT] = markdown_table
        dict[TYPE] = TABLE_CONTENT
        res.append(dict)
    ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we accessed various properties of the dictionary returned by Azure to get texts on the page, and data stored in tables. The logic is pretty complex because of all the nested structures 😨 but from personal experience, Azure OCR works well even for complex PDF structures, so I highly recommend giving it a try :)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/eq8uOgcZ95PV5P97vq/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/eq8uOgcZ95PV5P97vq/giphy.gif" width="480" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Storing PDF content in ChromaDB
&lt;/h1&gt;

&lt;p&gt;Still with me? 😅 Great, we're almost there so hang in there!&lt;/p&gt;

&lt;p&gt;Paste in the code below to store extracted text chunks from &lt;code&gt;res&lt;/code&gt; in ChromaDB.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    ...
    try:
        client.delete_collection(name="my_collection")
        st.session_state.messages = []
    except:
        print("Hopefully you'll never see this error.")

    openai_ef = embedding_functions.OpenAIEmbeddingFunction(api_key="your-openai-api-key", model_name="text-embedding-ada-002")
    collection = client.create_collection(name="my_collection", embedding_function=openai_ef)
    data = []
    id = 1
    for dict in res:
        content = dict.get(CONTENT, '')
        page_number = dict.get(PAGE_NUMBER, '')
        type_of_content = dict.get(TYPE, '')

        content_metadata = {   
            PAGE_NUMBER: page_number,
            TYPE: type_of_content
        }

        collection.add(
            documents=[content],
            metadatas=[content_metadata],
            ids=[str(id)]
        )
        id += 1
    ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first try block ensures that we can continue uploading PDFs without having to refresh the page. &lt;/p&gt;

&lt;p&gt;You might have noticed that we add data into a &lt;code&gt;collection&lt;/code&gt; and not to the database directly. A collection in ChromaDB is a vector space. When a user enters a query, it performs a search inside this collection, instead of the entire database. In Chroma, this collection is identified by a unique &lt;code&gt;name&lt;/code&gt;, and with a simple line of code, you can add all extracted text chunks via to this collection via &lt;code&gt;collection.add(...)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/aTGEmpKWeojBs7BCSC/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/aTGEmpKWeojBs7BCSC/giphy.gif" width="480" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Generating a response using OpenAI
&lt;/h1&gt;

&lt;p&gt;I get asked a lot about how to build a RAG chatbot without relying on frameworks like langchain and lLamaIndex. Well here's how you do it - you construct a list of prompts dynamically based on the retrieved results from your vector database. &lt;/p&gt;

&lt;p&gt;Paste in the following code to wrap things up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat messages from history on app rerun
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

if prompt := st.chat_input("What do you want to say to your PDF?"):
    # Display your message
    with st.chat_message("user"):
        st.markdown(prompt)
    # Add your message to chat history
    st.session_state.messages.append({"role": "user", "content": prompt})

    # query ChromaDB based on your prompt, taking the top 5 most relevant result. These results are ordered by similarity.
    q = collection.query(
        query_texts=[prompt],
        n_results=5,
    )
    results = q["documents"][0]

    prompts = []
    for r in results:
        # construct prompts based on the retrieved text chunks in results 
        prompt = "Please extract the following: " + prompt + "  solely based on the text below. Use an unbiased and journalistic tone. If you're unsure of the answer, say you cannot find the answer. \n\n" + r

        prompts.append(prompt)
    prompts.reverse()

    openai_res = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "assistant", "content": prompt} for prompt in prompts],
        temperature=0,
    )

    response = openai_res["choices"][0]["message"]["content"]
    with st.chat_message("assistant"):
        st.markdown(response)

    # append the response to chat history
    st.session_state.messages.append({"role": "assistant", "content": response})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice how we reversed &lt;code&gt;prompts&lt;/code&gt; after constructing a list of prompts according to the list of retrieved text chunks from ChromaDB. This is because the results returned from ChromaDB is ordered in descending order, meaning the most relevant text chunk will always be the first in the results list. However, the way ChatGPT works is it considers the last prompt in a list of prompts more, hence why we have to reverse it.&lt;/p&gt;

&lt;p&gt;Run the streamlit app and try things out for yourself 😙:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;streamlit run main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🎉 Congratulations, you made it to the end! &lt;/p&gt;

&lt;h1&gt;
  
  
  Taking it a step further
&lt;/h1&gt;

&lt;p&gt;As you know, LLM applications are a black box and so for production use cases, you'll want to safeguard the performance of your PDF chatbot to keep your users happy. To learn how to build a simple evaluation framework that could get you setup in less than 30 minutes, &lt;a href="https://www.confident-ai.com/blog/how-to-evaluate-llm-applications" rel="noopener noreferrer"&gt;click here.&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;In this article, you've learnt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what a vector database is a how to use ChromaDB&lt;/li&gt;
&lt;li&gt;how to use the raw OpenAI API to build a RAG based chatbot without relying on 3rd party frameworks&lt;/li&gt;
&lt;li&gt;what OCR is and how to use Azure's OCR services&lt;/li&gt;
&lt;li&gt;how to quickly set up a beautiful chatbot UI using streamlit, which includes a file uploader. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This tutorial walked you through an example of how you can build a "chat with PDF" application using just Azure OCR, OpenAI, and ChromaDB. With what you've learnt, you can build powerful applications that help increase the productivity of workforces (at least that's the most prominent use case I've came across). &lt;/p&gt;

&lt;p&gt;The source code for this tutorial is available here:&lt;br&gt;
&lt;a href="https://github.com/confident-ai/blog-examples/tree/main/chat-with-pdf" rel="noopener noreferrer"&gt;https://github.com/confident-ai/blog-examples/tree/main/chat-with-pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thank you for reading!&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>programming</category>
      <category>chatgpt</category>
      <category>python</category>
    </item>
    <item>
      <title>Building a customer support chatbot using GPT-3.5 and lLamaIndex🚀</title>
      <dc:creator>Jeffrey Ip</dc:creator>
      <pubDate>Tue, 19 Sep 2023 07:18:42 +0000</pubDate>
      <link>https://dev.to/confidentai/building-a-customer-support-chatbot-using-gpt-35-and-llamaindex-3d1l</link>
      <guid>https://dev.to/confidentai/building-a-customer-support-chatbot-using-gpt-35-and-llamaindex-3d1l</guid>
      <description>&lt;h1&gt;
  
  
  TL;DR
&lt;/h1&gt;

&lt;p&gt;In this article, you'll learn how to create a customer support chatbot using GPT-3.5 and lLamaIndex. Also, stay tuned for bonus tips and tricks on how to evaluate your chatbot at the end of this article :)&lt;/p&gt;

&lt;h1&gt;
  
  
  A small request 🥺
&lt;/h1&gt;

&lt;p&gt;I produce weekly content and your support would really help me continue. Please support me and my company &lt;a href="https://www.confident-ai.com" rel="noopener noreferrer"&gt;Confident AI&lt;/a&gt; by starring our GitHub library. We're building a platform to unit test your chatbot. Thank you very very much! ❤️&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/confident-ai/deepeval" rel="noopener noreferrer"&gt;https://github.com/confident-ai/deepeval&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Introducing OpenAI API, and lLamaindex
&lt;/h1&gt;

&lt;p&gt;In this tutorial, we're going to use GPT-3.5 provided by the OpenAI API. GPT-3.5 is a machine learning model and is like a super-smart computer buddy made by OpenAI. It's been trained with tons of data from the internet so it can chat, answer questions, and help with all sorts of language tasks.&lt;/p&gt;

&lt;p&gt;But, you might wonder, can raw, out-of-the-box GPT-3.5 answer customer support questions that are specific to my own internal data?&lt;/p&gt;

&lt;p&gt;Unfortunately, the answer is no 😔 because as you may know, GPT models have only been trained on public data up until 2021. This is precisely why we need open source frameworks like lLamaIndex! These frameworks help connect your internal data sources with GPT-3.5, so your chatbot can output tailored responses based on data that regular ChatGPT don't know about 😊&lt;/p&gt;

&lt;p&gt;Pretty cool, huh? Lets begin.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/QJvwBSGaoc4eI/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/QJvwBSGaoc4eI/giphy.gif" width="500" height="363"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Project Setup 🚀
&lt;/h1&gt;

&lt;p&gt;First, I'll guide you through how to set up a project for your chatbot.&lt;/p&gt;

&lt;p&gt;Create the project folder and a python virtual environment by running the code below in terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir customer-support-chatbot
cd customer-support-chatbot
python3 -m venv venv
source venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your terminal should now start something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(venv)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Installing dependencies
&lt;/h2&gt;

&lt;p&gt;Run the following code to install lLamaIndex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install llama-index
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that we don't require &lt;code&gt;openai&lt;/code&gt; because lLamaIndex already provides a wrapper to call the OpenAI API under the hood.&lt;/p&gt;

&lt;p&gt;Create a new &lt;code&gt;main.py&lt;/code&gt; file - the entry point to your chatbot, and &lt;code&gt;chatbot.py&lt;/code&gt; - your chatbot implementation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;touch main.py chatbot.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Setting up your internal knowledge base
&lt;/h2&gt;

&lt;p&gt;Create a new &lt;code&gt;data.txt&lt;/code&gt; file in a new &lt;code&gt;data&lt;/code&gt; folder that would contain fake data on MadeUpCompany:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir data
cd data
touch data.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This file will contain the data that your chatbot is going to base its responses on. Luckily for us, ChatGPT prepared some fake information on MadeUpCompany 😌 Paste the following text in &lt;code&gt;data.txt&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;About MadeUpCompany
MadeUpCompany is a pioneering technology firm founded in 2010, specializing in cloud computing, data analytics, and machine learning. Our headquarters is based in San Francisco, California, with satellite offices spread across New York, London, and Tokyo. We are committed to offering state-of-the-art solutions that help businesses and individuals achieve their full potential. With a diverse team of experts from various industries, we strive to redefine the boundaries of innovation and efficiency.

Products and Services
We offer a suite of services ranging from cloud storage solutions, data analytics platforms, to custom machine learning models tailored for specific business needs. Our most popular product is CloudMate, a cloud storage solution designed for businesses of all sizes. It offers seamless data migration, top-tier security protocols, and an easy-to-use interface. Our data analytics service, DataWiz, helps companies turn raw data into actionable insights using advanced algorithms.

Pricing
We have a variety of pricing options tailored to different needs. Our basic cloud storage package starts at $9.99 per month, with premium plans offering more storage and functionalities. We also provide enterprise solutions on a case-by-case basis, so it’s best to consult with our sales team for customized pricing.

Technical Support
Our customer support team is available 24/7 to assist with any technical issues. We offer multiple channels for support including live chat, email, and a toll-free number. Most issues are typically resolved within 24 hours. We also have an extensive FAQ section on our website and a community forum for peer support.

Security and Compliance
MadeUpCompany places the utmost importance on security and compliance. All our products are GDPR compliant and adhere to the highest security standards, including end-to-end encryption and multi-factor authentication.

Account Management
Customers can easily manage their accounts through our online portal, which allows you to upgrade your service, view billing history, and manage users in your organization. If you encounter any issues or have questions about your account, our account management team is available weekdays from 9 AM to 6 PM.

Refund and Cancellation Policy
We offer a 30-day money-back guarantee on all our products. If you're not satisfied for any reason, you can request a full refund within the first 30 days of your purchase. After that, you can still cancel your service at any time, but a prorated refund will be issued based on the remaining term of your subscription.

Upcoming Features
We’re constantly working to improve our services and offer new features. Keep an eye out for updates on machine learning functionalities in DataWiz and more collaborative tools in CloudMate in the upcoming quarters.

Your customer support staff can use these paragraphs to build their responses to customer inquiries, providing both detailed and precise information to address various questions.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lastly, navigate back to &lt;code&gt;customer-support-chatbot&lt;/code&gt; containing &lt;code&gt;main.py&lt;/code&gt;, and set your OpenAI API key as an environment variable. In your terminal, paste in this with your own API key (get yours &lt;a href="https://openai.com/blog/openai-api" rel="noopener noreferrer"&gt;here&lt;/a&gt; if you don't already have one):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export OPENAI_API_KEY="your-api-key-here"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All done! Let's start coding.&lt;/p&gt;

&lt;h1&gt;
  
  
  Building a Chatbot with lLamaIndex 🦄
&lt;/h1&gt;

&lt;p&gt;To begin, we first have to chunk and index the text we have in &lt;code&gt;data.txt&lt;/code&gt; to a format that's readable for GPT-3.5. So you might wonder, what do you mean by "readable"? 🤯&lt;/p&gt;

&lt;p&gt;Well, GPT-3.5 has something called a context limit, which refers to how much text the model can "see" or consider at one time. Think of it like the model's short-term memory. If you give it a really long paragraph or a big conversation history, it might reach its limit and not be able to add much more to it. If you hit this limit, you might have to shorten your text so the model can understand and respond properly. &lt;/p&gt;

&lt;p&gt;In addition, GPT-3.5 performs worse if you supply it with way too much text, kind of how someone loses focus if you tell too long a story. This is exactly where lLamaIndex shines 🦄 llamaindex helps us breakdown large bodies of text into chunks that can be consumed by GPT-3.5 🥳&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/o1i0XbXsqd4CALDbnj/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/o1i0XbXsqd4CALDbnj/giphy.gif" width="480" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In a few lines of code, we can build our chatbot using lLamaIndex. Everything from chunking the text from &lt;code&gt;data.txt&lt;/code&gt;, to calling the OpenAI APIs, is handled by lLamaIndex. Paste in the following code in &lt;code&gt;chatbot.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

def query(user_input):
    return query_engine.query(user_input).response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the following in &lt;code&gt;main.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from chatbot import query

while True:
    user_input = input("Enter your question: ")
    response = query(user_input)
    print("Bot response:", response)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now try it for yourself by running the code below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Feel free to switch out the text in &lt;code&gt;data/data.txt&lt;/code&gt; with your own knowledge base!&lt;/p&gt;

&lt;h1&gt;
  
  
  Improving your Chatbot
&lt;/h1&gt;

&lt;p&gt;You might start to run into situations where the chatbot isn't performing as well as you hope for certain questions/inputs. Luckily there are several ways to improve your chatbot 😊&lt;/p&gt;

&lt;h2&gt;
  
  
  Parsing your data into smaller/bigger chunks
&lt;/h2&gt;

&lt;p&gt;The quality of output from your chatbot is directly affected by the size of text chunks (scroll down for a better explanation why). &lt;/p&gt;

&lt;p&gt;In &lt;code&gt;chatbot.py&lt;/code&gt;, Add &lt;code&gt;service_context = ServiceContext.from_defaults(chunk_size=1000)&lt;/code&gt; to &lt;code&gt;VectorStoreIndex&lt;/code&gt; to alter the chunk size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(chunk_size=1000)
documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

def query(user_input):
    return query_engine.query(user_input).response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Play around with the size parameter to find what works best :)&lt;/p&gt;

&lt;h2&gt;
  
  
  Providing more context to GPT-3.5
&lt;/h2&gt;

&lt;p&gt;Depending on your data, you might benefit from supply a lesser/greater number of text chunks to GPT-3.5. Here's how you can do it through &lt;code&gt;query_engine = index.as_query_engine(similarity_top_k=5)&lt;/code&gt; in &lt;code&gt;chatbot.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(chunk_size=1000)
documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=5)

def query(user_input):
    return query_engine.query(user_input).response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Evaluating your chatbot ⚖️
&lt;/h1&gt;

&lt;p&gt;By now, you might have ran into the problem of eyeballing your chatbot output. You make a small configurational change, such as changing the number of retrieved text chunks, run &lt;code&gt;main.py&lt;/code&gt;, type in the same old query, and wait 5 seconds to see if the result has gotten any better 😰 Sounds familiar? &lt;/p&gt;

&lt;p&gt;The problem becomes worse if you want to inspect outputs from not just one, but several different queries. &lt;a href="https://www.confident-ai.com/blog/how-to-evaluate-llm-applications" rel="noopener noreferrer"&gt;Here is a great read on how you can build your own evaluation framework in less than 20 minutes&lt;/a&gt;, but if you'd prefer to not reinvent the wheel, consider using a free open source packages like &lt;a href="https://github.com/confident-ai/deepeval" rel="noopener noreferrer"&gt;DeepEval&lt;/a&gt;. It helps you evaluate your chatbot so you don't have to do it yourself 😌&lt;/p&gt;

&lt;p&gt;Since I'm slightly biased as the cofounder of Confident AI (which is the company behind DeepEval), I'm going to go ahead and show you how DeepEval can help with evaluating your chatbot (no but seriously, we offer unit testing for chatbots, have a stellar developer experience, and a &lt;a href="https://app.confident-ai.com" rel="noopener noreferrer"&gt;free platform&lt;/a&gt; for you to holistically view your chatbot's performance 🥵)&lt;/p&gt;

&lt;p&gt;Install by running the code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install deepeval
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a new test file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;touch test_chatbot.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paste in the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pytest
from deepeval.metrics.factual_consistency import FactualConsistencyMetric
from deepeval.test_case import LLMTestCase
from deepeval.run_test import assert_test
from chatbot import query

def test_1():
    input = "What does your company do?"
    output = query(input)
    context = "Our company specializes in cloud computing, data analytics, and machine learning. We offer a range of services including cloud storage solutions, data analytics platforms, and custom machine learning models."
    factual_consistency_metric = FactualConsistencyMetric(minimum_score=0.7)
    test_case = LLMTestCase(output=output, context=context)
    assert_test(test_case, [factual_consistency_metric])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;deepeval test run test_chatbot.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your test should have passed! Let's breakdown what happened. The variable &lt;code&gt;query&lt;/code&gt; mimics a user input, and &lt;code&gt;output&lt;/code&gt; is what your chatbot outputs based on this query. The variable &lt;code&gt;context&lt;/code&gt; contains the relevant information from your knowledge base, and &lt;code&gt;FactualConsistencyMetric(minimum_score=0.7)&lt;/code&gt; is an out-of-the-box metric provided by DeepEval for your to assess how factually correct your chatbot's output is based on the provided context. This score ranges from 0 - 1, which the &lt;code&gt;minimum_score=0.7&lt;/code&gt; ultimately determines if your test have passed or not.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/aNbGyHcDYphNbhe4EE/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/aNbGyHcDYphNbhe4EE/giphy.gif" width="480" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Add more tests to stop wasting time on fixing breaking changes to your chatbot 😙&lt;/p&gt;

&lt;h1&gt;
  
  
  How does your chatbot work under the hood?
&lt;/h1&gt;

&lt;p&gt;The chatbot we just built actually relies on an architecture called &lt;strong&gt;Retrieval Augmented Generation (RAG)&lt;/strong&gt;. Retrieval Augmented Generation is a way to make GPT-3.5 smarter by letting it pull in fresh or specific information from an outside source, in this case &lt;code&gt;data.txt&lt;/code&gt;. So, when you ask it something, it can give you a more current and relevant answer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdah2071nsip7cvlpl5j7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdah2071nsip7cvlpl5j7.png" alt="RAG architecture" width="800" height="1519"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the previous two sections, we looked at how tweaking two parameters — text chunk size and number of text chunks used can impact the quality of answer you get from GPT-3.5. This is because when you ask your chatbot a question, lLamaIndex &lt;strong&gt;retrieves&lt;/strong&gt; the most relevant text chunks from &lt;code&gt;data.txt&lt;/code&gt;, which GPT-3.5 will use to &lt;strong&gt;generate&lt;/strong&gt; a data &lt;strong&gt;augmented&lt;/strong&gt; answer.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;In this article, you've learnt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What OpenAI GPT-3.5 is,&lt;/li&gt;
&lt;li&gt;How to build a simple chatbot on your own data using lLamaIndex,&lt;/li&gt;
&lt;li&gt;how to improve the quality of your chatbot,&lt;/li&gt;
&lt;li&gt;how to evaluate your chatbot using Deepeval&lt;/li&gt;
&lt;li&gt;what is RAG and how it works,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This tutorial walks you through an example of a chatbot you can build using lLamaIndex and GPT-3.5. With lLamaIndex, you can create powerful personalized chatbots useful in various applications, such as customer support, user onboarding, sales enablement, and more 🥳&lt;/p&gt;

&lt;p&gt;The source code for this tutorial is available here:&lt;br&gt;
&lt;a href="https://github.com/confident-ai/blog-examples/tree/main/customer-support-chatbot" rel="noopener noreferrer"&gt;https://github.com/confident-ai/blog-examples/tree/main/customer-support-chatbot&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thank you for reading!&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>chatgpt</category>
      <category>programming</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
