<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Zilliz</title>
    <description>The latest articles on DEV Community by Zilliz (@zilliz).</description>
    <link>https://dev.to/zilliz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F7647%2Fc86e3bfa-c8ad-40ff-9d53-0fbb6b3108a6.png</url>
      <title>DEV Community: Zilliz</title>
      <link>https://dev.to/zilliz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zilliz"/>
    <language>en</language>
    <item>
      <title>How to Install and Run OpenClaw (Previously Clawdbot/Moltbot) on Mac: A Step-by-Step Tutorial</title>
      <dc:creator>Chloe Williams</dc:creator>
      <pubDate>Fri, 13 Feb 2026 10:03:05 +0000</pubDate>
      <link>https://dev.to/zilliz/how-to-install-and-run-openclaw-previously-clawdbotmoltbot-on-mac-a-step-by-step-tutorial-3jc5</link>
      <guid>https://dev.to/zilliz/how-to-install-and-run-openclaw-previously-clawdbotmoltbot-on-mac-a-step-by-step-tutorial-3jc5</guid>
      <description>&lt;p&gt;&lt;a href="https://openclaw.ai/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; is an open-source, self-hosted gateway that bridges your everyday messaging apps to AI coding agents. Instead of switching between tabs, apps, and interfaces, you send a message from WhatsApp or Telegram and get an AI-powered response right in your pocket. It's MIT licensed, runs on your hardware, and keeps you in full control of your data.&lt;/p&gt;

&lt;p&gt;In this tutorial, we'll walk through everything you need to install and run OpenClaw on macOS — from prerequisites to your first working chat.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Need
&lt;/h2&gt;

&lt;p&gt;Before you begin, make sure you have the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;macOS&lt;/strong&gt; (any recent version)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node.js 22 or newer&lt;/strong&gt; — the installer script will handle this for you if it's not already installed, but it's good to check&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An API key&lt;/strong&gt; — Anthropic is recommended by the OpenClaw team&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;About 5 minutes&lt;/strong&gt; of your time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To check your current Node version, open Terminal and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;node &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see &lt;code&gt;v22.x.x&lt;/code&gt; or higher, you're good to go. If not, don't worry — the installer will take care of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Install OpenClaw via the Installer Script (Recommended)
&lt;/h2&gt;

&lt;p&gt;The fastest way to install OpenClaw on macOS is the one-line installer script. It handles Node detection, CLI installation, and launches the onboarding wizard — all in one step.&lt;/p&gt;

&lt;p&gt;Open Terminal and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://openclaw.ai/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The script will download the CLI, install it globally via npm, and kick off the onboarding wizard automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alternative: Install via npm Directly
&lt;/h3&gt;

&lt;p&gt;If you already have Node 22+ and prefer manual control, you can install OpenClaw with npm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; openclaw@latest
openclaw onboard &lt;span class="nt"&gt;--install-daemon&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Alternative: Install via pnpm
&lt;/h3&gt;

&lt;p&gt;If pnpm is your package manager of choice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm add &lt;span class="nt"&gt;-g&lt;/span&gt; openclaw@latest
pnpm approve-builds &lt;span class="nt"&gt;-g&lt;/span&gt;   &lt;span class="c"&gt;# approve openclaw, node-llama-cpp, sharp, etc.&lt;/span&gt;
openclaw onboard &lt;span class="nt"&gt;--install-daemon&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; pnpm requires explicit approval for packages with build scripts. After the first install shows the "Ignored build scripts" warning, run &lt;code&gt;pnpm approve-builds -g&lt;/code&gt; and select the listed packages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Troubleshooting: &lt;code&gt;sharp&lt;/code&gt; Build Errors
&lt;/h3&gt;

&lt;p&gt;If you have &lt;code&gt;libvips&lt;/code&gt; installed globally (common on macOS via Homebrew) and &lt;code&gt;sharp&lt;/code&gt; fails during installation, force prebuilt binaries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;SHARP_IGNORE_GLOBAL_LIBVIPS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; openclaw@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see the error &lt;code&gt;sharp: Please add node-gyp to your dependencies&lt;/code&gt;, either install build tooling (Xcode Command Line Tools + &lt;code&gt;npm install -g node-gyp&lt;/code&gt;) or use the environment variable above.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Run the Onboarding Wizard
&lt;/h2&gt;

&lt;p&gt;If the installer script didn't automatically launch it, start the onboarding wizard manually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw onboard &lt;span class="nt"&gt;--install-daemon&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The wizard walks you through configuring auth, gateway settings, and optional channel connections. It also installs OpenClaw as a background service (daemon), so the Gateway stays running even after you close Terminal.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Wizard Configures
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Authentication&lt;/strong&gt; — generates a token for the Gateway so local and remote clients must authenticate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gateway settings&lt;/strong&gt; — port, bind address, and service installation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Channel connections&lt;/strong&gt; — optional setup for WhatsApp, Telegram, Discord, and more&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 3: Verify the Gateway Is Running
&lt;/h2&gt;

&lt;p&gt;Once onboarding completes, check that the Gateway is up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw gateway status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see confirmation that the Gateway is running. If you want to run it in the foreground for debugging or quick testing, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw gateway &lt;span class="nt"&gt;--port&lt;/span&gt; 18789
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a full health check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Open the Control UI (Dashboard)
&lt;/h2&gt;

&lt;p&gt;The fastest way to start chatting with OpenClaw is through the browser-based Control UI — no channel setup required.&lt;/p&gt;

&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This copies the dashboard URL, opens your browser if possible, and displays the link. By default, the Control UI is served at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://127.0.0.1:18789/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the dashboard prompts for authentication, paste the token from your Gateway config. You can retrieve it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw config get gateway.auth.token
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Security note:&lt;/strong&gt; The Control UI is an admin surface — it provides access to chat, configuration, and execution approvals. Do not expose it publicly. Stick to localhost, Tailscale Serve, or an SSH tunnel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Connect a Chat Channel (Optional)
&lt;/h2&gt;

&lt;p&gt;While the Control UI gives you instant access to chat, the real power of OpenClaw is messaging your AI agent from the apps you already use. Here's a quick overview of supported channels:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Channel&lt;/th&gt;
&lt;th&gt;Setup Complexity&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Telegram&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Easiest&lt;/td&gt;
&lt;td&gt;Simple bot token&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;WhatsApp&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Easy&lt;/td&gt;
&lt;td&gt;QR pairing required; stores more state on disk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Discord&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Bot API + Gateway&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;iMessage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Recommended via BlueBubbles macOS server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IRC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Classic IRC; channels + DMs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Slack&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Bolt SDK; workspace apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Signal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Privacy-focused; uses signal-cli&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Multiple channels can run simultaneously — configure as many as you want and OpenClaw routes messages per chat.&lt;/p&gt;

&lt;p&gt;Check out this post for: &lt;a href="https://milvus.io/blog/stepbystep-guide-to-setting-up-openclaw-previously-clawdbotmoltbot-with-slack.md?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;OpenClaw Tutorial: Connect to Slack for Local AI Assistant - Milvus Blog&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Example: Pair WhatsApp
&lt;/h3&gt;

&lt;p&gt;To connect WhatsApp, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw channels login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Follow the QR pairing flow, and you'll be able to message your AI agent directly from WhatsApp.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Send a Test Message
&lt;/h2&gt;

&lt;p&gt;With a channel configured, send a test message from the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw message send &lt;span class="nt"&gt;--target&lt;/span&gt; +15555550123 &lt;span class="nt"&gt;--message&lt;/span&gt; &lt;span class="s2"&gt;"Hello from OpenClaw"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace the phone number with your own. If everything is wired correctly, you'll see the message arrive in your messaging app — and OpenClaw's AI agent will respond.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optional: Build from Source
&lt;/h2&gt;

&lt;p&gt;For contributors or anyone who wants to run from a local checkout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/openclaw/openclaw.git
&lt;span class="nb"&gt;cd &lt;/span&gt;openclaw
pnpm &lt;span class="nb"&gt;install
&lt;/span&gt;pnpm ui:build
pnpm build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Link the CLI globally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm &lt;span class="nb"&gt;link&lt;/span&gt; &lt;span class="nt"&gt;--global&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run onboarding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw onboard &lt;span class="nt"&gt;--install-daemon&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For hot-reload during development, use &lt;code&gt;pnpm gateway:watch&lt;/code&gt; instead of the standard gateway command.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optional: macOS App Onboarding
&lt;/h2&gt;

&lt;p&gt;OpenClaw also offers a native macOS app (menu bar) with its own onboarding flow. If you're using the app:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Approve the macOS security warning&lt;/strong&gt; when first launching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Allow "Find Local Networks"&lt;/strong&gt; permission&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose Local vs. Remote&lt;/strong&gt; — select "This Mac" for a local-only Gateway&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grant permissions&lt;/strong&gt; — the app may request Automation, Notifications, Accessibility, and other TCC permissions depending on your use case&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install the CLI&lt;/strong&gt; (optional) — the app can install the global &lt;code&gt;openclaw&lt;/code&gt; CLI via npm so terminal workflows work alongside the app&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chat in the onboarding session&lt;/strong&gt; — the app opens a dedicated chat so the agent can introduce itself&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The stable workflow recommended by the OpenClaw docs: install and launch &lt;code&gt;OpenClaw.app&lt;/code&gt;, complete the onboarding checklist, then link your channels with &lt;code&gt;openclaw channels login&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration Basics
&lt;/h2&gt;

&lt;p&gt;OpenClaw stores its configuration at &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;. Out of the box, it uses the bundled Pi binary in RPC mode with per-sender sessions — no configuration needed.&lt;/p&gt;

&lt;p&gt;If you want to restrict who can message your agent, add an &lt;code&gt;allowFrom&lt;/code&gt; rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"channels"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"whatsapp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"allowFrom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"+15555550123"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"groups"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"requireMention"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"groupChat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"mentionPatterns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"@openclaw"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key File Locations on macOS
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Main configuration file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;~/.openclaw/workspace/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Skills, prompts, memories&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;~/.openclaw/credentials/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Channel credentials&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;~/.openclaw/agents/&amp;lt;agentId&amp;gt;/sessions/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Agent session data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/tmp/openclaw/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Logs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Useful Environment Variables
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variable&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OPENCLAW_HOME&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Override home directory for internal path resolution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OPENCLAW_STATE_DIR&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Override the state directory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OPENCLAW_CONFIG_PATH&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Override the config file path&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Troubleshooting: &lt;code&gt;openclaw&lt;/code&gt; Not Found
&lt;/h2&gt;

&lt;p&gt;If your shell can't find the &lt;code&gt;openclaw&lt;/code&gt; command after installation, the issue is almost always a missing PATH entry.&lt;/p&gt;

&lt;p&gt;Quick diagnosis:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;node &lt;span class="nt"&gt;-v&lt;/span&gt;
npm &lt;span class="nt"&gt;-v&lt;/span&gt;
npm prefix &lt;span class="nt"&gt;-g&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the output of &lt;code&gt;npm prefix -g&lt;/code&gt; plus &lt;code&gt;/bin&lt;/code&gt; isn't in your &lt;code&gt;$PATH&lt;/code&gt;, add it to your shell startup file (&lt;code&gt;~/.zshrc&lt;/code&gt; for modern macOS):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;npm prefix &lt;span class="nt"&gt;-g&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;/bin:&lt;/span&gt;&lt;span class="nv"&gt;$PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then open a new Terminal window or run &lt;code&gt;source ~/.zshrc&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting: "Unauthorized" / 1008 Error in Dashboard
&lt;/h2&gt;

&lt;p&gt;If the Control UI shows an unauthorized error:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Make sure the Gateway is reachable: &lt;code&gt;openclaw status&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Retrieve the token: &lt;code&gt;openclaw config get gateway.auth.token&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;In the dashboard settings, paste the token into the auth field and reconnect&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you need to generate a fresh token:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw doctor &lt;span class="nt"&gt;--generate-gateway-token&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What You Now Have
&lt;/h2&gt;

&lt;p&gt;After completing this tutorial, you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;running Gateway&lt;/strong&gt; on your Mac&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authentication configured&lt;/strong&gt; for secure access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Control UI access&lt;/strong&gt; for browser-based chat&lt;/li&gt;
&lt;li&gt;Optionally, one or more &lt;strong&gt;connected messaging channels&lt;/strong&gt; (WhatsApp, Telegram, Discord, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From here, you can explore multi-agent routing, workspace isolation, media support, and the full range of OpenClaw's capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://milvus.io/blog/we-extracted-openclaws-memory-system-and-opensourced-it-memsearch.md?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;We Extracted OpenClaw’s Memory System and Open-Sourced It (memsearch)&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://milvus.io/blog/openclaw-formerly-clawdbot-moltbot-explained-a-complete-guide-to-the-autonomous-ai-agent.md?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;What Is OpenClaw? Complete Guide to the Open-Source AI Agent&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://milvus.io/blog/clawdbot-long-running-ai-agents-langgraph-milvus.md?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Build Clawdbot-Style AI Agents with LangGraph &amp;amp; Milvus&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://milvus.io/blog/why-claude-code-feels-so-stable-a-developers-deep-dive-into-its-local-storage-design.md?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;How Claude Code Manages Local Storage for AI Agents&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>community</category>
    </item>
    <item>
      <title>Our Journey to 35K+ GitHub Stars: The Real Story of Building Milvus from Scratch</title>
      <dc:creator>Chloe Williams</dc:creator>
      <pubDate>Mon, 30 Jun 2025 10:35:10 +0000</pubDate>
      <link>https://dev.to/zilliz/our-journey-to-35k-github-stars-the-real-story-of-building-milvus-from-scratch-5b1j</link>
      <guid>https://dev.to/zilliz/our-journey-to-35k-github-stars-the-real-story-of-building-milvus-from-scratch-5b1j</guid>
      <description>&lt;p&gt;For the past few years, we've been focused on one thing: building an enterprise-ready vector database for the AI era. The hard part isn't building &lt;em&gt;a&lt;/em&gt; database—it's building one that's scalable, easy to use, and actually solves real problems in production.&lt;/p&gt;

&lt;p&gt;This June, we reached a new milestone: Milvus hit &lt;a href="https://github.com/milvus-io/milvus?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;35,000 stars on GitHub&lt;/a&gt; (now it has 35.5K+ stars at the time of writing). We're not going to pretend this is just another number—it means a lot to us.&lt;/p&gt;

&lt;p&gt;Each star represents a developer who took the time to look at what we've built, found it useful enough to bookmark, and in many cases, decided to use it. Some of you have gone further: filing issues, contributing code, answering questions in our forums, and helping other developers when they get stuck.&lt;/p&gt;

&lt;p&gt;We wanted to take a moment to share our story—the real one, with all the messy parts included.&lt;/p&gt;

&lt;h2&gt;
  
  
  We Started Building Milvus Because Nothing Else Worked
&lt;/h2&gt;

&lt;p&gt;Back in 2017, we started with a simple question: As AI applications were starting to emerge and unstructured data was exploding, how do you efficiently store and search the vector embeddings that power semantic understanding?&lt;/p&gt;

&lt;p&gt;Traditional databases weren't built for this. They're optimized for rows and columns, not high-dimensional vectors. The existing technologies and tools were either impossible or painfully slow for what we needed.&lt;/p&gt;

&lt;p&gt;We tried everything available. Hacked together solutions with Elasticsearch. Built custom indexes on top of MySQL. Even experimented with FAISS, but it was designed as a research library, not a production database infrastructure. Nothing provided the complete solution we envisioned for enterprise AI workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So we started building our own.&lt;/strong&gt; Not because we thought it would be easy—databases are notoriously hard to get right—but because we could see where AI was heading and knew it needed purpose-built infrastructure to get there.&lt;/p&gt;

&lt;p&gt;By 2018, we were deep into developing what would become &lt;a href="https://milvus.io/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt;. The term "&lt;strong&gt;vector database&lt;/strong&gt;" didn't even exist yet. We were essentially creating a new category of infrastructure software, which was both exciting and terrifying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open-Sourcing Milvus: Building in Public
&lt;/h2&gt;

&lt;p&gt;In November 2019, we decided to open-source Milvus version 0.10.&lt;/p&gt;

&lt;p&gt;Open-sourcing means exposing all your flaws to the world. Every hack, every TODO comment, every design decision you're not entirely sure about. But we believed that if vector databases were going to become critical infrastructure for AI, they needed to be open and accessible to everyone.&lt;/p&gt;

&lt;p&gt;The response was overwhelming. Developers didn't just use Milvus—they improved it. They found bugs we'd missed, suggested features we hadn't considered, and asked questions that made us think harder about our design choices.&lt;/p&gt;

&lt;p&gt;In 2020, we joined the &lt;a href="https://lfaidata.foundation/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;LF AI &amp;amp; Data Foundation&lt;/a&gt;. This wasn't just for credibility—it taught us how to maintain a sustainable open-source project. How to handle governance, backward compatibility, and building software that lasts years, not months.&lt;/p&gt;

&lt;p&gt;By 2021, we released Milvus 1.0 and &lt;a href="https://lfaidata.foundation/projects/milvus/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;graduated from LF AI &amp;amp; Data Foundation&lt;/a&gt;. That same year, we won the &lt;a href="https://big-ann-benchmarks.com/neurips21.html?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;BigANN global challenge&lt;/a&gt; for billion-scale vector search. That win felt good, but more importantly, it validated that we were solving real problems the right way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hardest Decision: Starting Over
&lt;/h2&gt;

&lt;p&gt;Here's where things get complicated. By 2021, Milvus 1.0 was working well for many use cases, but enterprise customers kept asking for the same things: better cloud-native architecture, easier horizontal scaling, more operational simplicity.&lt;/p&gt;

&lt;p&gt;We had a choice: patch our way forward or rebuild from the ground up. We chose to rebuild.&lt;/p&gt;

&lt;p&gt;Milvus 2.0 was essentially a complete rewrite. We introduced a fully decoupled storage-compute architecture with dynamic scalability. It took us two years and was honestly one of the most stressful periods in our company's history. We were throwing away a working system that thousands of people were using to build something unproven.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But when we released Milvus 2.0 in 2022, it transformed Milvus from a powerful vector database into production-ready infrastructure that could scale to enterprise workloads.&lt;/strong&gt; That same year, we also completed a &lt;a href="https://zilliz.com/news/vector-database-company-zilliz-series-b-extension?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Series B+ funding round&lt;/a&gt;—not to burn money, but to double down on product quality and support for global customers. We knew this path would take time, but every step had to be built on a solid foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Everything Accelerated with AI
&lt;/h2&gt;

&lt;p&gt;2023 was the year of &lt;a href="https://zilliz.com/learn/Retrieval-Augmented-Generation?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;RAG&lt;/a&gt; (retrieval-augmented generation). Suddenly, semantic search went from an interesting AI technique to essential infrastructure for chatbots, document Q&amp;amp;A systems, and AI agents.&lt;/p&gt;

&lt;p&gt;The GitHub stars of Milvus spiked. Support requests multiplied. Developers who had never heard of vector databases were suddenly asking sophisticated questions about indexing strategies and query optimization.&lt;/p&gt;

&lt;p&gt;This growth was exciting but also overwhelming. We realized we needed to scale not just our technology, but our entire approach to community support. We hired more developer advocates, completely rewrote our documentation, and started creating educational content for developers new to vector databases.&lt;/p&gt;

&lt;p&gt;We also launched &lt;a href="https://zilliz.com/cloud?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt;—our fully managed version of Milvus. Some people asked why we were "commercializing" our open-source project. The honest answer is that maintaining enterprise-grade infrastructure is expensive and complex. Zilliz Cloud allows us to sustain and accelerate Milvus development while keeping the core project completely open source.&lt;/p&gt;

&lt;p&gt;Then came 2024. &lt;a href="https://zilliz.com/blog/zilliz-named-a-leader-in-the-forrester-wave-vector-database-report?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;Forrester named us a leader&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;in the vector database category.&lt;/strong&gt; Milvus passed 30,000 GitHub stars. &lt;strong&gt;And we realized: the road we'd been paving for seven years had finally become the highway.&lt;/strong&gt; As more enterprises adopted vector databases as critical infrastructure, our business growth accelerated rapidly—validating that the foundation we'd built could scale both technically and commercially.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Team Behind Milvus: Zilliz
&lt;/h2&gt;

&lt;p&gt;Here's something interesting: many people know Milvus but not Zilliz. We're actually fine with that. &lt;a href="https://zilliz.com/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;Zilliz&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;is the team behind Milvus—we build it, maintain it, and support it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What we care about most are the unglamorous things that make the difference between a cool demo and production-ready infrastructure: performance optimizations, security patches, documentation that actually helps beginners, and responding thoughtfully to GitHub issues.&lt;/p&gt;

&lt;p&gt;We've built a 24/7 global support team across the U.S., Europe, and Asia, because developers need help in their time zones, not ours. We have community contributors we call "&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSfkVTYObayOaND8M1ci9eF_YWvoKDb-xQjLJYZ-LhbCdLAt2Q/viewform?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Milvus Ambassadors&lt;/a&gt;" who organize events, answer forum questions, and often explain concepts better than we do.&lt;/p&gt;

&lt;p&gt;We've also welcomed integrations with AWS, GCP, and other cloud providers—even when they offer their own managed versions of Milvus. More deployment options are good for users. Though we've noticed that when teams hit complex technical challenges, they often end up reaching out to us directly because we understand the system at the deepest level.&lt;/p&gt;

&lt;p&gt;Many people think open source is just a "toolbox," but it's actually an "evolutionary process"—a collective effort by countless people who love and believe in it. Only those who truly understand the architecture can provide the "why" behind bug fixes, performance bottleneck analysis, data system integration, and architectural adjustments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So if you're using open-source Milvus, or considering vector databases as a core component of your AI system, we encourage you to reach out to us directly for the most professional and timely support.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Impact in Production: The Trust from Users
&lt;/h2&gt;

&lt;p&gt;The use cases for Milvus have grown beyond what we initially imagined. We're powering AI infrastructure for some of the world's most demanding enterprises across every industry.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0w7r6rotxew8gm57dmvd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0w7r6rotxew8gm57dmvd.png" alt="zilliz customers.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/customers/bosch?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;Bosch&lt;/strong&gt;&lt;/a&gt;, the global automotive technology leader and pioneer in autonomous driving, revolutionized their data analysis with Milvus achieving 80% reduction in data collection costs and $1.4M annual savings while searching billions of driving scenarios in milliseconds for critical edge cases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/customers/read-ai?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;Read AI&lt;/strong&gt;&lt;/a&gt;, one of the fastest-growing productivity AI companies serving millions of monthly active users, uses Milvus to achieve sub-20-50ms retrieval latency across billions of records and 5× speedup in agentic search. Their CTO says, "Milvus serves as the central repository and powers our information retrieval among billions of records."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/customers/global-fintech-leader?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;A global fintech leader&lt;/strong&gt;&lt;/a&gt;, one of the world's largest digital payment platforms processing tens of billions of transactions across 200+ countries and 25+ currencies, chose Milvus for 5-10× faster batch ingestion than competitors, completing jobs in under 1 hour that took others 8+ hours.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/customers/filevine?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;Filevine&lt;/strong&gt;&lt;/a&gt;, the leading legal work platform trusted by thousands of law firms across the United States, manages 3 billion vectors across millions of legal documents, saving attorneys 60-80% of time in document analysis and achieving "true consciousness of data" for legal case management.&lt;/p&gt;

&lt;p&gt;We're also supporting &lt;strong&gt;NVIDIA, OpenAI, Microsoft, Salesforce, Walmart,&lt;/strong&gt; and many others in almost every industry. Over 10,000 organizations have made Milvus or Zilliz Cloud their vector database of choice.&lt;/p&gt;

&lt;p&gt;These aren't just technical success stories—they're examples of how vector databases are quietly becoming critical infrastructure that powers the AI applications people use every day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why We Built Zilliz Cloud: Enterprise-Grade Vector Database as a Service
&lt;/h2&gt;

&lt;p&gt;Milvus is open-source and free to use. But running Milvus well at enterprise scale requires deep expertise and significant resources. Index selection, memory management, scaling strategies, security configurations—these aren't trivial decisions. Many teams want the power of Milvus without the operational complexity and with enterprise support, SLA guarantees, etc.&lt;/p&gt;

&lt;p&gt;That's why we built &lt;a href="https://zilliz.com/cloud?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt;—a fully managed version of Milvus deployed across 25 global regions and 5 major clouds, including AWS, GCP, and Azure, designed specifically for enterprise-scale AI workloads that demand performance, security, and reliability.&lt;/p&gt;

&lt;p&gt;Here's what makes Zilliz Cloud different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Massive Scale with High Performance:&lt;/strong&gt; Our proprietary AI-powered AutoIndex engine delivers 3-5× faster query speeds than open-source Milvus, with zero index tuning required. The cloud-native architecture supports billions of vectors and tens of thousands of concurrent queries while maintaining sub-second response times.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/trust-center?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;Built-in Security &amp;amp; Compliance&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt; Encryption at rest and in transit, fine-grained RBAC, comprehensive audit logging, SAML/OAuth2.0 integration, and &lt;a href="https://zilliz.com/bring-your-own-cloud?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;BYOC&lt;/a&gt; (bring your own cloud) deployments. We're compliant with GDPR, HIPAA, and other global standards that enterprises actually need. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Optimized for Cost-Efficiency:&lt;/strong&gt; Tiered hot/cold data storage, elastic scaling that responds to real workloads, and pay-as-you-go pricing can reduce total cost of ownership by 50% or more compared to self-managed deployments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Truly Cloud-Agnostic without vendor lock-in:&lt;/strong&gt; Deploy on AWS, Azure, GCP, Alibaba Cloud, or Tencent Cloud without vendor lock-in. We ensure global consistency and scalability regardless of where you run.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These capabilities might not sound flashy, but they solve real, daily problems that enterprise teams face when building AI applications at scale. And most importantly: it's still Milvus under the hood, so there's no proprietary lock-in or compatibility issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next: Vector Data Lake
&lt;/h2&gt;

&lt;p&gt;We coined the term "&lt;a href="https://zilliz.com/learn/what-is-vector-database?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;vector database&lt;/a&gt;" and were the first to build one, but we're not stopping there. We're now building the next evolution: &lt;strong&gt;Vector Data Lake.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here's the problem we're solving: not every vector search needs millisecond latency.&lt;/strong&gt; Many enterprises have massive datasets that are queried occasionally, including historical document analysis, batch similarity computations, and long-term trend analysis. For these use cases, a traditional real-time vector database is both overkill and expensive. &lt;/p&gt;

&lt;p&gt;Vector Data Lake uses a storage-compute separated architecture specifically optimized for massive-scale, infrequently accessed vectors while keeping costs dramatically lower than real-time systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core capabilities include:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unified Data Stack:&lt;/strong&gt; Seamlessly connects online and offline data layers with consistent formats and efficient storage, so you can move data between hot and cold tiers without reformatting or complex migrations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compatible Compute Ecosystem:&lt;/strong&gt; Works natively with frameworks like Spark and Ray, supporting everything from vector search to traditional ETL and analytics. This means your existing data teams can work with vector data using tools they already know.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost-Optimized Architecture:&lt;/strong&gt; Hot data stays on SSD or NVMe for fast access; cold data automatically moves to object storage like S3. Smart indexing and storage strategies keep I/O fast when you need it while making storage costs predictable and affordable.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't about replacing vector databases—it's about giving enterprises the right tool for each workload. Real-time search for user-facing applications, cost-effective vector data lakes for analytics and historical processing.&lt;/p&gt;

&lt;p&gt;We still believe in the logic behind Moore's Law and Jevons Paradox: as the unit cost of computing drops, adoption scales. The same applies to vector infrastructure.&lt;/p&gt;

&lt;p&gt;By improving indexes, storage structures, caching, and deployment models—day in, day out—we hope to make AI infrastructure more accessible and affordable for everyone, and to help bring unstructured data into the AI-native future.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Big Thanks to You All!
&lt;/h2&gt;

&lt;p&gt;Those 35K+ stars represent something we're genuinely proud of: a community of developers who find Milvus useful enough to recommend and contribute to.&lt;/p&gt;

&lt;p&gt;But we're not done. Milvus has bugs to fix, performance improvements to make, and features our community has been asking for. Our roadmap is public, and we genuinely want your input on what to prioritize.&lt;/p&gt;

&lt;p&gt;The number itself isn't what matters—it's the trust those stars represent. Trust that we'll keep building in the open, keep listening to feedback, and keep making Milvus better.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;To our contributors:&lt;/strong&gt; your PRs, bug reports, and documentation improvements make Milvus better every day. Thank you so much. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;To our users:&lt;/strong&gt; thank you for trusting us with your production workloads and for the feedback that keeps us honest.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;To our community:&lt;/strong&gt; thank you for answering questions, organizing events, and helping newcomers get started.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're new to vector databases, we'd love to help you get started. If you're already using Milvus or Zilliz Cloud, we'd love to &lt;a href="https://zilliz.com/share-your-story?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;hear about your experience&lt;/a&gt;. And if you're just curious about what we're building, our community channels are always open.&lt;/p&gt;

&lt;p&gt;Let's keep building the infrastructure that makes AI applications possible—together.&lt;/p&gt;




&lt;p&gt;Find us here: &lt;a href="https://github.com/milvus-io/milvus?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Milvus on GitHub&lt;/a&gt; |&lt;a href="https://zilliz.com/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt; Zilliz Cloud&lt;/a&gt; |&lt;a href="https://discuss.milvus.io/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt; Discord&lt;/a&gt; | &lt;a href="https://www.linkedin.com/company/the-milvus-project/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://x.com/zilliz_universe?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;X&lt;/a&gt; | &lt;a href="https://www.youtube.com/@MilvusVectorDatabase/featured?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;YouTube&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://meetings.hubspot.com/chloe-williams1/milvus-office-hour?__hstc=175614333.dc4bcf53f6c7d650ea8978dcdb9e7009.1727350436713.1751017913702.1751029841530.667&amp;amp;__hssc=175614333.3.1751029841530&amp;amp;__hsfp=3554976067" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftt6b1za8ozv1byrtl2ds.png" width="800" height="150"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>product</category>
    </item>
    <item>
      <title>Top 5 Open Source Vector Search Engines: A Comprehensive Comparison Guide for 2025</title>
      <dc:creator>Chloe Williams</dc:creator>
      <pubDate>Sun, 01 Jun 2025 15:09:10 +0000</pubDate>
      <link>https://dev.to/zilliz/top-5-open-source-vector-search-engines-a-comprehensive-comparison-guide-for-2025-26p6</link>
      <guid>https://dev.to/zilliz/top-5-open-source-vector-search-engines-a-comprehensive-comparison-guide-for-2025-26p6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/learn/vector-similarity-search?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;Vector search&lt;/strong&gt;&lt;/a&gt;, also known as vector similarity search, has quickly evolved from an experimental technology to a must-have component in many AI applications. As developers and technical leaders, we're increasingly looking for ways to handle similarity-based queries that traditional databases simply weren't designed to handle efficiently.&lt;/p&gt;

&lt;p&gt;Whether you're building a product recommendation system or implementing semantic search, the underlying challenge is the same: how do you efficiently find the "nearest neighbors" to a query vector in a potentially massive dataset? That's where vector search engines come in.&lt;/p&gt;

&lt;p&gt;The good news is that the open source community has stepped up with multiple high-quality options. The challenging part? Figuring out which one is right for your specific use case, technical requirements, and team expertise. &lt;/p&gt;

&lt;p&gt;In this guide, we'll walk through the most popular open-source vector search engines available today, compare their strengths and limitations, and provide practical insights to help you make an informed decision. We'll cover everything from the technical foundations to specific implementation considerations, with a focus on real-world applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Vector Search: Core Concepts
&lt;/h2&gt;

&lt;p&gt;Before diving into specific engines, let's establish some shared understanding of what vector search actually involves.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Are Vector Embeddings?
&lt;/h3&gt;

&lt;p&gt;At its core, vector search relies on embedding data into&lt;a href="https://zilliz.com/glossary/vector-embeddings?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt; vectors&lt;/a&gt;—essentially converting information (text, images, audio, or any other data type) into lists of floating-point numbers that capture semantic meaning. These vectors typically range from dozens to thousands of dimensions.&lt;/p&gt;

&lt;p&gt;For example, &lt;a href="https://zilliz.com/ai-models?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;a text embedding model&lt;/a&gt; might encode the sentence "The weather is nice today" into a 384-dimensional vector where semantically similar sentences like "It's a beautiful day" would be positioned nearby in this high-dimensional space.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vector Search vs. Traditional Search
&lt;/h3&gt;

&lt;p&gt;Traditional search engines typically use inverted indices and exact keyword matching. Vector search, in contrast, measures &lt;a href="https://zilliz.com/blog/similarity-metrics-for-vector-search?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;the distance between vectors&lt;/a&gt; to find similar items, regardless of exact keyword overlap.&lt;/p&gt;

&lt;p&gt;Consider these approaches:&lt;/p&gt;

&lt;p&gt;Traditional keyword search matches "red leather jacket" with documents containing exactly those words. Vector search, however, can match "red leather jacket" with items that are conceptually similar, even if described as "scarlet biker coat" because it understands the semantic similarity rather than requiring exact term matches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Performance Metrics
&lt;/h3&gt;

&lt;p&gt;When evaluating vector search engines, several metrics matter:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query speed&lt;/strong&gt; is measured in milliseconds or queries per second (QPS), indicating how quickly results are returned. &lt;strong&gt;Recall&lt;/strong&gt; represents the percentage of relevant results actually retrieved compared to what should have been retrieved. &lt;strong&gt;Index build time&lt;/strong&gt; tells you how long it takes to create the search index, while &lt;strong&gt;memory usage&lt;/strong&gt; reflects RAM requirements for both indexing and querying. &lt;strong&gt;Scalability&lt;/strong&gt; refers to a system's ability to handle increasing data volumes and query loads without experiencing performance degradation.&lt;/p&gt;

&lt;p&gt;Understanding these fundamentals will help frame our exploration of the specific engines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Popular Vector Search Use Cases
&lt;/h2&gt;

&lt;p&gt;Vector search isn't just a theoretical concept—it's powering some of the most innovative applications being built today. Here are the key use cases where vector search engines are making a significant impact:&lt;/p&gt;

&lt;h3&gt;
  
  
  Retrieval Augmented Generation (RAG)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/learn/Retrieval-Augmented-Generation?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;RAG&lt;/a&gt; has become one of the most common applications of vector search, combining the power of large language models with knowledge retrieval. In RAG implementations, documents are converted to vector embeddings and stored in a vector database like &lt;a href="https://milvus.io/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt;, &lt;a href="https://zilliz.com/learn/faiss?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Faiss&lt;/a&gt;, and &lt;a href="https://zilliz.com/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt;. When a query arrives, the system retrieves the most relevant documents based on vector similarity. These retrieved documents provide context to an LLM, allowing for more accurate, up-to-date responses.&lt;/p&gt;

&lt;p&gt;This approach helps address the hallucination problem in LLMs while enabling them to access domain-specific information that wasn't included in their training data.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI Agents and Knowledge Retrieval
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/blog/function-calling-vs-mcp-vs-a2a-developers-guide-to-ai-agent-protocols?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;AI agents&lt;/a&gt; often need to make decisions based on relevant information scattered across various sources. Vector search enables these agents to quickly retrieve context-relevant information from large knowledge bases, identify similar past interactions or decisions, and construct memory systems that understand semantic similarity.&lt;/p&gt;

&lt;p&gt;For developers building AI agents, the choice of vector database can significantly impact both performance and capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recommendation Systems
&lt;/h3&gt;

&lt;p&gt;E-commerce platforms, streaming services, and content sites rely heavily on recommendation engines to increase engagement. Vector search powers these systems by representing user preferences and item features as vectors, finding items similar to those a user has liked previously, and identifying users with similar taste profiles.&lt;/p&gt;

&lt;p&gt;The right vector search engine can make the difference between recommendations that feel random versus those that seem to understand user preferences intuitively.&lt;/p&gt;

&lt;h3&gt;
  
  
  Semantic Search Applications
&lt;/h3&gt;

&lt;p&gt;Text search that understands meaning rather than just keywords is transforming how we interact with information. Vector search enables finding conceptually similar documents even when terminology differs, understanding user intent behind queries, and supporting multilingual search where concepts align across languages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Image and Multimedia Similarity Search
&lt;/h3&gt;

&lt;p&gt;Beyond text, vector search excels at finding similar images, audio, or videos. This capability powers applications like identifying visually similar products in e-commerce, finding music with similar acoustic properties, and detecting near-duplicate media assets.&lt;/p&gt;

&lt;p&gt;These applications require vector engines that can handle diverse embedding types efficiently.&lt;/p&gt;

&lt;p&gt;Now that we’ve learned about the essence of vector search and its common use cases, let’s explore the top vector databases, particularly open-source options. &lt;/p&gt;

&lt;h2&gt;
  
  
  Milvus
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://milvus.io/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt; is the most popular open-source vector database with more than &lt;a href="https://github.com/milvus-io/milvus?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;35,000 stars&lt;/a&gt; on GitHub. It first appeared in 2019 and has since gained significant traction in the developer community. Created specifically to handle large-scale similarity searches, Milvus was designed from the ground up to address the unique challenges of vector data management.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture and Technical Capabilities
&lt;/h3&gt;

&lt;p&gt;Milvus uses a cloud-native architecture with separated storage and compute layers. Stateless query nodes handle search requests, storage nodes manage data persistence, and coordinator nodes handle cluster management. This separation allows Milvus to scale horizontally as data volumes and query loads increase—a critical consideration for production deployments.&lt;/p&gt;

&lt;p&gt;The platform supports multiple index types, including &lt;a href="https://milvus.io/blog/understand-hierarchical-navigable-small-worlds-hnsw-for-vector-search.md?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;HNSW&lt;/a&gt; (Hierarchical Navigable Small World), &lt;a href="https://zilliz.com/learn/choosing-right-vector-index-for-your-project?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;IVF&lt;/a&gt; (Inverted File), &lt;a href="https://milvus.io/blog/diskann-explained.md?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;DiskANN&lt;/a&gt;, and others, providing developers with flexibility to optimize for different workloads. Milvus also offers &lt;a href="https://milvus.io/blog/get-started-with-hybrid-semantic-full-text-search-with-milvus-2-5.md?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;hybrid search capabilities&lt;/a&gt;, combining vector similarity with scalar filtering and full-text search, which proves valuable when search needs to consider both semantic similarity and keyword matching, as well as metadata constraints.&lt;/p&gt;

&lt;p&gt;Milvus supports multiple distance metrics, including Euclidean, Cosine, and Inner Product, making it adaptable to various embedding types and similarity definitions. Its storage architecture includes time travel capabilities, allowing point-in-time queries and backups.&lt;/p&gt;

&lt;p&gt;Milvus can be used to build various types of AI applications, from demos running locally in Jupyter Notebooks to massive-scale Kubernetes clusters handling tens of billions of vectors. Currently, there are three &lt;a href="https://milvus.io/docs/install-overview.md?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Milvus deployment options&lt;/a&gt;: Milvus Lite, Milvus Standalone, and Milvus Distributed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Characteristics
&lt;/h3&gt;

&lt;p&gt;In benchmarks, Milvus demonstrates query latency typically in single-digit milliseconds for million-scale datasets, making it suitable for real-time applications. The platform supports ANNS (Approximate Nearest Neighbor Search) algorithms that trade perfect recall for substantial speed improvements—an essential trade-off for practical applications.&lt;/p&gt;

&lt;p&gt;Memory usage in Milvus is managed through disk-based storage with memory caching, allowing it to handle datasets larger than available RAM. This approach makes Milvus more cost-effective for large vector collections compared to purely in-memory solutions.&lt;/p&gt;

&lt;p&gt;For most production workloads, Milvus strikes a balance between recall accuracy and query speed, with tunable parameters that enable adjustments tailored to specific requirements. However, this flexibility comes with added complexity in configuration and optimization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Migration Simplicity
&lt;/h3&gt;

&lt;p&gt;A notable advantage of Milvus is the straightforward migration path from other vector databases. Through open-source migration tools like the &lt;a href="https://github.com/zilliztech/vts?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Vector Transport Service (VTS) &lt;/a&gt;tool, moving data from other vector search engines to Milvus is simplified. This tool supports automated schema mapping, incremental data migration, and data validation during the transfer process. This makes Milvus particularly attractive for teams that have outgrown their current solution or want to standardize on a single platform.&lt;/p&gt;

&lt;p&gt;That said, migration always involves some effort and risk, so thorough testing remains necessary, despite the use of these tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zilliz Cloud: Fully Managed Milvus
&lt;/h3&gt;

&lt;p&gt;While the open-source Milvus is powerful on its own, it requires local machines and engineering resources to deploy, operate, and maintain when building production-level applications. Zilliz, the engineering team behind Milvus, has created a fully managed Milvus on &lt;a href="https://zilliz.com/cloud?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt;, eliminating all the operational overhead for its customers so that they can invest more in creation and their business, rather than devoting all resources to infrastructure management. &lt;/p&gt;

&lt;p&gt;This Zilliz Cloud service provides additional feature sets, simplified deployment and operations, automatic scaling and resource management, advanced security features, and SLA-backed reliability. The managed service also includes continuous updates and optimizations, eliminating the need for in-house expertise.&lt;/p&gt;

&lt;p&gt;For teams focused on building applications rather than managing infrastructure, Zilliz Cloud provides a way to leverage Milvus without operational overhead. &lt;/p&gt;

&lt;h3&gt;
  
  
  Community and Ecosystem
&lt;/h3&gt;

&lt;p&gt;The Milvus ecosystem has grown substantially, with an active GitHub repository that features regular releases. The project provides client SDKs for Python, Java, Go, and other languages, as well as integration with popular AI models and ML frameworks, including LangChain and LlamaIndex. Additionally, it features a growing community forum and comprehensive documentation.&lt;/p&gt;

&lt;p&gt;This ecosystem maturity reduces implementation risks and provides multiple resources for troubleshooting. However, like any open-source project, community support can sometimes be unpredictable compared to paid support options.&lt;/p&gt;

&lt;h2&gt;
  
  
  Faiss
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/learn/faiss?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Faiss&lt;/a&gt;, short for Facebook AI Similarity Search, is a popular vector search library that was developed and open-sourced by Facebook AI Research (now Meta) in 2017. Unlike some other options in this comparison, Faiss was created by researchers for researchers, initially focusing on academic and experimental workloads before being adopted for production systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Technical Overview
&lt;/h3&gt;

&lt;p&gt;Faiss takes a different approach from some other vector search solutions. It's implemented in C++ with Python bindings for performance and designed as a library rather than a standalone service. One distinguishing feature is its optimization for both CPU and GPU execution, with certain workloads seeing dramatic speedups on GPU hardware.&lt;/p&gt;

&lt;p&gt;The library offers multiple index types tailored for various scenarios. IndexFlatL2 offers exact search with L2 distance for perfect accuracy. IndexIVFFlat implements an inverted file with flat storage for improved query speed. IndexHNSW leverages Hierarchical Navigable Small World graphs for efficient approximate search. IndexPQ utilizes product quantization for memory efficiency, allowing even modest hardware to search billions of vectors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strengths and Limitations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;One of Faiss's major strengths is raw performance. It's often the fastest option for in-memory vector search when properly configured.&lt;/strong&gt; The library achieves memory efficiency through clever compression techniques, such as &lt;a href="https://zilliz.com/learn/harnessing-product-quantization-for-memory-efficiency-in-vector-databases?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;product quantization&lt;/a&gt;, which can reduce vector storage requirements by an order of magnitude.&lt;/p&gt;

&lt;p&gt;Faiss also stands out with native GPU support for even faster processing, making it ideal for research environments with access to GPU resources. The library offers fine-grained control with detailed parameter tuning options for those who want to optimize their workloads.&lt;/p&gt;

&lt;p&gt;However, Faiss comes with notable limitations. It has no built-in persistence layer, meaning developers must handle saving and loading indexes themselves. It requires more integration work than turnkey solutions since it's a library rather than a service. Faiss is also less suited for distributed deployments without additional engineering work. So, many developers use Faiss for experimenting or prototyping. &lt;/p&gt;

&lt;p&gt;Perhaps most significantly, Faiss has a steeper learning curve than some alternatives. The documentation, while comprehensive, assumes a strong understanding of the underlying algorithms and techniques.&lt;/p&gt;

&lt;h2&gt;
  
  
  Annoy
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/learn/what-is-annoy?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Annoy&lt;/a&gt;, which stands for "Approximate Nearest Neighbors Oh Yeah," was developed by Spotify and open-sourced in 2013, making it one of the older solutions in this comparison. Created specifically to power Spotify's music recommendation system, Annoy takes a distinct approach optimized for read-heavy workloads with relatively static data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approximate Nearest Neighbors Approach
&lt;/h3&gt;

&lt;p&gt;Annoy uses random projection binary search trees as its core algorithm. Each tree splits the vector space differently, creating a forest of trees that collectively provide good approximations of the true nearest neighbors. As more trees are added to the forest, the probability of finding the true nearest neighbors increases, allowing a trade-off between accuracy and resource usage.&lt;/p&gt;

&lt;p&gt;This approach differs significantly from the graph-based methods used by many newer vector search engines.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Trade-offs
&lt;/h3&gt;

&lt;p&gt;Annoy makes specific trade-offs that distinguish it from more general-purpose solutions. It's read-optimized, delivering very fast performance at query time, but this comes at the cost of write flexibility. Once built, Annoy indexes don't change—new data requires rebuilding the index.&lt;/p&gt;

&lt;p&gt;The system is disk-based, with indexes that can be memory-mapped for efficiency. This allows Annoy to handle datasets larger than available RAM while maintaining good query performance. However, Annoy offers limited functionality beyond core approximate nearest neighbor search, lacking many features found in more comprehensive solutions.&lt;/p&gt;

&lt;p&gt;These design choices make Annoy different from databases designed for frequent updates and complex queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Integration Options
&lt;/h3&gt;

&lt;p&gt;Annoy offers Python bindings with scikit-learn compatibility, making it accessible to data scientists and ML engineers. Its C++ core provides good performance despite the simplified API. The library supports easy serialization and deserialization of indexes, facilitating offline build processes.&lt;/p&gt;

&lt;p&gt;The API is simple and focused exclusively on nearest neighbor search, making it easy to learn, but it is limited in functionality. Unlike more comprehensive vector databases, Annoy requires additional infrastructure for features like persistence, scaling, and query filtering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Weaviate
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/comparison/milvus-vs-weaviate?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Weaviate&lt;/a&gt; emerged in 2019 as a different approach to vector search. Unlike pure vector databases, Weaviate combines vector search capabilities with a knowledge graph, creating a hybrid system designed to add contextual understanding to similarity queries.&lt;/p&gt;

&lt;p&gt;What sets Weaviate apart is its graph-based data model. In Weaviate, data objects can be connected through semantic relationships, and these connections add context to vector-based queries. This allows queries to blend vector similarity with graph traversal, supporting more sophisticated searches than simple nearest-neighbor matching. For instance, a deployment might store product embeddings and also model relationships between products, categories, and brands. A user query could then return not only similar items but also those connected through shared attributes or behaviors.&lt;/p&gt;

&lt;p&gt;This hybrid model enables expressive querying, but it also introduces additional complexity in data modeling and indexing. Developers must manage both vector embeddings and graph relationships, which can increase the learning curve and operational overhead.&lt;/p&gt;

&lt;p&gt;Weaviate uses HNSW-based indexing for efficient vector search and supports flexible filtering applied either pre- or post-search. It scales through sharding, allowing it to handle growing datasets and query loads. However, distributed setups can become more complex to configure and operate, particularly at larger scales.&lt;/p&gt;

&lt;p&gt;While Weaviate performs well across a variety of use cases, it's not always the top performer in pure vector search benchmarks. Its additional graph features, while powerful, can lead to slower response times when executing complex queries that combine vector search with multiple relationship traversals. This makes it better suited to applications that benefit from contextual enrichment, rather than those requiring ultra-low latency on high-throughput vector-only workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Qdrant
&lt;/h2&gt;

&lt;p&gt;Qdrant (pronounced "quadrant") is a newer entrant to the vector database space, first appearing in 2021. Qdrant provides both REST and gRPC APIs for interacting with the database, making it accessible from virtually any programming language. Its storage is isolated in collections, similar to tables in traditional databases, providing logical separation of different data types. The architecture offers point-in-time consistency guarantees and ACID-compliant operations for data reliability. This approach makes Qdrant more familiar to developers coming from traditional database backgrounds, reducing the learning curve.&lt;/p&gt;

&lt;p&gt;A key strength of Qdrant is its ability to combine vector search with traditional filtering. The platform offers rich filter expressions that execute efficiently as part of the search process. Its payload-based filtering integrates directly into the search rather than being applied as a post-processing step. It also supports complex boolean conditions, including AND, OR, and NOT operations across multiple fields, and allows boosting results based on specific filter conditions—useful for nuanced ranking in hybrid search.&lt;/p&gt;

&lt;p&gt;However, this filtering flexibility comes with trade-offs. As filter expressions become more complex or datasets grow, query performance may degrade, particularly when many filters are applied in high-cardinality fields. Additionally, while Qdrant supports distributed deployments, its horizontal scaling features are still evolving compared to more mature systems, and operational tooling around large-scale clustering remains relatively limited. These factors should be considered when evaluating Qdrant for high-scale or highly dynamic workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison Table: Key Features of Top Vector Search Engines
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Engine&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Filtering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Managed Option&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Distributed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Update Frequency&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Milvus&lt;/td&gt;
&lt;td&gt;Cloud-native, storage/compute separation&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Zilliz Cloud&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Faiss&lt;/td&gt;
&lt;td&gt;Library, C++ with Python bindings&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;Batch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Annoy&lt;/td&gt;
&lt;td&gt;Forest of binary trees&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Offline only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weaviate&lt;/td&gt;
&lt;td&gt;Knowledge graph + vector DB&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Weaviate Cloud&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qdrant&lt;/td&gt;
&lt;td&gt;Rust-based, collections&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Qdrant Cloud&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Other Notable Vector Search Options
&lt;/h2&gt;

&lt;p&gt;Beyond the main purpose built options highlighted above, many traditional databases start to offer vector search capability as an add-on. &lt;/p&gt;

&lt;h3&gt;
  
  
  Elasticsearch with Vector Search
&lt;/h3&gt;

&lt;p&gt;Elasticsearch, already widely adopted for text search, has added vector search capabilities in recent versions. This functionality introduces kNN (k-Nearest Neighbors) search to the Elasticsearch ecosystem, enabling organizations to utilize their existing infrastructure for vector search requirements.&lt;/p&gt;

&lt;p&gt;The integration with existing Elasticsearch features enables teams to combine traditional text search, faceting, and aggregations with vector similarity on a single platform. The familiar API reduces the learning curve for teams already using Elasticsearch.&lt;/p&gt;

&lt;p&gt;This approach works well for organizations already invested in the Elastic ecosystem who need to add vector capabilities without adopting an entirely new database. However, performance may not match purpose-built vector databases for large-scale, vector-only workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vespa
&lt;/h3&gt;

&lt;p&gt;Vespa is Yahoo's open source search engine that combines traditional search, vector search, and sophisticated ranking in a single platform. It offers real-time indexing and searching, with updates immediately available for query, unlike some solutions that require batch processing or index rebuilding.&lt;/p&gt;

&lt;p&gt;The platform provides sophisticated ranking frameworks that can combine multiple signals, including vector similarity, text relevance, and business rules. It scales to large deployments with a distributed architecture and has been battle-tested in production at major internet companies.&lt;/p&gt;

&lt;p&gt;Vespa's comprehensive feature set makes it suitable for complex search applications, though this comes with increased complexity compared to more focused solutions. It requires more resources to deploy and maintain than simpler vector search options.&lt;/p&gt;

&lt;h3&gt;
  
  
  pgvector
&lt;/h3&gt;

&lt;p&gt;pgvector is an extension that adds vector data types and operations to PostgreSQL, allowing vector search within a traditional relational database. It supports multiple index types including IVF and HNSW for efficient similarity search on vector columns.&lt;/p&gt;

&lt;p&gt;The key advantage is the ability to use SQL queries combining vector and relational data, making it easy to add vector search to existing applications without adopting a separate database. This option leverages existing PostgreSQL infrastructure and expertise, potentially reducing operational overhead.&lt;/p&gt;

&lt;p&gt;The main limitation is that performance may not match dedicated vector databases for very large vector collections or high query volumes. It represents a pragmatic compromise rather than an optimized solution for vector-only workloads. What is most important, &lt;a href="https://milvus.io/blog/why-ai-databases-do-not-need-sql.md?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;does SQL really necessary for AI workloads in the future&lt;/a&gt;? &lt;/p&gt;

&lt;h3&gt;
  
  
  Emerging Options
&lt;/h3&gt;

&lt;p&gt;The vector database space continues to evolve with newer projects entering the field. Chroma focuses specifically on embeddings for LLM applications, with simplified APIs for RAG implementations. Marqo emphasizes simplicity and cloud-native operations, aiming to reduce the operational burden of vector search. LanceDB offers embedded vector search capabilities, targeting edge devices and applications that need to operate offline.&lt;/p&gt;

&lt;p&gt;These emerging options show the continued innovation in the space, though they generally lack the production history and ecosystem maturity of more established solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing the Right Vector Search Engine 
&lt;/h2&gt;

&lt;p&gt;With so many options available, selecting the right vector search engine requires careful consideration of your specific needs and constraints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision Framework
&lt;/h3&gt;

&lt;p&gt;When evaluating vector search engines, start by considering your scale requirements—how many vectors will you store and query, both now and in the future? Different engines have different scaling characteristics and sweet spots.&lt;/p&gt;

&lt;p&gt;Next, assess your query patterns. Will you perform pure vector search, or do you need to combine vector similarity with filtering, relationship traversal, or other operations? Some engines excel at pure vector search but struggle with complex hybrid queries.&lt;/p&gt;

&lt;p&gt;Update frequency is another important consideration. If your data changes frequently or requires real-time updates, solutions like Annoy that require rebuilding indexes will be problematic. Conversely, if your data is relatively static, simpler architectures may offer performance advantages.&lt;/p&gt;

&lt;p&gt;Integration needs matter as well. Do you need a standalone service, a library to embed in your application, or an extension to an existing database? Your current infrastructure and team expertise may make certain options more practical than others.&lt;/p&gt;

&lt;p&gt;Finally, consider your team's expertise with specific technologies. The best technical solution on paper may not be the best choice if your team lacks the skills to implement and maintain it effectively.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scaling Considerations
&lt;/h3&gt;

&lt;p&gt;Different engines approach scaling in different ways, and understanding these differences is crucial for achieving long-term success. Milvus offers horizontal scaling with separated storage and compute, allowing independent scaling of different components as needs change. Faiss excels at vertical scaling, particularly with GPU acceleration, but requires more custom work for distributed deployments.&lt;/p&gt;

&lt;p&gt;Your anticipated growth trajectory should influence your choice, with some solutions better suited to gradual scaling while others may require significant re-architecture as you grow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Total Cost of Ownership
&lt;/h3&gt;

&lt;p&gt;When selecting a vector search engine, consider all aspects of total cost of ownership. Infrastructure costs include RAM and CPU requirements, which vary significantly between solutions. Some engines require substantial memory for optimal performance, while others can operate effectively with more modest resources.&lt;/p&gt;

&lt;p&gt;Operational complexity affects ongoing maintenance costs. Deployment, monitoring, and maintenance effort varies widely, with some solutions requiring specialized expertise while others integrate more easily with standard DevOps practices.&lt;/p&gt;

&lt;p&gt;Development time is another important factor. The learning curve and integration complexity of different engines can significantly impact project timelines and success rates. Solutions with better documentation, more examples, and more intuitive APIs typically result in faster implementation.&lt;/p&gt;

&lt;p&gt;Support options range from community forums to commercial support agreements. Consider your organization's requirements for response times and support guarantees when evaluating options.&lt;/p&gt;

&lt;p&gt;Finally, consider potential migration costs. If your needs change, how difficult would it be to switch to a different solution? Engines with standard APIs and export capabilities provide more future flexibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Future-Proofing
&lt;/h3&gt;

&lt;p&gt;Vector search technology is evolving rapidly; therefore, selecting a solution that can adapt to your changing needs is crucial. Examine community activity and release cadence to assess ongoing development. Projects with regular updates and active discussion forums are more likely to remain relevant and up-to-date.&lt;/p&gt;

&lt;p&gt;Corporate backing and sustainability matter for long-term viability. Projects supported by established companies or foundations generally have more stable development trajectories.&lt;/p&gt;

&lt;p&gt;Aligning the feature roadmap with your anticipated needs helps ensure the solution grows in directions that benefit your use cases. Finally, flexibility to adapt as requirements change provides insurance against unexpected shifts in project requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmarking with Real-world Workloads
&lt;/h3&gt;

&lt;p&gt;Benchmark results are often the first thing teams look at when comparing vector search engines, but many published benchmarks fail to reflect real-world usage. Synthetic tests tend to focus on idealized conditions—fixed datasets, uniform queries, and read-heavy workloads—while ignoring the complexities of real applications. In production, your system may need to support frequent updates, concurrent queries, multi-modal filtering, and hybrid search across structured and unstructured data. These challenges can drastically affect actual performance, scalability, and reliability. &lt;/p&gt;

&lt;p&gt;To make an informed choice, prioritize benchmarks that replicate your expected workload patterns as closely as possible. Testing with real datasets, realistic query volumes, and operational constraints will provide a more accurate picture of how a vector search engine performs in your environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zilliztech/VectorDBBench?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;VDBBench&lt;/a&gt; is an open-source benchmark designed from the ground up to simulate production reality. Unlike synthetic tests that cherry-pick scenarios, VDBBench pushes databases through continuous ingestion, rigorous filtering conditions, and diverse scenarios, just like your actual production workloads. &lt;/p&gt;

&lt;p&gt;VDBBench GitHub: &lt;a href="https://github.com/zilliztech/VectorDBBench" rel="noopener noreferrer"&gt;https://github.com/zilliztech/VectorDBBench&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion and Next Steps
&lt;/h2&gt;

&lt;p&gt;Vector search has moved beyond niche applications to become a fundamental building block for many modern applications. The open source ecosystem offers multiple strong options, each with distinct advantages and trade-offs.&lt;/p&gt;

&lt;p&gt;For most teams just starting with vector search, Milvus provides a good balance of features, performance, and operational simplicity. Its comprehensive functionality and growing ecosystem make it suitable for a wide range of use cases, while fully managed options like Zilliz Cloud reduce operational overhead.&lt;/p&gt;

&lt;p&gt;For specific needs, alternatives like Faiss (performance-focused), Weaviate (knowledge graph integration), Qdrant (filtering capabilities), or Annoy (read-optimized workloads) may be better fits.&lt;/p&gt;

&lt;p&gt;Whatever you choose, start small, benchmark thoroughly against your specific workload, and validate assumptions before committing to a production deployment. Vector search technology continues to evolve rapidly, so staying engaged with the community around your chosen solution is essential for long-term success.&lt;/p&gt;

&lt;p&gt;Ready to get started? Most of these projects offer excellent quickstart guides, Docker containers for easy experimentation, and active communities eager to help newcomers. The best way to evaluate is to build a small proof of concept with your actual data and query patterns.&lt;/p&gt;

&lt;p&gt;Happy searching!&lt;/p&gt;

</description>
      <category>community</category>
    </item>
    <item>
      <title>Popular Video AI Models Every Developer Should Know</title>
      <dc:creator>Chloe Williams</dc:creator>
      <pubDate>Fri, 09 May 2025 21:19:26 +0000</pubDate>
      <link>https://dev.to/zilliz/popular-video-ai-models-every-developer-should-know-33ha</link>
      <guid>https://dev.to/zilliz/popular-video-ai-models-every-developer-should-know-33ha</guid>
      <description>&lt;p&gt;Ever wondered how Netflix recommends the perfect movie trailer, how security cameras detect unusual activity, or how sports broadcasters create instant highlights? The secret lies in Video AI models. Video AI models enable automated analysis of complex visual data in real-time, enhancing efficiency, accuracy, and decision-making across sports analytics, surveillance, and content creation. With the explosion of video content (~500 hours of video are uploaded to YouTube every minute), video-centric AI applications are more critical than ever. &lt;/p&gt;

&lt;p&gt;These applications become possible only due to the underlying AI models allowing machines to analyze, interpret, and even predict events from videos. Video AI models enable advanced tasks like object detection, action recognition, and semantic segmentation, which traditional computer vision models would fall short of. In this blog, we’ll dive into some of the most popular Video AI models every developer should know, and how they’re shaping the future of video intelligence.&lt;/p&gt;

&lt;h2&gt;
  
  
  YOLO (You Only Look Once): Real-Time Object Detection
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/1506.02640?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;YOLO (You Only Look Once)&lt;/a&gt; is a real-time object detection model that processes an entire image in a single pass through its neural network. Unlike traditional methods like R-CNN or Fast R-CNN that rely on region proposals and multiple stages, YOLO treats detection as a regression problem, predicting bounding boxes and class probabilities directly. This makes YOLO exceptionally fast and ideal for real-time applications with great accuracy. &lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features and Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Real-time speed&lt;/strong&gt; - Delivers predictions at high frame rates suitable for real-time object detection. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Accuracy&lt;/strong&gt; - Generates fewer false positives and effectively detects multiple objects in a single frame due to consideration of global context during training. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalable and lightweight&lt;/strong&gt; - Optimized versions of YOLO can be easily deployed on edge devices with limited computational power. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-scale prediction&lt;/strong&gt; - Later versions of YOLO perform predictions at multiple scales, improving accuracy for small objects. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  YOLO v8: Latest Advances
&lt;/h3&gt;

&lt;p&gt;YOLO v8 offers an improved architecture as compared to previous models in terms of speed, and accuracy, and can perform object detection, classification, and segmentation tasks. Some key modifications are as follows. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbj98rt5mut2lzqbkqq0m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbj98rt5mut2lzqbkqq0m.png" width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Basic architecture of YOLO v8 (&lt;a href="https://www.researchgate.net/figure/The-improved-YOLOv8-network-architecture-includes-an-additional-module-for-the-head_fig2_372207753?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Source&lt;/a&gt;)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Anchor-free detection&lt;/strong&gt; - The model directly predicts the center of an object instead of offsets, which helps improve the learning speed for custom datasets. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mosaic data augmentation&lt;/strong&gt; - For better generalization, four images are mixed during training to provide variable locations, occlusions, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;C2f module&lt;/strong&gt; - The model’s backbone consists of a C2f module instead of C3, which helps speed up the training process with improved gradient flow. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Decoupled head&lt;/strong&gt; - The classification and regression tasks are no longer performed together to increase the model performance. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Surveillance:&lt;/strong&gt; YOLO powers real-time crowd detection, intruder alerts, and suspicious activity monitoring in security cameras, enhancing public safety.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Retail:&lt;/strong&gt; Tasks like object counting, customer behavior analysis, and automated inventory tracking help retailers optimize stock management and improve the shopping experience.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sports:&lt;/strong&gt; YOLO enables real-time tracking of players, balls, and equipment, providing performance analytics and enhancing live broadcasts. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  MoViNet: Efficient Action Recognition for Embedding Extraction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2103.11511?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;MoViNet (Mobile Video Networks)&lt;/a&gt; is an advanced video recognition model for real-time action recognition tasks. MoViNet can stream videos for online inference while being computationally efficient and requiring minimal memory, as opposed to previous 2D CNN-based architectures, which were resource-intensive.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faduyhsqtsnrgcab21mdi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faduyhsqtsnrgcab21mdi.png" width="800" height="253"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MoViNet architecture for streaming eval (&lt;a href="https://www.analyticsvidhya.com/blog/2024/08/exploring-movinets-efficient-mobile-video-recognition/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Source&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;MoViNet leverages techniques such as Neural Architecture Search (NAS) for generating 3D CNN architectures, stream buffering, and temporal ensembles of streaming MoViNets to improve accuracy and efficiency. This makes MoViNets optimized to run on edge devices such as smartphones and wearables, where low latency is critical. &lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features and Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Low latency&lt;/strong&gt; - Optimized for real-time inference with low latency due to the stream buffer technique.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High performance&lt;/strong&gt; - Achieves competitive accuracy on various video recognition benchmarks. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt; - MoViNet models are available in different variants, allowing developers to choose a model that suits their resource constraints and performance needs. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  MoViNet for Edge Computing
&lt;/h3&gt;

&lt;p&gt;MoViNet is ideally built for edge devices due to its low memory requirements, computational efficiency, and low latency. Techniques like stream buffering (processing video streams frame by frame, keeping memory footprint constant, and reducing processing time) and causal convolutions (sequentially processing video frames) allow on-the-fly detection of various actions which makes MoViNets suitable for uses in fitness trackers, sports analytics, and autonomous vehicles.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sports Analytics&lt;/strong&gt; - MoViNet can analyze player actions and movements, helping coaches and analysts track performance, strategy adherence, and injury risks in real-time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Healthcare Monitoring&lt;/strong&gt; - In physical therapy or elder care, MoViNet can analyze patient movements to ensure exercise compliance, track rehabilitation progress, and detect falls or unusual behavior.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Smart Homes&lt;/strong&gt; - MoViNet enables activity recognition and gesture control, allowing users to intuitively  interact with smart home devices or automate home security with anomaly recognition.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  SlowFast: Temporal Modeling for Action Recognition
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/1812.03982?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;SlowFast&lt;/a&gt; is a two-stream convolutional neural network designed for action recognition. It treats spatial structures and temporal events separately, as not all spatiotemporal orientations change equally fast in a video. Hence, it processes video at two different frame rates: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft0xnqr77kfpshplpxs2k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft0xnqr77kfpshplpxs2k.png" width="800" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Low frame rate and high frame rate in SlowFast networks (&lt;a href="https://openaccess.thecvf.com/content_ICCV_2019/papers/Feichtenhofer_SlowFast_Networks_for_Video_Recognition_ICCV_2019_paper.pdf?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Source&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;1) The &lt;strong&gt;Slow pathway&lt;/strong&gt; operates at a low frame rate to capture spatial semantics (eg, objects or appearance). &lt;/p&gt;

&lt;p&gt;2) The &lt;strong&gt;Fast pathway&lt;/strong&gt; operates at a high frame rate, capturing rapid actions and fine temporal information (eg, clapping, waving, etc.). &lt;/p&gt;

&lt;p&gt;By combining high spatial resolution from the Slow stream with fine-grained motion cues from the Fast stream, SlowFast models achieve state-of-the-art performance in complex action recognition tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features and Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dual-Stream Architecture&lt;/strong&gt; - The slow and fast streams capture rich spatial information and temporal changes merged through lateral connection, making action recognition efficient. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Computational Efficiency&lt;/strong&gt; - The reduced number of channels in the fast stream makes the model lightweight and computationally efficient. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Action Recognition Performance&lt;/strong&gt; - The model executed state-of-the-art performance on Kinetics-400, Kinetics-600, Charades, and Ava datasets.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  SlowFast’s Strengths in Complex Video Analysis
&lt;/h3&gt;

&lt;p&gt;SlowFast network is particularly effective for complex action recognition tasks. The slow stream maintains a global understanding of the scenes over longer time spans while the fast stream precisely models rapid temporal changes. The fast stream has temporal convolutions in every block which helps to capture fine-grained temporal details. As a result, SlowFast can detect and recognize complex behaviors and actions. &lt;/p&gt;

&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sports Analytics&lt;/strong&gt; - SlowFast can be used to track player movements, capture team movement patterns, and for strategy analysis in matches. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Surveillance&lt;/strong&gt; - SlowFast can help detect suspicious behaviors by analyzing both rapid movements and long-term patterns, making them ideal for real-time security monitoring. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Entertainment&lt;/strong&gt; - SlowFast can classify action scenes in movies, enable automated content tagging, and offer personalized recommendations on entertainment platforms. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  TimeSformer: Transformers for Video Understanding and Embedding
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2102.05095?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;TimeSformer&lt;/a&gt; (Time-Space Transformer) is a video classification and action recognition model based purely on transformers, which are popularly used for Natural Language Processing tasks. The model uses self-attention mechanisms across both spatial and temporal dimensions, allowing efficient and accurate modeling of video data. Compared to modern 3D CNN models, TimeSformer is three times faster to train and requires less than one-tenth the amount of compute for inference, making it ideal for processing videos in real-time. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu84gxwpymu5g4f4o19i8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu84gxwpymu5g4f4o19i8.png" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Divided space-time attention in TimeSformer architecture (&lt;a href="https://ai.meta.com/blog/timesformer-a-new-architecture-for-video-understanding/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Source&lt;/a&gt;)&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features and Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt; - As expensive 3D convolutions are eliminated, large models can be trained on longer video clips (temporal extent of 102 seconds) enabling understanding of complex human actions. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Low Computational Cost&lt;/strong&gt; - The model has less computational cost as the input video is processed in a small set of patches and the type of self-attention used does not do an exhaustive comparison of all patches saving time and resources.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Divided Space-Time Attention&lt;/strong&gt; - The self-attention mechanism is split into two sub-parts - temporal attention and spatial attention - which increases the efficiency and accuracy of the model. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  TimeSformer for Video Classification and Recognition
&lt;/h3&gt;

&lt;p&gt;TimeSformer is a novel architecture based on transformers that helps to overcome the limitation of 3D convolutional filters for being unable to model space-time dependencies beyond their small receptive field. First, the input video is presented as a time-space sequence of image patches extracted from individual frames. These patches are processed through a self-attention mechanism (comparison of each patch to other patches) across both spatial and temporal dimensions. This makes TimeSformer perform video classification and recognition by capturing fine temporal details precisely and efficiently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Action Recognition&lt;/strong&gt; - TimeSformer works great to detect specific activities in videos, such as identifying different sports movements or human interactions in surveillance footage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Video Classification&lt;/strong&gt; - As TimeSformer can model long-range video sequences, it can categorize entire videos into genres or themes (e.g., comedy, sports, documentaries). &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Content Moderation&lt;/strong&gt; - TimeSformer can identify inappropriate or harmful content in videos for automated filtering on platforms like YouTube and TikTok.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  CLIP (Contrastive Language-Image Pre-Training): Bridging Text and Video
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/learn/exploring-openai-clip-the-future-of-multimodal-ai-learning?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;CLIP (Contrastive Language-Image Pretraining)&lt;/a&gt; is a multimodal AI model developed by OpenAI that learns visual concepts through natural language prompts. Having been trained on large-scale internet data, it generalizes well to various tasks such as video classification, action recognition, and OCR. CLIP associates text and images in a shared embedding space through contrastive pretraining which is then used for zero-shot classification. CLIP can be further extended to video by processing frames as image inputs and aligning them with textual descriptions which makes it highly effective for multimodal search and retrieval. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjt9b0621d8lch0j3sxl1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjt9b0621d8lch0j3sxl1.png" width="800" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Overview of CLIP architecture (&lt;a href="https://openai.com/index/clip/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Source&lt;/a&gt;)&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features and Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Text-video alignment&lt;/strong&gt; - CLIP can easily work with natural language to search within video content, e.g., ‘find all scenes with cats’.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost efficiency and performance&lt;/strong&gt; - CLIP overcomes the need for costly datasets and aids good real-world performance with out-of-the-box predictions. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Efficiency and flexibility&lt;/strong&gt; - CLIP learns from unfiltered, noisy, and varied data making, it highly efficient for a variety of tasks and is flexible to adapt to new tasks. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Versatility&lt;/strong&gt; - CLIP works across various video tasks such as retrieval, captioning, and action recognition. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  CLIP for Video Search and Retrieval
&lt;/h3&gt;

&lt;p&gt;CLIP can be used for video search and retrieval by processing the video as individual frames and creating their embeddings along with text descriptions in a shared embedding space. With this, users can leverage a vector database like &lt;a href="https://milvus.io/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt; or &lt;a href="https://zilliz.com/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt; to simply search the video through natural language queries. For example, when searching for ‘all scenes with a beach’ in a video of Miami, all relevant clips can be retrieved that best match this description. Furthermore, CLIP can be combined with other AI models such as TimeSformer or SlowFast for enhanced analysis of motion dynamics and pave the way for advanced applications such as automated content tagging and video summarization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multimodal Search and Retrieval&lt;/strong&gt; – Users can search for content within images or videos using natural language queries, making it useful for media libraries and stock footage platforms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automated Content Moderation&lt;/strong&gt; – CLIP can detect inappropriate or harmful content in images or videos, helping with social media moderation and copyright enforcement. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Image and Video Captioning&lt;/strong&gt; – CLIP can generate descriptive captions for images and videos, which is valuable for automating annotations or for enhancing content discovery. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  I3D (Inflated 3D ConvNet): 3D Convolutions for Video Embeddings
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;I3D (Inflated 3D ConvNet)&lt;/strong&gt; is a convolutional architecture introduced by DeepMind that extends traditional 2D Convolutional Neural Networks (CNNs) into 3D spatiotemporal models by inflating the pre-trained 2D CNN filters into 3D volumetric filters. 3D filters are created by adding an additional channel that allows treating a single image as a video during training. 3D convolutions enable the model to capture temporal dynamics for video understanding tasks better. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F50x1jyjumzkvtod21cch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F50x1jyjumzkvtod21cch.png" width="800" height="312"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An example of inflated Inception-V1 architecture (left) and its inception submodule (right) (&lt;a href="https://arxiv.org/pdf/1705.07750v3.pdf?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Source&lt;/a&gt;)&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features and Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Inflated 3D Convolutions&lt;/strong&gt; - Pre-trained 2D CNN weights are used for initialization to enhance training efficiency. 2D CNN filters are inflated to 3D by replicating them across time. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Large Scale Training on Kinetics Dataset&lt;/strong&gt; - I3D has been pre-trained on the Kinetics dataset (human action recognition) making it robust for transfer learning. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Two-Stream Architecture&lt;/strong&gt; - I3D also incorporates an optional two-stream architecture where the RGB stream processes the raw video frames for appearance features whereas the optical flow stream uses precomputed optical flow for motion cues to improve accuracy. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Human Action Recognition&lt;/strong&gt; - I3D is great for action recognition tasks in surveillance, healthcare, or in sports analytics (eg: ‘running’, ‘playing soccer’, etc.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gesture Recognition&lt;/strong&gt; - I3D can also detect hand gestures for sign language translation or AR/VR interactions. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Video Anomaly Detection&lt;/strong&gt; - I3D can be leveraged to identify unusual events such as accidents or theft in security footage. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Video AI models are changing how we perceive video data by extracting as much relevant information as possible. They are transforming industries such as surveillance, sports analytics, and healthcare through various applications. In this blog, we discussed the most popular video AI models every developer should know to unlock new use cases and research. YOLO excels in object detection, MoViNet in efficient action recognition, and SlowFast in temporal modeling. TimeSformer leverages transformers for long-range video understanding, while CLIP bridges text and video for multimodal search, and I3D uses 3D convolutions for spatiotemporal modeling. Together, these cutting-edge models can empower the future of video intelligence. &lt;/p&gt;

</description>
      <category>community</category>
    </item>
    <item>
      <title>The Great AI Agent Protocol Race: Function Calling vs. MCP vs. A2A</title>
      <dc:creator>Chloe Williams</dc:creator>
      <pubDate>Tue, 29 Apr 2025 06:17:37 +0000</pubDate>
      <link>https://dev.to/zilliz/the-great-ai-agent-protocol-race-function-calling-vs-mcp-vs-a2a-2k5b</link>
      <guid>https://dev.to/zilliz/the-great-ai-agent-protocol-race-function-calling-vs-mcp-vs-a2a-2k5b</guid>
      <description>&lt;p&gt;If you’ve been keeping an eye on the AI dev world lately, you’ve probably noticed something: everyone is now talking about &lt;a href="https://zilliz.com/blog/what-exactly-are-ai-agents-why-openai-and-langchain-are-fighting-over-their-definition?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;AI Agents&lt;/strong&gt;&lt;/a&gt; — not just smart chatbots, but full-blown autonomous programs that can use tools, call APIs, and even collaborate with each other. LangChain and OpenAI even had a &lt;a href="https://zilliz.com/blog/what-exactly-are-ai-agents-why-openai-and-langchain-are-fighting-over-their-definition?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;debate&lt;/a&gt; over the definition of “AI Agents.”&lt;/p&gt;

&lt;p&gt;But as soon as you start building serious AI Agent systems, one big headache hits you: &lt;strong&gt;there’s no clear, universal way for Agents to work with tools — or with each other.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Right now, three major approaches are competing to define the future of AI agent architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Function Calling&lt;/strong&gt;: OpenAI's pioneering approach — teaching LLMs to make API calls like junior developers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MCP (Model Context Protocol)&lt;/strong&gt;: Anthropic’s attempt to create a standard toolkit interface across models and services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A2A (Agent-to-Agent Protocol)&lt;/strong&gt;: Google’s brand-new spec for letting different Agents &lt;em&gt;talk to each other&lt;/em&gt; and &lt;em&gt;work as a team&lt;/em&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every major AI player — OpenAI, Anthropic, Google — is quietly betting that &lt;strong&gt;whoever defines these standards will shape the future agent ecosystem&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;For developers building beyond basic chatbots, understanding these protocols isn't just about keeping up — it's about avoiding painful rewrites down the road.&lt;/p&gt;

&lt;p&gt;Here's what we'll cover in his post:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;What is &lt;strong&gt;Function Calling&lt;/strong&gt; Why it made tool use possible — but why it’s not enough.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How &lt;strong&gt;MCP&lt;/strong&gt; tries to fix the mess by creating a real protocol for tools and models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What &lt;strong&gt;A2A&lt;/strong&gt; adds by making Agents work together like teams, not loners.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How you should &lt;strong&gt;actually think about using them&lt;/strong&gt; (without wasting time chasing hype).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Function Calling: The Pioneer with Growing Pains
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Function Calling&lt;/strong&gt;, popularized by OpenAI and now adopted by Meta, Google, and others, was the first mainstream approach to connecting LLMs with external tools. Think of it as teaching your LLM to write API calls based on natural language requests.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fknlsqkd4qq3ku4nh0aen.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fknlsqkd4qq3ku4nh0aen.png" alt="Figure 1- Function calling workflow (Credit @Google Cloud)" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 1: Function calling workflow (Credit @Google Cloud)&lt;/p&gt;

&lt;p&gt;The workflow is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;User asks a question ("What's the weather in Seattle?")&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LLM recognizes it needs external data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It selects the appropriate function from your predefined list&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It formats parameters following JSON Schema:&lt;br&gt;
5&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "location": "Seattle",
  "unit": "celsius"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Your application executes the actual API call&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The LLM incorporates the returned data into its response&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For developers, Function Calling feels like giving your AI a cookbook of API recipes it can follow. For simple applications with a single model, it's nearly plug-and-play. To learn more about how to use function calling for building applications, check out the following articles: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/blog/function-calling-ollama-llama-3-milvus?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;How to Use Function Calling with Ollama, Llama3 and Milvus - Zilliz blog&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/blog/harnessing-function-calling-to-build-smarter-llm-apps?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Understanding Function Calling in LLMs - Zilliz blog&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But there's a significant drawback when scaling: &lt;strong&gt;no cross-model consistency&lt;/strong&gt;. Each LLM provider implements function calling differently. Want to support both Claude and GPT? You'll need to maintain separate function definitions and handle different response formats.&lt;/p&gt;

&lt;p&gt;It's like having to rewrite your restaurant order in a different language for each chef in the kitchen. This M×N problem becomes unwieldy fast as you add more models and tools.&lt;/p&gt;

&lt;p&gt;Function Calling also lacks native support for &lt;strong&gt;multi-step function chains&lt;/strong&gt;. If the output from one function needs to feed into another, you're handling that orchestration yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP (Model Context Protocol): The Universal Translator for AI and Tools
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://modelcontextprotocol.io/introduction?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;MCP (Model Context Protocol)&lt;/a&gt; addresses precisely these scaling issues. Backed by Anthropic and gaining support across models like Claude, GPT, Llama, and others, MCP introduces a standardized way for LLMs to interact with external tools and data sources.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How MCP Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Think of MCP as the "USB standard for AI tools" — a universal interface that ensures compatibility:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tools advertise their capabilities&lt;/strong&gt; using a standardized format, describing available actions, required inputs, and expected outputs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AI models read these descriptions&lt;/strong&gt; and can automatically understand how to use the tools&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Applications integrate once&lt;/strong&gt; and gain compatibility across the AI ecosystem&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;​​MCP transforms the messy M×N integration problem into a more manageable M+N problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The MCP Architecture&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;MCP uses a client-server model with four key components:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftoepsxxzgsyu5wntbvoi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftoepsxxzgsyu5wntbvoi.png" alt="Figure 2- The MCP architecture (Credit @Anthropic)" width="800" height="551"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 2: The MCP architecture (Credit @Anthropic)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MCP Hosts&lt;/strong&gt;: The applications where users interact with AI (like Claude Desktop or AI-enhanced code editors)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MCP Clients&lt;/strong&gt;: The connectors that manage communication between hosts and servers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MCP Servers&lt;/strong&gt;: Tool implementations that expose functionality through the MCP standard&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Sources&lt;/strong&gt;: The underlying files, databases, APIs and services that provide information&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If Function Calling is like having to speak multiple languages to different chefs, MCP is like having a universal translator in the kitchen. Define your tools once, and any MCP-compatible model can use them without custom code. This dramatically reduces the marginal cost of adding new models or tools to your application. As someone who's dealt with integration headaches, that's music to my ears.&lt;/p&gt;

&lt;h2&gt;
  
  
  A2A (Agent-to-Agent Protocol): The Team Coordinator for AI Agents
&lt;/h2&gt;

&lt;p&gt;While Function Calling and MCP focus on model-to-tool interaction, &lt;a href="https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;A2A&lt;/a&gt; (Agent-to-Agent Protocol), introduced by Google, tackles a different challenge: &lt;strong&gt;How do we get multiple specialized agents to collaborate effectively?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As AI agent architectures grow more complex, it quickly becomes clear that no single agent should handle everything. You might have one agent specialized in document summarization, another in database queries, and another in user interaction.&lt;/p&gt;

&lt;p&gt;A2A defines a lightweight, open protocol that lets different Agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Discover&lt;/strong&gt; each other and advertise their capabilities,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Delegate&lt;/strong&gt; tasks dynamically to the best-suited Agent,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Coordinate&lt;/strong&gt; progress and share real-time updates securely.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjvwbufyvympn1dnfzz2s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjvwbufyvympn1dnfzz2s.png" alt="Figure 3- How A2A works (credit @Google)" width="800" height="569"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 3: How A2A works (credit @Google)&lt;/p&gt;

&lt;p&gt;A2A facilitates communication between a "client" agent that manages tasks and a "remote" agent that executes them. If Function Calling gives an agent access to tools, A2A lets agents form effective teams.&lt;/p&gt;

&lt;p&gt;Consider hiring a software engineer: A hiring manager could task their agent to find candidates matching specific criteria. This agent then collaborates with specialized agents to source candidates, schedule interviews, and facilitate background checks — all through a unified interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Comparison: Function Calling vs MCP vs A2A
&lt;/h2&gt;

&lt;p&gt;It's tempting to see these protocols as competitors, but they actually solve different pieces of the agent ecosystem puzzle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Function Calling&lt;/strong&gt; connects models to individual tools (limited but simple)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MCP&lt;/strong&gt; standardizes tool access across different models (more scalable)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A2A&lt;/strong&gt; enables collaboration between independent agents (higher-level orchestration)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Function Calling&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;MCP&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;A2A&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;What it solves&lt;/td&gt;
&lt;td&gt;Model → API calls&lt;/td&gt;
&lt;td&gt;Model → Tools access, standardized&lt;/td&gt;
&lt;td&gt;Agent → Agent collaboration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Good for&lt;/td&gt;
&lt;td&gt;Simple real-time queries&lt;/td&gt;
&lt;td&gt;Scalable tool ecosystems&lt;/td&gt;
&lt;td&gt;Distributed multi-agent workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pain points&lt;/td&gt;
&lt;td&gt;No standard, messy multi-model support&lt;/td&gt;
&lt;td&gt;Need to set up servers&lt;/td&gt;
&lt;td&gt;Still early days, limited support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-world analogy&lt;/td&gt;
&lt;td&gt;Teaching your AI to make phone calls&lt;/td&gt;
&lt;td&gt;Having any smart app access any database/API easily&lt;/td&gt;
&lt;td&gt;Having teams of bots working together like coworkers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In architectural terms, MCP answers "what tools can my agent use?" while A2A handles "how can my agents work together?"&lt;/p&gt;

&lt;p&gt;This resembles how we structure complex software: individual components with well-defined interfaces, composed into larger systems. An effective agent ecosystem needs both tool interfaces (Function Calling/MCP) and inter-agent communication (A2A).&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Developers
&lt;/h2&gt;

&lt;p&gt;So, what should you, as a developer building with AI, do with these competing standards?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;For simple applications&lt;/strong&gt;: Function Calling remains the quickest path to adding tool use to your LLM application, especially if you're only using one model provider.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;For cross-model compatibility&lt;/strong&gt;: Consider adopting MCP, which gives you broader model support without duplicating integration work.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;For complex multi-agent systems&lt;/strong&gt;: Keep an eye on A2A, which could become crucial as agent ecosystems mature.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The smart play might be to layer these approaches: use Function Calling for quick prototyping, but implement MCP adapters for better scalability, with A2A orchestration for multi-agent workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Road Ahead
&lt;/h2&gt;

&lt;p&gt;The conversation around what makes an "AI Agent" is still evolving — sometimes even debated between companies like OpenAI, Anthropic, and LangChain.&lt;/p&gt;

&lt;p&gt;But regardless of definitions, one thing is clear: &lt;strong&gt;Standards like Function Calling, MCP, and A2A are laying the foundation for the next generation of AI applications.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For developers, understanding these patterns early is an investment in future-proofing your work. It's how we move from toy demos to production-ready systems — the kind that solve real problems at scale. The agent ecosystem is developing rapidly, and building on these protocols now means positioning your applications for what's coming next.&lt;/p&gt;

&lt;p&gt;What do you think? Which protocols are you using in your AI projects? Are you betting on one standard winning out, or preparing for a multi-protocol future?&lt;/p&gt;

&lt;h2&gt;
  
  
  More Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/blog/function-calling-ollama-llama-3-milvus?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;How to Use Function Calling with Ollama, Llama3 and Milvus - Zilliz blog&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/blog/harnessing-function-calling-to-build-smarter-llm-apps?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Understanding Function Calling in LLMs - Zilliz blog&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/blog/how-to-use-anthropic-mcp-server-with-milvus?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;How to Use Anthropic MCP Server with Milvus - Zilliz blog&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/blog/what-exactly-are-ai-agents-why-openai-and-langchain-are-fighting-over-their-definition?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;What are AI Agents? Why LangChain Fights with OpenAI? - Zilliz blog&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/blog/top-10-ai-agents-to-watch-in-2025?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Top 10 AI Agents to Watch in 2025 🚀 - Zilliz blog&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/blog/critical-role-of-vectordbs-in-building-intelligent-ai-agents?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;How VectorDBs Power Intelligent AI Agents - Zilliz blog&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>community</category>
    </item>
    <item>
      <title>What Exactly Are AI Agents? Why OpenAI and LangChain Are Fighting Over Their Definition?</title>
      <dc:creator>Chloe Williams</dc:creator>
      <pubDate>Wed, 23 Apr 2025 06:53:28 +0000</pubDate>
      <link>https://dev.to/zilliz/what-exactly-are-ai-agents-why-openai-and-langchain-are-fighting-over-their-definition-4bl</link>
      <guid>https://dev.to/zilliz/what-exactly-are-ai-agents-why-openai-and-langchain-are-fighting-over-their-definition-4bl</guid>
      <description>&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;At the simplest level, &lt;strong&gt;AI agents&lt;/strong&gt; are software programs powered by artificial intelligence that can perceive their environment, make decisions, and take actions to achieve a goal—often autonomously. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OpenAI and LangChain recently debated what truly defines an agent — simplicity vs. flexibility is the core divide.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Agents differ from LLMs, chatbots, and workflows by being goal-driven, tool-using, and proactive.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AI Agents are already used in coding, business ops, healthcare, education, personal productivity, and many other areas. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🥊 The OpenAI vs. LangChain “AI Agent” Debate
&lt;/h2&gt;

&lt;p&gt;The AI community witnessed a fascinating debate in early 2025 when OpenAI released its &lt;a href="https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf?ref=blog.langchain.dev&amp;amp;utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;comprehensive guide to AI agents&lt;/a&gt;, which prompted a &lt;a href="https://blog.langchain.dev/how-to-think-about-agent-frameworks/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;swift response from LangChain&lt;/a&gt;. This public exchange highlighted fundamental differences in how major players conceptualize AI agents and revealed important distinctions that every developer should understand.&lt;/p&gt;

&lt;p&gt;Let’s talk drama first. 🙂&lt;/p&gt;

&lt;h3&gt;
  
  
  What Happened? What Sparked the Controversy?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;OpenAI, in their new documentation for the Assistants API, explained how to build agents using their platform, complete with tools, memory, threads, and a planning architecture.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;However, they described AI agents in a high-level, somewhat simplified manner: as large language models (LLMs) with memory and tools that can achieve goals.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Then, LangChain, whose entire framework revolves around agent workflows, dropped a response blog: &lt;a href="https://blog.langchain.dev/how-to-think-about-agent-frameworks/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;“How to Think About Agent Frameworks”&lt;/a&gt;. And it didn't pull punches.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  LangChain’s Core Argument:
&lt;/h3&gt;

&lt;p&gt;LangChain argued that OpenAI’s guide:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Oversimplifies what agents are&lt;/strong&gt; – reducing them to just tool-using LLMs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Misrepresents existing frameworks&lt;/strong&gt; – implying LangChain-style agents are unstable or unreliable because of flaws in the architecture, not because of current limitations in LLM reasoning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ignores the core “agent loop”&lt;/strong&gt; – the concept of an agent continuously reasoning and deciding what to do next is critical, and it’s not front and center in OpenAI’s model.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Why Do They See It Differently?
&lt;/h3&gt;

&lt;p&gt;This isn’t just a clash of opinions — it’s a difference in &lt;strong&gt;philosophy&lt;/strong&gt; and &lt;strong&gt;design priorities&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Perspective&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;OpenAI&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;LangChain&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Focus&lt;/td&gt;
&lt;td&gt;API-first, productized “agent-like” experience for devs&lt;/td&gt;
&lt;td&gt;Open-source, modular framework for complex agent systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Design&lt;/td&gt;
&lt;td&gt;Abstracts away the inner loop for stability and ease&lt;/td&gt;
&lt;td&gt;Embraces reasoning loops and flexibility, even if fragile&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Goal&lt;/td&gt;
&lt;td&gt;Make it simple to add memory, tools, and goals to your assistant&lt;/td&gt;
&lt;td&gt;Let devs build sophisticated, customizable multi-step agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tradeoff&lt;/td&gt;
&lt;td&gt;More controlled and user-friendly, but maybe less “agentic”&lt;/td&gt;
&lt;td&gt;More powerful and flexible, but higher risk of tool misuse or reasoning errors&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Who’s “Right”?
&lt;/h3&gt;

&lt;p&gt;Honestly? Both have good points.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;OpenAI wants to productize agents safely and cleanly for the average developer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LangChain wants to push the boundaries of autonomy and reasoning, even if it’s messier.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So if you’re just getting started and want something that works? OpenAI’s Assistants API is solid. If you’re building ambitious workflows and need total control? LangChain might be the better fit.&lt;/p&gt;

&lt;p&gt;The good news: this debate is fueling clarity in the space. It’s pushing the whole AI world to ask: “What does it really mean to build an autonomous, intelligent, goal-driven AI system?”&lt;/p&gt;

&lt;p&gt;And that’s the question we’ll dig into for the rest of this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  🔍 So, What Exactly Are AI Agents?
&lt;/h2&gt;

&lt;p&gt;Imagine waking up to find your coffee already brewing, your calendar optimized for the day, and your inbox sorted with draft responses ready for your approval. Meanwhile, your code repository has been scanned overnight, bugs fixed, and tests automatically generated. Welcome to the future. &lt;/p&gt;

&lt;p&gt;At the simplest level, &lt;strong&gt;AI agents are software programs powered by artificial intelligence that can perceive their environment, make decisions, and take actions to achieve a goal—often autonomously&lt;/strong&gt;. Unlike traditional software that follows rigid, pre-programmed instructions, AI agents can operate with varying degrees of autonomy, learning from their interactions and adapting their behavior accordingly.&lt;/p&gt;

&lt;p&gt;Think of an AI agent as a digital assistant on steroids – one that doesn't just respond to your commands but anticipates needs, solves problems, and accomplishes tasks with minimal human supervision. The key distinction is autonomy and goal-orientation: agents are built to pursue objectives rather than simply process inputs.&lt;/p&gt;

&lt;p&gt;To put it in everyday terms, if traditional software is like a bicycle that goes exactly where you steer it, an AI agent is more like a self-driving car that gets you to your destination while handling the navigation details itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  How AI Agents Work
&lt;/h2&gt;

&lt;p&gt;Let's peek under the hood of these AI agents. At their core, AI agents follow what we call a &lt;strong&gt;"perception-think-action loop"&lt;/strong&gt; – but don't let the fancy term intimidate you. It's actually pretty intuitive when you break it down:&lt;/p&gt;

&lt;h3&gt;
  
  
  The Perception-Think-Action Loop
&lt;/h3&gt;

&lt;p&gt;Think of this as the agent's basic rhythm:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Perception&lt;/strong&gt;: First, your agent takes in information. This could be your typed request, data from APIs, sensor readings, or even the content of files. It's basically gathering all the context it needs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reasoning&lt;/strong&gt;: Now comes the thinking part. The agent (usually powered by a Large Language Model or LLM) processes what it's perceived. It's asking itself: "What's really being asked here? What's the goal? What information do I have and what do I need?"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Planning&lt;/strong&gt;: This is where agents really shine compared to simpler AI systems. The agent maps out a sequence of steps to achieve the goal. If the task is complex, it might break it down into sub-tasks and determine dependencies.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Action&lt;/strong&gt;: Time to get things done! The AI agent executes its plan by utilizing the tools at its disposal – it may call an API, query a &lt;a href="https://zilliz.com/learn/what-is-vector-database?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;vector database&lt;/a&gt;, generate code, or even control physical devices if they are connected to it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Learning &amp;amp; Adaptation&lt;/strong&gt;: After taking action, the agent evaluates the results. Did it work? If not, why? It uses this feedback to adjust its approach, either immediately for the current task or to improve future performance.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let me share how this works with a concrete example. Say you tell your coding agent: "Create a weather dashboard for my city."&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Perception&lt;/strong&gt;: It processes your request and understands you want a weather dashboard application.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reasoning&lt;/strong&gt;: It determines it needs to: find your location, access weather data, create a visualization interface, and package it as a usable application.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Planning&lt;/strong&gt;: It maps out steps like:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;First, determine your location (either ask you or use default settings)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Research weather APIs that offer the needed data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Design a UI layout with key weather metrics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write front-end code for visualization&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Set up API connections to fetch real-time data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Package everything into a deployable application&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Action&lt;/strong&gt;: The agent starts executing these steps. It might ask you for your location, generate API authentication code for a weather service, create HTML/CSS/JS for the dashboard, and test that the data flows correctly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Learning&lt;/strong&gt;: If you say the temperature display is too small, it adapts and regenerates that component with a larger font. It remembers this preference for future tasks.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Secret Sauce: Tool Use
&lt;/h3&gt;

&lt;p&gt;What makes today's agents truly powerful is their ability to use tools – they're not limited to just generating text responses. An advanced agent might:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Write and execute code in various programming languages&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Call external APIs to get real-time data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Search the web for information&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Interact with databases&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Control browser automation tools&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generate and manipulate images or other media&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This tool's use capability is what transforms a "smart chatbot" into a genuine AI agent. The agent can extend its capabilities beyond what's built into its core model by leveraging these external tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Components of an AI Agent
&lt;/h2&gt;

&lt;p&gt;Modern AI agents are complex systems comprised of several critical components working together to create intelligent, goal-oriented behavior. Let's break down these essential building blocks: &lt;/p&gt;

&lt;h3&gt;
  
  
  1. Foundation AI Models
&lt;/h3&gt;

&lt;p&gt;At the core of most AI agents is a foundation model, typically a Large Language Model (LLM) like GPT-4, Claude, or Llama that provides the reasoning capabilities. These models act as the "brain" of the agent, enabling it to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Process and generate natural language&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Understand context and nuance&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Apply common sense reasoning to novel situations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generate plans and evaluate alternatives&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The choice of foundation model significantly impacts an agent's capabilities, with more advanced models generally offering better reasoning but at higher computational costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Memory Systems 
&lt;/h3&gt;

&lt;p&gt;Unlike simple chatbots, sophisticated AI agents maintain various types of memory:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Short-term memory&lt;/strong&gt;: Keeps track of the current conversation or task context&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Long-term memory&lt;/strong&gt;: Stores persistent information like user preferences or learned knowledge&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Episodic memory&lt;/strong&gt;: Records specific interactions or "experiences" for future reference&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For instance, a customer service agent remembering your previous issues when you contact support again exemplifies effective memory utilization.&lt;/p&gt;

&lt;p&gt;Vector databases like &lt;a href="https://milvus.io/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt; and &lt;a href="https://zilliz.com/cloud?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt; usually play a key role in powering the memory system of AI agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Tool Use Systems
&lt;/h3&gt;

&lt;p&gt;Today's most capable agents can leverage external tools to overcome the limitations of language models alone:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;API connections to external services&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Search engines and knowledge bases&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Database access&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Code execution environments&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Other specialized AI models (like image generators)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This tool use capability transforms agents from passive responders to active problem-solvers that can affect the world outside their language model.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Planning and Reasoning Systems
&lt;/h3&gt;

&lt;p&gt;Advanced agents incorporate explicit planning components that help them break down complex goals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Task decomposition&lt;/strong&gt;: Breaking larger goals into manageable subtasks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reasoning chains&lt;/strong&gt;: Using techniques like &lt;a href="https://zilliz.com/learn/chain-of-agents-large-language-models-collaborating-on-long-context-tasks?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;chain-of-thought&lt;/a&gt; (COT) to work through problems step-by-step&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Self-reflection&lt;/strong&gt;: Evaluating the quality of their own plans and outputs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Feedback incorporation&lt;/strong&gt;: Learning from successes and failures to improve future plans&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Agent Frameworks and Orchestration
&lt;/h3&gt;

&lt;p&gt;Most production AI agents are built on specialized frameworks that handle the complex integration of the above components. For example: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LangChain&lt;/strong&gt;: Provides modular components for building agents with memory, tool-use capabilities, and prompt management in a flexible architecture&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LlamaIndex&lt;/strong&gt;: Specializes in knowledge-intensive applications, particularly for retrieving and reasoning over document collections&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenAI Agents SDK&lt;/strong&gt;: offers a simplified framework focused on reliable tool use with OpenAI's models&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These frameworks handle the complex plumbing needed for agents to function reliably, providing developers with abstractions for common agent patterns. Check out this blog for most popular AI frameworks: &lt;a href="https://zilliz.com/blog/10-open-source-llm-frameworks-developers-cannot-ignore-in-2025?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;10 Open-Source LLM Frameworks Developers Can’t Ignore in 2025&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Knowledge Retrieval Mechanisms
&lt;/h3&gt;

&lt;p&gt;Truly useful agents need access to specific knowledge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;RAG (&lt;/strong&gt;&lt;a href="https://zilliz.com/learn/Retrieval-Augmented-Generation?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;Retrieval-Augmented Generation&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;)&lt;/strong&gt;: Allows agents to pull relevant information from documents or databases before generating responses&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Knowledge graphs&lt;/strong&gt;: Provide structured relationships between concepts for more precise reasoning&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Vector search&lt;/strong&gt;: Enables semantic similarity matching rather than just keyword lookups&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hybrid retrieval&lt;/strong&gt;: Combines multiple approaches for more robust information access&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The knowledge component is often what transforms a generic agent into a domain-specific expert that can provide genuinely valuable insights or assistance.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Security and Safety Systems
&lt;/h3&gt;

&lt;p&gt;As agents gain more capabilities, safeguards become increasingly important:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Input filtering&lt;/strong&gt;: Screens requests for harmful content&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Output moderation&lt;/strong&gt;: Ensures responses meet safety guidelines&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Authorization boundaries&lt;/strong&gt;: Limits what actions agents can take&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Monitoring systems&lt;/strong&gt;: Tracks agent behavior and performance&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Explainability tools&lt;/strong&gt;: Makes agent reasoning transparent to users and developers&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These systems transform experimental agents into reliable, production-ready systems that can be trusted in real-world environments. &lt;/p&gt;

&lt;h2&gt;
  
  
  Vector Databases: The Backbone of Long-Term Agent Memory
&lt;/h2&gt;

&lt;p&gt;As mentioned above, AI agents to function effectively, they need a robust memory system that extends beyond short-term context. This is where vector databases emerge as a critical infrastructure component powering sophisticated agent architectures.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/learn/what-is-vector-database?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Vector databases&lt;/a&gt; such as &lt;a href="https://zilliz.com/what-is-milvus?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt; and &lt;a href="https://zilliz.com/cloud?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt; store information as high-dimensional vectors—mathematical representations that capture the semantic meaning of data whether it's text, images, audio, or other unstructured formats. This approach allows agents to perform similarity searches and retrieve contextually relevant information based on meaning rather than exact keyword matches. For example, when an agent encounters a new query, it can access its memory system to retrieve similar past interactions or relevant knowledge, enabling it to make informed decisions and adapt to new situations. Without such memory, agents would lack the continuity required for advanced reasoning and adaptive learning.&lt;/p&gt;

&lt;p&gt;To get started quickly on building an AI agent yourself, check out the tutorials below.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Tutorial: &lt;a href="https://zilliz.com/blog/build-graphrag-agent-with-neo4j-and-milvus?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Building a GraphRAG Agent With Neo4j and Milvus&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tutorial: &lt;a href="https://zilliz.com/blog/agentic-rag-using-claude-3.5-sonnet-llamaindex-and-milvus?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Agentic RAG with Claude 3.5 Sonnet, LlamaIndex, and Milvus&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tutorial: &lt;a href="https://zilliz.com/blog/build-ai-agent-for-rag-with-milvus-and-llamaindex?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Building an AI Agent for RAG with Milvus and LlamaIndex&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tutorial: &lt;a href="https://zilliz.com/blog/build-your-voice-assistant-agentic-rag-with-milvus-and-llama-3-2?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Stop Waiting, Start Building: Voice Assistant With Milvus and Llama 3.2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  AI Agents vs. Other AI Systems
&lt;/h2&gt;

&lt;p&gt;OK, so now you're probably wondering, "How are AI agents different from all the other AI stuff I've been using?" Great question! Let's clear up some confusion by comparing agents with their AI cousins: &lt;/p&gt;

&lt;h3&gt;
  
  
  AI Agents vs. LLMs (Even Advanced Ones) 
&lt;/h3&gt;

&lt;p&gt;Think of modern LLMs like GPT-4, Claude, or DeepSeek as incredibly powerful brains waiting for direction. Here's what separates them from true agents:&lt;/p&gt;

&lt;p&gt;LLMs by themselves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Function as "stateless" systems – forgetting context between sessions unless explicitly reminded&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generate impressive text, but can't take actions beyond the chat interface&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Respond to prompts rather than independently pursuing objectives&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even cutting-edge models with reasoning capabilities (like Claude 3.7 Sonnet with extended thinking or DeepSeek R1) and built-in search:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Can break down complex problems step-by-step&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Access real-time information beyond their training data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Produce sophisticated analysis and explanations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;But still operate within a reactive, prompt-response paradigm&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What transforms an LLM into an agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Persistent memory architecture using vector databases and state management&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tool integration frameworks that enable a diverse action space&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Planning systems that maintain progress toward defined goals&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Feedback loops that allow adaptation based on outcomes&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The difference is like having a brilliant consultant (LLM) versus an autonomous colleague (agent). The consultant gives excellent advice when asked but forgets you between meetings. The agent remembers your preferences, anticipates needs, takes initiative on your behalf, and learns from each interaction to serve you better over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI Agents vs. AI Assistants
&lt;/h3&gt;

&lt;p&gt;This is a subtle but important distinction that confuses many developers. AI assistants (like the basic versions of Siri, Alexa, or even Claude) are designed primarily to help users through conversation and simple predefined actions. They're focused on the human-AI interaction.&lt;/p&gt;

&lt;p&gt;AI agents go a step further:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;They can operate independently, even when you're not directly interacting with them&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;They have more agency to make decisions within their scope&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;They often work in the background on longer-running tasks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;They can be more proactive rather than just reactive&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, an AI assistant might help you book a flight when you ask it to. An AI agent might notice you've been discussing a trip, proactively research flight options based on your calendar availability, and then suggest the best times to book based on price trends it's been monitoring.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI Agents vs. Chatbots
&lt;/h3&gt;

&lt;p&gt;Traditional chatbots were designed for one thing: conversation. Even modern LLM-powered chatbots are primarily interfaces for communication. The differences from agents are stark:&lt;/p&gt;

&lt;p&gt;Chatbots:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;are conversation-first, with actions as an afterthought;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;typically wait for user prompts before doing anything;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;usually operate within a limited domain of knowledge.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AI Agents vs. AI Workflows
&lt;/h3&gt;

&lt;p&gt;If you've built AI applications before, you might have created workflow chains or pipelines. These are predetermined sequences of AI operations linked together. While useful, they differ from agents in critical ways:&lt;/p&gt;

&lt;p&gt;AI workflows are like assembly lines – efficient but rigid. They follow the same steps every time, and if something unexpected happens, they often break down. Agents are more like skilled workers who can adapt their approach based on circumstances.&lt;/p&gt;

&lt;h2&gt;
  
  
  Types of AI Agents
&lt;/h2&gt;

&lt;p&gt;Not all AI agents are created equal. Let me walk you through the main types seen in the wild, with real examples that might help you understand their unique characteristics:&lt;/p&gt;

&lt;h3&gt;
  
  
  Task-Specific Agents
&lt;/h3&gt;

&lt;p&gt;These are specialized agents designed to excel at particular jobs. They're like expert contractors you bring in for specific work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: GitHub Copilot for Docs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This coding documentation agent doesn't just generate documentation – it reads codebases, understands function signatures and dependencies, analyzes existing documentation patterns, and then creates contextually appropriate docs that match team styles. It can work across multiple files, maintaining consistency in terminology and approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Autonomous Agents
&lt;/h3&gt;

&lt;p&gt;These agents can work independently over extended periods with limited supervision. They're more like employees than tools. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: AutoGPT&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One of the first autonomous agents that caught widespread attention. You give it a high-level goal like "Create a successful blog about renewable energy," and it breaks this down into subtasks: researching current trends, identifying target audiences, planning content categories, drafting articles, finding relevant images, setting up publishing schedules, and analyzing traffic patterns to optimize future content. It can spend days or weeks pursuing these goals, making adjustments based on results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Agent Systems
&lt;/h3&gt;

&lt;p&gt;These involve multiple specialized agents working together, like a team with different roles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2308.10848?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;AgentVerse&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This framework exemplifies the multi-agent approach. In a content production environment, it might deploy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A research agent that gathers information on trending topics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A planning agent that outlines content structure&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multiple specialist writers focused on different aspects (technical details, beginner explanations, etc.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;An editor agent that ensures consistency across pieces&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A feedback agent that analyzes user engagement&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A coordinator agent that manages workflows and resolves conflicts&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The magic happens in the interactions – agents can debate approaches, request clarification from each other, and collaboratively solve problems in ways none could individually.&lt;/p&gt;

&lt;h3&gt;
  
  
  Embodied Agents
&lt;/h3&gt;

&lt;p&gt;These agents control or interact with physical systems in the real world.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Amazon's Warehouse Robots&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;These have evolved from simple path-following machines to sophisticated agents that adaptively navigate dynamic environments. They can reroute around obstacles, prioritize packages based on shipping deadlines, coordinate with other robots to prevent bottlenecks, and even predict and preposition themselves for anticipated order volumes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Cases for AI Agents
&lt;/h2&gt;

&lt;p&gt;Let's explore how AI agents are actually being used right now across different industries. These examples represent what's truly possible with today's technology:&lt;/p&gt;

&lt;h3&gt;
  
  
  Software Development
&lt;/h3&gt;

&lt;p&gt;In modern development workflows, coding agents transform productivity. A modern coding agent doesn't just write code snippets – it functions as a true development partner. Feed it a product spec, and it will architect a solution, generate the code across multiple files and functions, create appropriate tests, and then help debug any issues.&lt;/p&gt;

&lt;p&gt;For example, at recent hackathons, teams have used agents to build entire image processing applications. The agent handles everything from setting up the React frontend to implementing the backend APIs and database schema. When teams run into performance bottlenecks with large image processing, the agent analyzes the code, identifies the issue, and implements a more efficient algorithm, complete with proper error handling and edge case management. What would take days of work is accomplished in hours.&lt;/p&gt;

&lt;h3&gt;
  
  
  Business Operations
&lt;/h3&gt;

&lt;p&gt;Finance departments have been early adopters of agent technology. Many CFOs deploy accounting agents that completely transform month-end close processes. These agents don't just process transactions – they reconcile accounts across multiple systems, identify discrepancies, follow up on missing documentation, prepare financial statements with explanatory notes, and even suggest journal entries to correct issues they discover.&lt;/p&gt;

&lt;p&gt;The game-changer is how they handle exceptions. Rather than simply flagging problems for humans to resolve, they can reason through complex accounting rules to suggest appropriate treatments for unusual transactions. When encountering truly novel situations, they research accounting standards, propose solutions with citations to relevant guidance, and learn from accountants' feedback to handle similar situations autonomously in the future.&lt;/p&gt;

&lt;h3&gt;
  
  
  Healthcare
&lt;/h3&gt;

&lt;p&gt;Healthcare providers are using monitoring agents that go far beyond traditional alert systems. Hospitals implement patient monitoring agents that integrate data from electronic health records, bedside monitors, medication administration systems, and lab results. These agents don't just notify staff when readings exceed thresholds – they understand clinical context.&lt;/p&gt;

&lt;p&gt;For instance, when a patient's oxygen saturation drops, the agent checks recent medication administration, position changes, and historical patterns for that patient. It can distinguish between temporary fluctuations and concerning trends, only alerting staff when truly necessary. Over time, it learns each patient's baseline and normal variations, dramatically reducing false alarms while catching subtle early warning signs of deterioration that static monitoring would miss.&lt;/p&gt;

&lt;h3&gt;
  
  
  Education
&lt;/h3&gt;

&lt;p&gt;Educational agents are evolving from simple tutoring programs to comprehensive learning companions. University professors develop research mentor agents to support graduate students. These agents don't just answer questions – they help shape the entire research process.&lt;/p&gt;

&lt;p&gt;When a student begins a project, the agent helps refine research questions, suggests methodological approaches, identifies potential difficulties, and maps out a realistic timeline. As the student progresses, it reviews drafts, suggests improvements to experimental design, helps interpret results, and provides guidance on presenting findings effectively. Most impressively, it adapts its support based on each student's strengths, weaknesses, and learning style – providing more structure for those who need it while encouraging independence in others.&lt;/p&gt;

&lt;h3&gt;
  
  
  Personal Productivity
&lt;/h3&gt;

&lt;p&gt;Personal productivity agents are perhaps the most accessible use case for most people. A robust productivity agent transforms workload management. It's not just a glorified to-do list – it's a genuine workload management partner.&lt;/p&gt;

&lt;p&gt;It tracks projects across multiple tools (email, task managers, documents, calendar), identifies dependencies and potential conflicts, and proactively suggests schedule adjustments. When receiving new requests, it evaluates them against current commitments and helps determine what to prioritize or delegate. It drafts appropriate responses based on communication style and relationship with each person.&lt;/p&gt;

&lt;p&gt;What makes it truly valuable is how it learns preferences and working patterns over time. It recognizes which times of day are most suited for creative work versus meetings, which tasks tend to be procrastinated on, and how long similar tasks have typically taken in the past. It uses this knowledge to suggest realistic schedules that work with actual habits rather than some idealized productivity system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges and Considerations
&lt;/h2&gt;

&lt;p&gt;While AI agents present incredible opportunities, they also come with significant challenges that we need to address as developers and users:&lt;/p&gt;

&lt;h3&gt;
  
  
  Alignment Problems: When Agents Go Off-Track
&lt;/h3&gt;

&lt;p&gt;Consider an email management agent designed to prioritize inbox messages. Despite clear instructions about what "important" means, the agent might flag all messages from a manager as urgent (including lunch invitations) while categorizing client emergency requests as "can wait until tomorrow." Why? Because it observed the user responding quickly to their boss several times and learned the wrong pattern from this behavior.&lt;/p&gt;

&lt;p&gt;This is what's called an alignment problem – when agents optimize for goals that don't match the user's actual intentions. As agents gain more capabilities and autonomy, ensuring they accurately understand true objectives becomes critically important. The issue isn't about malicious AI but rather misunderstandings that can have significant consequences when agents have meaningful power to act independently.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Black Box Problem: Why Did It Do That?
&lt;/h3&gt;

&lt;p&gt;Have you ever had an agent make a decision that left you scratching your head? I remember reviewing code changes made by an agent that completely restructured our authentication system. The changes worked, but I had no idea why the agent thought this approach was better.&lt;/p&gt;

&lt;p&gt;Without transparency into agent reasoning, it's difficult to trust their decisions or learn from their approaches. The most effective agent systems I've worked with provide clear explanations of their decision-making process – not just what they did, but why they chose that approach over alternatives.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Headaches: New Attack Surfaces
&lt;/h3&gt;

&lt;p&gt;Giving agents access to systems creates new security considerations. A colleague of mine built an agent to help manage their AWS infrastructure. It was incredibly useful until it accidentally exposed sensitive configuration details in logs because it didn't understand the security implications.&lt;/p&gt;

&lt;p&gt;Agents often need broad access privileges to be useful, but this creates potential security vulnerabilities. Careful permission design, monitoring systems, and appropriate guardrails are essential – especially when agents interact with critical systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Responsibility Question: Who's Accountable?
&lt;/h3&gt;

&lt;p&gt;When your automated trading agent made a series of questionable trades that lost money, the question immediately arose: who's responsible? The developer who built it? You who deployed it? The company that created the underlying AI model?&lt;/p&gt;

&lt;p&gt;As agents take more autonomous actions in the world, we need clearer frameworks for accountability. This isn't just a legal question – it's also about designing appropriate human oversight and intervention mechanisms that preserve the efficiency benefits of automation while maintaining appropriate control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;If you're just starting to explore this world of AI agents, don't be intimidated. Start small – maybe with a personal productivity agent or a code assistant. Watch how it works, learn its strengths and limitations, and gradually expand the tasks you entrust to it. Before you know it, you'll be designing multi-agent systems to tackle complex workflows that previously required entire teams.&lt;/p&gt;

&lt;p&gt;For those already building agents, consider the human-agent relationship carefully. The most successful implementations I've seen don't aim to replace human workers, but rather to enhance their capabilities – handling routine tasks so that people can focus on creative problem-solving, strategic thinking, and interpersonal connections.&lt;/p&gt;

&lt;p&gt;Whether you're looking to build AI agents or just understand how they'll impact your work, there's no better time to dive in. The tools are becoming increasingly accessible, their capabilities more impressive, and their applications more diverse with each passing month.&lt;/p&gt;

&lt;h2&gt;
  
  
  References 
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf?ref=blog.langchain.dev&amp;amp;utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;OpenAI’s guide on building agents&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.anthropic.com/engineering/building-effective-agents?ref=blog.langchain.dev&amp;amp;utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Anthropic’s guide on building effective agents&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LangChain’s post: &lt;a href="https://blog.langchain.dev/how-to-think-about-agent-frameworks/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;How to think about agent frameworks&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/blog/10-open-source-llm-frameworks-developers-cannot-ignore-in-2025?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;10 Open-Source LLM Frameworks Developers Can’t Ignore in 2025&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>community</category>
    </item>
    <item>
      <title>Build RAG Chatbot 🤖 with LangChain, Milvus, Mistral AI Pixtral, and NVIDIA bge-m3</title>
      <dc:creator>Chloe Williams</dc:creator>
      <pubDate>Fri, 14 Mar 2025 07:00:00 +0000</pubDate>
      <link>https://dev.to/zilliz/build-rag-chatbot-with-langchain-milvus-mistral-ai-pixtral-and-nvidia-bge-m3-30om</link>
      <guid>https://dev.to/zilliz/build-rag-chatbot-with-langchain-milvus-mistral-ai-pixtral-and-nvidia-bge-m3-30om</guid>
      <description>&lt;h2&gt;
  
  
  Introduction to RAG
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/learn/Retrieval-Augmented-Generation" rel="noopener noreferrer"&gt;Retrieval-Augmented Generation (RAG)&lt;/a&gt; is a game-changer for GenAI applications, especially in conversational AI. It combines the power of pre-trained large language models (&lt;a href="https://zilliz.com/glossary/large-language-models-(llms)" rel="noopener noreferrer"&gt;LLMs&lt;/a&gt;) like OpenAI’s GPT with external knowledge sources stored in &lt;a href="https://zilliz.com/learn/what-is-vector-database" rel="noopener noreferrer"&gt;vector databases&lt;/a&gt; such as &lt;a href="https://zilliz.com/what-is-milvus" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt; and &lt;a href="https://zilliz.com/cloud" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt;, allowing for more accurate, contextually relevant, and up-to-date response generation. A RAG pipeline usually consists of four basic components: a vector database, an embedding model, an LLM, and a framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Components We'll Use for This RAG Chatbot
&lt;/h2&gt;

&lt;p&gt;This tutorial shows you how to build a simple RAG chatbot in Python using the following components:* &lt;a href="https://zilliz.com/blog/langchain-ultimate-guide-getting-started" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt;: An open-source framework that helps you orchestrate the interaction between LLMs, vector stores, embedding models, etc, making it easier to integrate a RAG pipeline.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://milvus.io/?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741723505748.1741726822764.46&amp;amp;__hssc=220948871.4.1741726822764&amp;amp;__hsfp=3541243462" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt;: An open-source vector database optimized to store, index, and search large-scale vector embeddings efficiently, perfect for use cases like RAG, semantic search, and recommender systems. If you hate to manage your own infrastructure, we recommend using &lt;a href="https://zilliz.com/cloud" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt;, which is a fully managed vector database service built on Milvus and offers a free tier supporting up to 1 million vectors.&lt;/li&gt;
&lt;li&gt;Mistral AI Pixtral: Pixtral is a cutting-edge image generation model designed for high-quality visual content creation. With a focus on artistic style transfer and detail accuracy, it excels in transforming text prompts into vibrant images. Ideal for applications in design, marketing, and creative fields, Pixtral enhances workflows with its versatility and aesthetic precision.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;NVIDIA bge-m3&lt;/em&gt;: The NVIDIA bge-m3 is a state-of-the-art generative model designed for high-performance multi-modal tasks, particularly in natural language processing and computer vision. Its strengths lie in real-time data processing and scalability, making it ideal for applications in interactive AI systems, creative content generation, and advanced analytical tools in various industries.By the end of this tutorial, you’ll have a functional chatbot capable of answering questions based on a custom knowledge base.Note: Since we may use proprietary models in our tutorials, make sure you have the required API key beforehand.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Install and Set Up LangChain
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 2: Install and Set Up Mistral AI Pixtral
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -qU "langchain[mistralai]"

import getpass
import os

if not os.environ.get("MISTRAL_API_KEY"):
  os.environ["MISTRAL_API_KEY"] = getpass.getpass("Enter API key for Mistral AI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("pixtral-12b-2409", model_provider="mistralai")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 3: Install and Set Up NVIDIA bge-m3
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -qU langchain-nvidia-ai-endpoints

import getpass
import os

if not os.environ.get("NVIDIA_API_KEY"):
  os.environ["NVIDIA_API_KEY"] = getpass.getpass("Enter API key for NVIDIA: ")

from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings

embeddings = NVIDIAEmbeddings(model="baai/bge-m3")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 4: Install and Set Up Milvus
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -qU langchain-milvus

from langchain_milvus import Milvus

vector_store = Milvus(embedding_function=embeddings)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 5: Build a RAG Chatbot
&lt;/h2&gt;

&lt;p&gt;Now that you’ve set up all components, let’s start to build a simple chatbot. We’ll use the &lt;a href="https://milvus.io/docs/overview.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741723505748.1741726822764.46&amp;amp;__hssc=220948871.4.1741726822764&amp;amp;__hsfp=3541243462" rel="noopener noreferrer"&gt;Milvus introduction doc&lt;/a&gt; as a private knowledge base. You can replace it with your own dataset to customize your RAG chatbot.    import bs4&lt;br&gt;
    from langchain import hub&lt;br&gt;
    from langchain_community.document_loaders import WebBaseLoader&lt;br&gt;
    from langchain_core.documents import Document&lt;br&gt;
    from langchain_text_splitters import RecursiveCharacterTextSplitter&lt;br&gt;
    from langgraph.graph import START, StateGraph&lt;br&gt;
    from typing_extensions import List, TypedDict&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://milvus.io/docs/overview.md",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("doc-style doc-post-content")
        )
    ),
)

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")


# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "nn".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;h3&gt;
  
  
  Test the Chatbot
&lt;/h3&gt;

&lt;p&gt;Yeah! You've built your own chatbot. Let's ask the chatbot a question.    response = graph.invoke({"question": "What data types does Milvus support?"})&lt;/p&gt;

&lt;h2&gt;
  
  
      print(response["answer"])
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Example Output
&lt;/h3&gt;

&lt;p&gt;Milvus supports various data types including sparse vectors, binary vectors, JSON, and arrays. Additionally, it handles common numerical and character types, making it versatile for different data modeling needs. This allows users to manage unstructured or multi-modal data efficiently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization Tips
&lt;/h2&gt;

&lt;p&gt;As you build your RAG system, optimization is key to ensuring peak performance and efficiency. While setting up the components is an essential first step, fine-tuning each one will help you create a solution that works even better and scales seamlessly. In this section, we’ll share some practical tips for optimizing all these components, giving you the edge to build smarter, faster, and more responsive RAG applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  LangChain optimization tips
&lt;/h3&gt;

&lt;p&gt;To optimize LangChain, focus on minimizing redundant operations in your workflow by structuring your chains and agents efficiently. Use caching to avoid repeated computations, speeding up your system, and experiment with modular design to ensure that components like models or databases can be easily swapped out. This will provide both flexibility and efficiency, allowing you to quickly scale your system without unnecessary delays or complications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Milvus optimization tips
&lt;/h3&gt;

&lt;p&gt;Milvus serves as a highly efficient vector database, critical for retrieval tasks in a RAG system. To optimize its performance, ensure that indexes are properly built to balance speed and accuracy; consider utilizing HNSW (Hierarchical Navigable Small World) for efficient nearest neighbor search where response time is crucial. Partitioning data based on usage patterns can enhance query performance and reduce load times, enabling better scalability. Regularly monitor and adjust cache settings based on query frequency to avoid latency during data retrieval. Employ batch processing for vector insertions, which can minimize database lock contention and enhance overall throughput. Additionally, fine-tune the model parameters by experimenting with the dimensionality of the vectors; higher dimensions can improve retrieval accuracy but may increase search time, necessitating a balance tailored to your specific use case and hardware infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistral AI Pixtral optimization tips
&lt;/h3&gt;

&lt;p&gt;Pixtral is optimized for multimodal RAG applications, requiring careful management of both textual and visual data retrieval. Improve retrieval efficiency by using specialized embeddings for different modalities—vector search for text and CLIP-based embeddings for images. Implement a multimodal ranking system to prioritize the most contextually relevant passages and images. Optimize model performance by structuring input prompts effectively, ensuring text and visual information are well-integrated without unnecessary repetition. Fine-tune temperature settings based on response requirements—lower values (0.1–0.2) for accuracy-driven applications, higher values for creative outputs. If deploying at scale, use parallel inference for handling large multimodal datasets efficiently. Streamline inference by leveraging batching and caching strategies, especially when handling frequently queried images and text pairs.&lt;/p&gt;

&lt;h3&gt;
  
  
  NVIDIA bge-m3 optimization tips
&lt;/h3&gt;

&lt;p&gt;To optimize the NVIDIA bge-m3 in a Retrieval-Augmented Generation (RAG) setup, ensure you're using the latest driver and CUDA toolkit for improved performance. Fine-tune the model hyperparameters such as learning rate and batch size based on your specific dataset to enhance efficiency. Employ mixed precision training to speed up computations and reduce memory usage. Utilize data augmentation techniques to increase the variability of your training dataset, helping the model generalize better. Additionally, streamline your retrieval process by implementing efficient indexing methods and caching frequently accessed data, which can significantly reduce latency during inference. Finally, monitor resource utilization with NVIDIA’s profiling tools to identify and address bottlenecks dynamically.&lt;/p&gt;

&lt;p&gt;By implementing these tips across your components, you'll be able to enhance the performance and functionality of your RAG system, ensuring it’s optimized for both speed and accuracy. Keep testing, iterating, and refining your setup to stay ahead in the ever-evolving world of AI development.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG Cost Calculator: A Free Tool to Calculate Your Cost in Seconds
&lt;/h2&gt;

&lt;p&gt;Estimating the cost of a Retrieval-Augmented Generation (RAG) pipeline involves analyzing expenses across vector storage, compute resources, and API usage. Key cost drivers include vector database queries, embedding generation, and LLM inference.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/rag-cost-calculator/" rel="noopener noreferrer"&gt;RAG Cost Calculator&lt;/a&gt; is a free tool that quickly estimates the cost of building a RAG pipeline, including chunking, embedding, vector storage/search, and LLM generation. It also helps you identify cost-saving opportunities and achieve up to 10x cost reduction on vector databases with the serverless option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/rag-cost-calculator/" rel="noopener noreferrer"&gt;Calculate your RAG cost now.&lt;/a&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0i8l16pgrop4ohq1ny1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0i8l16pgrop4ohq1ny1.png" alt="Calculate your RAG cost" width="800" height="404"&gt;&lt;/a&gt;Calculate your RAG cost&lt;/p&gt;

&lt;h2&gt;
  
  
  What Have You Learned?
&lt;/h2&gt;

&lt;p&gt;By diving into this tutorial, you’ve unlocked the power of combining cutting-edge tools to build a robust RAG system from scratch! You learned how LangChain acts as the glue, orchestrating the entire pipeline by seamlessly connecting your data sources, retrieval logic, and generative AI. With Milvus as your vector database, you saw firsthand how to store and query dense embeddings at scale, ensuring lightning-fast similarity searches that pull the most relevant context for your queries. Then came Mistral AI’s Mixtral, the LLM powerhouse that transforms retrieved snippets into coherent, human-like answers—showcasing its knack for multilingual understanding and creative problem-solving. And let’s not forget NVIDIA’s bge-m3, the embedding model that turns text into rich, multidimensional vectors, capturing semantic nuances so your system understands &lt;em&gt;exactly&lt;/em&gt; what users are asking for. Together, these tools form a dynamic quartet, turning raw data into actionable insights with precision and flair.&lt;/p&gt;

&lt;p&gt;But this tutorial didn’t stop at the basics—you also picked up pro tips for optimizing performance, like tweaking chunk sizes for better retrieval or fine-tuning prompts to guide Mixtral’s outputs. The cherry on top? That free RAG cost calculator you explored, which helps you balance accuracy and expenses as you scale. Now, imagine what you can build next! Whether it’s a customer support bot, a research assistant, or a personalized learning tool, you’ve got the blueprint to innovate. So fire up your IDE, experiment with these tools, and let your creativity run wild. The future of intelligent applications is in your hands—go build something amazing, share it with the world, and keep pushing the boundaries of what RAG can do! 🚀&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Resources
&lt;/h2&gt;

&lt;p&gt;🌟 In addition to this RAG tutorial, unleash your full potential with these incredible resources to level up your RAG skills.* &lt;a href="https://milvus.io/docs/multimodal_rag_with_milvus.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741723505748.1741726822764.46&amp;amp;__hssc=220948871.4.1741726822764&amp;amp;__hsfp=3541243462" rel="noopener noreferrer"&gt;How to Build a Multimodal RAG&lt;/a&gt; | Documentation&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://milvus.io/docs/how_to_enhance_your_rag.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741723505748.1741726822764.46&amp;amp;__hssc=220948871.4.1741726822764&amp;amp;__hsfp=3541243462" rel="noopener noreferrer"&gt;How to Enhance the Performance of Your RAG Pipeline&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://milvus.io/docs/graph_rag_with_milvus.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741723505748.1741726822764.46&amp;amp;__hssc=220948871.4.1741726822764&amp;amp;__hsfp=3541243462" rel="noopener noreferrer"&gt;Graph RAG with Milvus&lt;/a&gt; | Documentation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://zilliz.com/learn/How-To-Evaluate-RAG-Applications" rel="noopener noreferrer"&gt;How to Evaluate RAG Applications - Zilliz Learn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://zilliz.com/learn/generative-ai" rel="noopener noreferrer"&gt;Generative AI Resource Hub | Zilliz&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  We'd Love to Hear What You Think!
&lt;/h2&gt;

&lt;p&gt;We’d love to hear your thoughts! 🌟 Leave your questions or comments below or join our vibrant &lt;a href="https://discord.com/invite/milvus" rel="noopener noreferrer"&gt;Milvus Discord community&lt;/a&gt; to share your experiences, ask questions, or connect with thousands of AI enthusiasts. Your journey matters to us!If you like this tutorial, show your support by giving our &lt;a href="https://github.com/milvus-io/milvus" rel="noopener noreferrer"&gt;Milvus GitHub&lt;/a&gt; repo a star ⭐—it means the world to us and inspires us to keep creating! 💖&lt;/p&gt;

</description>
      <category>vectordatabase</category>
      <category>rag</category>
      <category>ai</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Tutorial: Build a RAG Chatbot with LangChain 🦜, Zilliz Cloud, Anthropic Claude 3 Opus, and Google Vertex AI text-embedding-004</title>
      <dc:creator>Chloe Williams</dc:creator>
      <pubDate>Wed, 12 Mar 2025 07:00:00 +0000</pubDate>
      <link>https://dev.to/zilliz/tutorial-build-a-rag-chatbot-with-langchain-zilliz-cloud-anthropic-claude-3-opus-and-google-24gg</link>
      <guid>https://dev.to/zilliz/tutorial-build-a-rag-chatbot-with-langchain-zilliz-cloud-anthropic-claude-3-opus-and-google-24gg</guid>
      <description>&lt;h2&gt;
  
  
  Introduction to RAG
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/learn/Retrieval-Augmented-Generation" rel="noopener noreferrer"&gt;Retrieval-Augmented Generation (RAG)&lt;/a&gt; is a game-changer for GenAI applications, especially in conversational AI. It combines the power of pre-trained large language models (&lt;a href="https://zilliz.com/glossary/large-language-models-(llms)" rel="noopener noreferrer"&gt;LLMs&lt;/a&gt;) like OpenAI’s GPT with external knowledge sources stored in &lt;a href="https://zilliz.com/learn/what-is-vector-database" rel="noopener noreferrer"&gt;vector databases&lt;/a&gt; such as &lt;a href="https://zilliz.com/what-is-milvus" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt; and &lt;a href="https://zilliz.com/cloud" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt;, allowing for more accurate, contextually relevant, and up-to-date response generation. A RAG pipeline usually consists of four basic components: a vector database, an embedding model, an LLM, and a framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Components We'll Use for This RAG Chatbot
&lt;/h2&gt;

&lt;p&gt;This tutorial shows you how to build a simple RAG chatbot in Python using the following components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://zilliz.com/blog/langchain-ultimate-guide-getting-started" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt;: An open-source framework that helps you orchestrate the interaction between LLMs, vector stores, embedding models, etc, making it easier to integrate a RAG pipeline.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://zilliz.com/cloud" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt;: a fully managed vector database-as-a-service platform built on top of the open-source &lt;a href="https://milvus.io/?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741723505748.1741726822764.46&amp;amp;__hssc=220948871.3.1741726822764&amp;amp;__hsfp=3541243462" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt;, designed to handle high-performance vector data processing at scale. It enables organizations to efficiently store, search, and analyze large volumes of unstructured data, such as text, images, or audio, by leveraging advanced vector search technology. It offers a free tier supporting up to 1 million vectors.&lt;/li&gt;
&lt;li&gt;Anthropic Claude 3 Opus: This advanced model in the Claude 3 series is designed for complex reasoning and nuanced conversations. It combines deep understanding with ethical considerations, making it ideal for sensitive applications like customer support, therapy chatbots, and content generation where context and empathy are paramount.&lt;/li&gt;
&lt;li&gt;Google Vertex AI text-embedding-004: This model specializes in creating high-quality text embeddings for diverse natural language processing tasks. Its strength lies in capturing semantic meaning and relationships effectively, making it suitable for applications such as semantic search, clustering, and recommendation systems. Ideal for developers seeking to enhance AI-driven insights from textual data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end of this tutorial, you’ll have a functional chatbot capable of answering questions based on a custom knowledge base.&lt;/p&gt;

&lt;p&gt;Note: Since we may use proprietary models in our tutorials, make sure you have the required API key beforehand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Install and Set Up LangChain
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 2: Install and Set Up Anthropic Claude 3 Opus
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -qU "langchain[anthropic]"

import getpass
import os

if not os.environ.get("ANTHROPIC_API_KEY"):
  os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Enter API key for Anthropic: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("claude-3-opus-latest", model_provider="anthropic")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 3: Install and Set Up Google Vertex AI text-embedding-004
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -qU langchain-google-vertexai

from langchain_google_vertexai import VertexAIEmbeddings

embeddings = VertexAIEmbeddings(model="text-embedding-004")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 4: Install and Set Up Zilliz Cloud
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -qU langchain-milvus

from langchain_milvus import Zilliz

vector_store = Zilliz(
    embedding_function=embeddings,
    connection_args={
        "uri": ZILLIZ_CLOUD_URI,
        "token": ZILLIZ_CLOUD_TOKEN,
    },
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 5: Build a RAG Chatbot
&lt;/h2&gt;

&lt;p&gt;Now that you’ve set up all components, let’s start to build a simple chatbot. We’ll use the &lt;a href="https://milvus.io/docs/overview.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741723505748.1741726822764.46&amp;amp;__hssc=220948871.3.1741726822764&amp;amp;__hsfp=3541243462" rel="noopener noreferrer"&gt;Milvus introduction doc&lt;/a&gt; as a private knowledge base. You can replace it with your own dataset to customize your RAG chatbot.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://milvus.io/docs/overview.md",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("doc-style doc-post-content")
        )
    ),
)

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")


# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "nn".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Test the Chatbot
&lt;/h3&gt;

&lt;p&gt;Yeah! You've built your own chatbot. Let's ask the chatbot a question.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response = graph.invoke({"question": "What data types does Milvus support?"})
print(response["answer"])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Example Output
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Milvus supports various data types including sparse vectors, binary vectors, JSON, and arrays. Additionally, it handles common numerical and character types, making it versatile for different data modeling needs. This allows users to manage unstructured or multi-modal data efficiently.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Optimization Tips
&lt;/h2&gt;

&lt;p&gt;As you build your RAG system, optimization is key to ensuring peak performance and efficiency. While setting up the components is an essential first step, fine-tuning each one will help you create a solution that works even better and scales seamlessly. In this section, we’ll share some practical tips for optimizing all these components, giving you the edge to build smarter, faster, and more responsive RAG applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  LangChain optimization tips
&lt;/h3&gt;

&lt;p&gt;To optimize LangChain, focus on minimizing redundant operations in your workflow by structuring your chains and agents efficiently. Use caching to avoid repeated computations, speeding up your system, and experiment with modular design to ensure that components like models or databases can be easily swapped out. This will provide both flexibility and efficiency, allowing you to quickly scale your system without unnecessary delays or complications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zilliz Cloud optimization tips
&lt;/h3&gt;

&lt;p&gt;Optimizing Zilliz Cloud for a RAG system involves efficient index selection, query tuning, and resource management. Use Hierarchical Navigable Small World (HNSW) indexing for high-speed, approximate nearest neighbor search while balancing recall and efficiency. Fine-tune ef_construction and M parameters based on your dataset size and query workload to optimize search accuracy and latency. Enable dynamic scaling to handle fluctuating workloads efficiently, ensuring smooth performance under varying query loads. Implement data partitioning to improve retrieval speed by grouping related data, reducing unnecessary comparisons. Regularly update and optimize embeddings to keep results relevant, particularly when dealing with evolving datasets. Use hybrid search techniques, such as combining vector and keyword search, to improve response quality. Monitor system metrics in Zilliz Cloud’s dashboard and adjust configurations accordingly to maintain low-latency, high-throughput performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Anthropic Claude 3 Opus optimization tips
&lt;/h3&gt;

&lt;p&gt;Claude 3 Opus is a powerful model for RAG applications requiring deep reasoning and high-quality responses. Optimize performance by structuring retrieval results effectively, ensuring that only the most relevant context is provided to avoid unnecessary token usage. Utilize a ranker to prioritize key passages before sending them to the model, preventing information overload and improving response quality. Fine-tune hyperparameters like temperature (0.1–0.3 for factual tasks) and top-k sampling to maintain accuracy while controlling response variation. If cost and speed are concerns, use Claude 3 Opus selectively for complex queries while relying on a smaller model like Claude 3 Haiku for simpler tasks. Implement caching for repeated or high-frequency queries to minimize API calls and improve latency. Use Claude’s parallel processing capabilities where applicable to handle multiple document queries efficiently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google Vertex AI text-embedding-004 optimization tips
&lt;/h3&gt;

&lt;p&gt;Google Vertex AI text-embedding-004 offers high-quality embeddings suitable for a wide range of RAG applications. To improve retrieval efficiency, reduce redundancy in input text by preprocessing data and focusing on key concepts and relevant context. For large-scale deployments, utilize batch processing to generate embeddings in parallel, reducing latency. Optimize search performance by implementing hybrid search strategies that combine traditional keyword matching with dense vector similarity. Fine-tune temperature settings to balance between creativity and precision, and adjust the model’s top-k and top-p parameters to control the variability of results. Cache embeddings for high-demand queries to reduce unnecessary processing, and refresh embeddings periodically to maintain relevance as new data is ingested.&lt;/p&gt;

&lt;p&gt;By implementing these tips across your components, you'll be able to enhance the performance and functionality of your RAG system, ensuring it’s optimized for both speed and accuracy. Keep testing, iterating, and refining your setup to stay ahead in the ever-evolving world of AI development.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG Cost Calculator: A Free Tool to Calculate Your Cost in Seconds
&lt;/h2&gt;

&lt;p&gt;Estimating the cost of a Retrieval-Augmented Generation (RAG) pipeline involves analyzing expenses across vector storage, compute resources, and API usage. Key cost drivers include vector database queries, embedding generation, and LLM inference.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/rag-cost-calculator/" rel="noopener noreferrer"&gt;RAG Cost Calculator&lt;/a&gt; is a free tool that quickly estimates the cost of building a RAG pipeline, including chunking, embedding, vector storage/search, and LLM generation. It also helps you identify cost-saving opportunities and achieve up to 10x cost reduction on vector databases with the serverless option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/rag-cost-calculator/" rel="noopener noreferrer"&gt;Calculate your RAG cost now.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0i8l16pgrop4ohq1ny1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0i8l16pgrop4ohq1ny1.png" alt="Calculate your RAG cost" width="800" height="404"&gt;&lt;/a&gt;Calculate your RAG cost&lt;/p&gt;

&lt;h2&gt;
  
  
  What Have You Learned?
&lt;/h2&gt;

&lt;p&gt;What have you learned?&lt;/p&gt;

&lt;p&gt;Wow, what an exciting journey you've embarked on! In this tutorial, you’ve seen how the integration of various cutting-edge technologies can culminate in a powerful RAG system. You started with LangChain as the robust framework that effortlessly ties all components together, orchestrating their collaboration seamlessly. It’s truly the backbone of your architecture, allowing for a smooth flow of data and requests.&lt;/p&gt;

&lt;p&gt;Next, we dove into how the Zilliz Cloud vector database enhances your application by enabling lightning-fast searches, ensuring that retrieving relevant information is not only efficient but also scalable. This rapid retrieval capability is fundamental for delivering a stellar user experience.&lt;/p&gt;

&lt;p&gt;We then explored how the Anthropic Claude 3 Opus LLM elevates your application’s conversational intelligence, empowering your system to generate engaging and contextually aware responses. With its capabilities, your user interactions can now feel more natural and dynamic.&lt;/p&gt;

&lt;p&gt;The magic doesn’t stop there! The Google Vertex AI text-embedding-004 model generates rich semantic representations, giving unique context to searches and responses. You also picked up on optimizing techniques and learned about using a free cost calculator to manage potential expenses.&lt;/p&gt;

&lt;p&gt;Now, it’s your turn! With the knowledge and tools you've gathered, you have an incredible opportunity to build, innovate, and optimize your very own RAG applications. Get out there, experiment, and let your creativity shine! The future is bright, and the possibilities are endless. Happy building!&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Resources
&lt;/h2&gt;

&lt;p&gt;🌟 In addition to this RAG tutorial, unleash your full potential with these incredible resources to level up your RAG skills.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://milvus.io/docs/multimodal_rag_with_milvus.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741723505748.1741726822764.46&amp;amp;__hssc=220948871.3.1741726822764&amp;amp;__hsfp=3541243462" rel="noopener noreferrer"&gt;How to Build a Multimodal RAG&lt;/a&gt; | Documentation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://milvus.io/docs/how_to_enhance_your_rag.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741723505748.1741726822764.46&amp;amp;__hssc=220948871.3.1741726822764&amp;amp;__hsfp=3541243462" rel="noopener noreferrer"&gt;How to Enhance the Performance of Your RAG Pipeline&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://milvus.io/docs/graph_rag_with_milvus.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741723505748.1741726822764.46&amp;amp;__hssc=220948871.3.1741726822764&amp;amp;__hsfp=3541243462" rel="noopener noreferrer"&gt;Graph RAG with Milvus&lt;/a&gt; | Documentation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://zilliz.com/learn/How-To-Evaluate-RAG-Applications" rel="noopener noreferrer"&gt;How to Evaluate RAG Applications - Zilliz Learn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://zilliz.com/learn/generative-ai" rel="noopener noreferrer"&gt;Generative AI Resource Hub | Zilliz&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  We'd Love to Hear What You Think!
&lt;/h2&gt;

&lt;p&gt;We’d love to hear your thoughts! 🌟 Leave your questions or comments below or join our vibrant &lt;a href="https://discord.com/invite/milvus" rel="noopener noreferrer"&gt;Milvus Discord community&lt;/a&gt; to share your experiences, ask questions, or connect with thousands of AI enthusiasts. Your journey matters to us!&lt;/p&gt;

&lt;p&gt;If you like this tutorial, show your support by giving our &lt;a href="https://github.com/milvus-io/milvus" rel="noopener noreferrer"&gt;Milvus GitHub&lt;/a&gt; repo a star ⭐—it means the world to us and inspires us to keep creating! 💖&lt;/p&gt;

</description>
      <category>vectordatabase</category>
      <category>rag</category>
      <category>tutorial</category>
      <category>ai</category>
    </item>
    <item>
      <title>How to Build a RAG Chatbot with LangChain, Milvus, Together AI Mixtral 8x7B Instruct v0.1, and OpenAI text-embedding-3-large</title>
      <dc:creator>Chloe Williams</dc:creator>
      <pubDate>Wed, 05 Mar 2025 17:00:00 +0000</pubDate>
      <link>https://dev.to/zilliz/how-to-build-a-rag-chatbot-with-langchain-milvus-together-ai-mixtral-8x7b-instruct-v01-and-324l</link>
      <guid>https://dev.to/zilliz/how-to-build-a-rag-chatbot-with-langchain-milvus-together-ai-mixtral-8x7b-instruct-v01-and-324l</guid>
      <description>&lt;h2&gt;
  
  
  Introduction to RAG
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/learn/Retrieval-Augmented-Generation" rel="noopener noreferrer"&gt;Retrieval-Augmented Generation (RAG)&lt;/a&gt; is a game-changer for GenAI applications, especially in conversational AI. It combines the power of pre-trained large language models (&lt;a href="https://zilliz.com/glossary/large-language-models-(llms)" rel="noopener noreferrer"&gt;LLMs&lt;/a&gt;) like OpenAI’s GPT with external knowledge sources stored in &lt;a href="https://zilliz.com/learn/what-is-vector-database" rel="noopener noreferrer"&gt;vector databases&lt;/a&gt; such as &lt;a href="https://zilliz.com/what-is-milvus" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt; and &lt;a href="https://zilliz.com/cloud" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt;, allowing for more accurate, contextually relevant, and up-to-date response generation. A RAG pipeline usually consists of four basic components: a vector database, an embedding model, an LLM, and a framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Components We'll Use for This RAG Chatbot
&lt;/h2&gt;

&lt;p&gt;This tutorial shows you how to build a simple RAG chatbot in Python using the following components:* &lt;a href="https://zilliz.com/blog/langchain-ultimate-guide-getting-started" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt;: An open-source framework that helps you orchestrate the interaction between LLMs, vector stores, embedding models, etc, making it easier to integrate a RAG pipeline.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://milvus.io/?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741036355418.1741044630283.13&amp;amp;__hssc=220948871.4.1741044630283&amp;amp;__hsfp=2816761639" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt;: An open-source vector database optimized to store, index, and search large-scale vector embeddings efficiently, perfect for use cases like RAG, semantic search, and recommender systems. If you hate to manage your own infrastructure, we recommend using &lt;a href="https://zilliz.com/cloud" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt;, which is a fully managed vector database service built on Milvus and offers a free tier supporting up to 1 million vectors.&lt;/li&gt;
&lt;li&gt;Together AI Mixtral 8x7B Instruct v0.1: This model offers a powerful blend of instruction-based learning and advanced natural language understanding. With its 8x7B architecture, it excels in generating coherent and context-aware responses. Ideal for applications like chatbots, content creation, and educational tools where user guidance and high-quality interaction are essential.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://zilliz.com/ai-models/text-embedding-3-large" rel="noopener noreferrer"&gt;text-embedding-3-large&lt;/a&gt;: OpenAI's text embedding model, generating embeddings with 1536 dimensions, designed for tasks like semantic search and similarity matching.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end of this tutorial, you’ll have a functional chatbot capable of answering questions based on a custom knowledge base.&lt;/p&gt;

&lt;p&gt;Note: Since we may use proprietary models in our tutorials, make sure you have the required API key beforehand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Install and Set Up LangChain
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 2: Install and Set Up Together AI Mixtral 8x7B Instruct v0.1
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -qU "langchain[together]"

import getpass
import os

if not os.environ.get("TOGETHER_API_KEY"):
  os.environ["TOGETHER_API_KEY"] = getpass.getpass("Enter API key for Together AI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("mistralai/Mixtral-8x7B-Instruct-v0.1", model_provider="together")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 3: Install and Set Up OpenAI text-embedding-3-large
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -qU langchain-openai

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 4: Install and Set Up Milvus
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -qU langchain-milvus

from langchain_milvus import Milvus

vector_store = Milvus(embedding_function=embeddings)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 5: Build a RAG Chatbot
&lt;/h2&gt;

&lt;p&gt;Now that you’ve set up all components, let’s start to build a simple chatbot. We’ll use the &lt;a href="https://milvus.io/docs/overview.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741036355418.1741044630283.13&amp;amp;__hssc=220948871.4.1741044630283&amp;amp;__hsfp=2816761639" rel="noopener noreferrer"&gt;Milvus introduction doc&lt;/a&gt; as a private knowledge base. You can replace it with your own dataset to customize your RAG chatbot.    import bs4&lt;br&gt;
    from langchain import hub&lt;br&gt;
    from langchain_community.document_loaders import WebBaseLoader&lt;br&gt;
    from langchain_core.documents import Document&lt;br&gt;
    from langchain_text_splitters import RecursiveCharacterTextSplitter&lt;br&gt;
    from langgraph.graph import START, StateGraph&lt;br&gt;
    from typing_extensions import List, TypedDict&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://milvus.io/docs/overview.md",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("doc-style doc-post-content")
        )
    ),
)

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")


# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "nn".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;h3&gt;
  
  
  Test the Chatbot
&lt;/h3&gt;

&lt;p&gt;Yeah! You've built your own chatbot. Let's ask the chatbot a question.    response = graph.invoke({"question": "What data types does Milvus support?"})&lt;/p&gt;

&lt;h2&gt;
  
  
      print(response["answer"])
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Example Output
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Milvus supports various data types including sparse vectors, binary vectors, JSON, and arrays. Additionally, it handles common numerical and character types, making it versatile for different data modeling needs. This allows users to manage unstructured or multi-modal data efficiently.&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Optimization Tips&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;As you build your RAG system, optimization is key to ensuring peak performance and efficiency. While setting up the components is an essential first step, fine-tuning each one will help you create a solution that works even better and scales seamlessly. In this section, we’ll share some practical tips for optimizing all these components, giving you the edge to build smarter, faster, and more responsive RAG applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  LangChain Optimization Tips
&lt;/h3&gt;

&lt;p&gt;To optimize LangChain, focus on minimizing redundant operations in your workflow by structuring your chains and agents efficiently. Use caching to avoid repeated computations, speeding up your system, and experiment with modular design to ensure that components like models or databases can be easily swapped out. This will provide both flexibility and efficiency, allowing you to quickly scale your system without unnecessary delays or complications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Milvus optimization tips
&lt;/h3&gt;

&lt;p&gt;Milvus serves as a highly efficient vector database, critical for retrieval tasks in a RAG system. To optimize its performance, ensure that indexes are properly built to balance speed and accuracy; consider utilizing HNSW (Hierarchical Navigable Small World) for efficient nearest neighbor search where response time is crucial. Partitioning data based on usage patterns can enhance query performance and reduce load times, enabling better scalability. Regularly monitor and adjust cache settings based on query frequency to avoid latency during data retrieval. Employ batch processing for vector insertions, which can minimize database lock contention and enhance overall throughput. Additionally, fine-tune the model parameters by experimenting with the dimensionality of the vectors; higher dimensions can improve retrieval accuracy but may increase search time, necessitating a balance tailored to your specific use case and hardware infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Together AI Mixtral 8x7B Instruct v0.1 optimization tips
&lt;/h3&gt;

&lt;p&gt;Together AI’s Mixtral 8x7B Instruct v0.1 uses a mixture-of-experts (MoE) architecture to balance efficiency and performance. Optimize retrieval by dynamically adjusting the number of retrieved documents based on query complexity to prevent overloading the context window. Structure prompts effectively, ensuring that critical details are at the start of the input to guide the model’s focus. Use a temperature of 0.1–0.3 for factual accuracy while tweaking top-k and top-p for balanced response generation. Together AI’s inference stack allows for optimized execution, so enable expert pruning to limit active pathways when full capacity isn’t needed. Implement caching strategies for common queries to minimize redundant processing. If integrating multiple models, use Mixtral 8x7B for medium-to-high complexity reasoning while offloading simpler queries to smaller, more efficient models.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenAI text-embedding-3-large optimization tips
&lt;/h3&gt;

&lt;p&gt;OpenAI text-embedding-3-large is a high-capacity embedding model designed for precise and rich semantic representation, making it ideal for RAG systems with complex document retrieval needs. Optimize efficiency by preprocessing and normalizing text to reduce noise before embedding generation. Use dimensionality reduction techniques, such as PCA, if storage or computational limits become a concern. When querying, leverage HNSW-based approximate nearest neighbor (ANN) search to accelerate retrieval while maintaining accuracy. Batch process embedding requests to reduce latency and optimize resource utilization. Implement re-ranking models to further refine top results based on query context. Regularly update the embedding store with newly ingested data to maintain retrieval relevance.By implementing these tips across your components, you'll be able to enhance the performance and functionality of your RAG system, ensuring it’s optimized for both speed and accuracy. Keep testing, iterating, and refining your setup to stay ahead in the ever-evolving world of AI development.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG Cost Calculator: A Free Tool to Calculate Your Cost in Seconds
&lt;/h2&gt;

&lt;p&gt;Estimating the cost of a Retrieval-Augmented Generation (RAG) pipeline involves analyzing expenses across vector storage, compute resources, and API usage. Key cost drivers include vector database queries, embedding generation, and LLM inference.&lt;a href="https://zilliz.com/rag-cost-calculator/" rel="noopener noreferrer"&gt;RAG Cost Calculator&lt;/a&gt; is a free tool that quickly estimates the cost of building a RAG pipeline, including chunking, embedding, vector storage/search, and LLM generation. It also helps you identify cost-saving opportunities and achieve up to 10x cost reduction on vector databases with the serverless option.&lt;a href="https://zilliz.com/rag-cost-calculator/" rel="noopener noreferrer"&gt;Calculate your RAG cost now.&lt;/a&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0i8l16pgrop4ohq1ny1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0i8l16pgrop4ohq1ny1.png" alt="Calculate your RAG cost" width="800" height="404"&gt;&lt;/a&gt;Calculate your RAG cost&lt;/p&gt;

&lt;h2&gt;
  
  
  What Have You Learned?
&lt;/h2&gt;

&lt;p&gt;What have you learned? This tutorial has taken you on an exciting journey through the integration of a powerful framework, a vector database, a state-of-the-art large language model (LLM), and an innovative embedding model to build a cutting-edge Retrieval-Augmented Generation (RAG) system. You've seen how the framework elegantly ties everything together, orchestrating the flow of data and commands like a maestro leading a symphony.&lt;/p&gt;

&lt;p&gt;With Milvus as your vector database, you've harnessed the power of speedy and efficient searches, allowing your system to quickly locate relevant information without breaking a sweat. The Together AI Mixtral 8x7B Instruct model has shown you how to infuse conversational intelligence into applications, helping your users interact with utmost ease and understanding. Meanwhile, the OpenAI text-embedding-3-large model has equipped you with the capability to create rich semantic representations, ensuring that your data isn’t just accurate but deeply meaningful.&lt;/p&gt;

&lt;p&gt;Don’t forget the optimization tips you've picked up along the way, and the handy free cost calculator that empowers you to budget your resources wisely. Now, it’s time to roll up your sleeves and dive into building, optimizing, and innovating your own RAG applications! The possibilities are immense, and with the knowledge you’ve gained, you’re well-prepared to create solutions that can truly make a difference. Go ahead, let your creativity run wild—your RAG adventure starts now!&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Resources
&lt;/h2&gt;

&lt;p&gt;🌟 In addition to this RAG tutorial, unleash your full potential with these incredible resources to level up your RAG skills.* &lt;a href="https://milvus.io/docs/multimodal_rag_with_milvus.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741036355418.1741044630283.13&amp;amp;__hssc=220948871.4.1741044630283&amp;amp;__hsfp=2816761639" rel="noopener noreferrer"&gt;How to Build a Multimodal RAG&lt;/a&gt; | Documentation&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://milvus.io/docs/how_to_enhance_your_rag.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741036355418.1741044630283.13&amp;amp;__hssc=220948871.4.1741044630283&amp;amp;__hsfp=2816761639" rel="noopener noreferrer"&gt;How to Enhance the Performance of Your RAG Pipeline&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://milvus.io/docs/graph_rag_with_milvus.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741036355418.1741044630283.13&amp;amp;__hssc=220948871.4.1741044630283&amp;amp;__hsfp=2816761639" rel="noopener noreferrer"&gt;Graph RAG with Milvus&lt;/a&gt; | Documentation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://zilliz.com/learn/How-To-Evaluate-RAG-Applications" rel="noopener noreferrer"&gt;How to Evaluate RAG Applications - Zilliz Learn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://zilliz.com/learn/generative-ai" rel="noopener noreferrer"&gt;Generative AI Resource Hub | Zilliz&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  We'd Love to Hear What You Think!
&lt;/h2&gt;

&lt;p&gt;We’d love to hear your thoughts! 🌟 Leave your questions or comments below or join our vibrant &lt;a href="https://discord.com/invite/milvus" rel="noopener noreferrer"&gt;Milvus Discord community&lt;/a&gt; to share your experiences, ask questions, or connect with thousands of AI enthusiasts. Your journey matters to us!&lt;/p&gt;

&lt;p&gt;If you like this tutorial, show your support by giving our &lt;a href="https://github.com/milvus-io/milvus" rel="noopener noreferrer"&gt;Milvus GitHub&lt;/a&gt; repo a star ⭐—it means the world to us and inspires us to keep creating! 💖&lt;/p&gt;

</description>
      <category>openai</category>
      <category>rag</category>
      <category>vectordatabase</category>
      <category>ai</category>
    </item>
    <item>
      <title>RAG Chatbot: Build with LangChain, Milvus, Fireworks AI 🔥Llama 3.1 8B Instruct, and Cohere embed-multilingual-v2.0</title>
      <dc:creator>Chloe Williams</dc:creator>
      <pubDate>Mon, 03 Mar 2025 23:51:21 +0000</pubDate>
      <link>https://dev.to/zilliz/rag-chatbot-build-with-langchain-milvus-fireworks-ai-llama-31-8b-instruct-and-cohere-34hg</link>
      <guid>https://dev.to/zilliz/rag-chatbot-build-with-langchain-milvus-fireworks-ai-llama-31-8b-instruct-and-cohere-34hg</guid>
      <description>&lt;h2&gt;
  
  
  Introduction to RAG
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/learn/Retrieval-Augmented-Generation" rel="noopener noreferrer"&gt;Retrieval-Augmented Generation (RAG)&lt;/a&gt; is a game-changer for GenAI applications, especially in conversational AI. It combines the power of pre-trained large language models (&lt;a href="https://zilliz.com/glossary/large-language-models-(llms)" rel="noopener noreferrer"&gt;LLMs&lt;/a&gt;) like OpenAI’s GPT with external knowledge sources stored in &lt;a href="https://zilliz.com/learn/what-is-vector-database" rel="noopener noreferrer"&gt;vector databases&lt;/a&gt; such as &lt;a href="https://zilliz.com/what-is-milvus" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt; and &lt;a href="https://zilliz.com/cloud" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt;, allowing for more accurate, contextually relevant, and up-to-date response generation. A RAG pipeline usually consists of four basic components: a vector database, an embedding model, an LLM, and a framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Components We'll Use for This RAG Chatbot
&lt;/h2&gt;

&lt;p&gt;This tutorial shows you how to build a simple RAG chatbot in Python using the following components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://zilliz.com/blog/langchain-ultimate-guide-getting-started" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt;: An open-source framework that helps you orchestrate the interaction between LLMs, vector stores, embedding models, etc, making it easier to integrate a RAG pipeline.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://milvus.io/?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741036355418.1741044630283.13&amp;amp;__hssc=220948871.3.1741044630283&amp;amp;__hsfp=2816761639" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt;: An open-source vector database optimized to store, index, and search large-scale vector embeddings efficiently, perfect for use cases like RAG, semantic search, and recommender systems. If you hate to manage your own infrastructure, we recommend using &lt;a href="https://zilliz.com/cloud" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt;, which is a fully managed vector database service built on Milvus and offers a free tier supporting up to 1 million vectors.&lt;/li&gt;
&lt;li&gt;Fireworks AI Llama 3.1 8B Instruct: This model is designed to deliver precise instructions and guidance through advanced reasoning capabilities. With its 8 billion parameters, it excels in generating coherent responses across various domains, making it ideal for educational tools, virtual assistants, and interactive content creation. Its strength lies in user engagement through personalized interactions.&lt;/li&gt;
&lt;li&gt;Cohere embed-multilingual-v2.0: This model specializes in generating high-quality multilingual embeddings, enabling effective cross-lingual understanding and retrieval. Its strengths lie in capturing semantic relationships in diverse languages, making it suitable for applications such as multilingual search, recommendation systems, and global content analysis where language diversity is a critical factor.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end of this tutorial, you’ll have a functional chatbot capable of answering questions based on a custom knowledge base.&lt;/p&gt;

&lt;p&gt;Note: Since we may use proprietary models in our tutorials, make sure you have the required API key beforehand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Install and Set Up LangChain
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 2: Install and Set Up Fireworks AI Llama 3.1 8B Instruct
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -qU "langchain[fireworks]"


import getpass
import os

if not os.environ.get("FIREWORKS_API_KEY"):
  os.environ["FIREWORKS_API_KEY"] = getpass.getpass("Enter API key for Fireworks AI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("accounts/fireworks/models/llama-v3p1-8b-instruct", model_provider="fireworks")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 3: Install and Set Up Cohere embed-multilingual-v2.0
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -qU langchain-cohere


import getpass
import os

if not os.environ.get("COHERE_API_KEY"):
  os.environ["COHERE_API_KEY"] = getpass.getpass("Enter API key for Cohere: ")

from langchain_cohere import CohereEmbeddings

embeddings = CohereEmbeddings(model="embed-multilingual-v2.0")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 4: Install and Set Up Milvus
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -qU langchain-milvus


from langchain_milvus import Milvus

vector_store = Milvus(embedding_function=embeddings)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 5: Build a RAG Chatbot
&lt;/h2&gt;

&lt;p&gt;Now that you’ve set up all components, let’s start to build a simple chatbot. We’ll use the &lt;a href="https://milvus.io/docs/overview.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741036355418.1741044630283.13&amp;amp;__hssc=220948871.3.1741044630283&amp;amp;__hsfp=2816761639" rel="noopener noreferrer"&gt;Milvus introduction doc&lt;/a&gt; as a private knowledge base. You can replace it with your own dataset to customize your RAG chatbot.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://milvus.io/docs/overview.md",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("doc-style doc-post-content")
        )
    ),
)

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")


# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "nn".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Test the Chatbot
&lt;/h3&gt;

&lt;p&gt;Yeah! You've built your own chatbot. Let's ask the chatbot a question.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response = graph.invoke({"question": "What data types does Milvus support?"})
print(response["answer"])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Example Output
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Milvus supports various data types including sparse vectors, binary vectors, JSON, and arrays. Additionally, it handles common numerical and character types, making it versatile for different data modeling needs. This allows users to manage unstructured or multi-modal data efficiently.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Optimization Tips
&lt;/h2&gt;

&lt;p&gt;As you build your RAG system, optimization is key to ensuring peak performance and efficiency. While setting up the components is an essential first step, fine-tuning each one will help you create a solution that works even better and scales seamlessly. In this section, we’ll share some practical tips for optimizing all these components, giving you the edge to build smarter, faster, and more responsive RAG applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  LangChain Optimization Tips
&lt;/h3&gt;

&lt;p&gt;To optimize LangChain, focus on minimizing redundant operations in your workflow by structuring your chains and agents efficiently. Use caching to avoid repeated computations, speeding up your system, and experiment with modular design to ensure that components like models or databases can be easily swapped out. This will provide both flexibility and efficiency, allowing you to quickly scale your system without unnecessary delays or complications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Milvus optimization tips
&lt;/h3&gt;

&lt;p&gt;Milvus serves as a highly efficient vector database, critical for retrieval tasks in a RAG system. To optimize its performance, ensure that indexes are properly built to balance speed and accuracy; consider utilizing HNSW (Hierarchical Navigable Small World) for efficient nearest neighbor search where response time is crucial. Partitioning data based on usage patterns can enhance query performance and reduce load times, enabling better scalability. Regularly monitor and adjust cache settings based on query frequency to avoid latency during data retrieval. Employ batch processing for vector insertions, which can minimize database lock contention and enhance overall throughput. Additionally, fine-tune the model parameters by experimenting with the dimensionality of the vectors; higher dimensions can improve retrieval accuracy but may increase search time, necessitating a balance tailored to your specific use case and hardware infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fireworks AI Llama 3.1 8B Instruct optimization tips
&lt;/h3&gt;

&lt;p&gt;Llama 3.1 8B Instruct is a cost-efficient model that delivers strong performance in RAG applications with moderate complexity. Optimize retrieval by limiting context length to only the most relevant passages, ensuring efficient token usage. Structure prompts clearly, with short, well-organized sections that guide the model’s focus. Keep temperature around 0.1–0.3 for accuracy and fine-tune top-k and top-p for flexibility. Cache high-frequency queries to minimize redundant processing and reduce API costs. Take advantage of Fireworks AI’s infrastructure to batch requests, optimizing efficiency for large-scale operations. Use response streaming to enhance interactivity in applications requiring fast feedback. If deploying multiple models, leverage 8B for simple queries and hand off more complex tasks to larger models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cohere embed-multilingual-v2.0 optimization tips
&lt;/h3&gt;

&lt;p&gt;Cohere embed-multilingual-v2.0 supports a variety of languages, making it ideal for cross-lingual RAG setups. To optimize efficiency, preprocess text to remove language-specific noise and handle encoding issues, ensuring clean input for embedding generation. Implement efficient ANN algorithms, like FAISS with hierarchical indexing, to support fast retrieval across multilingual datasets. Compress embeddings using techniques such as product quantization or HNSW to optimize storage and speed. Use language detection models to route queries to the appropriate language-specific embeddings, minimizing unnecessary computation. Batch embedding operations and take advantage of parallel processing to handle large amounts of multilingual data efficiently. Regularly update embeddings to ensure the model reflects any language shifts or evolving trends.&lt;/p&gt;

&lt;p&gt;By implementing these tips across your components, you'll be able to enhance the performance and functionality of your RAG system, ensuring it’s optimized for both speed and accuracy. Keep testing, iterating, and refining your setup to stay ahead in the ever-evolving world of AI development.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG Cost Calculator: A Free Tool to Calculate Your Cost in Seconds
&lt;/h2&gt;

&lt;p&gt;Estimating the cost of a Retrieval-Augmented Generation (RAG) pipeline involves analyzing expenses across vector storage, compute resources, and API usage. Key cost drivers include vector database queries, embedding generation, and LLM inference.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/rag-cost-calculator/" rel="noopener noreferrer"&gt;RAG Cost Calculator&lt;/a&gt; is a free tool that quickly estimates the cost of building a RAG pipeline, including chunking, embedding, vector storage/search, and LLM generation. It also helps you identify cost-saving opportunities and achieve up to 10x cost reduction on vector databases with the serverless option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://zilliz.com/rag-cost-calculator/" rel="noopener noreferrer"&gt;Calculate your RAG cost now.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0i8l16pgrop4ohq1ny1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0i8l16pgrop4ohq1ny1.png" alt="Calculate your RAG cost" width="800" height="404"&gt;&lt;/a&gt;Calculate your RAG cost&lt;/p&gt;

&lt;h2&gt;
  
  
  What Have You Learned?
&lt;/h2&gt;

&lt;p&gt;What have you learned? Wow, what an incredible journey we've taken together through the world of Retrieval-Augmented Generation (RAG)! You’ve successfully integrated a powerful framework with a cutting-edge vector database, an impressive large language model, and a sophisticated embedding model to create a next-gen RAG system. The joy of seeing these components work together is just fantastic, isn't it?&lt;/p&gt;

&lt;p&gt;You explored how the framework elegantly ties all the parts together, creating seamless workflows that make your projects feel more like magic than mere code. The lightning-fast searches powered by the vector database not only enhance performance but open up a universe of possibilities for retrieving relevant information at remarkable speeds! With the conversational intelligence provided by the LLM—Fireworks AI Llama 3.1—you can engage users like never before, making interactions feel natural and intuitive.&lt;/p&gt;

&lt;p&gt;Furthermore, the embedding model, Cohere embed-multilingual-v2.0, has given you remarkable capabilities in generating rich semantic representations, enabling you to capture nuances in language that can significantly enhance user experience. And let's not forget those handy optimization tips and that free cost calculator—tools designed to ensure you get the most value from your RAG application.&lt;/p&gt;

&lt;p&gt;So, what's next? Don’t let your newfound knowledge sit idle! Dive in, start building, optimizing, and innovating your own RAG applications. The world is eager for fresh ideas, and with the skills you’ve acquired, you’re more than ready to make a difference. Go ahead and unleash your creativity—your adventure in AI has just begun!&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Resources
&lt;/h2&gt;

&lt;p&gt;🌟 In addition to this RAG tutorial, unleash your full potential with these incredible resources to level up your RAG skills.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://milvus.io/docs/multimodal_rag_with_milvus.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741036355418.1741044630283.13&amp;amp;__hssc=220948871.3.1741044630283&amp;amp;__hsfp=2816761639" rel="noopener noreferrer"&gt;How to Build a Multimodal RAG&lt;/a&gt; | Documentation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://milvus.io/docs/how_to_enhance_your_rag.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741036355418.1741044630283.13&amp;amp;__hssc=220948871.3.1741044630283&amp;amp;__hsfp=2816761639" rel="noopener noreferrer"&gt;How to Enhance the Performance of Your RAG Pipeline&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://milvus.io/docs/graph_rag_with_milvus.md?__hstc=220948871.b077ca1e9148f3f681770b3bfdf39e1f.1740691563293.1741036355418.1741044630283.13&amp;amp;__hssc=220948871.3.1741044630283&amp;amp;__hsfp=2816761639" rel="noopener noreferrer"&gt;Graph RAG with Milvus&lt;/a&gt; | Documentation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://zilliz.com/learn/How-To-Evaluate-RAG-Applications" rel="noopener noreferrer"&gt;How to Evaluate RAG Applications - Zilliz Learn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://zilliz.com/learn/generative-ai" rel="noopener noreferrer"&gt;Generative AI Resource Hub | Zilliz&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  We'd Love to Hear What You Think!
&lt;/h2&gt;

&lt;p&gt;We’d love to hear your thoughts! 🌟 Leave your questions or comments below or join our vibrant &lt;a href="https://discord.com/invite/milvus" rel="noopener noreferrer"&gt;Milvus Discord community&lt;/a&gt; to share your experiences, ask questions, or connect with thousands of AI enthusiasts. Your journey matters to us!&lt;/p&gt;

&lt;p&gt;If you like this tutorial, show your support by giving our &lt;a href="https://github.com/milvus-io/milvus" rel="noopener noreferrer"&gt;Milvus GitHub&lt;/a&gt; repo a star ⭐—it means the world to us and inspires us to keep creating! 💖&lt;/p&gt;

</description>
      <category>rag</category>
      <category>vectordatabase</category>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>Scaling Audio Similarity Search with Vector Databases</title>
      <dc:creator>Chloe Williams</dc:creator>
      <pubDate>Sat, 01 Mar 2025 00:01:50 +0000</pubDate>
      <link>https://dev.to/zilliz/scaling-audio-similarity-search-with-vector-databases-2nn3</link>
      <guid>https://dev.to/zilliz/scaling-audio-similarity-search-with-vector-databases-2nn3</guid>
      <description>&lt;p&gt;Imagine being able to find a song you can’t quite remember—just by humming a few notes into an app and instantly having all the details pop up. Sounds like magic, right? Well, it’s not—it's audio similarity search in action. In today’s world of exponential data growth, where audio content is exploding, efficient audio similarity search is crucial for powering everything from music recommendations to real-time content retrieval and even complex audio classifications. As the sheer volume of audio data soars into the millions (and even billions), traditional search methods simply can’t keep up. Enter vector databases, the game-changer in enabling scalable and ultra-fast similarity searches by turning audio signals into high-dimensional embeddings. Let’s dig into how vector databases make large-scale audio similarity search a reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Audio Similarity Search 
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is audio similarity search? 
&lt;/h3&gt;

&lt;p&gt;At its core, &lt;a href="https://milvus.io/docs/audio_similarity_search.md?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;audio similarity search&lt;/a&gt; involves finding and retrieving audio that closely matches a given query. Instead of relying on traditional keyword searches, which depend on metadata or transcriptions, this technology uses machine learning models to analyze audio characteristics like pitch, timbre, rhythm, and more, offering a much more nuanced and accurate retrieval.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common use-cases 
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Music Recommendation&lt;/strong&gt; - Apps such as Spotify analyze audio features of the songs played often to suggest similar tracks, enhancing user experience. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Podcast Search&lt;/strong&gt; - Users can easily look for podcasts with similar content, voices, tones, or themes based on their preferences. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Speech Similarity&lt;/strong&gt; - Used in security applications and voice assistants to detect speaker identity or match spoken phrases. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Environmental Sound Recognition&lt;/strong&gt; - Used for wildlife monitoring by recognizing animal calls or for disaster response management by tracking the severity of earthquakes or landslides through audio cues. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Challenges in traditional audio search 
&lt;/h3&gt;

&lt;p&gt;Traditionally audio search has been heavily dependent on keywords which are manually assigned tags or transcriptions of the audio data. This approach requires very precise metadata and ignores rich audio features making the retrieval difficult and inaccurate. Additionally, as datasets grow, manually tagging and indexing audio files becomes impractical. Hence, modern approaches using embeddings and vector databases can make large-scale audio search possible. &lt;/p&gt;

&lt;h2&gt;
  
  
  The Magic of Vector Databases in Audio Search
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is a vector database, anyway? 
&lt;/h3&gt;

&lt;p&gt;A vector database is a specialized database that can store, index, and retrieve any kind of unstructured data - text, images, video, or audio in the form of &lt;a href="https://zilliz.com/glossary/vector-embeddings?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;vector embeddings.&lt;/a&gt; Embeddings are high-dimensional numerical representations (vectors) that capture the essential features of the data. These embeddings allow performing similarity search by mathematically comparing the query with the stored data enabling efficient and accurate retrieval. Vector databases offer capabilities such as scalability for large data, real-time processing, and high-speed retrieval making real-world applications possible. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnwkzb7fnseqvls01hdtt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnwkzb7fnseqvls01hdtt.png" width="800" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Creating vector embeddings from unstructured data&lt;/p&gt;

&lt;h3&gt;
  
  
  How do vector databases store and index embeddings? 
&lt;/h3&gt;

&lt;p&gt;A vector embedding is stored in a vector database along with its metadata which assists in efficient retrieval. Vector indexing helps to intelligently store the vector embeddings so that the search time is minimized. Common indexing techniques are &lt;a href="https://zilliz.com/learn/vector-index?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;IVF (Inverted File Index)&lt;/a&gt; and &lt;a href="https://zilliz.com/learn/hierarchical-navigable-small-worlds-HNSW?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;HNSW (Hierarchical Navigable Small World)&lt;/a&gt;. They partition the dataset in a way to minimize the search time.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Popular vector databases - Milvus and Zilliz Cloud
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://milvus.io/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt; is an open-source vector database that supports GPU acceleration and &lt;a href="https://zilliz.com/glossary/anns?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Approximate Nearest Neighbor (ANN)&lt;/a&gt; algorithms like HNSW, IVF, and PQ, making it ideal for applications such as audio similarity search, image retrieval, and recommendation systems. &lt;a href="https://zilliz.com/cloud?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt; is the fully managed, cloud-native version of Milvus, offering a serverless infrastructure with auto-scaling, high availability, and enterprise-grade security. These databases enable efficient handling of large-scale vector search tasks with minimal operational overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Audio Embeddings Enable Similarity Search 
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What are audio embeddings?
&lt;/h3&gt;

&lt;p&gt;Audio embeddings are numerical representations of audio signals that capture key sound characteristics such as pitch, tempo, rhythm, and timbre. These embeddings enable direct comparison of audio clips based on their inherent acoustic characteristics instead of relying on textual metadata. &lt;/p&gt;

&lt;h3&gt;
  
  
  What are the different techniques to generate audio embeddings?
&lt;/h3&gt;

&lt;p&gt;Before creating embeddings, the raw audio signals undergo preprocessing steps such as resampling (standardizing the sample rate for consistency), noise reduction (removing unwanted background sounds), and segmentation (dividing audio into meaningful chunks). &lt;/p&gt;

&lt;p&gt;Next, key audio features are extracted using different techniques:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mel-Frequency Cepstral Coefficients (MFCCs)&lt;/strong&gt;: These features mimic human auditory perception by capturing the spectral shape of a sound, making them useful for speech and music analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Spectrograms&lt;/strong&gt;: They are a visual representation of frequency over time, highlighting variations in pitch, intensity, and harmonic structures, which are widely used as input for deep learning models. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Chroma-based Features&lt;/strong&gt;: These capture the tonal content of an audio signal by emphasizing pitch class distribution.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once features are extracted, deep learning-based models further process them to generate high-dimensional embeddings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenL3&lt;/strong&gt;: A deep audio representation model trained on multimodal datasets, capturing a wide range of audio patterns for tasks like environmental sound recognition and music similarity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;YAMNet&lt;/strong&gt;: A model based on MobileNet, trained on the AudioSet dataset, which classifies and extracts embeddings for over 500 sound categories, including speech, instruments, and ambient noises.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;VGGish&lt;/strong&gt;: A deep neural network inspired by VGG, trained on YouTube videos, designed to extract generic audio features applicable to tasks like audio event detection and content-based retrieval.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once embeddings are generated, they are stored and indexed in a vector database, allowing for fast and scalable similarity search. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnn4pbl47uxk9sl0p01g0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnn4pbl47uxk9sl0p01g0.png" width="800" height="229"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Audio similarity search with Zilliz Cloud&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling Audio Similarity Search with Vector Databases
&lt;/h2&gt;

&lt;p&gt;As audio datasets could contain millions of files, performing efficient search and retrieval becomes a challenge. Vector databases play a crucial role in making audio search systems scalable by offering advanced search algorithms and optimized indexing strategies. &lt;/p&gt;

&lt;h3&gt;
  
  
  Managing Large-Scale Audio Datasets
&lt;/h3&gt;

&lt;p&gt;Handling massive audio datasets is possible with the help of techniques such as &lt;a href="https://zilliz.com/glossary/batch-processing?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;batch processing&lt;/a&gt;, distributed storage, and GPU-accelerated indexing offered by vector databases. They allow the processing of large volumes of audio embeddings without compromising performance. &lt;/p&gt;

&lt;h3&gt;
  
  
  Indexing Strategies for Efficient Search
&lt;/h3&gt;

&lt;p&gt;Vector databases optimize similarity search using indexing techniques like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HNSW (Hierarchical Navigable Small World)&lt;/strong&gt; - It is a graph-based indexing method that builds multiple layers of proximity-based connections of embeddings. The top layers contain scarcely connected nodes whereas the lower layers have denser connections. When a new query comes in, the traversal happens from the top to the bottom. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1byz6nkxei0kwxwrkfkq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1byz6nkxei0kwxwrkfkq.png" width="800" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Search in HNSW algorithm (&lt;a href="https://towardsdatascience.com/similarity-search-part-4-hierarchical-navigable-small-world-hnsw-2aad4fe87d37/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;Source&lt;/a&gt;)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;IVF (Inverted File Index)&lt;/strong&gt; - It splits the whole dataset into clusters using techniques like K-means clustering so when a new query comes the most similar cluster is found and further search happens within that.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PQ (Product Quantization)&lt;/strong&gt; - It compresses high-dimensional vectors into smaller sub-vectors, improving storage efficiency and search speed.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Handling High-Dimensional Data
&lt;/h3&gt;

&lt;p&gt;Audio embeddings are often high-dimensional, leading to the &lt;a href="https://zilliz.com/glossary/curse-of-dimensionality-in-machine-learning?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;curse of dimensionality&lt;/a&gt;, which causes increased computational cost and less effective indexing. Therefore, dimensionality reduction techniques such as Principal Component Analysis (PCA) or t-SNE can help reduce dimensions without losing critical audio features. Additionally, quantization techniques such as Product Quantization (PQ) and Scalar Quantization (SQ) can compress vectors to make them storage efficient. &lt;/p&gt;

&lt;h3&gt;
  
  
  Performing Real-time Search
&lt;/h3&gt;

&lt;p&gt;All in all, vector databases enable real-time search by maintaining a low latency rate through efficient Approximate Nearest Neighbor (ANN) search algorithms, fast indexing techniques, distributed processing, quantization, in-memory operations, GPU acceleration, and effective handling of high-dimensional data. &lt;/p&gt;

&lt;h2&gt;
  
  
  Tools and Frameworks for Scaling Audio Similarity Search 
&lt;/h2&gt;

&lt;p&gt;To build a scalable audio similarity search system choosing the right vector databases and suitable embedding models or libraries is crucial. Here’s how you can pick the ones relevant to you. &lt;/p&gt;

&lt;h3&gt;
  
  
  Which vector database is appropriate for you?
&lt;/h3&gt;

&lt;p&gt;Vector databases store and index high-dimensional embeddings on the scale of millions or billions making real-world applications fast and scalable. Some of the most popular vector databases are - &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Milvus&lt;/strong&gt; - Milvus is a highly scalable vector database (supports billion-scale vector search) built for real-time search and retrieval with efficient indexing methods such as HNSW and IVF. It is ideal for enterprise applications or for someone wanting an open-source yet scalable option. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Zilliz Cloud&lt;/strong&gt; - It is a fully managed, cloud-native version of Milvus, optimized for seamless scaling and deployment. It supports serverless architecture and integrates easily with AWS, Google Cloud, and other cloud providers. It is ideal for teams without dedicated DevOps resources who want a plug-and-play vector search solution. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;FAISS (Facebook AI Similarity Search)&lt;/strong&gt; - It is Facebook’s open-source library for performing quick similarity searches leveraging GPU acceleration. It is best suitable for offline, batch-based similarity search and research applications. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Which audio embedding model should you choose?
&lt;/h3&gt;

&lt;p&gt;Audio embeddings transform raw audio into meaningful feature vectors that can be compared in a vector space. The following models provide pre-trained embeddings - &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenL3&lt;/strong&gt;: A deep learning-based model that extracts general-purpose audio embeddings using self-supervised learning on multimodal datasets.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;VGGish&lt;/strong&gt;: A CNN-based model trained on YouTube-8M, commonly used for music and audio classification.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;YAMNet&lt;/strong&gt;: A MobileNet-based model trained on Google's AudioSet, specializing in environmental sound classification.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Other models like &lt;strong&gt;CLAP (Contrastive Language-Audio Pretraining)&lt;/strong&gt; and &lt;strong&gt;DEEP Audio Embeddings&lt;/strong&gt; provide domain-specific embeddings for speech processing and music retrieval. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Optimizing Performance and Efficiency in Large-Scale Audio Search 
&lt;/h2&gt;

&lt;p&gt;Performance and Efficiency in large-scale audio systems can be optimized by considering the following aspects. &lt;/p&gt;

&lt;h3&gt;
  
  
  Techniques to improve search speed and accuracy
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Approximate Nearest Neighbor (ANN) Search&lt;/strong&gt; - ANN algorithms quickly approximate the closest matches instead of exhaustively comparing every audio embedding. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Optimizing Memory Usage and Compute&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using Dimensionality reduction techniques like PCA (Principal Component Analysis) or Autoencoders reduces the size of embeddings, improving efficiency.&lt;/li&gt;
&lt;li&gt;Doing batch processing instead of single queries reduces computational overhead.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Ways to balance accuracy with computational efficiency
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Adjusting the parameters of the indexing algorithm and the search parameters of the vector database, for eg., adjusting ‘ef’ in Milvus Search increases the accuracy. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Using domain-specific embeddings and training custom models on task-specific datasets helps reduce noise and improve search quality. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Techniques to reduce latency in real-time applications
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Preloading embeddings into memory, performing distributed search, and using multi-GPU processing are some of the ways to reduce latency and speed up operations. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Challenges and Considerations 
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Privacy and Security&lt;/strong&gt; - Audio data such as personal voice notes, biometric speech patterns, or medical audio must be carefully protected as unauthorized access could lead to privacy violations. Encryption techniques and secure access control mechanisms (Zilliz Cloud offers role-based access control) which allow managing permissions can be used to safeguard user data. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability Challenges&lt;/strong&gt; - As the volume of audio datasets can keep increasing (millions to billions), the system must scale efficiently without compromising retrieval speed. Techniques like vector quantization, sharding, and HNSW indexing are essential to improve performance. Employing distributed storage solutions (Milvus deployed on Kubernetes) allows the system to handle high query loads while maintaining low latency. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model Drift&lt;/strong&gt; - The audio embeddings can become outdated as new sounds, voices, or music styles emerge making the search system less accurate. Therefore, continuous retraining on fresh data is necessary to keep embeddings relevant. Implementing drift detection techniques to monitor performance and embedding versioning to track updates can help keep search results accurate and updated. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ethical Considerations&lt;/strong&gt; - Mitigating bias in audio datasets is essential to ensure fair results. An embedding model predominantly trained or certain accents or languages may not cater well to others leading to unfair retrieval results. Therefore, having diverse and representative data is crucial. Additionally, using explainability techniques can provide transparency and make users trust and interpret the results more acceptably. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion 
&lt;/h2&gt;

&lt;p&gt;Audio similarity search powered by vector databases is transforming industries, from music recommendation to environmental monitoring. With the ability to handle vast datasets and offer lightning-fast retrieval, this technology opens up countless possibilities. But like any powerful tool, it requires careful handling of data privacy, scalability, and model relevance. As AI continues to evolve, audio similarity search will remain a foundational technology, unlocking new potential in the world of audio AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Reading 
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/learn/top-10-most-used-embedding-models-for-audio-data" rel="noopener noreferrer"&gt;https://zilliz.com/learn/top-10-most-used-embedding-models-for-audio-data&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/learn/unlocking-pre-trained-models-developers-guide-to-audio-ai-tasks" rel="noopener noreferrer"&gt;https://zilliz.com/learn/unlocking-pre-trained-models-developers-guide-to-audio-ai-tasks&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/vector-database-use-cases/audio-similarity-search" rel="noopener noreferrer"&gt;https://zilliz.com/vector-database-use-cases/audio-similarity-search&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://milvus.io/" rel="noopener noreferrer"&gt;https://milvus.io/&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://zilliz.com/cloud" rel="noopener noreferrer"&gt;https://zilliz.com/cloud&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>community</category>
    </item>
    <item>
      <title>DeepSeek vs. OpenAI: A Battle of Innovation in Modern AI</title>
      <dc:creator>Chloe Williams</dc:creator>
      <pubDate>Sat, 01 Mar 2025 00:01:47 +0000</pubDate>
      <link>https://dev.to/zilliz/deepseek-vs-openai-a-battle-of-innovation-in-modern-ai-50j8</link>
      <guid>https://dev.to/zilliz/deepseek-vs-openai-a-battle-of-innovation-in-modern-ai-50j8</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The rapid advancements in AI technology have led to models that not only excel in complex tasks but also seamlessly adapt to a wide range of applications, enhancing their utility across industries. OpenAI, a pioneer in this space, continues to push the boundaries with innovative models that redefine natural language processing and machine learning capabilities. These advancements have sparked a wave of innovation, thereby making AI more accessible, efficient, and capable of performing tasks that once seemed out of reach.&lt;/p&gt;

&lt;p&gt;However, OpenAI's dominance is now being challenged by emerging competitors, such as DeepSeek, a Chinese AI company that has introduced &lt;a href="https://github.com/deepseek-ai/DeepSeek-R1?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;DeepSeek R1&lt;/strong&gt;&lt;/a&gt;, an open-source model that rivals some of the most advanced models available. DeepSeek R1 stands out due to its focus on cost efficiency and its ability to match the performance of high-end models while keeping operational costs significantly lower. This emerging player has begun to capture attention, especially for organizations and developers seeking a balance between performance and affordability.&lt;/p&gt;

&lt;p&gt;In this blog, we will explore two standout models from OpenAI: OpenAI o1, which is renowned for its advanced reasoning capabilities and deliberate "thinking before responding" approach, and OpenAI o3-mini, a faster, more cost-efficient model optimized for STEM applications. Additionally, we will compare these models to DeepSeek R1, which offers similar performance and capabilities, but at a fraction of the cost. By examining their key features, performance benchmarks, and use cases, we aim to provide you with insights on how to choose the right AI model for your specific needs, whether you're working in research, software development, healthcare, or other fields where complex reasoning and cost efficiency are paramount.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI o1 Overview
&lt;/h2&gt;

&lt;p&gt;Launched in September 2024 (in its preview version, and the full version released in December 2024), OpenAI o1 represents a significant leap forward in AI reasoning. Unlike its predecessors, o1 is specifically designed to tackle complex, multi-step tasks using a &lt;a href="https://zilliz.com/glossary/chain-of-thoughts?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;chain-of-thought&lt;/strong&gt;&lt;/a&gt; reasoning approach, which is enhanced by &lt;strong&gt;large-scale&lt;/strong&gt; &lt;a href="https://zilliz.com/glossary/deep-reinforcement-learning?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;reinforcement learning&lt;/strong&gt;&lt;/a&gt;. This innovative method enables the model to think through problems step-by-step, improving its problem-solving capabilities and making it particularly effective for logical reasoning and decision-making in challenging scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Model Architecture&lt;/strong&gt;: OpenAI o1 is built on a &lt;a href="https://zilliz.com/glossary/transformer-models?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;transformer-based&lt;/a&gt; architecture optimized for reasoning and problem-solving. It employs a unique mechanism for generating extended &lt;a href="https://zilliz.com/glossary/chain-of-thoughts?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;chains-of-thought&lt;/a&gt;, allowing the model to perform deeper and more thorough analyses before providing an answer. This extended reasoning process enhances accuracy and reliability for complex queries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fze3ed6r2u4cvd8rwdn6j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fze3ed6r2u4cvd8rwdn6j.png" width="800" height="580"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1: Multi-step conversation with reasoning tokens (&lt;/em&gt;&lt;a href="https://platform.openai.com/docs/guides/reasoning?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training Data&lt;/strong&gt;: OpenAI o1 was trained on a combination of filtered publicly available datasets and proprietary data from partnerships to enhance its reasoning and technical capabilities. Its training data includes web data, open-source datasets, reasoning datasets, paywalled content, specialized archives, and industry-specific resources, ensuring strong performance in both general knowledge and complex problem-solving. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance Benchmarks&lt;/strong&gt;: While OpenAI o1 is slower than earlier models like GPT-4o due to its reasoning processes, it consistently ranks higher in accuracy for complex tasks, particularly in STEM fields like science, mathematics, and coding. It achieved an impressive 83% on the American Invitational Mathematics Examination (AIME), and ranked in the 89th percentile on Codeforces competitive programming challenges. Additionally, it has demonstrated PhD-level accuracy in benchmarks for physics, biology, and chemistry problems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flxk5ex0t7ai9i28cq02h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flxk5ex0t7ai9i28cq02h.png" width="800" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 2: Performance Benchmarks OpenAI o1 (&lt;/em&gt;&lt;a href="https://openai.com/index/learning-to-reason-with-llms/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Cases and Application Areas&lt;/strong&gt;: OpenAI o1 is widely applicable in fields requiring advanced reasoning, such as scientific research (data analysis, hypothesis testing), software development (multi-step workflows, debugging), healthcare (diagnosis development), and educational applications (solving complex puzzles or crosswords).&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI o3-mini Overview
&lt;/h2&gt;

&lt;p&gt;The OpenAI o3-mini was officially released in late January 2025, following a preview in December 2024. This model continues OpenAI's progression in enhancing reasoning capabilities for complex tasks, offering notable improvements over previous models like o1, particularly in speed, efficiency, and performance across coding, mathematics, and science challenges.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Model Architecture&lt;/strong&gt;: Similar to OpenAI o1, o3-mini is based on a &lt;a href="https://zilliz.com/glossary/transformer-models?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;transformer&lt;/a&gt; architecture specifically optimized for advanced reasoning. It leverages the &lt;a href="https://zilliz.com/glossary/chain-of-thoughts?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;chain-of-thought&lt;/a&gt; technique to enable step-by-step problem-solving, combined with large-scale &lt;a href="https://zilliz.com/glossary/deep-reinforcement-learning?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;reinforcement learning&lt;/a&gt; to enhance reasoning. A standout feature of o3-mini is its reduced latency compared to o1, allowing for faster results while maintaining high accuracy in complex tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fef2dwx2lltdqfb1c5rac.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fef2dwx2lltdqfb1c5rac.png" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 3: Latency Comparison o3-mini vs. o1 (&lt;/em&gt;&lt;a href="https://openai.com/index/openai-o3-mini/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training Data:&lt;/strong&gt; Like its predecessor, o3-mini was trained on a combination of publicly available datasets, proprietary OpenAI data, and advanced filtering techniques to ensure a safe and effective training setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance Benchmarks&lt;/strong&gt;: OpenAI o3-mini has demonstrated exceptional performance on benchmark datasets, achieving 87.3% accuracy in competition-level math problems, 79.7% accuracy on PhD-level science questions, and 49.3% accuracy in software programming, surpassing OpenAI o1 at higher reasoning levels.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpuxdswxa090e3ms8e0ov.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpuxdswxa090e3ms8e0ov.png" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 4: Performance Benchmarks o3-mini vs. o1: Mathematics, AIME (&lt;/em&gt;&lt;a href="https://openai.com/index/openai-o3-mini/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frw1eb39ilq4rxh0fgbwk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frw1eb39ilq4rxh0fgbwk.png" alt="Figure 5: Performance Benchmarks o3-mini vs. o1: PhD-level science" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 5: Performance Benchmarks o3-mini vs. o1: PhD-level science (&lt;/em&gt;&lt;a href="https://openai.com/index/openai-o3-mini/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28lbwbkldou6zc5dxfx1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28lbwbkldou6zc5dxfx1.png" width="800" height="709"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 6: Performance Benchmarks o3-mini vs. o1: Software Engineering (&lt;/em&gt;&lt;a href="https://openai.com/index/openai-o3-mini/?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Cases and Application Areas&lt;/strong&gt;: While OpenAI o3-mini shares many use cases with o1, such as scientific research, software development, healthcare, and educational problem-solving, it stands out in domains where high-level reasoning with lower latency is essential. For instance, in financial analysis, o3-mini can efficiently handle complex risk forecasting, fraud detection, and investment strategy simulations, all while quickly processing large volumes of data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deepseek R1 Overview
&lt;/h2&gt;

&lt;p&gt;DeepSeek R1, released in January 2025, is an open-source AI model developed by the Chinese company DeepSeek. It is specifically designed for advanced reasoning and problem-solving, leveraging a combination of &lt;a href="https://zilliz.com/glossary/chain-of-thoughts?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;chain-of-thought&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;reasoning, supervised&lt;/strong&gt; &lt;a href="https://zilliz.com/glossary/fine-tuning?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;fine-tuning&lt;/strong&gt;&lt;/a&gt;, and &lt;a href="https://zilliz.com/glossary/deep-reinforcement-learning?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;strong&gt;reinforcement learning&lt;/strong&gt;&lt;/a&gt; to enhance logical inference. One of its standout features is its ability to achieve performance levels comparable to leading AI models, such as OpenAI o1, while maintaining significantly lower operational costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Model Architecture&lt;/strong&gt;: DeepSeek R1 employs a &lt;a href="https://zilliz.com/glossary/transformer-models?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;transformer-based&lt;/a&gt; architecture optimized for reasoning tasks. Its training methodology builds on &lt;a href="https://zilliz.com/blog/why-deepseek-v3-is-taking-the-ai-world-by-storm?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;DeepSeek V3&lt;/a&gt; and includes multiple steps: large-scale reinforcement learning, supervised &lt;a href="https://zilliz.com/glossary/fine-tuning?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;fine-tuning&lt;/a&gt;, and a curated dataset of &lt;a href="https://zilliz.com/glossary/chain-of-thoughts?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;chain-of-thought&lt;/a&gt; examples. What sets it apart is the combination of three key elements, which significantly improve efficiency and problem-solving depth: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mixture-of-Experts (MoE):&lt;/strong&gt; This technique dynamically selects a subset of specialized &lt;a href="https://zilliz.com/glossary/neural-networks?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;neural network&lt;/a&gt; "experts" for each input, reducing computation while improving efficiency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-Head Latent Attention (MLA)&lt;/strong&gt;: The core idea behind MLA is low-rank joint compression for attention keys and values, which helps optimize the Key-Value (KV) cache during inference.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-Token Prediction (MTP)&lt;/strong&gt;: Enables to predict multiple future &lt;a href="https://zilliz.com/glossary/tokenization?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;tokens&lt;/a&gt; at once.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fet4t90mt1xnus9iutb0c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fet4t90mt1xnus9iutb0c.png" width="800" height="646"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 7: Basic architecture of DeepSeek-V3 (&lt;/em&gt;&lt;a href="https://arxiv.org/pdf/2412.19437?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4ecpamqw0d8tx4qauzl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4ecpamqw0d8tx4qauzl.png" width="800" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 8: Illustration of our Multi-Token Prediction (MTP) implementation (&lt;/em&gt;&lt;a href="https://arxiv.org/pdf/2412.19437?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training Data&lt;/strong&gt;: DeepSeek R1 was trained on a combination of two proprietary datasets that are not publicly available. One dataset adds reasoning capabilities, while the other enhances general-purpose tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reasoning&lt;/strong&gt;: cold start &lt;a href="https://zilliz.com/glossary/chain-of-thoughts?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;chain-of-thought &lt;/a&gt;data to &lt;a href="https://zilliz.com/glossary/fine-tuning?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;fine-tune&lt;/a&gt; DeepSeek V3. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Non-reasoning&lt;/strong&gt;: labeled data for the subsequent supervised &lt;a href="https://zilliz.com/glossary/fine-tuning?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;fine-tuning&lt;/a&gt; step to enhance general-purpose tasks such as writing, translation or factual QA.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Performance Benchmarks&lt;/strong&gt;: DeepSeek R1 excels in tasks requiring reasoning and deep analytical thinking, achieving 79.8% on AIME, 97.3% on MATH-500, and 96.3% accuracy on Codeforces. It also performed strongly in general knowledge benchmarks, with 90.8% on &lt;a href="https://zilliz.com/glossary/mmlu-benchmark?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;MMLU&lt;/a&gt; and 71.5% on GPQA Diamond.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1rhvtzee6j9nmy6kcmos.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1rhvtzee6j9nmy6kcmos.png" width="800" height="606"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 9: Performance Benchmarks DeepSeek (&lt;/em&gt;&lt;a href="https://arxiv.org/pdf/2412.19437?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Cases and Application Areas&lt;/strong&gt;: Due to its strong performance in math and coding tasks, DeepSeek R1 is well suited for scientific research, software development, and academic education. Its open-source nature makes it accessible to a wide range of users and industries at a low cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Versions
&lt;/h2&gt;

&lt;p&gt;OpenAI has released three model versions of its o1, preview, mini and full, and one of o3-mini, differing in &lt;a href="https://zilliz.com/glossary/context-window?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;context window&lt;/a&gt; size and maximum output tokens. Meanwhile, DeepSeek has introduced two DeepSeek R1 models. The initial Zero version represented their first training attempt, excelling in reasoning benchmarks but facing challenges related to readability and language mixing. Additionally, DeepSeek has developed distilled models by &lt;a href="https://zilliz.com/glossary/fine-tuning?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;fine-tuning&lt;/a&gt; open-source models such as Qwen and Llama.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm9hl75tw2mj7mb5uet7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm9hl75tw2mj7mb5uet7.png" width="800" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 10: OpenAI o1 model versions (&lt;/em&gt;&lt;a href="https://platform.openai.com/docs/models#o1?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59gtet25uxp4pmj4h6oi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59gtet25uxp4pmj4h6oi.png" width="800" height="204"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 11: OpenAI o3-mini model versions (&lt;/em&gt;&lt;a href="https://platform.openai.com/docs/models#o1?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fose2lzlmtfezcbowjidh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fose2lzlmtfezcbowjidh.png" width="800" height="140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 12: DeepSeek R1 model versions (&lt;/em&gt;&lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-R1?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrs73xp0pmsaqj7sf57m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrs73xp0pmsaqj7sf57m.png" width="644" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 13: DeepSeek R1 distill model versions (&lt;/em&gt;&lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-R1?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Comparison
&lt;/h2&gt;

&lt;p&gt;Pricing is a crucial factor when selecting an AI model for schainpecific use cases. Below is a comparison of the costs associated with OpenAI o1, OpenAI o3-mini, and DeepSeek R1, including input and output token pricing per 1 million tokens, as well as cached prices.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Cached Price&lt;/strong&gt; &lt;strong&gt;(per 1M tokens)&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Input Token Price (per 1M tokens)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Output Token Price&lt;/strong&gt; &lt;strong&gt;(per 1M tokens)&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI o1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$7.5&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;td&gt;$60.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI o1-mini&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.55&lt;/td&gt;
&lt;td&gt;$1.10&lt;/td&gt;
&lt;td&gt;$4.40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI o3-mini&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.55&lt;/td&gt;
&lt;td&gt;$1.10&lt;/td&gt;
&lt;td&gt;$4.40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DeepSeek R1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;$0.55&lt;/td&gt;
&lt;td&gt;$2.19&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This table highlights the cost differences between the models. DeepSeek R1 offers significantly lower pricing, costing half as much as OpenAI's most affordable model, which makes it more cost efficient and scalable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison Table: Deepseek R1 vs. OpenAI o1 vs OpenAI o3-mini
&lt;/h2&gt;

&lt;p&gt;To provide a clearer comparison, the table below outlines key features, performance benchmarks, areas of application, and cost considerations. OpenAI's key strength lies in the low latency of the o3-mini model, while DeepSeek R1 stands out for its cost efficiency.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Feature&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;OpenAI o1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;OpenAI o3-mini&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;DeepSeek R1&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Release Date&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dec 2024&lt;/td&gt;
&lt;td&gt;Jan 2025&lt;/td&gt;
&lt;td&gt;Jan 2025&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Transformer-based, chain-of-thought reasoning&lt;/td&gt;
&lt;td&gt;Transformer-based, chain-of-thought optimized for low latency&lt;/td&gt;
&lt;td&gt;Transformer-based, MoE, MLA, MTP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Training Data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Public + proprietary&lt;/td&gt;
&lt;td&gt;Public + proprietary&lt;/td&gt;
&lt;td&gt;Proprietary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Benchmarks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;83% (AIME), 89% (Codeforces), 79.7% (PhD-level STEM accuracy)&lt;/td&gt;
&lt;td&gt;87.3% (AIME), 79.7% (Science), 49.3% (Coding)&lt;/td&gt;
&lt;td&gt;79.8% (AIME), 97.3% (MATH-500), 96.3% (Codeforces)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Higher due to extended reasoning&lt;/td&gt;
&lt;td&gt;Lower latency, faster responses&lt;/td&gt;
&lt;td&gt;Moderate latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scientific research, software development, healthcare, education&lt;/td&gt;
&lt;td&gt;Financial analysis, real-time decision-making, software development, research&lt;/td&gt;
&lt;td&gt;Scientific research, academic education, software development&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open-Source&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost Efficiency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High-cost proprietary model&lt;/td&gt;
&lt;td&gt;High-cost proprietary model&lt;/td&gt;
&lt;td&gt;Lower operational cost&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbnt2rdz5l4gmwr6jrjmg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbnt2rdz5l4gmwr6jrjmg.png" width="800" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 14: Benchmark performance comparison DeepSeek vs. OpenAI (&lt;/em&gt;&lt;a href="https://arxiv.org/pdf/2412.19437?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;strong&gt;)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because of its open-source nature, DeepSeek R1 is available on multiple platforms, including Hugging Face and Ollama, while OpenAI models are integrated into various enterprise solutions and cloud platforms. However, OpenAI has not yet released its model weights, and users cannot download a &lt;a href="https://zilliz.com/glossary/fine-tuning?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;fine-tuned&lt;/a&gt; model from the OpenAI platform. DeepSeek R1 also features a curated list of integrations in the following &lt;a href="https://github.com/deepseek-ai/awesome-deepseek-integration/tree/main?utm_medium=referral&amp;amp;utm_channel=devto" rel="noopener noreferrer"&gt;repository&lt;/a&gt;, including LiteLLM, Langfuse, and Ragflow. Additionally, it can be found on AWS and Azure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;As we move further into the age of AI, the competition between leading models like OpenAI's o1 and o3-mini, and newcomers like DeepSeek R1, is only going to intensify. OpenAI's models have proven their effectiveness in a wide variety of applications, especially where high-level reasoning, scalability, and robust performance are essential. Their innovation, speed, and advanced features set a high bar for the industry.&lt;/p&gt;

&lt;p&gt;However, DeepSeek R1 offers a compelling alternative, especially for those seeking to leverage advanced AI capabilities without the high operational costs associated with other models. Its open-source nature and impressive performance benchmarks make it an attractive option for organizations and developers looking for a cost-effective solution without sacrificing the depth of reasoning and problem-solving capabilities.&lt;/p&gt;

&lt;p&gt;Ultimately, the choice between these models comes down to your specific use case. If you're working on high-complexity tasks that demand rigorous, step-by-step reasoning, OpenAI o1 might be the best fit. If speed and cost-efficiency are more important, especially in STEM-related fields or financial applications, OpenAI o3-mini might be the right choice. For those seeking an open-source, budget-friendly solution that still delivers exceptional performance in tasks like math and software development, DeepSeek R1 presents an excellent alternative.&lt;/p&gt;

</description>
      <category>community</category>
    </item>
  </channel>
</rss>
