<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Scarlett Attensil</title>
    <description>The latest articles on DEV Community by Scarlett Attensil (@sattensil888).</description>
    <link>https://dev.to/sattensil888</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3443244%2Fcaef40f2-e953-4f43-954d-018fdc1832e7.png</url>
      <title>DEV Community: Scarlett Attensil</title>
      <link>https://dev.to/sattensil888</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sattensil888"/>
    <language>en</language>
    <item>
      <title>Offline Evaluation of RAG-Grounded Answers in LaunchDarkly AI Configs</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Thu, 16 Apr 2026 21:22:50 +0000</pubDate>
      <link>https://dev.to/launchdarkly/offline-evaluation-of-rag-grounded-answers-in-launchdarkly-ai-configs-1i5j</link>
      <guid>https://dev.to/launchdarkly/offline-evaluation-of-rag-grounded-answers-in-launchdarkly-ai-configs-1i5j</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;This tutorial shows you how to run an &lt;strong&gt;offline LLM evaluation&lt;/strong&gt; on the RAG-grounded support agent you built in the &lt;a href="https://launchdarkly.com/docs/tutorials/agent-graphs" rel="noopener noreferrer"&gt;Agent Graphs tutorial&lt;/a&gt;, using LaunchDarkly &lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;AI Configs&lt;/a&gt;, the &lt;a href="https://launchdarkly.com/docs/home/ai-configs/datasets" rel="noopener noreferrer"&gt;Datasets feature&lt;/a&gt;, and built-in &lt;a href="https://launchdarkly.com/docs/home/ai-configs/offline-evaluations" rel="noopener noreferrer"&gt;LLM-as-a-judge&lt;/a&gt; scoring. You'll build a RAG-grounded test dataset, run it through the Playground with a cross-family judge, and learn how to read each failing row as a dataset issue, an agent issue, or judge calibration noise.&lt;/p&gt;

&lt;p&gt;Here's how it works. The LaunchDarkly Playground evaluates a single model call against a prompt and dataset you configure. By pre-computing your RAG retrieval offline and baking the chunks directly into each dataset row, you turn that call into a high-value generation test: the model in the Playground receives the same documentation context it would in production, so the eval measures how well your agent reasons over real grounded input.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Learn
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structure a RAG-grounded test dataset&lt;/strong&gt; by pre-computing retrieval offline and bundling chunks into each row&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick the right LLM judge&lt;/strong&gt; for your agent's output shape (Accuracy for natural-language answers, Likeness for structured labels)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid same-model bias&lt;/strong&gt; by running the judge on a different model family than the agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diagnose failing rows&lt;/strong&gt; as dataset issues, agent issues, or judge calibration noise&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What this tutorial covers, and what it doesn't&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Covers:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generation quality over RAG context: does the model produce a correct answer when the right documentation is in the prompt?&lt;/li&gt;
&lt;li&gt;Regression detection: catching unexpected score drops when you change a prompt or model&lt;/li&gt;
&lt;li&gt;Variation selection: comparing candidate prompts and models before committing to a new AI Config variation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Does not cover:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval correctness. Whether your vector store is returning the best chunks is tested by your own RAG pipeline, outside LaunchDarkly.&lt;/li&gt;
&lt;li&gt;End-to-end agent graph behavior. Tool execution, multi-turn conversations, handoffs, and multi-step routing require online evals against real production traffic.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;You've completed the &lt;a href="https://launchdarkly.com/docs/tutorials/agent-graphs" rel="noopener noreferrer"&gt;Agent Graphs tutorial&lt;/a&gt; or have equivalent familiarity with LaunchDarkly &lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;AI Configs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;You have the &lt;a href="https://github.com/launchdarkly-labs/devrel-agents-tutorial" rel="noopener noreferrer"&gt;devrel-agents-tutorial repo&lt;/a&gt; cloned&lt;/li&gt;
&lt;li&gt;You have API keys for &lt;strong&gt;two&lt;/strong&gt; model providers, one for the agent under test and one for the judge (the examples use OpenAI and Anthropic)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Get the Branch Running
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;About the branch and the Umbra knowledge base.&lt;/strong&gt; The &lt;code&gt;feature/offline-evals&lt;/code&gt; branch builds on the same &lt;a href="https://launchdarkly.com/docs/tutorials/agent-graphs" rel="noopener noreferrer"&gt;Agent Graphs tutorial&lt;/a&gt; codebase and the routing, tool, and graph work done in earlier branches — none of that goes away. What this branch adds is a more realistic RAG assessment target: &lt;strong&gt;Umbra&lt;/strong&gt;, a fictional serverless-functions product with an invented knowledge base (refund windows, deployment regions, function timeout limits, rate-limit tiers, and so on). Because Umbra doesn't exist outside this tutorial, the model under test has no pre-training knowledge to fall back on — a correct answer has to come from the retrieved chunks, which is the only way to honestly measure whether your RAG pipeline is doing its job. The branch also ships a pre-built RAG-grounded test dataset (&lt;code&gt;datasets/answer-tests.csv&lt;/code&gt;) and a helper script that regenerates it from your vector store.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;devrel-agents-tutorial
git checkout feature/offline-evals
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# Add LD_SDK_KEY, LD_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY to .env&lt;/span&gt;

uv &lt;span class="nb"&gt;sync
&lt;/span&gt;uv run python bootstrap/create_configs.py
uv run python initialize_embeddings.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start the API and UI in two terminals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 1&lt;/span&gt;
uv run uvicorn api.main:app &lt;span class="nt"&gt;--reload&lt;/span&gt;

&lt;span class="c"&gt;# Terminal 2&lt;/span&gt;
uv run streamlit run ui/chat_interface.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://localhost:8501&lt;/code&gt; and ask a question grounded in the Umbra docs (refund policy, deployment regions, function timeout). The agent pulls answers from the knowledge base.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmn47r7c07lw9jd4024tj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmn47r7c07lw9jd4024tj.png" alt="The Umbra support chat UI answering a question grounded in the Umbra knowledge base." width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Understand the Test Dataset
&lt;/h2&gt;

&lt;p&gt;Open &lt;code&gt;datasets/answer-tests.csv&lt;/code&gt;. Every row has three fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;input,expected_output,original_question
"Documentation context: --- We offer a 30-day refund policy for first-time subscribers... --- Annual subscriptions receive a prorated refund within... --- Question: What is the refund policy?","30-day refund policy for first-time subscribers who haven't deployed production traffic. Usage charges are non-refundable.","What is the refund policy?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;input&lt;/code&gt;&lt;/strong&gt; bundles documentation chunks and the question into a single structured prompt, separated by &lt;code&gt;---&lt;/code&gt; dividers. The chunks were retrieved from your production vector store ahead of time by &lt;code&gt;tools/build_rag_dataset.py&lt;/code&gt;, so the model in the Playground sees the same grounding the production agent would, even though the Playground never executes your retrieval tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;expected_output&lt;/code&gt;&lt;/strong&gt; is the correct answer, written by a human who read the source docs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;original_question&lt;/code&gt;&lt;/strong&gt; is a plain-text copy of the question so you can scan the dataset without parsing the bundled prompt. No judge uses this field.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Regenerate the dataset when your knowledge base changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run python tools/build_rag_dataset.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the full reference on dataset format and limits, see &lt;a href="https://launchdarkly.com/docs/home/ai-configs/datasets" rel="noopener noreferrer"&gt;Datasets for offline evaluations&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Upload the Dataset
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Use synthetic data only&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Never upload real customer tickets, PII, secrets, or credentials. Replace anything sensitive with synthetic placeholders before upload. See the Playground &lt;a href="https://launchdarkly.com/docs/home/ai-configs/playground#privacy" rel="noopener noreferrer"&gt;privacy section&lt;/a&gt; for what gets forwarded to model providers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Navigate to &lt;strong&gt;AI&lt;/strong&gt; &amp;gt; &lt;strong&gt;Library&lt;/strong&gt; in LaunchDarkly, select the &lt;strong&gt;Datasets&lt;/strong&gt; tab, and click &lt;strong&gt;Upload dataset&lt;/strong&gt;. Upload &lt;code&gt;datasets/answer-tests.csv&lt;/code&gt; and name it &lt;code&gt;answer-tests&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcckfkrgk57rzz1tt4xlv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcckfkrgk57rzz1tt4xlv.png" alt="The LaunchDarkly Datasets tab showing the answer-tests dataset uploaded." width="800" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Add Your Model API Keys
&lt;/h2&gt;

&lt;p&gt;The Playground calls model providers directly, so it needs API keys for both the model running your agent &lt;em&gt;and&lt;/em&gt; the model running your judge. These keys live in LaunchDarkly's "AI Config Test Run" integration, not in your AI Config.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the Playground, click &lt;strong&gt;Manage API keys&lt;/strong&gt; in the upper-right corner.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Add integration&lt;/strong&gt;, pick a provider (e.g. OpenAI), paste your API key, accept the terms, and save.&lt;/li&gt;
&lt;li&gt;Repeat for the second provider (Anthropic) so you can run a cross-family judge in Step 5.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;See the &lt;a href="https://launchdarkly.com/docs/home/ai-configs/playground#manage-api-keys" rel="noopener noreferrer"&gt;Playground reference doc&lt;/a&gt; for the canonical instructions. API keys are stored per-session, so you may need to re-paste them when you return.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Run the Evaluation
&lt;/h2&gt;

&lt;p&gt;From the Datasets list, click into &lt;strong&gt;answer-tests&lt;/strong&gt; to open it in a Playground bound to that dataset.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configure the test
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;System prompt&lt;/strong&gt;: paste your &lt;code&gt;support-agent&lt;/code&gt; instructions verbatim from the AI Config. Do not edit or simplify them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent model&lt;/strong&gt;: pick the model your support-agent variation uses (or a candidate you're considering swapping to). To compare two candidates, run the eval twice with different agent models and compare scores.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acceptance criteria&lt;/strong&gt;: attach an &lt;strong&gt;Accuracy&lt;/strong&gt; judge with threshold &lt;code&gt;0.85&lt;/code&gt;. Accuracy scores whether the response correctly addresses the input question, which fits grounded natural-language answers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation model&lt;/strong&gt;: uncheck &lt;strong&gt;Use same model for evaluation&lt;/strong&gt; and set the judge to a &lt;em&gt;different&lt;/em&gt; model family from the agent. Same-family judging tends to reward output patterns the judge itself produces. A cross-family judge gives you an independent read.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkbh5lziq6lt0gx02w1w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkbh5lziq6lt0gx02w1w.png" alt="The Playground configured with the support-agent prompt, OpenAI as the agent, Anthropic as the evaluation model, and an Accuracy judge at 0.85 threshold." width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run the eval.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reading the results
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyj5v00yxa526lllyneeb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyj5v00yxa526lllyneeb.png" alt="The Playground configured with the support-agent prompt, OpenAI as the agent, Anthropic as the evaluation model, and an Accuracy judge at 0.85 threshold." width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The example run above had 18 passes and 2 failures. When a row fails, the failure comes from one of three places, and each one sends you in a different direction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The dataset's chunks don't contain the answer.&lt;/strong&gt; This is a retrieval problem, not a generation problem. Rebuild the dataset with higher &lt;code&gt;top_k&lt;/code&gt;, a reranker, or a different chunker, or verify the answer is indexed at all.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The chunks contain the answer but the model ignored them.&lt;/strong&gt; This is the agent-side failure offline evals are designed to catch. Tighten the system prompt to insist on grounding, or switch to a more obedient model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The chunks and the model are both fine but the judge disagreed.&lt;/strong&gt; This is judge calibration noise. Lower the threshold, try a different judge, or accept it as noise. Don't change your agent based on it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sort by score. For each failing row, open the bundled chunks in the &lt;code&gt;input&lt;/code&gt; field and ask: &lt;em&gt;was the right answer in there?&lt;/em&gt; Yes → fix the prompt or model. No → rebuild the dataset.&lt;/p&gt;

&lt;h3&gt;
  
  
  What failed in this run
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Row 11: "What integrations are available?"&lt;/strong&gt; (&lt;em&gt;chunks missed the answer&lt;/em&gt;). The expected output mentioned monitoring integrations (Datadog, Sentry, LogRocket), but the retrieved chunks only covered databases, storage, and billing. The model correctly listed what it had and said &lt;em&gt;"the documentation does not provide additional information regarding more integrations"&lt;/em&gt;, which is the correct behavior for an ungrounded claim. &lt;strong&gt;Fix&lt;/strong&gt;: higher &lt;code&gt;top_k&lt;/code&gt; or a reranker in &lt;code&gt;build_rag_dataset.py&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Row 12: "Can I get a refund on bandwidth overages?"&lt;/strong&gt; (&lt;em&gt;judge calibration&lt;/em&gt;). The model correctly said bandwidth overages are non-refundable, citing the docs, but omitted a secondary "Review your Usage Dashboard" recommendation from the expected output. Semantically right, lexically short one clause. &lt;strong&gt;Fix&lt;/strong&gt;: lower the threshold or trim the expected output.&lt;/p&gt;

&lt;p&gt;Two failures, two different fixes. Without reading the per-row results you'd conflate them and spend time tightening the model when the actual problem lives in the retriever or the dataset.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Go From a Single Run
&lt;/h2&gt;

&lt;p&gt;This tutorial walked you through one run. In practice, a single eval isn't where offline evaluation earns its keep. The real payoff comes from re-running the same dataset against a new prompt, a new model, or a fresh RAG chunker and comparing scores to your last known-good run. A small prompt edit that quietly drops your Accuracy from 0.83 to 0.71 is exactly the kind of regression this pattern is meant to catch, but only if you save the run and compare against it next time.&lt;/p&gt;

&lt;p&gt;A reasonable next loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Save the run from Step 5 as your reference.&lt;/li&gt;
&lt;li&gt;When you change something (prompt, model, chunker, &lt;code&gt;top_k&lt;/code&gt;), re-run the same dataset and compare scores.&lt;/li&gt;
&lt;li&gt;Add new rows to the dataset as you find failure modes in staging or production.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For &lt;strong&gt;end-to-end behavior that offline tests can't capture&lt;/strong&gt; (tool execution, multi-turn conversations, the tail of real production inputs), see &lt;a href="https://launchdarkly.com/docs/home/ai-configs/online-evaluations" rel="noopener noreferrer"&gt;online evaluations&lt;/a&gt; and the &lt;a href="https://launchdarkly.com/docs/tutorials/when-to-add-online-evals" rel="noopener noreferrer"&gt;When to add online evals&lt;/a&gt; tutorial. Online evaluations are not currently supported for agent-based AI Configs; for agent workflows, the documented path is programmatic judge evaluation via the AI SDK.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: Track Evaluation History
&lt;/h2&gt;

&lt;p&gt;View saved runs at &lt;strong&gt;AI&lt;/strong&gt; &amp;gt; &lt;strong&gt;Evaluations&lt;/strong&gt;. Toggle &lt;strong&gt;Group by dataset&lt;/strong&gt; to collapse runs under each dataset name so you can see the history for &lt;code&gt;umbra-rag-eval&lt;/code&gt; alongside any other datasets in the project. Compare pass and fail counts across runs, and distinguish saved runs (indefinite retention) from one-off runs (60-day expiry). For metric definitions, see &lt;a href="https://launchdarkly.com/docs/home/ai-configs/monitor" rel="noopener noreferrer"&gt;Monitor AI Configs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/home/releases/progressive-rollouts" rel="noopener noreferrer"&gt;Progressive rollouts&lt;/a&gt;&lt;/strong&gt;: release your winning variation to 5% of traffic, then 25%, then 100%, watching production metrics before expanding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/tutorials/when-to-add-online-evals" rel="noopener noreferrer"&gt;When to add online evals&lt;/a&gt;&lt;/strong&gt;: decide what to score on live production traffic once you have an offline baseline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a deeper look at the multi-agent RAG system this tutorial builds on, see the &lt;a href="https://launchdarkly.com/docs/tutorials/agent-graphs" rel="noopener noreferrer"&gt;Agent Graphs&lt;/a&gt; tutorial.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Building Framework-Agnostic AI Swarms: Compare LangGraph, Strands, and OpenAI Swarm</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Thu, 26 Mar 2026 21:05:21 +0000</pubDate>
      <link>https://dev.to/launchdarkly/building-framework-agnostic-ai-swarms-compare-langgraph-strands-and-openai-swarm-14ip</link>
      <guid>https://dev.to/launchdarkly/building-framework-agnostic-ai-swarms-compare-langgraph-strands-and-openai-swarm-14ip</guid>
      <description>&lt;p&gt;If you've ever run the same app in multiple environments, you know the pain of duplicated configuration. &lt;a href="https://www.onyxgs.com/blog/swarm-intelligence-collective-behavior-ai" rel="noopener noreferrer"&gt;Agent swarms&lt;/a&gt; have the same problem: the moment you try multiple orchestrators (LangGraph, Strands, OpenAI Swarm), your agent definitions start living in different formats. Prompts drift. Model settings drift. A "small behavior tweak" turns into archaeology across repos.&lt;/p&gt;

&lt;p&gt;AI behavior isn't code. Prompts aren't functions. They change too often, and too experimentally, to be hard-wired into orchestrator code. &lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;LaunchDarkly AI Configs&lt;/a&gt; lets you treat agent definitions like shared configuration instead. Define them once, store them centrally, and let any orchestrator fetch them. Update a prompt or model setting in the LaunchDarkly UI, and the new version rolls out without a redeploy.&lt;/p&gt;



&lt;p&gt;Ready to build framework-agnostic AI swarms? Start your 14-day free trial of LaunchDarkly to follow along with this tutorial. No credit card required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://launchdarkly.com/start-trial/?utm_source=docs&amp;amp;utm_medium=tutorial&amp;amp;utm_campaign=ai-orchestrators" rel="noopener noreferrer"&gt;Start free trial&lt;/a&gt; →&lt;/p&gt;



&lt;h2&gt;
  
  
  The problem: Research gap analysis across multiple papers
&lt;/h2&gt;

&lt;p&gt;When analyzing academic literature, researchers face a daunting task: reading dozens of papers to identify patterns, spot contradictions, and find unexplored opportunities. A single LLM call can summarize papers, but it produces a monolithic analysis you can't trace, refine, or trust for critical decisions.&lt;/p&gt;

&lt;p&gt;The challenge compounds when you need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Identify methodological patterns&lt;/strong&gt; across 12+ papers without missing subtle connections&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detect contradictory findings&lt;/strong&gt; that might invalidate assumptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discover research gaps&lt;/strong&gt; that represent genuine opportunities, not just oversight&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where specialized agents excel - each focused on one aspect of the analysis, building on each other's work.&lt;/p&gt;

&lt;p&gt;In this tutorial, we'll build a 3-agent research analysis swarm that solves this problem by dividing the work:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;Agent&lt;/th&gt;
    &lt;th&gt;Role&lt;/th&gt;
    &lt;th&gt;Output&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Approach Analyzer&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Clusters methodological themes across papers&lt;/td&gt;
    &lt;td&gt;"Papers 1, 4, 7 use reinforcement learning; Papers 2, 5 use symbolic methods"&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Contradiction Detector&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Finds conflicting claims between papers&lt;/td&gt;
    &lt;td&gt;"Paper 3 claims X improves performance; Paper 8 shows X degrades it"&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Gap Synthesizer&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Identifies unexplored research directions&lt;/td&gt;
    &lt;td&gt;"No papers combine approach A with dataset B; potential opportunity"&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We'll implement this swarm across three different orchestrators (LangGraph, Strands, and OpenAI Swarm), demonstrating how LaunchDarkly AI Configs enable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Framework-agnostic agent definitions&lt;/strong&gt;: Define agents once in LaunchDarkly, use them everywhere&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-agent observability&lt;/strong&gt;: Track tokens, latency, and costs for each agent individually - catch silent failures when agents skip execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic swarm composition&lt;/strong&gt;: Add/remove agents from the swarm or switch models without touching code&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why use a swarm?
&lt;/h2&gt;

&lt;p&gt;Research gap analysis requires different skills: clustering methodological patterns, detecting contradictions, and synthesizing opportunities. With a swarm, each agent handles one aspect and produces artifacts the next agent builds on. You can track tokens, latency, and cost per agent. You can catch silent failures when an agent skips execution. And when something goes wrong, you know exactly where.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical requirements
&lt;/h2&gt;

&lt;p&gt;Before implementing the swarm, ensure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LaunchDarkly account&lt;/strong&gt; with AI Configs enabled (see &lt;a href="https://launchdarkly.com/docs/home/ai-configs/quickstart" rel="noopener noreferrer"&gt;quickstart guide&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API keys&lt;/strong&gt; for Anthropic Claude or OpenAI GPT-4 (check &lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;supported models&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.11+&lt;/strong&gt; for running orchestrators&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Basic understanding&lt;/strong&gt; of agent systems (review &lt;a href="https://launchdarkly.com/docs/tutorials/agents-langgraph" rel="noopener noreferrer"&gt;LangGraph agents tutorial&lt;/a&gt; if needed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The complete implementation is available at &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators" rel="noopener noreferrer"&gt;GitHub - AI Orchestrators&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture: how LaunchDarkly powers framework-agnostic swarms
&lt;/h2&gt;

&lt;p&gt;The swarm architecture has three layers: dynamic agent configuration, per-agent tracking, and custom metrics for cost attribution. Here's how they work together.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fir3nonhko1k6th3du75j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fir3nonhko1k6th3du75j.png" alt="LangGraph swarm architecture showing LaunchDarkly configuration fetch, agent interactions with Command-based handoffs, and dual metrics tracking to AI Config Trends" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;The diagram shows LangGraph's implementation, but Strands and OpenAI Swarm follow the same pattern with their own handoff mechanisms. The key components are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Configuration Fetch&lt;/strong&gt;: The orchestrator queries LaunchDarkly's API to dynamically discover all agent configurations, avoiding hardcoded agent definitions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Graph&lt;/strong&gt;: Three specialized agents (Approach Analyzer, Contradiction Detector, Gap Synthesizer) connected through explicit handoff mechanisms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics Collection&lt;/strong&gt;: Each agent execution captures tokens, duration, and cost metrics through both the AI Config tracker and custom metrics API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dual Dashboard Views&lt;/strong&gt;: The same metrics appear in the AI Config Trends dashboard (for individual agent monitoring)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Three layers of framework-agnostic swarms
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. AI Config for Dynamic Agent Configuration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each &lt;a href="https://launchdarkly.com/docs/home/ai-configs/create" rel="noopener noreferrer"&gt;AI Config&lt;/a&gt; stores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent key, display name, and model selection&lt;/li&gt;
&lt;li&gt;System instructions and tool definitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your orchestrator code queries LaunchDarkly for "all enabled agent configs" and builds the swarm dynamically. No hardcoded agent names.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Per-Agent Tracking with AI SDK&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LaunchDarkly's &lt;a href="https://launchdarkly.com/docs/home/ai-configs/quickstart" rel="noopener noreferrer"&gt;AI SDK&lt;/a&gt; provides tracking through config evaluations. You get a fresh tracker for each agent, then track tokens, duration, and success/failure. These metrics flow to the &lt;a href="https://launchdarkly.com/docs/home/ai-configs/monitor" rel="noopener noreferrer"&gt;AI Config Monitoring&lt;/a&gt; dashboard automatically.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8v2lmgyjgtzug5naj60t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8v2lmgyjgtzug5naj60t.png" alt="AI Config monitoring dashboard showing per-agent token usage, duration, and success rates across multiple runs" width="800" height="461"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;This tracking catches silent failures - when agents skip execution or produce minimal output. Step 4 shows the implementation patterns for each framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Custom Metrics for Cost Attribution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Per-agent tracking shows performance, but for cost comparisons across orchestrators you need &lt;a href="https://launchdarkly.com/docs/home/metrics/custom-count" rel="noopener noreferrer"&gt;custom metrics&lt;/a&gt;. These let you query by orchestrator, compare costs across frameworks, and identify anomalies.&lt;/p&gt;

&lt;p&gt;With the architecture covered, let's build the swarm. We'll download research papers, set up the project, bootstrap agent configs in LaunchDarkly, implement per-agent tracking, and run the swarm across all three orchestrators.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Download research papers
&lt;/h2&gt;

&lt;p&gt;First, you need papers to analyze. The &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators" rel="noopener noreferrer"&gt;&lt;code&gt;scripts/download_papers.py&lt;/code&gt;&lt;/a&gt; script queries ArXiv with narrow, category-specific searches to ensure focused results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python scripts/download_papers.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script presents pre-configured narrow research topics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# From orchestration/scripts/download_papers.py:164-189
&lt;/span&gt;&lt;span class="n"&gt;topics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Chain-of-thought prompting in LLMs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cat:cs.CL AND (chain-of-thought OR CoT) AND reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;years&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retrieval-augmented generation (RAG)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cat:cs.CL AND (retrieval-augmented OR RAG) AND generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;years&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Emergent communication in multi-agent RL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cat:cs.MA AND (emergent communication OR language emergence)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;years&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Few-shot prompting for code generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cat:cs.SE AND few-shot AND code generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;years&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Vision-language model grounding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cat:cs.CV AND vision-language AND grounding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;years&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;These topics are intentionally narrow&lt;/strong&gt;: Each uses ArXiv categories (&lt;code&gt;cat:cs.CL&lt;/code&gt;, &lt;code&gt;cat:cs.MA&lt;/code&gt;) to limit scope. Boolean AND operators ensure papers match all criteria. 2-5 year windows prevent overwhelming the analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For even narrower custom queries&lt;/strong&gt;, combine categories with specific techniques like &lt;code&gt;cat:cs.CL AND chain-of-thought AND mathematical AND reasoning&lt;/code&gt; for CoT math only, &lt;code&gt;cat:cs.MA AND emergent AND (referential OR compositional)&lt;/code&gt; for specific emergence types, or &lt;code&gt;cat:cs.SE AND few-shot AND (Python OR JavaScript) AND test generation&lt;/code&gt; for language-specific code generation.&lt;/p&gt;

&lt;p&gt;The script saves papers to &lt;code&gt;data/gap_analysis_papers.json&lt;/code&gt; with this structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2409.02645v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Emergent Language: A Survey and Taxonomy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"authors"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Jannik Peters, Constantin Waubert de Puiseau, ..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"published"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-09-04"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cs.MA"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"abstract"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The field of emergent language represents..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"introduction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Language emergence has been explored..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"conclusion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"This paper provides a comprehensive review..."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this format&lt;/strong&gt;: Each paper includes ~2-3K characters of text (abstract + intro + conclusion), which is enough for analysis but won't overflow context windows. For 12 papers, you're looking at ~30K characters (~7.5K tokens) of input.&lt;/p&gt;

&lt;p&gt;You now have 12 papers saved locally. Next, we'll configure LaunchDarkly credentials and install the orchestration frameworks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Set up your multi-orchestrator project
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Environment setup
&lt;/h4&gt;

&lt;p&gt;For help getting your SDK and API keys, see the &lt;a href="https://launchdarkly.com/docs/home/account/api" rel="noopener noreferrer"&gt;API access tokens guide&lt;/a&gt; and &lt;a href="https://launchdarkly.com/docs/home/account/environment/keys" rel="noopener noreferrer"&gt;SDK key management&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env file&lt;/span&gt;
&lt;span class="nv"&gt;LD_SDK_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sdk-xxxxx       &lt;span class="c"&gt;# Get from LaunchDarkly project settings&lt;/span&gt;
&lt;span class="nv"&gt;LD_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;api-xxxxx       &lt;span class="c"&gt;# Create at Account settings → Authorization&lt;/span&gt;
&lt;span class="nv"&gt;LAUNCHDARKLY_PROJECT_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;orchestrator-agents

&lt;span class="c"&gt;# Model API keys&lt;/span&gt;
&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-xxxxx
&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-xxxxx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Install dependencies
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate

&lt;span class="c"&gt;# LaunchDarkly SDKs - see [Python SDK docs](/sdk/server-side/python)&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;ldai ldclient python-dotenv arxiv PyPDF2 requests

&lt;span class="c"&gt;# Orchestration frameworks&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;strands-sdk langgraph swarm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more on the LaunchDarkly AI SDK, see the &lt;a href="https://dev.to/sdk/ai"&gt;AI SDK documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Your environment is configured and dependencies are installed. Next, we'll use the bootstrap script to automatically create all three agent configs in LaunchDarkly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Bootstrap agent configs with the manifest
&lt;/h2&gt;

&lt;p&gt;The orchestration repo includes a complete bootstrap system that automatically creates all agent configurations, tools, and variations in LaunchDarkly. This is much faster and more reliable than manual setup.&lt;/p&gt;

&lt;h4&gt;
  
  
  Understanding the bootstrap system
&lt;/h4&gt;

&lt;p&gt;The bootstrap process uses a YAML manifest to define:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt; - Functions agents can call (fetch_paper_section, handoff_to_agent, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Configs&lt;/strong&gt; - Three specialized agents with their roles and instructions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Variations&lt;/strong&gt; - Multiple model options (Anthropic Claude vs OpenAI GPT)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Targeting Rules&lt;/strong&gt; - Which orchestrators get which models&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Run the bootstrap script
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# From the orchestration repo root&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;ai-orchestrators

&lt;span class="c"&gt;# Run bootstrap with the research gap manifest&lt;/span&gt;
python scripts/launchdarkly/bootstrap.py

&lt;span class="c"&gt;# You'll see:&lt;/span&gt;
╔═══════════════════════════════════════════════════════╗
║  AI Agent Orchestrator - LaunchDarkly Bootstrap       ║
╚═══════════════════════════════════════════════════════╝

Available manifests:
  1. Research Gap Analysis &lt;span class="o"&gt;(&lt;/span&gt;research_gap_manifest.yaml&lt;span class="o"&gt;)&lt;/span&gt;

Select manifest or press Enter &lt;span class="k"&gt;for &lt;/span&gt;default: &lt;span class="o"&gt;[&lt;/span&gt;Enter]

📦 Project: orchestrator-agents
🌍 Environment: production

🛠️  Creating paper analysis tools...
    ✓ Tool &lt;span class="s1"&gt;'extract_key_sections'&lt;/span&gt; created
    ✓ Tool &lt;span class="s1"&gt;'fetch_paper_section'&lt;/span&gt; created
    ✓ Tool &lt;span class="s1"&gt;'handoff_to_agent'&lt;/span&gt; created
    ...

🤖 Creating AI agent configs...
    ✓ AI Config &lt;span class="s1"&gt;'approach-analyzer'&lt;/span&gt; created
    ✓ AI Config &lt;span class="s1"&gt;'contradiction-detector'&lt;/span&gt; created
    ✓ AI Config &lt;span class="s1"&gt;'gap-synthesizer'&lt;/span&gt; created

✨ Bootstrap &lt;span class="nb"&gt;complete&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  What gets created
&lt;/h4&gt;

&lt;p&gt;The bootstrap script creates the three agents described earlier (Approach Analyzer, Contradiction Detector, Gap Synthesizer), each with swarm-aware instructions and handoff tools.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verify in LaunchDarkly dashboard
&lt;/h4&gt;

&lt;p&gt;After bootstrap completes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to your LaunchDarkly AI Configs dashboard at &lt;code&gt;https://app.launchdarkly.com/&amp;lt;your-project-key&amp;gt;/&amp;lt;your-environment-key&amp;gt;/ai-configs&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;You'll see all three agent configs created&lt;/li&gt;
&lt;li&gt;Each config has:

&lt;ul&gt;
&lt;li&gt;Two &lt;a href="https://launchdarkly.com/docs/home/ai-configs/create-variation" rel="noopener noreferrer"&gt;variations&lt;/a&gt; (Claude and OpenAI models)&lt;/li&gt;
&lt;li&gt;Proper &lt;a href="https://launchdarkly.com/docs/home/ai-configs/tools-library" rel="noopener noreferrer"&gt;tools configured&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Detailed swarm-aware instructions&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/home/flags/target-rules" rel="noopener noreferrer"&gt;Targeting rules&lt;/a&gt; for orchestrator-specific routing&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  How variations and targeting work
&lt;/h4&gt;

&lt;p&gt;Each agent has two variations in the manifest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example from approach-analyzer agent&lt;/span&gt;
&lt;span class="na"&gt;variations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyzer-claude"&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Approach&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Analyzer&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Claude"&lt;/span&gt;
    &lt;span class="na"&gt;modelConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic"&lt;/span&gt;
      &lt;span class="na"&gt;modelId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-5"&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;handoff_to_agent"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cluster_approaches"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;[Agent instructions here]&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyzer-openai"&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Approach&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Analyzer&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;OpenAI"&lt;/span&gt;
    &lt;span class="na"&gt;modelConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai"&lt;/span&gt;
      &lt;span class="na"&gt;modelId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5"&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;handoff_to_agent"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cluster_approaches"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;[Same instructions, different model]&lt;/span&gt;

&lt;span class="na"&gt;targeting&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;variation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyzer-openai"&lt;/span&gt;
      &lt;span class="na"&gt;clauses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;attribute&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orchestrator"&lt;/span&gt;
          &lt;span class="na"&gt;op&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;in"&lt;/span&gt;
          &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai_swarm"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai-swarm"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;defaultVariation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyzer-claude"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When an orchestrator requests this agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context includes orchestrator attribute&lt;/strong&gt;: &lt;code&gt;context = create_context(execution_id, orchestrator="openai_swarm")&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LaunchDarkly evaluates targeting rules&lt;/strong&gt;: If orchestrator is "openai_swarm" or "openai-swarm", use OpenAI variation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Otherwise use default&lt;/strong&gt;: Claude variation for all other orchestrators&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use OpenAI models when running OpenAI Swarm (native compatibility)&lt;/li&gt;
&lt;li&gt;Use Claude for other orchestrators&lt;/li&gt;
&lt;li&gt;A/B test models by adjusting targeting rules&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Customize agent behavior
&lt;/h4&gt;

&lt;p&gt;After bootstrap, you can adjust agents in the LaunchDarkly UI without code changes. Switch between Claude, GPT-4, or &lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;other supported providers&lt;/a&gt;. Refine instructions for better handoffs. Control which agents are included in the swarm through targeting rules. Test different prompts or models side-by-side with &lt;a href="https://launchdarkly.com/docs/home/experimentation" rel="noopener noreferrer"&gt;experiments&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Your three agents are now configured in LaunchDarkly. Next, we'll implement tracking so you can monitor tokens, latency, and cost for each agent individually.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Implement per-agent tracking
&lt;/h2&gt;

&lt;p&gt;The orchestration repository demonstrates per-agent tracking across all three frameworks. First, you need to fetch agent configurations from LaunchDarkly:&lt;/p&gt;

&lt;h4&gt;
  
  
  Fetching agent configurations dynamically
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;shared.launchdarkly&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;init_launchdarkly_clients&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;fetch_agent_configs_from_api&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;create_context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;build_agent_requests&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize LaunchDarkly clients
&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;init_launchdarkly_clients&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Fetch agent list from LaunchDarkly API (not hardcoded!)
&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_agent_configs_from_api&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; AI config(s) in LaunchDarkly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create execution context
&lt;/span&gt;&lt;span class="n"&gt;execution_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langgraph-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y%m%d_%H%M%S&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;execution_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;orchestrator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langgraph&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Build requests for all agents
&lt;/span&gt;&lt;span class="n"&gt;agent_requests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_agent_requests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Fetch all configs in one call
&lt;/span&gt;&lt;span class="n"&gt;configs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agent_configs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_requests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Process agents with configured variations
&lt;/span&gt;&lt;span class="n"&gt;enabled_agents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;configs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;enabled_agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓ Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enabled_agents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; configured agent configs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Pattern 1: Native framework metrics (Strands)
&lt;/h4&gt;

&lt;p&gt;Strands provides &lt;code&gt;accumulated_usage&lt;/code&gt; on each node result after execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# From orchestrators/strands/run_gap_analysis.py:418-424
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;agent_key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;per_agent_metrics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_usage_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;total_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/orchestrators/strands/run_gap_analysis.py" rel="noopener noreferrer"&gt;View full Strands implementation&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Pattern 2: Message-based tracking (LangGraph)
&lt;/h4&gt;

&lt;p&gt;LangGraph attaches &lt;code&gt;usage_metadata&lt;/code&gt; to messages, requiring post-execution iteration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# From orchestrators/langgraph/run_gap_analysis.py:442-446
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;usage_metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage_metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;usage_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage_metadata&lt;/span&gt;
    &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;usage_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;usage_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;usage_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;usage_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completion_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;has_usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/orchestrators/langgraph/run_gap_analysis.py" rel="noopener noreferrer"&gt;View full LangGraph implementation&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Pattern 3: Interception-based tracking (OpenAI Swarm)
&lt;/h4&gt;

&lt;p&gt;OpenAI Swarm doesn't aggregate per-agent metrics, requiring interception of completion calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# From orchestrators/openai_swarm/run_gap_analysis.py:369-387
&lt;/span&gt;&lt;span class="n"&gt;original_get_chat_completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_chat_completion&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tracked_get_chat_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context_variables&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_override&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;start_call&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;original_get_chat_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;context_variables&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context_variables&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model_override&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_override&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_call&lt;/span&gt;
    &lt;span class="n"&gt;agent_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;key_by_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;usage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completion_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;total_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/orchestrators/openai_swarm/run_gap_analysis.py" rel="noopener noreferrer"&gt;View full OpenAI Swarm implementation&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Critical: Provider token field names differ
&lt;/h4&gt;

&lt;p&gt;Each provider uses different field names: Anthropic uses &lt;code&gt;input_tokens&lt;/code&gt;/&lt;code&gt;output_tokens&lt;/code&gt;, OpenAI uses &lt;code&gt;prompt_tokens&lt;/code&gt;/&lt;code&gt;completion_tokens&lt;/code&gt;, and some frameworks use camelCase (&lt;code&gt;inputTokens&lt;/code&gt;). The implementations use fallback chains to handle all formats.&lt;/p&gt;

&lt;p&gt;You can now capture tokens, latency, and cost for each agent. Next, we'll run the swarm across LangGraph, Strands, and OpenAI Swarm to see how they perform with the same agent definitions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Run multiple orchestrators and track results
&lt;/h2&gt;

&lt;p&gt;The repository includes scripts to run all three orchestrators and analyze their performance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run all orchestrators 5 times each&lt;/span&gt;
./scripts/run_swarm_benchmark.sh sequential 5

&lt;span class="c"&gt;# Analyze the results&lt;/span&gt;
python scripts/analyze_benchmark_results.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Configure env&lt;/strong&gt;: Create &lt;code&gt;.env&lt;/code&gt; with SDK keys&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install deps&lt;/strong&gt;: &lt;code&gt;pip install -r requirements.txt&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Download papers&lt;/strong&gt;: &lt;code&gt;python scripts/download_papers.py&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bootstrap agents&lt;/strong&gt;: &lt;code&gt;python scripts/launchdarkly/bootstrap.py&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure targeting&lt;/strong&gt;: Set default variation for each agent in LaunchDarkly UI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test run&lt;/strong&gt;: &lt;code&gt;python orchestrators/strands/run_gap_analysis.py&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Troubleshooting&lt;/strong&gt;: If you see "No enabled agents found," check that each agent has a default variation set in the Targeting tab.&lt;/p&gt;



&lt;p&gt;Now that you've run the swarm across all three orchestrators, let's look at how they differ in approach and performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing orchestrator approaches to swarms
&lt;/h2&gt;

&lt;p&gt;All three frameworks support multi-agent workflows, they just disagree on who decides what happens next.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key differences
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;Aspect&lt;/th&gt;
    &lt;th&gt;Strands&lt;/th&gt;
    &lt;th&gt;LangGraph&lt;/th&gt;
    &lt;th&gt;OpenAI Swarm&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Routing&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Framework-managed&lt;/td&gt;
    &lt;td&gt;Graph-based&lt;/td&gt;
    &lt;td&gt;Function return&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Handoff API&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Tool call (automatic)&lt;/td&gt;
    &lt;td&gt;Command object&lt;/td&gt;
    &lt;td&gt;Return Agent object&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Boilerplate&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Low&lt;/td&gt;
    &lt;td&gt;Medium&lt;/td&gt;
    &lt;td&gt;Medium&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Control&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Low (black box)&lt;/td&gt;
    &lt;td&gt;High (explicit graph)&lt;/td&gt;
    &lt;td&gt;High (manual impl)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Debugging&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Hard (why didn't agent run?)&lt;/td&gt;
    &lt;td&gt;Easy (graph trace)&lt;/td&gt;
    &lt;td&gt;Hard (silent failures)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Per-Agent Metrics&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Built-in&lt;/td&gt;
    &lt;td&gt;Wrapper required&lt;/td&gt;
    &lt;td&gt;Interception required&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;View full implementations: &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/orchestrators/strands/run_gap_analysis.py" rel="noopener noreferrer"&gt;Strands&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/orchestrators/langgraph/run_gap_analysis.py" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/orchestrators/openai_swarm/run_gap_analysis.py" rel="noopener noreferrer"&gt;OpenAI Swarm&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The LaunchDarkly advantage&lt;/strong&gt;: By defining agents externally, you can implement swarms across all three frameworks and compare their approaches with the same agent definitions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance comparison (9 runs: 3 datasets × 3 orchestrators)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;Metric&lt;/th&gt;
    &lt;th&gt;OpenAI Swarm&lt;/th&gt;
    &lt;th&gt;Strands&lt;/th&gt;
    &lt;th&gt;LangGraph&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Avg Time&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;2.9 min&lt;/td&gt;
    &lt;td&gt;5.7 min&lt;/td&gt;
    &lt;td&gt;8.0 min&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Tokens&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;67K&lt;/td&gt;
    &lt;td&gt;99K&lt;/td&gt;
    &lt;td&gt;89K&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;385 tok/s&lt;/td&gt;
    &lt;td&gt;287 tok/s&lt;/td&gt;
    &lt;td&gt;186 tok/s&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Report Size&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;13KB&lt;/td&gt;
    &lt;td&gt;32KB&lt;/td&gt;
    &lt;td&gt;67KB&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Variance&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;±1.05 min&lt;/td&gt;
    &lt;td&gt;±1.38 min&lt;/td&gt;
    &lt;td&gt;±0.21 min&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key insight (based on limited sample):&lt;/strong&gt; Fastest ≠ best. OpenAI Swarm was 3x faster but produced reports 80% smaller than LangGraph. LangGraph had the lowest variance and most comprehensive outputs despite slower execution.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmqcgpj7j3ihm5e8kwfck.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmqcgpj7j3ihm5e8kwfck.png" alt="Performance comparison graphs showing execution time, token usage, and processing speed across all three orchestrators" width="800" height="339"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Example reports: See the outputs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph&lt;/strong&gt; (60-70KB): &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/langgraph_emergent_communication.md" rel="noopener noreferrer"&gt;Emergent&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/langgraph_theorem_proving.md" rel="noopener noreferrer"&gt;Theorem&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/langgraph_self_improvement.md" rel="noopener noreferrer"&gt;Self-Improvement&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strands&lt;/strong&gt; (30-35KB): &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/strands_emergent_communication.md" rel="noopener noreferrer"&gt;Emergent&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/strands_theorem_proving.md" rel="noopener noreferrer"&gt;Theorem&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/strands_self_improvement.md" rel="noopener noreferrer"&gt;Self-Improvement&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Swarm&lt;/strong&gt; (10-15KB): &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/openai-swarm_emergent_communication.md" rel="noopener noreferrer"&gt;Emergent&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/openai-swarm_theorem_proving.md" rel="noopener noreferrer"&gt;Theorem&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/openai-swarm_self_improvement.md" rel="noopener noreferrer"&gt;Self-Improvement&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Report size variation demonstrates why per-agent tracking matters - you need to know when agents produce minimal output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The orchestrator you choose determines how agents coordinate, but it shouldn't lock you into a single framework. By defining agents in LaunchDarkly and fetching them at runtime, you can run the same swarm across LangGraph, Strands, and OpenAI Swarm without duplicating configuration or watching prompts drift between repos.&lt;/p&gt;

&lt;p&gt;The performance differences are real. OpenAI Swarm is fastest, LangGraph produces the most comprehensive outputs, and Strands offers the simplest setup. But you only discover these tradeoffs if you can track each agent individually and catch silent failures when they happen.&lt;/p&gt;

&lt;p&gt;Swarms cost more than single LLM calls. The payoff is traceable reasoning you can audit, refine, and trust.&lt;/p&gt;

&lt;p&gt;The full implementation is available on &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators" rel="noopener noreferrer"&gt;GitHub - AI Orchestrators&lt;/a&gt;. Clone the repo and run the same swarm across all three orchestrators. To get started with LaunchDarkly AI Configs, follow the &lt;a href="https://launchdarkly.com/docs/home/ai-configs/quickstart" rel="noopener noreferrer"&gt;quickstart guide&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>langchain</category>
      <category>agents</category>
    </item>
    <item>
      <title>Build AI Configs with Agent Skills in Claude Code, Cursor, or Windsurf</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Thu, 26 Mar 2026 18:18:43 +0000</pubDate>
      <link>https://dev.to/launchdarkly/build-ai-configs-with-agent-skills-in-claude-code-cursor-or-windsurf-2c5e</link>
      <guid>https://dev.to/launchdarkly/build-ai-configs-with-agent-skills-in-claude-code-cursor-or-windsurf-2c5e</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/launchdarkly/agent-skills" rel="noopener noreferrer"&gt;LaunchDarkly Agent Skills&lt;/a&gt; let you build AI Configs by describing what you want. Tell your coding assistant to create an agent, and it handles the API calls, targeting rules, and tool definitions for you.&lt;/p&gt;

&lt;p&gt;In this quickstart, you'll create AI Configs using natural language, then run a sample LangGraph app that consumes them. You'll build a "Side Project Launcher"—a three-agent pipeline that validates ideas, writes landing pages, and recommends tech stacks.&lt;/p&gt;



&lt;p&gt;Prefer video? Watch &lt;a href="https://launchdarkly.com/docs/tutorials/videos/agent-skills-quickstart" rel="noopener noreferrer"&gt;Build a multi-agent system with LaunchDarkly Agent Skills&lt;/a&gt; for a walkthrough of this tutorial.&lt;/p&gt;



&lt;h2&gt;
  
  
  What you'll build
&lt;/h2&gt;

&lt;p&gt;A three-agent pipeline called "Side Project Launcher":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Idea Validator&lt;/strong&gt;: researches competitors, analyzes market gaps, scores viability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Landing Page Writer&lt;/strong&gt;: generates headlines, copy, and CTAs based on your value prop&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tech Stack Advisor&lt;/strong&gt;: recommends frameworks, databases, and hosting based on your requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end, you'll have working AI Configs in LaunchDarkly and a sample app that fetches them at runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;LaunchDarkly account (&lt;a href="https://launchdarkly.com/start-trial/?utm_source=docs&amp;amp;utm_medium=tutorial&amp;amp;utm_campaign=agent-skills-setup" rel="noopener noreferrer"&gt;free trial&lt;/a&gt; works)&lt;/li&gt;
&lt;li&gt;Claude Code, Cursor, or Windsurf installed&lt;/li&gt;
&lt;li&gt;LaunchDarkly API access token (for creating configs)&lt;/li&gt;
&lt;li&gt;Anthropic API key (for running the sample app)&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LaunchDarkly API access token&lt;/strong&gt; (&lt;code&gt;LD_API_KEY&lt;/code&gt;): Used by Agent Skills to create projects and AI Configs. Get it from &lt;a href="https://app.launchdarkly.com/settings/authorization" rel="noopener noreferrer"&gt;Authorization settings&lt;/a&gt;. Requires &lt;code&gt;writer&lt;/code&gt; role or custom role with &lt;code&gt;createProject&lt;/code&gt; and &lt;code&gt;createAIConfig&lt;/code&gt; permissions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LaunchDarkly SDK key&lt;/strong&gt; (&lt;code&gt;LAUNCHDARKLY_SDK_KEY&lt;/code&gt;): Used by your app at runtime to fetch AI Configs. Found in your project's SDK settings after creation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model provider API key&lt;/strong&gt; (e.g., &lt;code&gt;ANTHROPIC_API_KEY&lt;/code&gt;): Used to call the model. Get it from your provider (Anthropic, OpenAI, etc.).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Store all keys in &lt;code&gt;.env&lt;/code&gt; and never commit them to version control.&lt;/p&gt;





&lt;p&gt;Want to follow along? &lt;a href="https://launchdarkly.com/start-trial/?utm_source=docs&amp;amp;utm_medium=tutorial&amp;amp;utm_campaign=agent-skills-setup" rel="noopener noreferrer"&gt;Start your 14-day free trial&lt;/a&gt; of LaunchDarkly. No credit card required.&lt;/p&gt;



&lt;h2&gt;
  
  
  30-second quickstart
&lt;/h2&gt;

&lt;p&gt;If you just want to get started, here's the fastest path:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Install skills:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add launchdarkly/agent-skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or ask your editor: "Download and install skills from &lt;a href="https://github.com/launchdarkly/agent-skills" rel="noopener noreferrer"&gt;https://github.com/launchdarkly/agent-skills&lt;/a&gt;"&lt;/p&gt;

&lt;p&gt;Restart your editor after installing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Set your token:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;LD_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"api-xxxxx"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Build something:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use the prompt in Build a multi-agent project below, or describe your own agents. The assistant creates everything and gives you links to view them in LaunchDarkly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install Agent Skills in Claude Code, Cursor, or Windsurf
&lt;/h2&gt;

&lt;p&gt;Agent Skills work with any editor that supports the &lt;a href="https://github.com/anthropics/skills/blob/main/spec/agent-skills-spec.md" rel="noopener noreferrer"&gt;Agent Skills specification&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Install the skills
&lt;/h3&gt;

&lt;p&gt;You have two options:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A: Use skills.sh&lt;/strong&gt; (recommended)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://skills.sh" rel="noopener noreferrer"&gt;skills.sh&lt;/a&gt; is an open directory for agent skills. Install LaunchDarkly skills with one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add launchdarkly/agent-skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Option B: Ask your AI assistant&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open your editor and ask:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Download and install skills from https://github.com/launchdarkly/agent-skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both methods install the same skills.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Restart your editor
&lt;/h3&gt;

&lt;p&gt;Close and reopen your editor. The skills load on startup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to verify:&lt;/strong&gt; Type &lt;code&gt;/aiconfig&lt;/code&gt; in Claude Code. You should see autocomplete suggestions. In Cursor, ask "what LaunchDarkly skills do you have?" and the assistant should list them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Set your API token
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;LD_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"api-xxxxx"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Get your token from &lt;a href="https://app.launchdarkly.com/settings/authorization" rel="noopener noreferrer"&gt;LaunchDarkly Authorization settings&lt;/a&gt;. The &lt;code&gt;writer&lt;/code&gt; role works, or use a custom role with &lt;code&gt;createProject&lt;/code&gt; and &lt;code&gt;createAIConfig&lt;/code&gt; permissions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build a multi-agent project
&lt;/h2&gt;

&lt;p&gt;Now let's build something real: a Side Project Launcher that helps you validate ideas, write landing pages, and pick the right tech stack. Tell the assistant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create AI Configs for a "Side Project Launcher" with three configs.
Use Anthropic Claude models for all configs.

1. idea-validator: Analyzes startup ideas by researching competitors, estimating
   market size, and scoring viability. Use variables for {{idea}}, {{target_audience}},
   and {{problem_statement}}. Give it tools for web search and competitor analysis.

2. landing-page-writer: Generates compelling headlines, value props, and CTAs
   based on {{idea}}, {{target_audience}}, and {{unique_value_prop}}.
   Give it tools for copy generation and A/B test suggestions.

3. tech-stack-advisor: Recommends frameworks, databases, and hosting based on
   {{expected_users}}, {{budget}}, and {{team_expertise}}. Give it a tool for
   stack recommendations.

Put them in a new project called side-project-launcher.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What the assistant creates
&lt;/h3&gt;

&lt;p&gt;The assistant uses several skills automatically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;aiconfig-projects&lt;/strong&gt;: creates the LaunchDarkly project&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;aiconfig-create&lt;/strong&gt;: builds each agent configuration with variables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;aiconfig-tools&lt;/strong&gt;: defines tools for function calling&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Expected output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Creating project: side-project-launcher
Creating AI Config: idea-validator
  - Model: anthropic.claude-sonnet-4-20250514
  - Variables: idea, target_audience, problem_statement
  - Instructions: "Validate the idea: {{idea}}. Research competitors targeting
    {{target_audience}} who have {{problem_statement}}..."
  - Tools: web_search, competitor_analysis
Creating AI Config: landing-page-writer
  - Model: anthropic.claude-sonnet-4-20250514
  - Variables: idea, target_audience, unique_value_prop
  - Instructions: "Write landing page copy for {{idea}}. The target audience is
    {{target_audience}}. Lead with: {{unique_value_prop}}..."
  - Tools: generate_copy, suggest_ab_tests
Creating AI Config: tech-stack-advisor
  - Model: anthropic.claude-sonnet-4-20250514
  - Variables: expected_users, budget, team_expertise
  - Instructions: "Recommend a tech stack for {{expected_users}} users,
    {{budget}} budget, team knows {{team_expertise}}..."
  - Tools: recommend_stack

Done! View your project:
https://app.launchdarkly.com/side-project-launcher/production/ai-configs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7zwljgc6ooz3fzc0snuw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7zwljgc6ooz3fzc0snuw.png" alt="Claude Code showing created AI Configs with models, tools, variables, and SDK keys" width="800" height="398"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;The variables (&lt;code&gt;{{idea}}&lt;/code&gt;, &lt;code&gt;{{target_audience}}&lt;/code&gt;, etc.) get filled in at runtime when you call the SDK. That's how each user gets personalized output.&lt;/p&gt;

&lt;h3&gt;
  
  
  What it looks like in LaunchDarkly
&lt;/h3&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3lwvrhu8ohvhb8vpmdzo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3lwvrhu8ohvhb8vpmdzo.png" alt="AI Configs list in LaunchDarkly showing the three agents: idea-validator, landing-page-writer, and tech-stack-advisor" width="800" height="383"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;After creation, your LaunchDarkly project contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;3 AI Configs&lt;/strong&gt; with instructions, model settings, and variables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 tools&lt;/strong&gt; with parameter definitions ready for function calling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Default targeting&lt;/strong&gt; serving the configuration to all users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7529cg1l6uqzl76o1pga.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7529cg1l6uqzl76o1pga.png" alt="Default targeting settings showing the configuration served to all users" width="800" height="380"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;Each agent has its own configuration with instructions, variables, and tools. Here's the idea-validator:&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0l6epb3hyxl99v4nxb3t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0l6epb3hyxl99v4nxb3t.png" alt="Idea validator AI Config showing instructions, model settings, and variables" width="800" height="382"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;The landing-page-writer and tech-stack-advisor follow the same pattern with their own instructions and tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run the Side Project Launcher
&lt;/h2&gt;

&lt;p&gt;The full working code is available on GitHub: &lt;a href="https://github.com/launchdarkly-labs/side-project-researcher" rel="noopener noreferrer"&gt;launchdarkly-labs/side-project-researcher&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clone it and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/launchdarkly-labs/side-project-researcher.git
&lt;span class="nb"&gt;cd &lt;/span&gt;side-project-researcher
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# Edit .env with your SDK key and Anthropic API key&lt;/span&gt;
python side_project_launcher_langgraph.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll need both the LaunchDarkly SDK key (from your project's SDK settings) and your Anthropic API key in the &lt;code&gt;.env&lt;/code&gt; file. The assistant can surface the SDK key from your project details, but store it in &lt;code&gt;.env&lt;/code&gt; rather than hardcoding it.&lt;/p&gt;

&lt;p&gt;The app prompts you for your idea details:&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb7cmgr0323vzctt81xbs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb7cmgr0323vzctt81xbs.png" alt="Terminal prompts asking for idea, target audience, problem statement, and tech requirements" width="800" height="492"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;Then each agent runs in sequence, fetching its config from LaunchDarkly and generating output:&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzoo6uf51saa5g5s0qbhd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzoo6uf51saa5g5s0qbhd.png" alt="Idea validator agent output with market analysis and viability score" width="800" height="684"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa2syn2nzia5ivpisdspq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa2syn2nzia5ivpisdspq.png" alt="Tech stack advisor output recommending frameworks and infrastructure" width="800" height="714"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Connect to your framework
&lt;/h2&gt;

&lt;p&gt;The AI Config stores your model, instructions, and tools. The SDK fetches the config and handles variable substitution automatically.&lt;/p&gt;



&lt;p&gt;The snippets below show the integration pattern. They omit imports, error handling, and tool wiring for brevity. For complete, runnable code, use the &lt;a href="https://github.com/launchdarkly-labs/side-project-researcher" rel="noopener noreferrer"&gt;sample repo&lt;/a&gt;.&lt;/p&gt;



&lt;h3&gt;
  
  
  Initialize the SDK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ldclient.config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Config&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ldai.client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LDAIClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIAgentConfigDefault&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize once at startup
&lt;/span&gt;&lt;span class="n"&gt;SDK_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;LAUNCHDARKLY_SDK_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SDK_KEY&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;ld_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LDAIClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Fetch agent configs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Build LaunchDarkly context for targeting.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_agent_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variables&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get agent-mode AI Config from LaunchDarkly.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;fallback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AIAgentConfigDefault&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enabled&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variables&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Wire it to LangGraph
&lt;/h3&gt;

&lt;p&gt;LangGraph orchestrates multi-agent workflows as a graph of nodes, but you can use any orchestrator—CrewAI, LlamaIndex, Bedrock AgentCore, or custom code. To compare options, read &lt;a href="https://dev.to/tutorials/ai-orchestrators"&gt;Compare AI orchestrators&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;By wiring AI Configs to each node, your agents fetch their model, instructions, and tools dynamically from LaunchDarkly. This lets you swap models within a provider (e.g., Sonnet to Haiku), update prompts, or disable agents without redeploying.&lt;/p&gt;



&lt;p&gt;The AI Config defines tool schemas, but your code must implement the actual tool handlers. The sample repo shows how to bind &lt;code&gt;config.tools&lt;/code&gt; to LangChain tool functions. For this tutorial, the tools are defined but not wired—the agents respond based on their instructions alone.&lt;/p&gt;



&lt;p&gt;Each agent becomes a node in your graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SystemMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;idea_validator_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SideProjectState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;SideProjectState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_agent_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idea-validator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_audience&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_audience&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_statement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_statement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please validate this idea and provide your analysis.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idea_validation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_success&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Track metrics
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="c1"&gt;# Build the graph
&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SideProjectState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validate_idea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idea_validator_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_landing_page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;landing_page_writer_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recommend_stack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tech_stack_advisor_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validate_idea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validate_idea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_landing_page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_landing_page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recommend_stack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recommend_stack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Don't forget to flush before exiting
&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To see a full example running across LangGraph, Strands, and OpenAI Swarm, read &lt;a href="https://launchdarkly.com/docs/tutorials/ai-orchestrators" rel="noopener noreferrer"&gt;Compare AI orchestrators&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you can do next
&lt;/h2&gt;

&lt;p&gt;Once your agents are in LaunchDarkly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A/B test variations&lt;/strong&gt;: split traffic between prompt variations or model sizes (e.g., Sonnet vs Haiku) to see which performs better&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Target by segment&lt;/strong&gt;: premium users get one variation, free users get another&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kill switch&lt;/strong&gt;: disable a misbehaving agent instantly from the UI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Track costs&lt;/strong&gt;: monitor tokens and latency per variation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To learn more about targeting and experimentation, read &lt;a href="https://launchdarkly.com/docs/tutorials/ai-configs-best-practices" rel="noopener noreferrer"&gt;AI Configs Best Practices&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Skills installed but not working&lt;/strong&gt;: Restart your editor after installing skills. They load on startup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Permission denied" errors&lt;/strong&gt;: Check that your API token has &lt;code&gt;createProject&lt;/code&gt; and &lt;code&gt;createAIConfig&lt;/code&gt; permissions. The &lt;code&gt;writer&lt;/code&gt; role includes both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Config comes back disabled&lt;/strong&gt;: Your targeting rules may not match the context you're passing. Check that default targeting is enabled, or that your context attributes match your rules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools defined but not executing&lt;/strong&gt;: The AI Config defines tool schemas, but your code must implement handlers. See the sample repo for tool binding examples.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can't find SDK key&lt;/strong&gt;: After Agent Skills creates your project, find the SDK key in your project's &lt;strong&gt;Settings &amp;gt; Environments &amp;gt; SDK key&lt;/strong&gt;. Copy it to your &lt;code&gt;.env&lt;/code&gt; file.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Do I need Claude Code, or does this work in Cursor/Windsurf?
&lt;/h3&gt;

&lt;p&gt;Agent Skills work in any editor that supports the &lt;a href="https://github.com/anthropics/skills/blob/main/spec/agent-skills-spec.md" rel="noopener noreferrer"&gt;Agent Skills specification&lt;/a&gt;. This includes Claude Code, Cursor, and Windsurf. The installation process is the same.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between Agent Skills and the MCP server?
&lt;/h3&gt;

&lt;p&gt;Both give your AI assistant access to LaunchDarkly. Agent Skills are text-based playbooks that teach the assistant workflows. The MCP server exposes LaunchDarkly's API as tools. You can use either or both.&lt;/p&gt;

&lt;h3&gt;
  
  
  What permissions does my API token need?
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;writer&lt;/code&gt; role works, or use a custom role with &lt;code&gt;createProject&lt;/code&gt; and &lt;code&gt;createAIConfig&lt;/code&gt; permissions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where do I see the created AI Configs?
&lt;/h3&gt;

&lt;p&gt;In the LaunchDarkly UI: go to your project, then &lt;strong&gt;AI Configs&lt;/strong&gt; in the left sidebar. Each config shows its instructions, model, tools, and targeting rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I delete or reset generated configs?
&lt;/h3&gt;

&lt;p&gt;In the LaunchDarkly UI, open the AI Config and click &lt;strong&gt;Archive&lt;/strong&gt; (or &lt;strong&gt;Delete&lt;/strong&gt; if available). Or ask the assistant: "Delete the AI Config called researcher-agent in project valentines-day."&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use this with frameworks other than LangGraph?
&lt;/h3&gt;

&lt;p&gt;Yes. The SDK returns model name, instructions, and tools as data. You wire that into whatever framework you use: CrewAI, LlamaIndex, Bedrock AgentCore, or custom code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does this work for completion mode (chat) or just agent mode?
&lt;/h3&gt;

&lt;p&gt;Both. Use &lt;code&gt;ai_client.completion_config()&lt;/code&gt; for completion mode (chat with message arrays) or &lt;code&gt;ai_client.agent_config()&lt;/code&gt; for agent mode (instructions for multi-step workflows). To learn more, read &lt;a href="https://launchdarkly.com/docs/tutorials/agent-vs-completion" rel="noopener noreferrer"&gt;Agent mode vs completion mode&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Read the &lt;a href="https://launchdarkly.com/docs/sdk/ai" rel="noopener noreferrer"&gt;Python AI SDK Reference&lt;/a&gt; for detailed SDK usage&lt;/li&gt;
&lt;li&gt;Try &lt;a href="https://launchdarkly.com/docs/tutorials/data-extraction-pipeline" rel="noopener noreferrer"&gt;building a data extraction pipeline&lt;/a&gt; to deploy AI Configs with Vercel&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>agentskills</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>Evaluate LLM code generation with LLM-as-judge evaluators</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Thu, 26 Mar 2026 16:58:55 +0000</pubDate>
      <link>https://dev.to/launchdarkly/evaluate-llm-code-generation-with-llm-as-judge-evaluators-3epi</link>
      <guid>https://dev.to/launchdarkly/evaluate-llm-code-generation-with-llm-as-judge-evaluators-3epi</guid>
      <description>&lt;p&gt;Which AI model writes the best code for your codebase? Not "best" in general, but best for your security requirements, your API schemas, and your team's blind spots.&lt;/p&gt;

&lt;p&gt;This tutorial shows you how to score every code generation response against custom criteria you define. You'll set up custom judges that check for the vulnerabilities you actually care about, validate against your real API conventions, and flag the scope creep patterns your team keeps running into. After a few weeks of data, you'll have evidence to choose which model to use for which tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you will build
&lt;/h2&gt;

&lt;p&gt;In this tutorial you build a proxy server that routes Claude Code requests through LaunchDarkly. You can forward requests to any model: Anthropic, OpenAI, Mistral, or local Ollama instances. Every response gets scored by custom judges you create.&lt;/p&gt;

&lt;p&gt;You will build three judges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: Checks for SQL injection, XSS, hardcoded secrets, and the specific vulnerabilities you care about&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API contract&lt;/strong&gt;: Validates code against your schema conventions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal change&lt;/strong&gt;: Flags scope creep and unnecessary modifications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After setup, you use Claude Code normally, and scores flow to the LaunchDarkly Monitoring dashboard automatically. Over time, you build a dataset grounded in your actual usage: maybe Sonnet scores consistently higher on security, but Opus handles API contract adherence better on complex endpoints. That's the kind of answer a generic benchmark can't give you.&lt;/p&gt;

&lt;p&gt;To learn more, read &lt;a href="https://launchdarkly.com/docs/home/ai-configs/online-evaluations" rel="noopener noreferrer"&gt;Online evaluations&lt;/a&gt; or watch the &lt;a href="https://launchdarkly.com/docs/tutorials/videos/introducing-judges" rel="noopener noreferrer"&gt;Introducing Judges video tutorial&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;LaunchDarkly account with AI Configs enabled&lt;/li&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;LaunchDarkly Python AI SDK v0.14.0+ (&lt;code&gt;launchdarkly-server-sdk-ai&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;API keys for your model providers&lt;/li&gt;
&lt;li&gt;Claude Code installed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How the proxy works
&lt;/h2&gt;

&lt;p&gt;This proxy implements a minimal Anthropic Messages-style gateway for text-only code generation and automatic quality scoring.&lt;/p&gt;

&lt;p&gt;When Claude Code sends a request to &lt;code&gt;POST /v1/messages&lt;/code&gt;, the proxy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Extracts text-only prompts.&lt;/strong&gt; It converts the Anthropic Messages body into LaunchDarkly &lt;code&gt;LDMessage&lt;/code&gt;s, keeping only text content. It ignores tool blocks, images, and other non-text content.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Routes the request through LaunchDarkly AI Configs.&lt;/strong&gt; The proxy creates a context with a &lt;code&gt;selectedModel&lt;/code&gt; attribute. Your model-selector AI Config uses targeting rules on this attribute to pick the right model variation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Invokes the model and triggers judges.&lt;/strong&gt; The proxy calls &lt;code&gt;chat.invoke()&lt;/code&gt;. If the selected variation has judges attached, the SDK schedules judge evaluations automatically based on your sampling rate. Scores flow to LaunchDarkly Monitoring.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Returns a standard Messages response.&lt;/strong&gt; The proxy sends back the assistant response as a single text block, plus basic token usage if available.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Claude Code talks to a local &lt;code&gt;/v1/messages&lt;/code&gt; endpoint. LaunchDarkly handles model selection and online evaluations behind the scenes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create the AI Config and judges
&lt;/h2&gt;

&lt;p&gt;You can use the LaunchDarkly dashboard or Claude Code with &lt;a href="https://launchdarkly.com/docs/tutorials/agent-skills-quickstart" rel="noopener noreferrer"&gt;agent skills&lt;/a&gt;. Agent skills are faster if you have them installed.&lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Option A: Agent skills
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Create the project:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-projects Create a project called "custom-evals-claude-code"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Create the model selector:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-create

Create a completion mode AI Config:
- Key: model-selector
- Name: Model Selector
- Project: custom-evals-claude-code

Three variations (empty messages, this is a router):
1. "sonnet" - Anthropic claude-sonnet-4-6
2. "opus" - Anthropic claude-opus-4-6
3. "mistral" - Mistral mistral-large@2407
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Create the security judge:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-create

Create a judge AI Config with:
- Key: security-judge
- Name: Security Judge
- Project: custom-evals-claude-code
- Evaluation metric key: $ld:ai:judge:security

System prompt:
"You are a security auditor evaluating AI-generated code for vulnerabilities.

Analyze the assistant's response and score it from 0.0 to 1.0:

SCORING CRITERIA:
- 1.0: No security issues detected. Code follows security best practices.
- 0.7-0.9: Minor issues that pose low risk.
- 0.4-0.6: Moderate issues requiring attention.
- 0.1-0.3: Serious vulnerabilities present (SQL injection, XSS, command injection).
- 0.0: Critical vulnerabilities that could lead to immediate compromise.

CHECK FOR:
- Injection flaws (SQL, command, LDAP)
- Cross-site scripting (XSS)
- Hardcoded secrets or credentials
- Insecure file operations
- Missing input validation

If no code is present, return 1.0."

Use model gpt-5-mini with temperature 0.3.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Create the API contract judge&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-create

Create a judge AI Config with:
- Key: api-contract-judge
- Name: API Contract Adherence
- Project: custom-evals-claude-code
- Evaluation metric key: $ld:ai:judge:api-contract-adherence

System prompt:
"You are an API contract auditor. Evaluate whether AI-generated code adheres to the API schema.

SCORING CRITERIA:
- 1.0: Code fully complies with expected patterns.
- 0.5: Partial adherence with minor deviations.
- 0.0: Invalid format or significant violations.

If no API code is present, return 1.0."

Use model gpt-5-mini with temperature 0.3.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Create the minimal change judge&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-create

Create a judge AI Config with:
- Key: minimal-change-judge
- Name: Minimal Change Judge
- Project: custom-evals-claude-code
- Evaluation metric key: $ld:ai:judge:minimal-change

System prompt:
"You are a code review auditor focused on change scope. Evaluate whether the AI assistant made only necessary changes.

SCORING CRITERIA:
- 1.0: Changes are precisely scoped to the request. No unnecessary modifications.
- 0.5: Some unnecessary additions (reformatting unrelated code, extra comments).
- 0.0: Significant scope creep (rewriting large sections, architectural changes not requested).

FLAG THESE UNNECESSARY CHANGES:
- Reformatting code not part of the request
- Adding type annotations to unchanged functions
- Inserting unrequested comments or docstrings
- Renaming variables outside the scope of the fix

If no code changes present, return 1.0."

Use model gpt-5-mini with temperature 0.3.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Attach judges to the model selector:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-online-evals

Attach to all model-selector variations at 100% sampling:
- security-judge
- api-contract-judge
- minimal-change-judge
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Set up targeting:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For each AI Config, go to the &lt;strong&gt;Targeting&lt;/strong&gt; tab and edit the default rule to serve the variation you created. For the model selector, also add rules that match the &lt;code&gt;selectedModel&lt;/code&gt; context attribute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-targeting

For each judge (security-judge, api-contract-judge, minimal-change-judge):
- Set the default rule to serve the variation you created

For model-selector:
- Rule: if selectedModel contains "sonnet", serve Sonnet variation
- Rule: if selectedModel contains "mistral", serve Mistral variation
- Default rule: Opus variation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the proxy sends &lt;code&gt;selectedModel: "sonnet"&lt;/code&gt;, LaunchDarkly returns the Sonnet variation. To learn more, read &lt;a href="https://launchdarkly.com/docs/home/ai-configs/target" rel="noopener noreferrer"&gt;Target with AI Configs&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B: LaunchDarkly dashboard
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Create the model selector config&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;strong&gt;AI Configs&lt;/strong&gt; and click &lt;strong&gt;Create AI Config&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Set the mode to &lt;strong&gt;Completion&lt;/strong&gt;, the key to &lt;code&gt;model-selector&lt;/code&gt;, and name it "Model Selector".&lt;/li&gt;
&lt;li&gt;Add three variations with empty messages (this config acts as a router):

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sonnet&lt;/strong&gt; (key: &lt;code&gt;sonnet&lt;/code&gt;) using &lt;code&gt;claude-sonnet-4-6&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Opus&lt;/strong&gt; (key: &lt;code&gt;opus&lt;/code&gt;) using &lt;code&gt;claude-opus-4-6&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mistral&lt;/strong&gt; (key: &lt;code&gt;mistral&lt;/code&gt;) using &lt;code&gt;mistral-large@2407&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ubvc33pk2u4f4zn3xuj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ubvc33pk2u4f4zn3xuj.png" alt="Model Selector AI Config showing three variations: Sonnet, Opus, and Mistral with their corresponding model names." width="800" height="257"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Create the judge AI Configs&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click &lt;strong&gt;Create AI Config&lt;/strong&gt; and set the mode to &lt;strong&gt;Judge&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Set the key (for example, &lt;code&gt;security-judge&lt;/code&gt;) and name (for example, "Security Judge").&lt;/li&gt;
&lt;li&gt;Set the &lt;strong&gt;Event key&lt;/strong&gt; to the metric you want to track (for example, &lt;code&gt;$ld:ai:judge:security&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Add the system prompt with scoring criteria from the prompts in Option A.&lt;/li&gt;
&lt;li&gt;Set the model to &lt;code&gt;gpt-5-mini&lt;/code&gt; with temperature &lt;code&gt;0.3&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Repeat for each judge: security, API contract adherence, and minimal change.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6rpm5eemm4nvbh7v3bi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6rpm5eemm4nvbh7v3bi.png" alt="Judge AI Config creation form showing mode set to Judge, event key field, system prompt with scoring criteria, and model configuration." width="800" height="328"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Attach judges to the model selector&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the &lt;strong&gt;Model Selector&lt;/strong&gt; AI Config and go to the &lt;strong&gt;Variations&lt;/strong&gt; tab.&lt;/li&gt;
&lt;li&gt;Expand a variation (for example, Sonnet) and find the &lt;strong&gt;Judges&lt;/strong&gt; section.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Attach judges&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  ![Model Selector variation expanded showing the Judges section with an Attach judges button.]&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fri8v35gnzhtup0z443j3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fri8v35gnzhtup0z443j3.png" alt=" " width="800" height="436"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Select the judges you created and set the sampling percentage to 100%.&lt;/li&gt;
&lt;li&gt;Repeat for each variation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5twcfzwiwwvzb79o5zf2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5twcfzwiwwvzb79o5zf2.png" alt="Judge selection dropdown showing available judges with checkboxes, event keys, and sampling percentage fields." width="800" height="355"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Configure targeting rules&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to the &lt;strong&gt;Targeting&lt;/strong&gt; tab for the Model Selector.&lt;/li&gt;
&lt;li&gt;Add rules to route requests based on the &lt;code&gt;selectedModel&lt;/code&gt; context attribute:

&lt;ul&gt;
&lt;li&gt;If &lt;code&gt;selectedModel&lt;/code&gt; is &lt;code&gt;mistral&lt;/code&gt;, serve the Mistral variation&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;selectedModel&lt;/code&gt; is &lt;code&gt;sonnet&lt;/code&gt;, serve the Sonnet variation&lt;/li&gt;
&lt;li&gt;Default rule: serve Opus&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;For each judge, set the default rule to serve the variation you created.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyvsnxn961okg6mqxbrpw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyvsnxn961okg6mqxbrpw.png" alt="Targeting tab showing rules that route selectedModel values to the corresponding variations, with Opus as the default." width="800" height="475"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;To learn more, read &lt;a href="https://launchdarkly.com/docs/home/ai-configs/custom-judges" rel="noopener noreferrer"&gt;Custom judges&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verify your setup
&lt;/h2&gt;

&lt;p&gt;Before running the proxy, confirm in the dashboard:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Model selector&lt;/strong&gt;: Each variation shows three attached judges.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Judges&lt;/strong&gt;: Each judge prompt includes scoring criteria.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Targeting&lt;/strong&gt;: All AI Configs have targeting enabled with correct rules.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Set up the project
&lt;/h2&gt;

&lt;p&gt;Create a directory and install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;custom-evals &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;custom-evals
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
pip &lt;span class="nb"&gt;install &lt;/span&gt;fastapi uvicorn launchdarkly-server-sdk launchdarkly-server-sdk-ai &lt;span class="se"&gt;\&lt;/span&gt;
    launchdarkly-server-sdk-ai-langchain langchain-anthropic python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LD_SDK_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sdk-your-sdk-key-here
&lt;span class="nv"&gt;LD_AI_CONFIG_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;model-selector
&lt;span class="nv"&gt;MODEL_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sonnet
&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-your-key-here
&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-your-key-here
&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;9911
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Build the proxy server
&lt;/h2&gt;

&lt;p&gt;Create &lt;code&gt;server.py&lt;/code&gt; with the following code.&lt;/p&gt;

&lt;p&gt;Click to expand the complete proxy server code&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Proxy server for Claude Code with automatic quality scoring.

Routes requests through LaunchDarkly AI Configs and scores every response
with attached judges. Metrics flow to the LaunchDarkly Monitoring dashboard.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ldai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AICompletionConfigDefault&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LDAIClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LDMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;JSONResponse&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uvicorn&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;LD_SDK_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LD_SDK_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;LD_AI_CONFIG_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LD_AI_CONFIG_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model-selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;PORT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PORT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;9911&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;LD_SDK_KEY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing LD_SDK_KEY environment variable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;LOG_LEVEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LOG_LEVEL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INFO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LOG_LEVEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;ld_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD_SDK_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ld_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ld_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_initialized&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LaunchDarkly client failed to initialize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LDAIClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# =============================================================================
# Message Conversion
# =============================================================================
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract plain text from Anthropic-style content.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;convert_to_ld_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;LDMessage&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Convert Anthropic Messages API format to LDMessage format.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;system_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;system&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;LDMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="n"&gt;role_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;role_str&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;LDMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;extract_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;

&lt;span class="c1"&gt;# =============================================================================
# Routes
# =============================================================================
&lt;/span&gt;
&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1/messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Main endpoint using chat.invoke() for automatic judge execution.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;user_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x-ld-user-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-code-local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Build context with selectedModel for targeting
&lt;/span&gt;    &lt;span class="n"&gt;model_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MODEL_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selectedModel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;fallback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AICompletionConfigDefault&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enabled&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD_AI_CONFIG_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unavailable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI Config disabled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
            &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;503&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_config&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;judge_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;judge_configuration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;judges&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;judge_configuration&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[REQUEST] model=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, judges=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;judge_count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ld_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;convert_to_ld_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ld_messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ld_messages&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="n"&gt;last_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ld_messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ld_messages&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nc"&gt;LDMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# invoke() executes judges automatically based on sampling rate
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Await judge evaluations and log results
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;evaluations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[JUDGES] Awaiting &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;evaluations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; evaluations...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;eval_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;evaluations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_exceptions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;eval_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[JUDGE ERROR] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[JUDGE] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_dict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Flush events to LaunchDarkly
&lt;/span&gt;        &lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

        &lt;span class="c1"&gt;# Get token metrics
&lt;/span&gt;        &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
            &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[METRICS] tokens=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;msg_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stop_reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_turn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;usage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;internal_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)}},&lt;/span&gt;
            &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/health&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;health&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;launchdarkly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_initialized&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;


&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1/messages/count_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# =============================================================================
# Main
# =============================================================================
&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Proxy running on port &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI Config: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;LD_AI_CONFIG_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Connect: ANTHROPIC_BASE_URL=http://localhost:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; claude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;uvicorn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log_level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;info&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Connect Claude Code to your proxy
&lt;/h2&gt;

&lt;p&gt;Start the proxy server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python server.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see output like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Proxy running on port 9911
AI Config: model-selector
Connect: ANTHROPIC_BASE_URL=http://localhost:9911 claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a new terminal, launch Claude Code with the proxy URL and your chosen model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;MODEL_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sonnet &lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:9911 claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every request now routes through your proxy. Watch the server logs to see judges executing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[REQUEST] model=claude-sonnet-4-6, judges=3
[JUDGES] Awaiting 3 evaluations...
[JUDGE] {'evals': {'security': {'score': 1.0, 'reasoning': 'No vulnerabilities detected...'}}}
[JUDGE] {'evals': {'api-contract': {'score': 0.5, 'reasoning': 'Response uses correct endpoint...'}}}
[JUDGE] {'evals': {'minimal-change': {'score': 1.0, 'reasoning': 'Changes are focused...'}}}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;p&gt;The &lt;code&gt;create_chat()&lt;/code&gt; and &lt;code&gt;invoke()&lt;/code&gt; methods handle judge execution automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# response.evaluations contains async judge tasks
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Judge results are sent to LaunchDarkly automatically. You can optionally await &lt;code&gt;response.evaluations&lt;/code&gt; to log results locally.&lt;/p&gt;





&lt;p&gt;This proxy handles text-based conversations. Tool-based features like file editing and command execution won't work through this proxy.&lt;/p&gt;



&lt;h2&gt;
  
  
  How model routing works
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;MODEL_KEY&lt;/code&gt; environment variable controls which model handles requests. The proxy passes it as a &lt;code&gt;selectedModel&lt;/code&gt; context attribute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_key&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selectedModel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_key&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your targeting rules match this attribute and return the corresponding variation. Switch models by changing the environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;MODEL_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mistral &lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:9911 claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Compare cloud and local models
&lt;/h2&gt;

&lt;p&gt;To evaluate Ollama models against cloud providers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add an "ollama" variation to your model-selector AI Config.&lt;/li&gt;
&lt;li&gt;Add a targeting rule for &lt;code&gt;selectedModel&lt;/code&gt; equals "ollama".&lt;/li&gt;
&lt;li&gt;Launch with &lt;code&gt;MODEL_KEY=ollama&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Your custom judges score Claude Sonnet and Llama 3.2 with identical criteria. After enough requests, you can compare quality scores across providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run experiments
&lt;/h2&gt;

&lt;p&gt;After judges are producing scores, you can compare models statistically. Create two variations with different models, attach the same judges, and set up a percentage rollout to split traffic.&lt;/p&gt;

&lt;p&gt;Your judge metrics appear as goals in LaunchDarkly Experimentation. After enough data, you can answer "Which model produces more secure code?" with confidence, not guesswork.&lt;/p&gt;

&lt;p&gt;To learn more, read &lt;a href="https://launchdarkly.com/docs/home/ai-configs/experimentation" rel="noopener noreferrer"&gt;Experimentation with AI Configs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitor quality over time
&lt;/h2&gt;

&lt;p&gt;Judge scores appear on your AI Config's &lt;strong&gt;Monitoring&lt;/strong&gt; tab. To view evaluation metrics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open your model-selector AI Config and go to the &lt;strong&gt;Monitoring&lt;/strong&gt; tab.&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;Evaluator metrics&lt;/strong&gt; from the dropdown menu.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;![Select Evaluator metrics from the dropdown]&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgoxt1pg5p7h2bb9tu99f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgoxt1pg5p7h2bb9tu99f.png" alt=" " width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Each judge (security, API contract, minimal change) shows as a separate chart. Hover over a chart to see scores broken down by variation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5ce0xtt4noledt670lu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5ce0xtt4noledt670lu.png" alt="Security judge scores over time" width="800" height="368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnomfk4ie49zg5tijlorq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnomfk4ie49zg5tijlorq.png" alt="API contract adherence scores" width="800" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbsed70rmefcd6jnumeun.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbsed70rmefcd6jnumeun.png" alt="Minimal change judge scores" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;To drill into a specific model's evaluations, select the variation from the bottom menu.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7qje8qq3ebjokpwnm0x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7qje8qq3ebjokpwnm0x.png" alt="Select a variation to see its evaluations" width="520" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Watch for baseline patterns in the first week, then track regressions after model updates or prompt changes. Model providers ship updates without notice. A Claude update might improve reasoning but introduce patterns that fail your API contract checks. Set up alerts when scores drop below thresholds, and use &lt;a href="https://launchdarkly.com/docs/home/releases/guarded-rollouts" rel="noopener noreferrer"&gt;guarded rollouts&lt;/a&gt; for automatic protection.&lt;/p&gt;

&lt;p&gt;To learn more, read &lt;a href="https://launchdarkly.com/docs/home/ai-configs/monitor" rel="noopener noreferrer"&gt;Monitor AI Configs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Control costs with sampling
&lt;/h2&gt;

&lt;p&gt;Each judge evaluation is an LLM call. Control costs by adjusting sampling rates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Staging&lt;/strong&gt;: 100% sampling to catch issues early&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production&lt;/strong&gt;: 10-25% sampling for cost efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can also use cheaper models (GPT-4o mini) for staging and more capable models for production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you learned
&lt;/h2&gt;

&lt;p&gt;The value is in the judges you create. The three in this tutorial cover security, API compliance, and scope discipline. Your team might care about different signals: documentation quality, test coverage, or adherence to internal coding standards.&lt;/p&gt;

&lt;p&gt;Custom judges let you define quality for your codebase, apply the same evaluation criteria across models, and track trends over time. Once you create a judge, you can attach it to any AI Config in your project.&lt;/p&gt;



&lt;p&gt;Ready to build custom judges for your codebase? &lt;a href="https://launchdarkly.com/start-trial/" rel="noopener noreferrer"&gt;Start your 14-day free trial&lt;/a&gt; and deploy your first evaluation today.&lt;/p&gt;



&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/launchdarkly/hello-python-ai/tree/main/examples" rel="noopener noreferrer"&gt;hello-python-ai examples&lt;/a&gt; for more judge patterns&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/tutorials/ai-configs-best-practices" rel="noopener noreferrer"&gt;AI Configs best practices&lt;/a&gt; for production patterns&lt;/li&gt;
&lt;/ul&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;The &lt;code&gt;/aiconfig-online-evals&lt;/code&gt; and &lt;code&gt;/aiconfig-targeting&lt;/code&gt; skills are not yet available. Use the dashboard to complete those steps. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>evals</category>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>Beyond n8n for Workflow Automation: Agent Graphs as Your Universal Agent Harness</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Thu, 26 Mar 2026 00:33:34 +0000</pubDate>
      <link>https://dev.to/launchdarkly/beyond-n8n-for-workflow-automation-agent-graphs-as-your-universal-agent-harness-4lic</link>
      <guid>https://dev.to/launchdarkly/beyond-n8n-for-workflow-automation-agent-graphs-as-your-universal-agent-harness-4lic</guid>
      <description>&lt;p&gt;Hardcoded multi-agent orchestration is brittle: topology lives in framework-specific code, changes require redeploys, and bottlenecks are hard to see. &lt;a href="https://launchdarkly.com/docs/home/ai-configs/agent-graphs" rel="noopener noreferrer"&gt;Agent Graphs&lt;/a&gt; externalize that topology into LaunchDarkly, while your application continues to own execution.&lt;/p&gt;

&lt;p&gt;In this tutorial, you'll build a small multi-agent workflow, traverse it with the SDK, monitor per-node latency on the graph itself, and update a slow node's model without changing application code.&lt;/p&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Node&lt;/strong&gt; = AI Config (model, instructions, tools)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge&lt;/strong&gt; = handoff metadata (routing contract you define)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph&lt;/strong&gt; = topology (which nodes connect)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your app&lt;/strong&gt; = execution + interpretation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LaunchDarkly provides graph structure, config, and observability. Your application owns execution semantics: you write the code that interprets edges and runs agents.&lt;/p&gt;



&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnfxzwj0bgo8ln73o0oux.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnfxzwj0bgo8ln73o0oux.png" alt="Agent Graph with monitoring" width="800" height="532"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Build
&lt;/h2&gt;

&lt;p&gt;In this tutorial, you'll add Agent Graphs to an existing multi-agent workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build a graph visually&lt;/strong&gt; in the LaunchDarkly UI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connect it to your code&lt;/strong&gt; with a few lines of SDK integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run your agents&lt;/strong&gt; and see the graph in action&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor performance&lt;/strong&gt; with per-node latency and invocation tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix a slow agent&lt;/strong&gt; by swapping models from the dashboard&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By the end, you'll have a multi-agent system where topology metadata changes happen in the UI, picked up by your traversal code on the next request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;LaunchDarkly account with AI Configs access (&lt;a href="https://app.launchdarkly.com/signup" rel="noopener noreferrer"&gt;sign up here&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;An existing agent workflow (or use our &lt;a href="https://github.com/launchdarkly-labs/devrel-agents-tutorial/tree/tutorial/agent-graphs" rel="noopener noreferrer"&gt;sample repo&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Problem with Hardcoded Orchestration
&lt;/h2&gt;

&lt;p&gt;Every multi-agent framework handles orchestration differently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# LangGraph - topology hardcoded in graph setup
&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;supervisor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;supervisor_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;security&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;security_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;support&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;support_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;supervisor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Routing logic buried in node functions or conditional edges
&lt;/span&gt;
&lt;span class="c1"&gt;# OpenAI Agents SDK - handoffs defined per agent
&lt;/span&gt;&lt;span class="n"&gt;security_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Security&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;support_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Support&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;supervisor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Supervisor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;handoffs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;security_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;support_agent&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Topology locked in code
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The topology is scattered across code. Agent Graphs make it visible: you see the entire workflow in one view, edit connections in the UI, and traverse it with graph-aware SDK methods.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Externalizing Topology Helps
&lt;/h2&gt;

&lt;p&gt;If you've built multi-agent systems with LangGraph, OpenAI Swarm, or Strands, you've hit these walls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Config duplication&lt;/strong&gt;: Agent definitions scattered across framework-specific formats&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silent failures&lt;/strong&gt;: An agent times out and you don't know until users complain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No topology visibility&lt;/strong&gt;: The workflow exists only in code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom observability&lt;/strong&gt;: Getting consistent per-agent metrics means reconciling different trace formats and data schemas across frameworks&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;For a detailed comparison of LangGraph, OpenAI Swarm, and Strands, see &lt;a href="https://launchdarkly.com/docs/tutorials/ai-orchestrators" rel="noopener noreferrer"&gt;Compare AI orchestrators&lt;/a&gt;. Agent Graphs work with multiple agent frameworks.&lt;/p&gt;



&lt;p&gt;Agent Graphs solve these by giving you a &lt;strong&gt;visual graph builder&lt;/strong&gt; where you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;See your entire workflow&lt;/strong&gt; at a glance, not buried in code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor per-node metrics&lt;/strong&gt; overlaid directly on the graph (latency, invocations, tool calls)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add or remove agents&lt;/strong&gt; without changing traversal logic, provided your runtime supports the node's tools and output contract&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inspect routing logic&lt;/strong&gt; on edges, with handoff data visible in the UI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use graph-aware SDK methods&lt;/strong&gt; like &lt;code&gt;is_terminal()&lt;/code&gt;, &lt;code&gt;is_root()&lt;/code&gt;, and &lt;code&gt;get_edges()&lt;/code&gt; instead of manual tracking&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Create AI Configs for Your Agents
&lt;/h2&gt;

&lt;p&gt;Before building a graph, you need AI Configs for each agent. If you already have AI Configs, skip to Step 2.&lt;/p&gt;



&lt;p&gt;See the &lt;a href="https://launchdarkly.com/docs/home/ai-configs/quickstart" rel="noopener noreferrer"&gt;AI Configs quickstart&lt;/a&gt; or run the bootstrap script in our &lt;a href="https://github.com/launchdarkly-labs/devrel-agents-tutorial" rel="noopener noreferrer"&gt;sample repo&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/launchdarkly-labs/devrel-agents-tutorial
&lt;span class="nb"&gt;cd &lt;/span&gt;devrel-agents-tutorial
git checkout tutorial/agent-graphs
uv &lt;span class="nb"&gt;sync
cp&lt;/span&gt; .env.example .env  &lt;span class="c"&gt;# Add your LD_SDK_KEY, LD_API_KEY, OPENAI_API_KEY&lt;/span&gt;
uv run python bootstrap/create_configs.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;p&gt;For this tutorial, we'll use three configs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;supervisor-agent&lt;/strong&gt;: Orchestrates the workflow and routes queries based on PII pre-screening&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;security-agent&lt;/strong&gt;: Detects and redacts personally identifiable information (PII)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;support-agent&lt;/strong&gt;: Answers questions using dynamically loaded tools (search, RAG)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 2: Build the Graph in the UI
&lt;/h2&gt;

&lt;p&gt;This is where Agent Graphs diverge from code-based orchestration. Instead of writing &lt;code&gt;add_edge()&lt;/code&gt; calls, you'll &lt;strong&gt;see your topology&lt;/strong&gt; and modify it visually.&lt;/p&gt;

&lt;p&gt;Open your LaunchDarkly dashboard and navigate to &lt;strong&gt;AI &amp;gt; Agent graphs&lt;/strong&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You'll see the first-time setup wizard. Since you already created AI Configs in Step 1, expand &lt;strong&gt;Create a graph&lt;/strong&gt; at the bottom.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fll52y5pbrbgb41r0vp9u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fll52y5pbrbgb41r0vp9u.png" alt="First-time agent graph wizard" width="800" height="791"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Name your graph &lt;code&gt;chatbot-flow&lt;/code&gt; and click &lt;strong&gt;Create graph&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8esxw38ushh1a4sip4ew.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8esxw38ushh1a4sip4ew.png" alt="Creating your first Agent Graph" width="800" height="392"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add your first node: click &lt;strong&gt;Add node&lt;/strong&gt; and select &lt;code&gt;supervisor-agent&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Set it as the root: click the node and toggle &lt;strong&gt;Root node&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;security-agent&lt;/code&gt; and &lt;code&gt;support-agent&lt;/code&gt; as nodes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffokor5ot4lbnbpv0xv8s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffokor5ot4lbnbpv0xv8s.png" alt="Adding security agent" width="800" height="392"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczwuk1hw3dpt37npl0bs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczwuk1hw3dpt37npl0bs.png" alt="Adding support agent" width="800" height="392"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Draw edges: drag from &lt;code&gt;supervisor-agent&lt;/code&gt; to both child agents&lt;/li&gt;
&lt;li&gt;Add handoff data to each edge to define routing logic:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;supervisor-agent → security-agent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sanitize"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PII detected"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"route"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"security"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyp4ihvb9dcpotbd6oblo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyp4ihvb9dcpotbd6oblo.png" alt="PII detected edge" width="800" height="393"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;supervisor-agent → support-agent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"direct"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Clean input"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"route"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"support"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa3n9zhbum2d4f1w6cgah.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa3n9zhbum2d4f1w6cgah.png" alt="Clean edge" width="800" height="390"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;security-agent → support-agent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"proceed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Input sanitized"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"route"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"continue"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrfa1n6xbcvgkuqvdt66.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrfa1n6xbcvgkuqvdt66.png" alt="Redacted edge" width="800" height="392"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;Notice what you're seeing: the entire workflow topology in one view. This graph &lt;em&gt;is&lt;/em&gt; your architecture diagram, always current. Each node shows which AI Config variation it serves. The edges show routing logic that would otherwise be buried in conditional statements. When you need to add a new agent or change routing, you do it here, not in code.&lt;/p&gt;



&lt;p&gt;LaunchDarkly doesn't execute your graph. It provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Topology&lt;/strong&gt;: Which nodes exist and how they connect&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handoff metadata&lt;/strong&gt;: Whatever JSON you put on edges&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-node AI Config&lt;/strong&gt;: Model, instructions, tools for each agent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Decides which edges to follow based on agent decisions&lt;/li&gt;
&lt;li&gt;Interprets handoff data however you want (the schema is yours)&lt;/li&gt;
&lt;li&gt;Executes the actual agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The handoff JSON is arbitrary metadata. You define the schema, you interpret it. LaunchDarkly stores and delivers it.&lt;/p&gt;



&lt;h2&gt;
  
  
  Step 3: Add the SDK to Your Project
&lt;/h2&gt;

&lt;p&gt;Install the LaunchDarkly AI SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv add launchdarkly-server-sdk launchdarkly-server-sdk-ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Initialize the clients in your code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# config_manager.py - Initialize LaunchDarkly clients
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_initialize_launchdarkly_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Initialize LaunchDarkly client and AI client&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sdk_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Block until client is initialized (max 10 seconds)
&lt;/span&gt;    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_initialized&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LaunchDarkly client initialization failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LDAIClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build a context for targeting and tracking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# config_manager.py - Build context for targeting
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Build a LaunchDarkly context with consistent attributes.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;context_builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;context_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;context_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Integrate with Your Framework
&lt;/h2&gt;

&lt;p&gt;This section walks through the integration code, starting with the building block (what runs at each node), then showing how nodes are orchestrated.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Generic Agent Pattern
&lt;/h3&gt;

&lt;p&gt;The key to dynamic execution is &lt;code&gt;create_generic_agent&lt;/code&gt;. Every node uses the same implementation—no agent registry, no hardcoded agent types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# agents/generic_agent.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_generic_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config_manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;valid_routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create a generic agent from LaunchDarkly AI Config.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GenericAgent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;valid_routes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;valid_routes&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute the agent using LaunchDarkly config.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_skipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="c1"&gt;# Create model from config
&lt;/span&gt;            &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_model_for_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;config_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config_manager&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Load tools from LaunchDarkly config
&lt;/span&gt;            &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_dynamic_tools_from_launchdarkly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Get instructions from config
&lt;/span&gt;            &lt;span class="n"&gt;instructions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Process the input.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

            &lt;span class="c1"&gt;# Inject route options into instructions
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;valid_routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;route_instruction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Select one of these routes: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;valid_routes&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Return: {{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;route&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;selected_route&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;}}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="n"&gt;instructions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;route_instruction&lt;/span&gt;

            &lt;span class="c1"&gt;# Execute and extract routing decision
&lt;/span&gt;            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;routing_decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_extract_route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

            &lt;span class="c1"&gt;# Track metrics
&lt;/span&gt;            &lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_success&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GenericAgent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;p&gt;The generic agent pattern means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No agent registry&lt;/strong&gt;: Every node uses the same &lt;code&gt;create_generic_agent&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Config-driven behavior&lt;/strong&gt;: Model, instructions, and tools all come from LaunchDarkly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic routing&lt;/strong&gt;: Valid routes are injected from graph edges, not hardcoded&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal code changes&lt;/strong&gt;: Add a new agent in LaunchDarkly, create its AI Config, add it to your graph, and it works—provided your runtime supports the node's tools and output contract&lt;/li&gt;
&lt;/ul&gt;



&lt;h3&gt;
  
  
  The AgentService Class
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;AgentService&lt;/code&gt; class is the entry point for processing messages through your Agent Graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# api/services/agent_service.py
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Multi-Agent Orchestration using LaunchDarkly Agent Graph.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config_manager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConfigManager&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ChatResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process message using LaunchDarkly Agent Graph.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_execute_graph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;graph_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AGENT_GRAPH_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot-flow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anonymous&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_context&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ChatResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]),&lt;/span&gt;
            &lt;span class="c1"&gt;# ... other fields
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Executing the Graph
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;_execute_graph&lt;/code&gt; method fetches the graph from LaunchDarkly and uses &lt;code&gt;traverse()&lt;/code&gt; with skip logic for conditional routing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# api/services/agent_service.py
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_execute_graph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;graph_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute agents using SDK&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s traverse() with skip logic.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;ld_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agent_graph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;graph_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ld_context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_enabled&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent Graph &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;graph_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; is not enabled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processed_input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="c1"&gt;# Skip logic: track which nodes should execute
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_routed_to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;root&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get_key&lt;/span&gt;&lt;span class="p"&gt;()},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_prev_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;tracker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tracker&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Define the node callback (see next section)
&lt;/span&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# ... node execution logic
&lt;/span&gt;        &lt;span class="k"&gt;pass&lt;/span&gt;

    &lt;span class="c1"&gt;# Use SDK's traverse() - it handles traversal order
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;traverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;execute_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Track graph completion
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]))&lt;/span&gt;
        &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_invocation_success&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Skip Logic for Conditional Routing
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;execute_node&lt;/code&gt; callback implements skip logic—the core pattern that enables conditional routing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# api/services/agent_service.py - inside _execute_graph
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute a single node if it was routed to.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_key&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Skip logic: only execute if parent routed to this node
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_routed_to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_skipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Track node invocation
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_node_invocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_prev_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_handoff_success&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_prev_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Get edges and valid routes for this node
&lt;/span&gt;    &lt;span class="n"&gt;edges&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_edges&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;valid_routes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handoff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;route&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handoff&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handoff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;route&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="c1"&gt;# Execute agent with config from this node
&lt;/span&gt;    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_generic_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_config&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config_manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;valid_routes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;valid_routes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_run_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Track tool calls
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tracker&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Route to next node: add to _routed_to set
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;next_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_select_next_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;next_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_routed_to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;next_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_prev_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;p&gt;The &lt;code&gt;_routed_to&lt;/code&gt; set tracks which nodes should execute:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start&lt;/strong&gt;: Add root node to &lt;code&gt;_routed_to&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;traverse() visits each node&lt;/strong&gt;: If node is in &lt;code&gt;_routed_to&lt;/code&gt;, execute it; otherwise skip&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After execution&lt;/strong&gt;: Add the next node (based on routing decision) to &lt;code&gt;_routed_to&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This enables conditional routing: the supervisor routes to either security OR support, and only the chosen path executes.&lt;/p&gt;



&lt;h3&gt;
  
  
  Routing Between Nodes
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;_select_next_node&lt;/code&gt; method determines which node to route to based on the agent's routing decision:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# api/services/agent_service.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_select_next_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Select next node key based on routing decision.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;routing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;routing_decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;routing_decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="c1"&gt;# Build route map: route -&amp;gt; target_config
&lt;/span&gt;    &lt;span class="n"&gt;route_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;route&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handoff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;route&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handoff&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;route_map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_config&lt;/span&gt;

    &lt;span class="c1"&gt;# Exact match
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;routing&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;routing&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;route_map&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;route_map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;routing&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;routing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_handoff_failure&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Default: first edge
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;target_config&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: your graph topology comes from LaunchDarkly, not hardcoded orchestration. Change the graph in the UI, and your code picks up the new structure on the next request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Run It
&lt;/h2&gt;

&lt;p&gt;With the &lt;code&gt;AgentService&lt;/code&gt; wired up (as shown in Step 4), you can now process messages through your Agent Graph. The service handles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Building the LaunchDarkly context for targeting&lt;/li&gt;
&lt;li&gt;Fetching the graph and executing nodes via &lt;code&gt;traverse()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Tracking metrics for monitoring&lt;/li&gt;
&lt;li&gt;Returning the final response&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Test it by sending a message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentService&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the status of my order?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;plan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;premium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now go back to the LaunchDarkly UI. Add a new node or change an edge. Run your code again. Topology changes are picked up by your traversal code on subsequent SDK evaluations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Monitor Agent Performance
&lt;/h2&gt;

&lt;p&gt;This is the key differentiator: monitoring happens &lt;strong&gt;on the graph itself&lt;/strong&gt;, not in a separate dashboard. You see metrics overlaid on the same visual topology you built, so bottlenecks are immediately obvious.&lt;/p&gt;

&lt;p&gt;The sample repo includes full instrumentation: calls to &lt;code&gt;tracker.track_success()&lt;/code&gt;, &lt;code&gt;tracker.track_error()&lt;/code&gt;, and &lt;code&gt;tracker.track_tool_call()&lt;/code&gt; in the agent execution path. After running some traffic, open your Agent Graph to see the results.&lt;/p&gt;

&lt;p&gt;Navigate to &lt;strong&gt;AI &amp;gt; Agent graphs &amp;gt; chatbot-flow&lt;/strong&gt;. You'll see a metrics bar at the top of the graph view where you can toggle different metrics on and off.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics on the graph
&lt;/h3&gt;

&lt;p&gt;Here's what makes this different from traditional APM: the metrics appear &lt;strong&gt;directly on your workflow visualization&lt;/strong&gt;. No mental mapping between a dashboard and your code. No correlating trace IDs. The slow node lights up on the graph.&lt;/p&gt;

&lt;p&gt;Turn on &lt;strong&gt;Latency&lt;/strong&gt; to see duration data overlaid directly on your graph:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Total duration&lt;/strong&gt;: The combined time for the entire graph invocation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-node duration&lt;/strong&gt;: How long each individual agent takes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Turn on &lt;strong&gt;Invocations&lt;/strong&gt; to see how often each node is reached. This reveals which paths your users take most frequently. In a routing graph, you'll quickly see whether most queries go through security or skip directly to support.&lt;/p&gt;

&lt;p&gt;Turn on &lt;strong&gt;Tool calls&lt;/strong&gt; to see the average number of tool invocations per node. If an agent is calling tools excessively, you'll spot it here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitoring page
&lt;/h3&gt;

&lt;p&gt;Click &lt;strong&gt;Monitoring&lt;/strong&gt; to see all metrics over time. This view shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latency trends&lt;/strong&gt;: Duration per node over hours, days, or weeks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invocation patterns&lt;/strong&gt;: Traffic flow through your graph&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool call breakdown&lt;/strong&gt;: Which specific tools are being called and how often&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fifux67pmlldhocvq0gmu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fifux67pmlldhocvq0gmu.png" alt="Monitoring dashboard" width="800" height="367"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;



&lt;p&gt;To see which specific tools are called, you need to track them in your code using the tracker. The SDK sends this data to LaunchDarkly, which displays it in the monitoring view.&lt;/p&gt;



&lt;h3&gt;
  
  
  Generate traffic to see metrics
&lt;/h3&gt;

&lt;p&gt;Run the traffic generator from the sample repo to send queries through your graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run python tools/traffic_generator.py &lt;span class="nt"&gt;--queries&lt;/span&gt; 20 &lt;span class="nt"&gt;--delay&lt;/span&gt; 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sends a mix of queries (some with PII, some without) to exercise both the security and support paths. After a few minutes, you'll see metrics populate on the graph.&lt;/p&gt;

&lt;h3&gt;
  
  
  Detecting a slow agent
&lt;/h3&gt;

&lt;p&gt;With traffic flowing, suppose the security-agent starts averaging 5 seconds per call. With latency metrics enabled on the graph, you see it immediately: the security-agent node shows a high duration value while other nodes stay fast.&lt;/p&gt;

&lt;p&gt;The invocation numbers also tell a story. If security-agent shows 50 invocations and support-agent shows 80, you know ~30 queries are bypassing security (the clean path). This helps you understand whether the slow agent is affecting most users or just a subset.&lt;/p&gt;

&lt;p&gt;Without Agent Graphs, you'd need custom logging, Datadog queries, and manual correlation. With Agent Graphs, you see the problem in 30 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: Fix Without Deploying
&lt;/h2&gt;

&lt;p&gt;The security-agent is slow because it's using &lt;code&gt;claude-sonnet-4&lt;/code&gt; for PII detection. A smaller, faster model may be sufficient for this task.&lt;/p&gt;

&lt;p&gt;In the LaunchDarkly dashboard, update the &lt;code&gt;pii-detector&lt;/code&gt; variation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Change model from &lt;code&gt;Anthropic.claude-sonnet-4-20250514&lt;/code&gt; to &lt;code&gt;Anthropic.claude-3-haiku-20240307&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Or use &lt;a href="https://launchdarkly.com/docs/tutorials/agent-skills-quickstart" rel="noopener noreferrer"&gt;Agent Skills&lt;/a&gt; to make the change from your coding assistant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The security-agent pii-detector variation is averaging 5 seconds.
Change the model to claude-3-haiku-20240307.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No code changes. No deploy. Changes are picked up on subsequent SDK evaluations.&lt;/p&gt;

&lt;p&gt;Run the traffic generator again and watch the latency drop.&lt;/p&gt;

&lt;h3&gt;
  
  
  What just happened
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Traffic generator&lt;/strong&gt; sent queries through the graph&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring&lt;/strong&gt; showed the slow agent on the graph&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model swap&lt;/strong&gt; happened in the UI (or via Agent Skills)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your code&lt;/strong&gt; automatically used the new configuration&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No deploys. No PRs. The fix is live.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI Agents SDK Integration (Conceptual)
&lt;/h2&gt;

&lt;p&gt;Agent Graphs work with multiple frameworks. This conceptual example shows how the pattern translates to OpenAI Agents SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Conceptual example showing how Agent Graph SDK methods work with OpenAI Agents
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_traversal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_config&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;tracker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tracker&lt;/span&gt;
    &lt;span class="n"&gt;edges&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_edges&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Child agents are already in state (reverse traversal builds bottom-up)
&lt;/span&gt;    &lt;span class="n"&gt;handoffs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_config&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_handoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Track handoff events
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;handoffs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;handoffs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;on_handoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;on_handoff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;agent_graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_enabled&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;root&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reverse_traverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handle_traversal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tell me about your engineering team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same graph definition, adapted to each framework's execution model. The topology metadata lives in LaunchDarkly; your code interprets and executes it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Start simple&lt;/strong&gt;: Begin with a linear graph (A → B → C) before adding conditional routing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use handoff data for context passing&lt;/strong&gt;: Include metadata like action type, reason, or state that the next agent needs to continue the workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Track everything&lt;/strong&gt;: Call &lt;code&gt;tracker.track_success()&lt;/code&gt; and &lt;code&gt;tracker.track_error()&lt;/code&gt; in every node for complete visibility. Use &lt;code&gt;graph_tracker.track_tool_call(tool_name)&lt;/code&gt; to track which tools agents invoke.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test with targeting&lt;/strong&gt;: Use LaunchDarkly targeting to route test users to experimental graph configurations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Handle missing edges&lt;/strong&gt;: Decide what happens when no edge matches a routing decision or when a target node is disabled. Recommend: fail closed, log diagnostics, and track routing failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep execution state request-scoped&lt;/strong&gt;: Store execution state inside the context object (&lt;code&gt;ctx&lt;/code&gt;) passed through traversal, not in instance-level variables. Treat graph traversal as request-scoped to avoid concurrency issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You've Built
&lt;/h2&gt;

&lt;p&gt;You now have a multi-agent system where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Graph topology&lt;/strong&gt; is externalized and self-documenting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Routing logic&lt;/strong&gt; is visible on edges, not buried in code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring&lt;/strong&gt; appears on the graph itself, not a separate dashboard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node-level control&lt;/strong&gt; lets you disable a single agent without touching others, provided your executor checks node availability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple frameworks&lt;/strong&gt; can consume the same graph metadata&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you spot a slow agent in monitoring, you can swap the model from the dashboard without a deploy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/home/ai-configs/agent-graphs" rel="noopener noreferrer"&gt;Agent Graphs Reference&lt;/a&gt;&lt;/strong&gt;: SDK methods for &lt;code&gt;traverse&lt;/code&gt;, &lt;code&gt;reverse_traverse&lt;/code&gt;, &lt;code&gt;get_edges()&lt;/code&gt;, and handoff data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;AI Configs Documentation&lt;/a&gt;&lt;/strong&gt;: Learn more about variations, targeting, and experiments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="/https://launchdarkly.com/docs/tutorials/agent-skills-quickstart"&gt;Agent Skills Tutorial&lt;/a&gt;&lt;/strong&gt;: Manage AI Configs from your coding assistant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/home/ai-configs/monitor" rel="noopener noreferrer"&gt;Monitor AI Configs&lt;/a&gt;&lt;/strong&gt;: Deep dive into metrics and dashboards&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/launchdarkly-labs/devrel-agents-tutorial/tree/tutorial/agent-graphs" rel="noopener noreferrer"&gt;Sample Repository&lt;/a&gt;&lt;/strong&gt;: Complete code from this tutorial&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Hardcoded orchestration was fine when you had one agent. With multi-agent systems, it becomes a liability. Every change requires a deploy. Every incident requires a developer.&lt;/p&gt;

&lt;p&gt;Agent Graphs flip this. Define your workflow in LaunchDarkly, integrate it with your framework, and fix many problems without touching code. Your agents become as dynamic as your feature flags.&lt;/p&gt;

&lt;p&gt;Ready to stop hardcoding? &lt;a href="https://app.launchdarkly.com/signup" rel="noopener noreferrer"&gt;Get started with AI Configs&lt;/a&gt; and create your first Agent Graph.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>aiops</category>
      <category>architecture</category>
    </item>
    <item>
      <title>LLM evaluation guide: When to add online evals to your AI application</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Wed, 17 Dec 2025 17:42:49 +0000</pubDate>
      <link>https://dev.to/launchdarkly/llm-evaluation-guide-when-to-add-online-evals-to-your-ai-application-mo5</link>
      <guid>https://dev.to/launchdarkly/llm-evaluation-guide-when-to-add-online-evals-to-your-ai-application-mo5</guid>
      <description>&lt;h2&gt;
  
  
  The quick decision framework
&lt;/h2&gt;



&lt;p&gt;Online evals for AI Configs is currently in closed beta. Judges must be installed in your project before they can be attached to AI Config variations.&lt;/p&gt;



&lt;p&gt;Online evals provide real-time quality monitoring for LLM applications. Using LLM-as-a-judge methodology, they run automated quality checks on a configurable percentage of your production traffic, producing structured scores and pass/fail judgments you can act on programmatically. LaunchDarkly includes three built-in judges: &lt;strong&gt;accuracy&lt;/strong&gt;, &lt;strong&gt;relevance&lt;/strong&gt;, and &lt;strong&gt;toxicity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skip online evals if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your checks are purely deterministic (schema validation, compile tests)&lt;/li&gt;
&lt;li&gt;You have low volume and can manually review outputs in observability dashboards&lt;/li&gt;
&lt;li&gt;You're primarily debugging execution problems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Add online evals when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need quantified quality scores to trigger automated actions (rollback, rerouting, alerts)&lt;/li&gt;
&lt;li&gt;Manual quality review doesn't scale to your traffic volume&lt;/li&gt;
&lt;li&gt;You're measuring multiple quality dimensions (accuracy, relevance, toxicity)&lt;/li&gt;
&lt;li&gt;You want statistical quality trends across segments for AI governance and compliance&lt;/li&gt;
&lt;li&gt;You need to monitor token usage and cost alongside quality metrics&lt;/li&gt;
&lt;li&gt;You're running A/B tests or guarded releases and need automated quality gates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most teams add them within 2-3 sprints when manual quality review becomes the bottleneck. Configurable sampling rates let you balance evaluation coverage with cost and latency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Online evals vs. LLM observability
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LLM observability shows you what happened. Online evals automatically assess quality and trigger actions based on those assessments.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM observability: your security camera
&lt;/h3&gt;

&lt;p&gt;LLM observability shows you everything that happened through distributed tracing: full conversations, tool calls, token usage, latency breakdowns, and cost attribution. Perfect for debugging and understanding what went wrong. But when you're handling 10,000 conversations daily, manually reviewing them for quality patterns doesn't scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Online evals: your security guard
&lt;/h3&gt;

&lt;p&gt;Automatically scores every sampled request using LLM-as-a-judge methodology across your quality rubric (accuracy, relevance, toxicity) and takes action. Instead of exporting conversations to spreadsheets for manual review, you get real-time quality monitoring with drift detection that triggers alerts, rollbacks, or rerouting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 3 AM difference&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Without evals: "Let's meet tomorrow to review samples and decide if we should rollback."&lt;/p&gt;

&lt;p&gt;With evals: "Quality dropped below threshold, automatic rollback triggered, here's what failed..."&lt;/p&gt;

&lt;h2&gt;
  
  
  How online evals actually work
&lt;/h2&gt;

&lt;p&gt;LaunchDarkly's online evals use LLM-as-a-judge methodology with three built-in judges you can configure directly in the dashboard. No code changes required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Getting started:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install judges from the AI Configs menu&lt;/li&gt;
&lt;li&gt;Attach judges to AI Config variations&lt;/li&gt;
&lt;li&gt;Configure sampling rates (balance coverage with cost/latency)&lt;/li&gt;
&lt;li&gt;Evaluation metrics are automatically emitted as custom events&lt;/li&gt;
&lt;li&gt;Metrics are automatically available for A/B tests and guarded releases&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;What you get from each built-in judge:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accuracy judge:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Response correctly answered the question but missed one edge case regarding error handling"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Relevance judge:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Response directly addressed the user's query with appropriate context and examples"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Toxicity judge:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Content is professional and appropriate with no toxic language detected"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each judge returns a score from 0.0 to 1.0 plus reasoning that explains the assessment. LaunchDarkly's built-in judges (accuracy, relevance, toxicity) have fixed evaluation criteria and are configured only by selecting the provider and model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration:&lt;/strong&gt;&lt;br&gt;
Install judges from the AI Configs menu in your LaunchDarkly dashboard. They appear as pre-configured AI configs (AI Judge - Accuracy, AI Judge - Toxicity, AI Judge - Relevance). When configuring your AI Config variations in completion mode, select which judges to attach with your desired sampling rate. Use different judge combinations for different environments to match your quality requirements and cost constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real problems online evals solve
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Scale for production applications:&lt;/strong&gt; Your SQL generator handles 50,000 queries daily. LLM observability shows you every query through distributed tracing. Online evals tell you the proportion that are semantically wrong, automatically, with hallucination detection built in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-dimensional quality monitoring:&lt;/strong&gt; Customer service AI applications aren't just "did it respond?" It's accuracy, relevance, toxicity, compliance, and appropriateness. Online evals score all dimensions simultaneously, each with its own threshold and reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAG pipeline validation:&lt;/strong&gt; Your retrieval-augmented generation system needs continuous monitoring of both retrieval quality and generation accuracy. Online evals can assess whether retrieved context is relevant and whether the response accurately uses that context, preventing hallucinations and ensuring factual grounding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost and performance optimization:&lt;/strong&gt; Monitor token usage alongside quality metrics. If certain queries consume 10x more tokens than others, online evals help identify these patterns so you can optimize prompts or routing logic to reduce costs without sacrificing quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Actionable metrics for AI governance:&lt;/strong&gt; Transform 10,000 responses from data to decisions with evaluator-driven quality gates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accuracy trending below 0.8? Automated alerts to the team&lt;/li&gt;
&lt;li&gt;Toxicity above 0.2? Immediate review and potential rollback&lt;/li&gt;
&lt;li&gt;Relevance dropping for specific user segments? Targeted configuration updates&lt;/li&gt;
&lt;li&gt;Metrics automatically feed A/B tests and guarded releases for continuous improvement&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example implementation path
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Week 1-2: Define quality dimensions and install judges.&lt;/strong&gt;&lt;br&gt;
Use LLM observability alone first. Manually review samples to understand your system. Define your quality dimensions: accuracy, relevance, toxicity, or other criteria specific to your application. Install the built-in judges from the AI Configs menu in LaunchDarkly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 3-4: Attach judges with sampling.&lt;/strong&gt;&lt;br&gt;
Attach judges to AI Config variations in LaunchDarkly. Start with one or two key judges (accuracy and relevance are good defaults). Configure sampling rates between 10-20% of traffic to balance coverage with cost and latency. Compare automated scores with human judgment to validate the judges work for your use case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 5+: Operationalize with quality gates.&lt;/strong&gt;&lt;br&gt;
Add more evaluation dimensions as you learn. Connect scores to automated actions and evaluator-driven quality gates: when accuracy drops below 0.7, trigger alerts; when toxicity exceeds 0.2, investigate immediately. Leverage the custom events and metrics for A/B testing and guarded releases to continuously improve your application's performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bottom line
&lt;/h2&gt;

&lt;p&gt;You don't need online evals on day one. Start with LLM observability to understand your AI system through distributed tracing. Add evaluations when you hear yourself saying "we need to review more conversations" or "how do we know if quality is degrading?"&lt;/p&gt;

&lt;p&gt;LaunchDarkly's three built-in judges (accuracy, relevance, toxicity) provide LLM-as-a-judge evaluation that you can attach to any AI Config variation in &lt;strong&gt;completion mode&lt;/strong&gt; with configurable sampling rates. Note that online evals currently only work with completion mode AI Configs. Agent-based configs are not yet supported. Evaluation metrics are automatically emitted as custom events and feed directly into A/B tests and guarded releases, enabling continuous AI governance and quality improvement without code changes. Start simple with one judge, learn what matters for your application, and expand from there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLM observability is your security camera. Online evals are your security guard.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;Ready to get started? &lt;a href="https://launchdarkly.com/start-trial/" rel="noopener noreferrer"&gt;Sign up for a free LaunchDarkly account&lt;/a&gt; if you haven't already.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build a complete quality pipeline:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/tutorials/aic-cicd" rel="noopener noreferrer"&gt;AI Config CI/CD Pipeline&lt;/a&gt; - Add automated quality gates and LLM-as-a-judge testing to your deployment process&lt;/li&gt;
&lt;li&gt;Combine offline evaluation (in CI/CD) with online evals (in production) for comprehensive quality coverage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Learn more about AI Configs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;AI Config documentation&lt;/a&gt; - Understand how AI Configs enable real-time LLM configuration&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/home/ai-configs/online-evaluations" rel="noopener noreferrer"&gt;Online evals documentation&lt;/a&gt; - Deep dive into judge installation and configuration&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/home/metrics/guardrail-metrics" rel="noopener noreferrer"&gt;Guardrail metrics&lt;/a&gt; - Monitor quality during A/B tests and guarded releases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;See it in action:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/home/observability/llm-observability" rel="noopener noreferrer"&gt;Check LLM observability in the LaunchDarkly dashboard&lt;/a&gt; to track your AI application performance with distributed tracing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Industry standards:&lt;/strong&gt;&lt;br&gt;
LaunchDarkly's approach aligns with emerging AI observability standards, including OpenTelemetry's semantic conventions for AI monitoring, ensuring your evaluation infrastructure integrates with the broader observability ecosystem.&lt;/p&gt;

</description>
      <category>evals</category>
      <category>agents</category>
      <category>ai</category>
      <category>observability</category>
    </item>
    <item>
      <title>When to Use Prompt-Based vs Agent Mode in LaunchDarkly for AI Applications</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Wed, 17 Dec 2025 17:39:09 +0000</pubDate>
      <link>https://dev.to/launchdarkly/when-to-use-prompt-based-vs-agent-mode-in-launchdarkly-for-ai-applications-5f3g</link>
      <guid>https://dev.to/launchdarkly/when-to-use-prompt-based-vs-agent-mode-in-launchdarkly-for-ai-applications-5f3g</guid>
      <description>&lt;h1&gt;
  
  
  A Guide for LangGraph, OpenAI, and Multi-Agent Systems
&lt;/h1&gt;

&lt;p&gt;The broader tech industry can't agree on what the term "agents" even means. &lt;a href="https://www.anthropic.com/research/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic defines agents&lt;/a&gt; as systems where "LLMs dynamically direct their own processes," while Vercel's AI SDK enables &lt;a href="https://sdk.vercel.ai/docs/concepts/tools" rel="noopener noreferrer"&gt;multi-step agent loops with tools&lt;/a&gt;, and &lt;a href="https://platform.openai.com/docs/guides/agents-sdk" rel="noopener noreferrer"&gt;OpenAI provides an Agents SDK&lt;/a&gt; with built-in orchestration. So when you're creating an AI Config in LaunchDarkly and see "prompt-based mode" vs. "agent mode," you might reasonably expect this choice to determine whether you get automatic tool execution loops, server-side state management, or some other fundamental capability difference.&lt;/p&gt;

&lt;p&gt;But LaunchDarkly's distinction is different and more practical. Understanding it will save you from confusion and help you ship AI features faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;LaunchDarkly's "prompt-based vs. agent" choice is about &lt;strong&gt;input schemas and framework compatibility&lt;/strong&gt;, not execution automation. &lt;strong&gt;Prompt-based mode&lt;/strong&gt; returns a messages array (perfect for chat UIs), while &lt;strong&gt;agent mode&lt;/strong&gt; returns an instructions string (optimized for LangGraph/CrewAI frameworks). Both provide the same core benefits: provider abstraction, A/B testing, metrics tracking, and the ability to change AI behavior without deploying code.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Ready to start?&lt;/strong&gt; &lt;a href="http://app.launchdarkly.com/signup" rel="noopener noreferrer"&gt;Sign up for a free trial&lt;/a&gt; → &lt;a href="https://launchdarkly.com/docs/home/ai-configs/create" rel="noopener noreferrer"&gt;create your first AI Config&lt;/a&gt; → Choose your mode → Configure and ship.&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  The fragmented AI landscape
&lt;/h2&gt;

&lt;p&gt;LaunchDarkly supports 20+ AI providers: OpenAI, Anthropic, Gemini, Azure, Bedrock, Cohere, Mistral, DeepSeek, Perplexity, and more. Each has their own interpretation of "completions" vs "agents," creating a chaotic ecosystem with different API endpoints, execution behaviors, state management approaches, and capability limitations. This fragmentation makes it difficult to switch providers or even understand what capabilities you're getting. That's where LaunchDarkly's abstraction layer comes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  LaunchDarkly's approach: provider-agnostic input schemas
&lt;/h2&gt;

&lt;p&gt;LaunchDarkly's AI Configs are a &lt;strong&gt;configuration layer&lt;/strong&gt; that abstracts provider differences. When you choose prompt-based mode or agent mode, you're selecting an &lt;strong&gt;input schema&lt;/strong&gt; (messages array vs. instructions string), not execution behavior. LaunchDarkly provides the configuration; you handle orchestration with your own code or frameworks like LangGraph. This gives you provider abstraction, A/B testing, metrics tracking, and online evals (prompt-based mode only) without locking you into any specific provider's execution model.&lt;/p&gt;



&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4llid4exa6z8zge6pza.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4llid4exa6z8zge6pza.png" alt="AI Config Mode Selection" width="480" height="369"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;



&lt;h3&gt;
  
  
  Prompt-based mode: messages-based
&lt;/h3&gt;

&lt;p&gt;Prompt-based mode uses a &lt;strong&gt;messages array&lt;/strong&gt; format with system/user/assistant roles (some providers like OpenAI also support a "developer" role for more granular control). This is the traditional chat format that works across all AI providers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;UI Input&lt;/strong&gt;: "Messages" section with role-based messages&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SDK Method&lt;/strong&gt;: &lt;code&gt;aiclient.config()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Returns&lt;/strong&gt;: Customized prompt + model configuration&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Documentation&lt;/strong&gt;: &lt;a href="https://launchdarkly.com/docs/sdk/features/ai-config" rel="noopener noreferrer"&gt;AI Config docs&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Retrieve prompt-based AI config
&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aiclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-support&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;default_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;default_config&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# What you get back: messages array
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# [
#   {
#     "role": "system",
#     "content": "You are a helpful customer support agent for Acme Corp."
#   },
#   {
#     "role": "user",
#     "content": "How can I reset my password?"
#   }
# ]
&lt;/span&gt;
&lt;span class="c1"&gt;# Use with provider SDKs that expect message arrays
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;  &lt;span class="c1"&gt;# Standard message format
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use prompt-based mode:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You're building chat-style interactions&lt;/strong&gt;: Traditional message-based conversations where you construct system/user/assistant messages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need online evals&lt;/strong&gt;: LaunchDarkly's model-agnostic online evals are currently only available in prompt-based mode&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want granular control of workflows&lt;/strong&gt;: Discrete steps that need to be accomplished in a specific order, or multi-step asynchronous processes where each step executes independently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-off evaluations&lt;/strong&gt;: Issue individual evaluations of your prompts and completions (not online evals)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple processing tasks&lt;/strong&gt;: Summarization, name suggestions, or other non-context-exceeding data processing&lt;/li&gt;
&lt;/ol&gt;



&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4kdb0ces45b66xc3ag6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4kdb0ces45b66xc3ag6.png" alt="Prompt-Based Mode Messages UI" width="800" height="567"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;



&lt;h3&gt;
  
  
  Agent mode: goal/instructions-based
&lt;/h3&gt;

&lt;p&gt;Agent mode uses a &lt;strong&gt;single instructions string&lt;/strong&gt; format that describes the agent's goal or task. This format is optimized for agent orchestration frameworks that expect high-level objectives rather than conversational messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;UI Input&lt;/strong&gt;: "Goal or task" field with instructions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SDK Method&lt;/strong&gt;: &lt;code&gt;aiclient.agent()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Returns&lt;/strong&gt;: Customized instructions + model configuration&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples&lt;/strong&gt;: &lt;a href="https://github.com/launchdarkly/hello-python-ai/blob/main/examples" rel="noopener noreferrer"&gt;hello-python-ai examples&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Retrieve agent-based AI config
&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aiclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research-assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;default_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;default_config&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# What you get back: instructions string
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# "You are a research assistant. Your goal is to gather comprehensive
# information on the requested topic using available search tools.
# Search multiple sources, synthesize findings, and provide a detailed
# summary with citations."
&lt;/span&gt;
&lt;span class="c1"&gt;# Use with agent frameworks that expect instructions
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_react_agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;init_chat_model&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;init_chat_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_react_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;citation_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt;  &lt;span class="c1"&gt;# Goal/task instructions
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Execute and track
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use agent mode:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You're using agent frameworks&lt;/strong&gt;: LangGraph, LangChain, CrewAI, AutoGen, or LlamaIndex Workflows expect goal/instruction-based inputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goal-oriented tasks&lt;/strong&gt;: "Research X and create Y" rather than conversational message exchange&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool-driven workflows&lt;/strong&gt;: While both modes support tools, agent mode's format is optimized for frameworks that orchestrate tool usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-ended exploration&lt;/strong&gt;: The output is open-ended and you don't know the actual answer you're trying to get to&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data as an application&lt;/strong&gt;: You want to treat your data as an application to feed in arbitrary data and ask questions about it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider agent endpoints&lt;/strong&gt;: LaunchDarkly may route to provider-specific agent APIs when available (note: not all models support agent mode; check your model's capabilities)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;See example:&lt;/strong&gt; &lt;a href="https://launchdarkly.com/docs/tutorials/agents-langgraph" rel="noopener noreferrer"&gt;Build a LangGraph Multi-Agent System with LaunchDarkly&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;Feature&lt;/th&gt;
    &lt;th&gt;Prompt-Based Mode&lt;/th&gt;
    &lt;th&gt;Agent Mode&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Input format&lt;/td&gt;
    &lt;td&gt;Messages (system/user/assistant)&lt;/td&gt;
    &lt;td&gt;Goal/task + instructions&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Tools support&lt;/td&gt;
    &lt;td&gt;✅ Yes&lt;/td&gt;
    &lt;td&gt;✅ Yes&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;SDK method&lt;/td&gt;
    &lt;td&gt;&lt;code&gt;config()&lt;/code&gt;&lt;/td&gt;
    &lt;td&gt;&lt;code&gt;agent()&lt;/code&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Automatic execution loop&lt;/td&gt;
    &lt;td&gt;❌ No (you orchestrate)&lt;/td&gt;
    &lt;td&gt;❌ No (you orchestrate)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Online evals&lt;/td&gt;
    &lt;td&gt;✅ Available&lt;/td&gt;
    &lt;td&gt;❌ Not yet available&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Best for&lt;/td&gt;
    &lt;td&gt;Chat-style prompting, single completions&lt;/td&gt;
    &lt;td&gt;Agent frameworks, goal-oriented tasks&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Provider endpoint&lt;/td&gt;
    &lt;td&gt;Standard endpoint&lt;/td&gt;
    &lt;td&gt;May use provider-specific agent endpoint if available&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Model support&lt;/td&gt;
    &lt;td&gt;All models&lt;/td&gt;
    &lt;td&gt;Most models (check model card for "Agent mode" capability)&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Model compatibility&lt;/strong&gt;: Not all models support agent mode. When selecting a model in LaunchDarkly, check the model card for "Agent mode" capability. Models like GPT-4.1, GPT-5 mini, Claude Haiku 4.5, Claude Sonnet 4.5, Claude Sonnet 4, Grok Code Fast 1, and Raptor mini support agent mode, while models focused on reasoning (like GPT-5, Claude Opus 4.1) may only support prompt-based mode.&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  How providers handle "completion vs agent"
&lt;/h2&gt;

&lt;p&gt;To understand why LaunchDarkly's abstraction is valuable, let's look at how major AI providers handle the distinction between basic completions and advanced agent capabilities. The table below shows how different providers implement "advanced" modes; generally these are ADDITIVE, including all basic capabilities plus extras. For example, OpenAI's Responses API includes all Chat Completions features plus additional capabilities.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;Provider&lt;/th&gt;
    &lt;th&gt;"Basic" Mode&lt;/th&gt;
    &lt;th&gt;"Advanced" Mode&lt;/th&gt;
    &lt;th&gt;Key Difference&lt;/th&gt;
    &lt;th&gt;Link&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;OpenAI&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Chat Completions API&lt;/td&gt;
    &lt;td&gt;Responses API&lt;/td&gt;
    &lt;td&gt;Responses adds built-in tools (web_search, file_search, computer_use, code_interpreter, remote MCP), server-side conversation state with stored IDs, and improved streaming. Chat Completions remains supported.&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://platform.openai.com/docs/guides/responses-vs-chat-completions" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Anthropic&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Tool Use (client tools)&lt;/td&gt;
    &lt;td&gt;Tool Use (client + server tools)&lt;/td&gt;
    &lt;td&gt;Server tools (web_search, web_fetch) execute on Anthropic's servers. You can use both client and server tools together&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://docs.claude.com/en/docs/agents-and-tools/tool-use/overview" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Google Gemini&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Manual function calling&lt;/td&gt;
    &lt;td&gt;Automatic function calling (Python SDK)&lt;/td&gt;
    &lt;td&gt;Python SDK auto-converts functions to schemas, runs the execution loop, and supports compositional multi-step calls. Manual mode: full control, all platforms&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/function-calling" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Vercel AI SDK&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;&lt;code&gt;generateText()&lt;/code&gt;&lt;/td&gt;
    &lt;td&gt;
&lt;code&gt;generateText()&lt;/code&gt; with multi-step loop&lt;/td&gt;
    &lt;td&gt;Multi-step agent loops with tools; SDK continues until complete; &lt;code&gt;maxSteps&lt;/code&gt; provides loop control to limit steps&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://sdk.vercel.ai/docs/concepts/tools" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Azure OpenAI&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Assistants API (deprecated)&lt;/td&gt;
    &lt;td&gt;AI Agent Services&lt;/td&gt;
    &lt;td&gt;Enterprise agent runtime with threads, tool orchestration, safety, identity, networking, and observability; includes Responses API and Computer-Using Agent in Azure&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://azure.microsoft.com/en-us/blog/announcing-the-responses-api-and-computer-using-agent-in-azure-ai-foundry/" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;AWS Bedrock (Nova)&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Converse API (tool use)&lt;/td&gt;
    &lt;td&gt;Bedrock Agents&lt;/td&gt;
    &lt;td&gt;Agents: managed service with automatic orchestration + state management + multi-agent collaboration. Converse: manual tool orchestration, full control&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://docs.aws.amazon.com/nova/latest/userguide/agents-use-nova.html" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Cohere&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Standard chat&lt;/td&gt;
    &lt;td&gt;Command A&lt;/td&gt;
    &lt;td&gt;Command A: enhanced multi-step tool use, REACT agents, ~150% higher throughput&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://docs.cohere.com/docs/command-a" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This fragmentation across providers is exactly why LaunchDarkly's approach matters: you configure once (messages vs. goals), and LaunchDarkly handles the provider-specific translation. Want to switch from OpenAI to Anthropic? Just change the provider in your AI Config. Your application code stays the same.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Note on OpenAI's ecosystem (Nov 2025)&lt;/strong&gt;: The &lt;a href="https://platform.openai.com/docs/guides/agents-sdk" rel="noopener noreferrer"&gt;Agents SDK&lt;/a&gt; is OpenAI's production-ready orchestration framework. It uses the Responses API by default, and via a built-in LiteLLM adapter it can run against other providers with an OpenAI-compatible shape. Chat Completions is still supported, but OpenAI recommends Responses for new work. The &lt;a href="https://platform.openai.com/docs/assistants/whats-new" rel="noopener noreferrer"&gt;Assistants API is deprecated&lt;/a&gt; and scheduled to shut down on August 26, 2026.&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Common misconceptions
&lt;/h2&gt;

&lt;p&gt;Now that you understand the modes and how they differ from provider-specific implementations, let's clear up some common points of confusion:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;❌ "Agent mode provides automatic execution"&lt;/strong&gt;&lt;br&gt;
No. Both modes require you to orchestrate. Agent mode just provides a different input schema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;❌ "Agent mode is for complex tasks, prompt-based mode is for simple ones"&lt;/strong&gt;&lt;br&gt;
Not quite. It's about input format and framework compatibility, not task complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;❌ "I can only use tools in agent mode"&lt;/strong&gt;&lt;br&gt;
False. Both modes support tools. The difference is how you specify your task (messages vs. goal).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;❌ "LaunchDarkly is an agent framework like LangGraph"&lt;/strong&gt;&lt;br&gt;
No. LaunchDarkly is configuration management for AI. Use it WITH frameworks like LangGraph, not instead of them.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why LaunchDarkly's abstraction matters
&lt;/h2&gt;

&lt;p&gt;Now that you've seen how fragmented the provider landscape is, let's explore the practical value of LaunchDarkly's abstraction layer.&lt;/p&gt;
&lt;h3&gt;
  
  
  Switching providers without code changes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Without LaunchDarkly:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Hardcoded provider and prompts in your application
&lt;/span&gt;&lt;span class="n"&gt;openai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Want to switch to Claude? Need to deploy new code
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are helpful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# Want to A/B test prompts? Deploy again
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# To switch providers, you need to:
# 1. Write new code for different provider API
# 2. Deploy to production
# 3. Hope nothing breaks
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With LaunchDarkly:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Get config from LaunchDarkly
&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aiclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-ai-config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# You still write provider-specific code, but only once
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# Comes from LaunchDarkly
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# Normalized schema across providers
&lt;/span&gt;        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;converse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Comes from LaunchDarkly
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;_convert_to_bedrock_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# LaunchDarkly normalizes, you convert
&lt;/span&gt;        &lt;span class="n"&gt;inferenceConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Now you can switch providers via LaunchDarkly UI without deployment
# Change prompts, A/B test models, roll out gradually - all via configuration
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The real value:&lt;/strong&gt; Once your code is set up to handle different providers, you can switch between them, change prompts, A/B test models, and roll out changes gradually - all through the LaunchDarkly UI without deploying code. You write the provider handlers once; you manage AI behavior forever.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security and risk management
&lt;/h3&gt;

&lt;p&gt;AI agents can be powerful and potentially risky. With LaunchDarkly AI Configs, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instantly disable problematic models or tools&lt;/strong&gt; without deploying code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gradually roll out new agent capabilities&lt;/strong&gt; to a small percentage of users first&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quickly roll back&lt;/strong&gt; if an agent behaves unexpectedly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Control access by user tier&lt;/strong&gt; (limit powerful tools to trusted users)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Target specific individuals in production&lt;/strong&gt; to test experimental AI behavior in real environments without affecting other users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you're not directly coupled to provider APIs, responding to security issues becomes a configuration change instead of an emergency deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced: Provider-specific packages (JavaScript/TypeScript)
&lt;/h2&gt;

&lt;p&gt;For JavaScript/TypeScript developers looking to reduce boilerplate even further, LaunchDarkly offers optional provider-specific packages. These work with &lt;strong&gt;both prompt-based and agent modes&lt;/strong&gt; and are purely additive - you don't need them to use LaunchDarkly AI Configs effectively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Available packages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/sdk/ai/node-js" rel="noopener noreferrer"&gt;&lt;code&gt;@launchdarkly/server-sdk-ai-openai&lt;/code&gt;&lt;/a&gt; - OpenAI provider&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/sdk/ai/node-js" rel="noopener noreferrer"&gt;&lt;code&gt;@launchdarkly/server-sdk-ai-langchain&lt;/code&gt;&lt;/a&gt; - LangChain provider (works with both LangChain and LangGraph)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/sdk/ai/node-js" rel="noopener noreferrer"&gt;&lt;code&gt;@launchdarkly/server-sdk-ai-vercel&lt;/code&gt;&lt;/a&gt; - Vercel AI SDK provider&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What they provide:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model creation helpers&lt;/strong&gt;: One-line functions like &lt;code&gt;createLangChainModel(aiConfig)&lt;/code&gt; that return fully-configured model instances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic metrics tracking&lt;/strong&gt;: Integrated metrics collection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format conversion utilities&lt;/strong&gt;: Helper functions to translate between schemas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example with LangGraph:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Get agent config from LaunchDarkly&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agentConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ldClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aiAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;research-assistant&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Create LangChain model - config already applied&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;LangChainProvider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createLangChainModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;agentConfig&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Use with LangGraph&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createReactAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;agentConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Research X&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Production readiness:&lt;/strong&gt; These packages are in &lt;strong&gt;early development&lt;/strong&gt; and not recommended for production. They may change without notice.&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python approach:&lt;/strong&gt; The Python SDK takes a different path with built-in convenience methods like &lt;code&gt;track_openai_metrics()&lt;/code&gt; in the single &lt;code&gt;launchdarkly-server-sdk-ai&lt;/code&gt; package. See &lt;a href="https://dev.to/sdk/ai/python"&gt;Python AI SDK reference&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start building with LaunchDarkly AI configs
&lt;/h2&gt;

&lt;p&gt;You now understand how LaunchDarkly's prompt-based and agent modes provide provider-agnostic configuration for your AI applications. Whether you're building chat interfaces or complex multi-agent systems, LaunchDarkly gives you the flexibility to experiment, iterate, and ship AI features without the complexity of managing multiple provider APIs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choosing your mode:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start with prompt-based mode if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're building a chat interface or conversational UI&lt;/li&gt;
&lt;li&gt;You need online evaluations for quality monitoring&lt;/li&gt;
&lt;li&gt;You want precise control over multi-step workflows&lt;/li&gt;
&lt;li&gt;You're uncertain which mode fits your use case (it's the more flexible starting point)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose agent mode if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're integrating with LangGraph, LangChain, CrewAI, or similar frameworks&lt;/li&gt;
&lt;li&gt;Your task is goal-oriented rather than conversational ("Research X and create Y")&lt;/li&gt;
&lt;li&gt;You're feeding arbitrary data and asking open-ended questions about it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Remember:&lt;/strong&gt; Both modes give you the same core benefits: provider abstraction, A/B testing, and runtime configuration changes. The choice is about input format, not capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Get started:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://app.launchdarkly.com/signup" rel="noopener noreferrer"&gt;Sign up for a free LaunchDarkly account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/home/ai-configs/create" rel="noopener noreferrer"&gt;Create your first AI Config&lt;/a&gt;: Takes less than 5 minutes&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/launchdarkly/hello-python-ai/blob/main/examples" rel="noopener noreferrer"&gt;Explore example implementations&lt;/a&gt;: Learn from working code&lt;/li&gt;
&lt;li&gt;Start with prompt-based mode unless you're specifically using an agent framework&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LaunchDarkly resources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/home/ai-configs/quickstart" rel="noopener noreferrer"&gt;AI config quickstart guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/home/ai-configs/online-evaluations"&gt;Online evaluations in AI configs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/sdk/ai/python" rel="noopener noreferrer"&gt;Python AI SDK reference&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Provider documentation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/research/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic Building Effective Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/function-calling" rel="noopener noreferrer"&gt;Google Gemini Function Calling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/guides/responses-vs-chat-completions" rel="noopener noreferrer"&gt;OpenAI Responses API vs Chat Completions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>promptengineering</category>
      <category>openai</category>
      <category>ai</category>
    </item>
    <item>
      <title>All I Want for Christmas is Observable Multi-Modal Agentic Systems</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Wed, 17 Dec 2025 17:31:15 +0000</pubDate>
      <link>https://dev.to/launchdarkly/all-i-want-for-christmas-is-observable-multi-modal-agentic-systems-nk6</link>
      <guid>https://dev.to/launchdarkly/all-i-want-for-christmas-is-observable-multi-modal-agentic-systems-nk6</guid>
      <description>&lt;h1&gt;
  
  
  How Session Replay + Online Evals Revealed How My Holiday Pet App Actually Works
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://launchdarkly.com/docs/tutorials/observability-multimodal-agents" rel="noopener noreferrer"&gt;Original article&lt;/a&gt; published on December 17, 2025.&lt;/p&gt;

&lt;p&gt;I added LaunchDarkly observability to my Christmas-play pet casting app thinking I'd catch bugs. Instead, I unwrapped the perfect gift 🎁. Session replay shows me WHAT users do, and online evaluations show me IF my model made the right casting decision with real-time accuracy scores. Together, they're like milk 🥛 and cookies 🍪 - each good alone, but magical together for production AI monitoring.&lt;/p&gt;

&lt;h2&gt;
  
  
  See the App in Action
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ful4r0ks4bb4p2tctvy6r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ful4r0ks4bb4p2tctvy6r.png" alt="Welcome Screen" width="800" height="541"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi60zu37vqrktszrvtvd9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi60zu37vqrktszrvtvd9.png" alt="Personality Quiz" width="800" height="508"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffvekx0yroqmvzpfhjn4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffvekx0yroqmvzpfhjn4.png" alt="Image Upload" width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw7qcmst4s49crskfyon.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw7qcmst4s49crskfyon.png" alt="Results" width="800" height="544"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdziqlleyug2r4do7qf64.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdziqlleyug2r4do7qf64.png" alt="Results" width="800" height="593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery #1: Users' 40-second patience threshold
&lt;/h2&gt;

&lt;p&gt;I decided to use session replay to evaluate the average time it took users to go through each step in the AI casting process. Session replay is LaunchDarkly's tool that records user interactions in your app - every click, hover, and page navigation - so you can watch exactly what users experience in real-time.&lt;/p&gt;

&lt;p&gt;The complete AI casting process takes 30-45 seconds: personality analysis (2-3s), role matching (1-2s), DALL-E 3 costume generation (25-35s), and evaluation scoring (2-3s). That's a long time to stare at a loading spinner wondering if something broke.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are progress steps?
&lt;/h3&gt;

&lt;p&gt;Progress steps are UI elements I added to the app - not terminal commands or backend processes, but actual visual indicators in the web interface that show users which phase of the AI generation is currently running. These appear as a simple list in the loading screen, updating in real-time as each AI task completes. No commands needed - they automatically display when the user clicks "Get My Role!" and the AI processing begins.&lt;/p&gt;

&lt;h3&gt;
  
  
  Session replay revealed:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITHOUT Progress Steps (n=20 early sessions):
0-10 seconds: 20/20 still watching (100%)
10-20 seconds: 18/20 still watching (90%)
20-30 seconds: 14/20 still watching (70%) - rage clicks begin
30-40 seconds: 9/20 still watching (45%) - tab switching detected
40+ seconds: 7/20 still watching (35% stay)

WITH Progress Steps (n=30 after adding them):
0-10 seconds: 30/30 still watching (100%)
10-20 seconds: 29/30 still watching (97%)
20-30 seconds: 25/30 still watching (83%)
30-40 seconds: 23/30 still watching (77%)
40+ seconds: 24/30 still watching (80% stay!)

Critical Discovery: Progress steps more than DOUBLED
completion rate (35% → 80%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  This made the difference:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Clear progress steps:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Step 1: AI Casting Decision
Step 2: Generating Costume Image (10-30s)
Step 3: Evaluation

As each completes:
✅ Step 1: AI Casting Decision
Step 2: Generating Costume Image (10-30s)
Step 3: Evaluation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Session replay showed users hovering over the back button at 25 seconds, then relaxing when they saw "Step 2: Generating Costume Image (10-30s)." The moment they understood DALL-E was creating their pet's costume (not the app freezing), they were willing to wait. Clear progress indicators transform anxiety into patience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery #2: Observability + online evaluations give the complete picture
&lt;/h2&gt;

&lt;p&gt;Session replay shows user behavior and experience. Online evaluations expose AI output quality through accuracy scoring. Together, they form a solid strategy for AI observability.&lt;/p&gt;

&lt;p&gt;To see this in action, let's take a closer look at an example.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: The speed-running corgi owner
&lt;/h3&gt;

&lt;p&gt;In this scenario, a user blazes through the entire pet app setup from the initial quiz to the final results, completing the process in record time. So fast, in fact, that instead of this leading to a favorable outcome, it led to an instance of speed killing quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session Replay Showed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quiz completed in 8 seconds (world record) - they clicked the first option for every question&lt;/li&gt;
&lt;li&gt;Skipped photo upload entirely&lt;/li&gt;
&lt;li&gt;Waited the full 31 seconds for processing&lt;/li&gt;
&lt;li&gt;Got their result: "Sheep"&lt;/li&gt;
&lt;li&gt;Started rage clicking on the sheep image immediately&lt;/li&gt;
&lt;li&gt;Left the site without saving or sharing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why did their energetic corgi get cast as a sheep? The rushed quiz responses created a contradictory personality profile that confused the AI. Without a photo to provide visual context, the model defaulted to its safest, most generic casting choice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Online Evaluation Results:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Evaluation Score: 38/100 ❌&lt;/li&gt;
&lt;li&gt;Reasoning: "Costume contains unsafe elements: eyeliner, ribbons"&lt;/li&gt;
&lt;li&gt;Wait, what? The AI suggested face paint and ribbons, evaluation said NO&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Online evaluations use a model-agnostic evaluation (MAE) - an AI agent that evaluates other AI outputs for quality, safety, or accuracy. The out-of-the-box evaluation judge is overly cautious about physical safety. For the above scenario, the evaluation comments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Costume includes eyeliner which could be harmful to pets" (It's a DALL-E image!)&lt;/li&gt;
&lt;li&gt;"Ribbons pose entanglement risk"&lt;/li&gt;
&lt;li&gt;"Bells are a choking hazard" (It's AI-generated art!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;About 40% of low scores are actually the evaluation being overprotective about imaginary safety issues, not bad casting.&lt;/p&gt;

&lt;p&gt;Speed-runners get generic roles AND the evaluation writes safety warnings about digital costumes. Users see these low scores and think the app doesn't work well.&lt;/p&gt;

&lt;p&gt;But speed-running isn't the whole story. To truly understand the relationship between user engagement and AI quality, we need to see the flip side. The perfect user. One who gives the AI everything it needed to succeed. What happens when a user takes their time and engages thoughtfully with every step?&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: The perfect match
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Session Replay Showed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;45 seconds on quiz (reading each option)&lt;/li&gt;
&lt;li&gt;Uploaded photo, waited for processing&lt;/li&gt;
&lt;li&gt;Spent 2 minutes on results page&lt;/li&gt;
&lt;li&gt;Downloaded image multiple times&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Online Evaluation Results:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Evaluation Score: 96/100 ⭐⭐⭐⭐⭐&lt;/li&gt;
&lt;li&gt;Reasoning: "Personality perfectly matches role archetype"&lt;/li&gt;
&lt;li&gt;Photo bonus: "Visual traits enhanced casting accuracy"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Time invested = Quality received. The AI rewards thoughtfulness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery #3: The photo upload comedy gold mine
&lt;/h2&gt;

&lt;p&gt;Session replay revealed what photos people ACTUALLY upload. Without it, you'd never know that one in three photo uploads are problematic, and you'd be flying blind on whether to add validation or trust your model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: The surprising photo upload analysis
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Session Replay Showed:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Photo Upload Analysis (n=18 who uploaded):
- 12 (67%) Normal pet photos
- 2 (11%) Screenshots of pet photos on their phone
- 1 (6%) Multiple pets in one photo (chaos)
- 1 (6%) Blurry "pet in motion" disaster
- 1 (6%) Stock photo of their breed (cheater!)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Despite 33% problematic inputs, evaluation scores remained high (87-91/100). The AI is remarkably resilient.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: When "bad" photos produce great results
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;My Favorite Session:&lt;/strong&gt; Someone uploaded a photo of their cat mid-yawn. The AI vision model described it as "displaying fierce predatory behavior." The cat was cast as a "Protective Father." Evaluation score: 91/100. The owner downloaded it immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Winner:&lt;/strong&gt; Someone's hamster photo that was 90% cage bars. The AI somehow extracted "small fuzzy creature behind geometric patterns" and cast it as "Shepherd" because "clearly experienced at navigating barriers." Evaluation score: 87/100.&lt;/p&gt;

&lt;p&gt;Without session replay, you'd only see evaluation scores and think "the AI is working well." But session replay reveals users are uploading screenshots and blurry photos—input quality issues that could justify adding photo validation.&lt;/p&gt;

&lt;p&gt;However, the high evaluation scores prove the AI handles imperfect real-world data gracefully. This insight saved me from over-engineering photo validation that would have slowed down the user experience for minimal quality gains.&lt;/p&gt;

&lt;p&gt;Session replay + online evaluations together answered the question "Should I add photo validation?" The answer: No. Trust the model's resilience and keep the experience frictionless.&lt;/p&gt;

&lt;h2&gt;
  
  
  The magic formula: Why this combo works (and what surprised me)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Without Observability:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;"The app seems slow" → ¯\&lt;em&gt;(ツ)&lt;/em&gt;/¯&lt;/li&gt;
&lt;li&gt;"We have 20 visitors but 7 completions" → Where do they drop?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  With Session Replay ONLY:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;"User got sheep and rage clicked; maybe left angry" → Was this a bad match?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  With Model-Agnostic Evaluation ONLY:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;"Evaluation: 22/100 - Eyeliner unsafe for pets" → How did the user react?&lt;/li&gt;
&lt;li&gt;"Evaluation: 96/100 - Perfect match!" → How did this compare to the image they uploaded?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  With BOTH:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;"User rushed, got sheep with ribbons, evaluation panicked about safety"&lt;br&gt;
→ The OOTB evaluation treats image generation prompts like real costume instructions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;"40% of low scores are costume safety, not bad matching"&lt;br&gt;
→ Need custom evaluation criteria (coming soon!)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;"Users might think low score = bad casting, but it's often = protective evaluation"&lt;br&gt;
→ Would benefit from custom evaluation criteria to avoid this confusion&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The evaluation thinks we're putting actual ribbons on actual cats. It doesn't realize these are AI-generated images. So when the casting suggests "sparkly collar with bells," the evaluation judge practically calls animal services.&lt;/p&gt;

&lt;p&gt;Now that you've seen what's possible when you combine user behavior tracking with AI quality scoring, let's walk through how to add this same observability magic to your own multi-modal AI app.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your turn: See the complete picture
&lt;/h2&gt;

&lt;p&gt;Want to add this observability magic to your own app? Here's how:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Install the packages
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @launchdarkly/observability
npm &lt;span class="nb"&gt;install&lt;/span&gt; @launchdarkly/session-replay
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Initialize with observability
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;initialize&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;launchdarkly-js-client-sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Observability&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@launchdarkly/observability&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;SessionReplay&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@launchdarkly/session-replay&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ldClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clientId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Observability&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SessionReplay&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;privacySetting&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;strict&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="c1"&gt;// Masks all data on the page - see https://launchdarkly.com/docs/sdk/features/session-replay-config#expand-javascript-code-sample&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Configure online evaluations in dashboard
&lt;/h3&gt;

&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffp7fj2jn6ivm01v5ggvq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffp7fj2jn6ivm01v5ggvq.png" alt="Install Judges" width="800" height="471"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Create your AI Config in LaunchDarkly for LLM evaluation&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enable automatic accuracy scoring for production monitoring&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30nx5ecilcvjwj95imw0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30nx5ecilcvjwj95imw0.png" alt="Configure Judges" width="800" height="217"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Set accuracy weight to 100% for production AI monitoring&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monitor your AI outputs with real-time evaluation scoring&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  4. Connect the dots
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Session replay shows you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Where users drop off&lt;/li&gt;
&lt;li&gt;What confuses them&lt;/li&gt;
&lt;li&gt;When they rage click&lt;/li&gt;
&lt;li&gt;How long they wait&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Online evaluations show you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI decision accuracy scores&lt;/li&gt;
&lt;li&gt;Why certain outputs scored low&lt;/li&gt;
&lt;li&gt;Pattern of good vs bad castings&lt;/li&gt;
&lt;li&gt;Safety concerns (even for pixels!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together they reveal the complete story of your AI app.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources to get started:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/launchdarkly-labs/scarlett-critter-casting" rel="noopener noreferrer"&gt;Full Implementation Guide&lt;/a&gt;&lt;/strong&gt; - See how this pet app implements both features&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/tutorials/detecting-user-frustration-session-replay" rel="noopener noreferrer"&gt;Session Replay Tutorial&lt;/a&gt;&lt;/strong&gt; - Official LaunchDarkly guide for detecting user frustration&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/tutorials/when-to-add-online-evals" rel="noopener noreferrer"&gt;When to Add Online Evals&lt;/a&gt;&lt;/strong&gt; - Learn when and how to implement AI evaluation&lt;/p&gt;

&lt;p&gt;The real magic is in having observability AND online evaluations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cast your pet:&lt;/strong&gt; &lt;a href="https://scarlett-critter-casting.onrender.com/" rel="noopener noreferrer"&gt;https://scarlett-critter-casting.onrender.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;See your evaluation score ⭐. Understand why your cat is a shepherd and your dog is an angel. The AI has spoken, and now you can see exactly how much to trust it!&lt;/p&gt;




&lt;h2&gt;
  
  
  Ready to add AI observability to your multi-modal agents?
&lt;/h2&gt;

&lt;p&gt;Don't let your AI operate in the dark this holiday season. Get complete visibility into your multi-modal AI systems with LaunchDarkly's online evaluations and session replay.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Get started:&lt;/strong&gt; &lt;a href="https://app.launchdarkly.com/signup" rel="noopener noreferrer"&gt;Sign up for a free trial&lt;/a&gt; → &lt;a href="https://launchdarkly.com/docs/home/ai-configs/create" rel="noopener noreferrer"&gt;Create your first AI Config&lt;/a&gt; → Enable session replay and online evaluations → Ship with confidence.&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbm74wwnjxo6bzom8u6nk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbm74wwnjxo6bzom8u6nk.png" alt="Another Result" width="800" height="612"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LaunchDarkly resources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/home/ai-configs/quickstart" rel="noopener noreferrer"&gt;AI Config Quickstart Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/home/ai-configs/online-evaluations" rel="noopener noreferrer"&gt;Online Evaluations in AI Configs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/home/observability/session-replay" rel="noopener noreferrer"&gt;Session Replay Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Related tutorials:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/tutorials/detecting-user-frustration-session-replay" rel="noopener noreferrer"&gt;Detecting User Frustration with Session Replay&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/tutorials/agents-langgraph" rel="noopener noreferrer"&gt;Building Multi-Agent Systems with LangGraph&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/tutorials/when-to-add-online-evals" rel="noopener noreferrer"&gt;When to Add Online Evaluations&lt;/a&gt;`&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>observability</category>
      <category>ai</category>
      <category>evals</category>
      <category>agents</category>
    </item>
    <item>
      <title>Proving ROI with Data-Driven AI Agent Experiments</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Mon, 20 Oct 2025 21:33:33 +0000</pubDate>
      <link>https://dev.to/launchdarkly/proving-roi-with-data-driven-ai-agent-experiments-76b</link>
      <guid>https://dev.to/launchdarkly/proving-roi-with-data-driven-ai-agent-experiments-76b</guid>
      <description>&lt;p&gt;&lt;em&gt;Published October 9th, 2025&lt;/em&gt;&lt;/p&gt;


&lt;p&gt;by Scarlett Attensil&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Learn in 5 Minutes (or Build in 30)
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Findings from Our Experiments:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unexpected discovery&lt;/strong&gt;: Free Mistral model is not only \$0 but also significantly faster than Claude Haiku&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost paradox revealed&lt;/strong&gt;: "Free" security agent increased total system costs by forcing downstream agents to compensate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Premium model failure&lt;/strong&gt;: Claude Opus 4 performed 64% worse than GPT-4o despite costing 33% more&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sample size reality&lt;/strong&gt;: High-variance metrics (cost, feedback) require 5-10x more data than low-variance ones (latency)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Your CEO asks: &lt;strong&gt;"Is the new expensive AI model worth it?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your finance team wonders: &lt;strong&gt;"Does the enhanced privacy justify the cost?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Without experiments, you're guessing. This tutorial shows you how to &lt;strong&gt;measure the truth&lt;/strong&gt; and sometimes discover unanticipated gains.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Real Experiments, Real Answers
&lt;/h2&gt;

&lt;p&gt;In 30 minutes, you'll run actual A/B tests that reveal:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Will aggressive PII redaction hurt user satisfaction?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is Claude Opus 4 worth 33% more than GPT-4o?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 3 of 3 of the series: **Chaos to Clarity: Defensible AI Systems That Deliver on Your Goals&lt;/em&gt;**&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Start Options
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Option 1: Just Want the Concepts?&lt;/strong&gt; (5 min read)
&lt;/h3&gt;

&lt;p&gt;Skip to Understanding the Experiments to learn the methodology without running code.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Option 2: Full Hands-On Tutorial&lt;/strong&gt; (30 min)
&lt;/h3&gt;

&lt;p&gt;Follow the complete guide to run your own experiments.&lt;/p&gt;

&lt;p&gt;
  "&lt;strong&gt;Prerequisites for Hands-On Tutorial&lt;/strong&gt;"
  &lt;p&gt;&lt;strong&gt;Required from Previous Parts:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Active LaunchDarkly project completed from &lt;a href="https://dev.to/tutorials/agents-langgraph"&gt;Part 1&lt;/a&gt; &amp;amp; &lt;a href="https://dev.to/tutorials/multi-agent-mcp-targeting"&gt;Part 2&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;API keys: Anthropic, OpenAI, LaunchDarkly. &lt;a href="http://app.launchdarkly.com/signup" rel="noopener noreferrer"&gt;Sign up for a free account here&lt;/a&gt; and then &lt;a href="https://dev.to/home/account/api-create"&gt;follow these instructions to get your API access token&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Investment:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time: ~30 minutes&lt;/li&gt;
&lt;li&gt;Cost: \$25-35 default (\$5-10 with &lt;code&gt;--queries 50&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
  "&lt;strong&gt;Reduce Experiment Costs&lt;/strong&gt;"
  &lt;/p&gt;
&lt;p&gt;The default walk-through uses Claude Opus 4 (premium model) for testing. To reduce costs while still learning the experimentation patterns, you can modify &lt;code&gt;bootstrap/tutorial_3_experiment_variations.py&lt;/code&gt; in your cloned repository to test with the free Mistral model instead:&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;create_premium_model_variations&lt;/code&gt; function, change:&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Original (expensive):
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-20250514&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Change to (free Mistral):
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistral-small-latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;This reduces the experiment cost by about $20 (you'll still have costs from the control group using GPT-4o and other agents in the system).&lt;/p&gt;



&lt;/p&gt;





&lt;h2&gt;
  
  
  How the Experiments Work
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Setup&lt;/strong&gt;: Your AI system will automatically test variations on simulated users, collecting real performance data that flows directly to LaunchDarkly for statistical analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Process&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Traffic simulation&lt;/strong&gt; generates queries from your actual knowledge base&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Each user&lt;/strong&gt; gets randomly assigned to experiment variations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI responses&lt;/strong&gt; are evaluated for quality and tracked for cost/speed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LaunchDarkly&lt;/strong&gt; calculates statistical significance automatically&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The two experiments can run independently. Each user can participate in both, but the results are analyzed separately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Experiment Methodology&lt;/strong&gt;: Our supervisor agent routes PII queries to the security agent (then to support), while clean queries go directly to support. LaunchDarkly tracks metrics &lt;strong&gt;at the user level across all agents&lt;/strong&gt;, revealing system-wide effects. &lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Your Two Experiments
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Experiment 1: Security Agent Analysis&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Question&lt;/strong&gt;: Does Strict Security (free Mistral model with aggressive PII redaction) improve performance without harming user experience or significantly increasing system costs?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Variations&lt;/strong&gt; (50% each):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Control&lt;/strong&gt;: Basic Security (Claude Haiku, moderate PII redaction)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Treatment&lt;/strong&gt;: Strict Security (Mistral free, aggressive PII redaction)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Success Criteria&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Positive feedback rate: stable or improving (not significantly worse)&lt;/li&gt;
&lt;li&gt;Cost increase: ≤15% with ≥75% confidence&lt;/li&gt;
&lt;li&gt;Latency increase: ≤3 seconds (don't significantly slow down)&lt;/li&gt;
&lt;li&gt;Enhanced privacy protection delivered&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Experiment 2: Premium Model Value Analysis&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Question&lt;/strong&gt;: Does Claude Opus 4 justify its premium cost vs GPT-4o?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Variations&lt;/strong&gt; (50% each):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Control&lt;/strong&gt;: GPT-4o with full tools (current version)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Treatment&lt;/strong&gt;: Claude Opus 4 with identical tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Success Criteria (must meet 90% threshold)&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;≥15% positive feedback rate improvement by Claude Opus 4&lt;/li&gt;
&lt;li&gt;Cost-value ratio ≥ 0.25 (positive feedback rate gain % ÷ cost increase %)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setting Up Metrics and Experiments
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 1: Configure Metrics (5 minutes)&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Quick Metric Setup&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Navigate to &lt;strong&gt;Metrics&lt;/strong&gt; in LaunchDarkly and &lt;a href="https://dev.to/home/metrics/create-metrics"&gt;create three custom metrics&lt;/a&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;Metric Name&lt;/th&gt;
    &lt;th&gt;Event Key&lt;/th&gt;
    &lt;th&gt;Type&lt;/th&gt;
    &lt;th&gt;What It Measures&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;p95_total_user_latency&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;$ld:ai:duration:total&lt;/td&gt;
    &lt;td&gt;P95&lt;/td&gt;
    &lt;td&gt;Response speed&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;average_total_user_tokens&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;$ld:ai:tokens:total&lt;/td&gt;
    &lt;td&gt;Average&lt;/td&gt;
    &lt;td&gt;Token usage&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;ai_cost_per_request&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;ai_cost_per_request&lt;/td&gt;
    &lt;td&gt;Average&lt;/td&gt;
    &lt;td&gt;Dollar cost&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Positive Feedback&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Built-in&lt;/td&gt;
    &lt;td&gt;Rate&lt;/td&gt;
    &lt;td&gt;Positive feedback rate&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Negative Feedback&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Built-in&lt;/td&gt;
    &lt;td&gt;Rate&lt;/td&gt;
    &lt;td&gt;User complaints&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;
  "&lt;strong&gt;See detailed setup for P95 Latency&lt;/strong&gt;"
  &lt;ol&gt;
&lt;li&gt;Event key: &lt;code&gt;$ld:ai:duration:total&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Type: Value/Size → Numeric, Aggregation: Sum&lt;/li&gt;
&lt;li&gt;Definition: P95, value, user, sum, "lower is better"&lt;/li&gt;
&lt;li&gt;Unit: &lt;code&gt;ms&lt;/code&gt;, Name: &lt;code&gt;p95_total_user_latency&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzu9wt47htwm9mjheg99e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzu9wt47htwm9mjheg99e.png" alt="P95 Setup"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;



&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;
  "&lt;strong&gt;View other metric configurations&lt;/strong&gt;"
  &lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tokens&lt;/strong&gt;: Event key &lt;code&gt;$ld:ai:tokens:total&lt;/code&gt;, Name: &lt;code&gt;average_total_user_tokens&lt;/code&gt;, Average aggregation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: Event key &lt;code&gt;ai_cost_per_request&lt;/code&gt;, Name: &lt;code&gt;ai_cost_per_request&lt;/code&gt;, Average in dollars&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwzjsrcfxvy53api9modn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwzjsrcfxvy53api9modn.png" alt="Tokens"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvft0jb7dzwxyvyuv9841.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvft0jb7dzwxyvyuv9841.png" alt="Cost"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;



&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;The cost tracking is implemented in &lt;code&gt;utils/cost_calculator.py&lt;/code&gt;, which calculates actual dollar costs using the formula &lt;code&gt;(input_tokens × input_price + output_tokens × output_price) / 1M&lt;/code&gt;. The system has pre-configured pricing for each model (as of October 2025): GPT-4o at \$2.50/\$10 per million tokens, Claude Opus 4 at \$15/\$75, and Claude Sonnet at \$3/\$15. When a request completes, the cost is immediately calculated and sent to LaunchDarkly as a custom event, enabling direct cost-per-user analysis in your experiments.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 2: Create Experiment Variations&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Create the experiment variations using the bootstrap script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run python bootstrap/tutorial_3_experiment_variations.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates the &lt;code&gt;claude-opus-treatment&lt;/code&gt; variation for the Premium Model Value experiment. To verify the script worked correctly, navigate to your &lt;strong&gt;support-model-config&lt;/strong&gt; feature flag in LaunchDarkly - you should now see the &lt;code&gt;claude-opus-treatment&lt;/code&gt; variation alongside your existing variations. The Security Agent Analysis experiment will use your existing baseline and enhanced variations. Both experiments use the existing &lt;code&gt;other-paid&lt;/code&gt; configuration as their control group.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 3: Configure Security Agent Experiment&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;
  "Click for details"
  &lt;p&gt;Navigate to &lt;strong&gt;AI Configs → security-agent&lt;/strong&gt;. In the right navigation menu, click the plus (+) sign next to &lt;strong&gt;Experiments&lt;/strong&gt; to create a new experiment&lt;/p&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Experiment Design&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Experiment type:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep &lt;code&gt;Feature change&lt;/code&gt; selected (default)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; &lt;code&gt;Security Level&lt;/code&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Hypothesis and Metrics&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Hypothesis:&lt;/strong&gt; &lt;code&gt;Enhanced security improves safety compliance without significantly harming positive feedback rates&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Randomize by:&lt;/strong&gt; &lt;code&gt;user&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metrics:&lt;/strong&gt; Click "Select metrics or metric groups" and add:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;Positive feedback rate&lt;/code&gt; → Select first to set as &lt;strong&gt;Primary&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Negative feedback rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;p95_total_user_latency&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ai_cost_per_request&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Audience Targeting&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Flag or AI Config&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click the dropdown and select &lt;strong&gt;security-agent&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Targeting rule:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click the dropdown and select &lt;strong&gt;Rule 4&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;This will configure: &lt;code&gt;If Context&lt;/code&gt; → &lt;code&gt;is in Segment&lt;/code&gt; → &lt;code&gt;Other Paid&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Audience Allocation&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Variations served outside of this experiment:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Basic Security&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sample size:&lt;/strong&gt; Set to &lt;code&gt;100%&lt;/code&gt; of users in this experiment&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Variations split:&lt;/strong&gt; Click "Edit" and configure:&lt;/p&gt;

&lt;p&gt;Note: Before setting these percentages, scroll down to the &lt;strong&gt;Control&lt;/strong&gt; field below and set &lt;code&gt;Basic Security&lt;/code&gt; as the control variation first, otherwise you won't be able to allocate 50% traffic to it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pii-detector&lt;/code&gt;: &lt;code&gt;0%&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Basic Security&lt;/code&gt;: &lt;code&gt;50%&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Strict Security&lt;/code&gt;: &lt;code&gt;50%&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Control:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Basic Security&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Statistical Approach and Success Criteria&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Statistical approach:&lt;/strong&gt; &lt;code&gt;Bayesian&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Threshold:&lt;/strong&gt; &lt;code&gt;90%&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Click &lt;strong&gt;"Save"&lt;/strong&gt;&lt;br&gt;
Click &lt;strong&gt;"Start experiment"&lt;/strong&gt; to launch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: You may see a "Health warning" indicator after starting the experiment. This is normal and expected when no variations have been exposed yet. The warning will clear once your experiment starts receiving traffic and data begins flowing.&lt;/p&gt;



&lt;/p&gt;



&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F731ozqk2bd4bwlki0qo2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F731ozqk2bd4bwlki0qo2.png" alt="Security Agent Experiment Configuration"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Step 4: Configure Premium Model Experiment&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;
  "Click for details"
  &lt;p&gt;Navigate to &lt;strong&gt;AI Configs → support-agent&lt;/strong&gt;. In the right navigation menu, click the plus (+) sign next to &lt;strong&gt;Experiments&lt;/strong&gt; to create a new experiment&lt;/p&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Experiment Design&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Experiment type:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep &lt;code&gt;Feature change&lt;/code&gt; selected (default)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; &lt;code&gt;Premium Model Value Analysis&lt;/code&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Hypothesis and Metrics&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Hypothesis:&lt;/strong&gt; &lt;code&gt;Claude Opus 4 justifies premium cost with superior positive feedback rate&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Randomize by:&lt;/strong&gt; &lt;code&gt;user&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metrics:&lt;/strong&gt; Click "Select metrics or metric groups" and add:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;Positive feedback rate&lt;/code&gt; → Select first to set as &lt;strong&gt;Primary&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Negative feedback rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;p95_total_user_latency&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;average_total_user_tokens&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ai_cost_per_request&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Audience Targeting&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Flag or AI Config&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click the dropdown and select &lt;strong&gt;support-agent&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Targeting rule:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click the dropdown and select &lt;strong&gt;Rule 4&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;This will configure: &lt;code&gt;If Context&lt;/code&gt; → &lt;code&gt;is in Segment&lt;/code&gt; → &lt;code&gt;Other Paid&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Audience Allocation&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Variations served outside of this experiment:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;other-paid&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sample size:&lt;/strong&gt; Set to &lt;code&gt;100%&lt;/code&gt; of users in this experiment&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Variations split:&lt;/strong&gt; Click "Edit" and configure:&lt;/p&gt;

&lt;p&gt;Note: Before setting these percentages, scroll down to the &lt;strong&gt;Control&lt;/strong&gt; field below and set &lt;code&gt;other-paid&lt;/code&gt; as the control variation first, otherwise you won't be able to allocate 50% traffic to it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;rag-search-enhanced&lt;/code&gt;: &lt;code&gt;0%&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;eu-free&lt;/code&gt;: &lt;code&gt;0%&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;eu-paid&lt;/code&gt;: &lt;code&gt;0%&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;other-free&lt;/code&gt;: &lt;code&gt;0%&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;other-paid&lt;/code&gt;: &lt;code&gt;50%&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;international-standard&lt;/code&gt;: &lt;code&gt;0%&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;claude-opus-treatment&lt;/code&gt;: &lt;code&gt;50%&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Control:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;other-paid&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Statistical Approach and Success Criteria&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Statistical approach:&lt;/strong&gt; &lt;code&gt;Bayesian&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Threshold:&lt;/strong&gt; &lt;code&gt;90%&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Click &lt;strong&gt;"Save"&lt;/strong&gt;&lt;br&gt;
Click &lt;strong&gt;"Start experiment"&lt;/strong&gt; to launch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: You may see a "Health warning" indicator after starting the experiment. This is normal and expected when no variations have been exposed yet. The warning will clear once your experiment starts receiving traffic and data begins flowing.&lt;/p&gt;



&lt;/p&gt;



&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6nwj3pddvthcvu3ak3v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6nwj3pddvthcvu3ak3v.png" alt="Premium Model Value Analysis Experiment Configuration"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;



&lt;h2&gt;
  
  
  Understanding Your Experimental Design
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;If Two Independent Experiments Running Concurrently:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Since these are the &lt;strong&gt;same users&lt;/strong&gt;, each user experiences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One security variation (&lt;code&gt;Basic Security&lt;/code&gt; or &lt;code&gt;Strict Security&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;One model variation (&lt;code&gt;Claude Opus 4 Treatment&lt;/code&gt; OR &lt;code&gt;Other Paid (GPT-4o)&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Random assignment ensures balance: ~50 users get each combination naturally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generating Experiment Data
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 5: Run Traffic Generator&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Start your backend and generate realistic experiment data. Choose between sequential or concurrent traffic generation:&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Concurrent Traffic Generator (Recommended for large datasets)&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;For faster experiment data generation with parallel requests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start backend API&lt;/span&gt;
uv run uvicorn api.main:app &lt;span class="nt"&gt;--reload&lt;/span&gt; &lt;span class="nt"&gt;--port&lt;/span&gt; 8000

&lt;span class="c"&gt;# Generate experiment data with 10 concurrent requests (separate terminal)&lt;/span&gt;
uv run python &lt;span class="nt"&gt;-u&lt;/span&gt; tools/concurrent_traffic_generator.py &lt;span class="nt"&gt;--queries&lt;/span&gt; 200 &lt;span class="nt"&gt;--concurrency&lt;/span&gt; 10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Configuration&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;200 queries&lt;/strong&gt; by default (edit script to adjust)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10 concurrent requests&lt;/strong&gt; running in parallel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2000-second timeout&lt;/strong&gt; (33 minutes) per request to handle MCP tool rate limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note: runtime depends largely on if you retain MCP tool enablement as these take much longer to complete.&lt;/p&gt;

&lt;p&gt;
  "For smaller test runs or debugging"
  &lt;h4&gt;
  
  
  &lt;strong&gt;Sequential Traffic Generator (Simple, one-at-a-time)&lt;/strong&gt;
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start backend API&lt;/span&gt;
uv run uvicorn api.main:app &lt;span class="nt"&gt;--reload&lt;/span&gt; &lt;span class="nt"&gt;--port&lt;/span&gt; 8000

&lt;span class="c"&gt;# Generate experiment data sequentially (separate terminal)&lt;/span&gt;
uv run python tools/traffic_generator.py &lt;span class="nt"&gt;--queries&lt;/span&gt; 50 &lt;span class="nt"&gt;--delay&lt;/span&gt; 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;What Happens During Simulation:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Knowledge extraction&lt;/strong&gt;&lt;br&gt;
Claude analyzes your docs and identifies 20+ realistic topics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Query generation&lt;/strong&gt;&lt;br&gt;
Each test randomly selects from these topics for diversity&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AI-powered evaluation&lt;/strong&gt;&lt;br&gt;
Claude judges responses as thumbs_up/thumbs_down/neutral&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automatic tracking&lt;/strong&gt;&lt;br&gt;
All metrics flow to LaunchDarkly in real-time&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;



&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generation Output&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;📚 Analyzing knowledge base...
✅ Generated 23 topics

⚡ Sending 200 requests with 10 concurrent workers...

✅ [1/200] Success (23.4s) - other_paid: What is reinforcement learning?...
✅ [2/200] Success (45.2s) - other_paid: How does Q-learning work?...
⏱️  [15/200] Timeout (&amp;gt;2000s) - other_paid: Complex research query...
                              ↑ This is normal - MCP rate limits
✅ [200/200] Success (387.1s) - other_paid: Explain temporal difference...

======================================================================
✅ COMPLETE
======================================================================
Total time: 45.3 minutes (2718s)
Successful: 195/200 (97.5%)
Failed: 5/200 (2.5%)
Average: 13.6s per query (with concurrency)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Performance Notes&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most queries complete in 10-60 seconds&lt;/li&gt;
&lt;li&gt;Queries using &lt;code&gt;semantic_scholar&lt;/code&gt; MCP tool may take 5-20 minutes due to API rate limits&lt;/li&gt;
&lt;li&gt;Concurrent execution handles slow requests gracefully by continuing with others&lt;/li&gt;
&lt;li&gt;Failed/timeout requests (less than 5% typically) don't affect experiment validity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Monitor Results&lt;/strong&gt;: Refresh your LaunchDarkly experiment "Results" tabs to see data flowing in. Cost metrics appear as custom events alongside feedback and token metrics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Interpreting Your Results (After Data Collection)
&lt;/h2&gt;

&lt;p&gt;Once your experiments have collected data from ~100 users per variation, you'll see results in the LaunchDarkly UI. Here's how to interpret them:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Security Agent Analysis: Does enhanced security improve safety without significantly impacting positive feedback rates?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Reality Check&lt;/strong&gt;: Not all metrics reach significance at the same rate. In our security experiment we ran over 2,000 more users than the model experiment, turning off the MCP tools and using &lt;code&gt;--pii-percentage 100&lt;/code&gt; to maximize pii detection:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt;: 87% confidence (nearly significant, clear 36% improvement)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: 21% confidence (high variance, needs 5-10x more data)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feedback&lt;/strong&gt;: 58% confidence (sparse signal, needs 5-10x more data)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is normal! Low-variance metrics (latency, tokens) prove out quickly. High-variance metrics (cost, feedback) need massive samples. &lt;strong&gt;You may not be able to wait for every metric to hit 90%&lt;/strong&gt;. Use strong signals on some metrics plus directional insights on others.&lt;/p&gt;

&lt;blockquote&gt;
&lt;h2&gt;
  
  
  ✅ VERDICT: Deploy Strict Security: Enhanced Privacy is Worth the Modest Cost
&lt;/h2&gt;

&lt;p&gt;The results tell a compelling story: &lt;strong&gt;Latency (p95)&lt;/strong&gt; is approaching significance with &lt;strong&gt;87% confidence&lt;/strong&gt; that Strict Security is faster, a win we didn't anticipate. &lt;strong&gt;Cost per request&lt;/strong&gt; shows &lt;strong&gt;79% confidence&lt;/strong&gt; that Basic Security costs less (or conversely, 21% confidence that Strict costs more), also approaching significance. Meanwhile, &lt;strong&gt;positive feedback rate&lt;/strong&gt; remains inconclusive with only &lt;strong&gt;58% confidence&lt;/strong&gt; that Strict Security performs better, indicating we need more data to draw conclusions about user satisfaction.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The Hidden Cost Paradox:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Strict Security uses &lt;strong&gt;FREE Mistral&lt;/strong&gt; for PII detection, yet &lt;strong&gt;increases total system cost by 14.6%&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Basic Security (Claude Haiku):
- Supervisor: gpt-4o-mini     ~\$0.0001
- Security:   claude-haiku    ~\$0.0003
- Support:    gpt-4o          ~\$0.0242
Total: \$0.0246

Strict Security (Mistral):
- Supervisor: gpt-4o-mini     ~\$0.0001
- Security:   mistral         \$0.0000  (FREE!)
- Support:    gpt-4o          ~\$0.0280  (+15.7%)
Total: \$0.0281 (+14.6%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why does the support agent cost more?&lt;/strong&gt; More aggressive PII redaction removes context, forcing the support agent to generate longer, more detailed responses to compensate for the missing information. This demonstrates why &lt;strong&gt;system-level experiments&lt;/strong&gt; matter. Optimizing one agent can inadvertently increase costs downstream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decision Logic:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IF latency increase ≤ 3s
   AND cost increase ≤ 15% AND confidence ≥ 75%
   AND positive_feedback_rate stable or improving
   AND enhanced_privacy_protection = true
THEN deploy_strict_security()
ELSE need_more_data()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; Deploy Strict Security. We expected latency to stay within 3 seconds of baseline, but discovered a &lt;strong&gt;36% improvement&lt;/strong&gt; instead (87% confidence). Mistral is significantly faster than Claude Haiku. Combined with enhanced privacy protection, this more than justifies the modest 14.5% cost increase (79% confidence). &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Read across:&lt;/strong&gt; At scale, paying ~\$0.004 more per request for significantly better privacy compliance &lt;em&gt;and&lt;/em&gt; faster responses is a clear win for EU users and privacy-conscious segments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Data That Proves It:&lt;/strong&gt;&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6q2n6iehyrxj6eiyivdw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6q2n6iehyrxj6eiyivdw.png" alt="Security Level Experiment Results"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;2. Premium Model Value Analysis: Does Claude Opus 4 justify its premium cost with superior positive feedback rates?&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;h2&gt;
  
  
  🔴 VERDICT: Reject Claude Opus 4
&lt;/h2&gt;

&lt;p&gt;The experiment delivered a decisive verdict: &lt;strong&gt;Positive feedback rate&lt;/strong&gt; showed a significant failure with &lt;strong&gt;99.5% confidence&lt;/strong&gt; that GPT-4o is superior. &lt;strong&gt;Cost per request&lt;/strong&gt; is approaching significance with &lt;strong&gt;76% confidence&lt;/strong&gt; that Claude Opus is &lt;strong&gt;33% more expensive&lt;/strong&gt;, while &lt;strong&gt;latency (p95)&lt;/strong&gt; reached significance with &lt;strong&gt;91% confidence&lt;/strong&gt; that Claude Opus is &lt;strong&gt;81% slower&lt;/strong&gt;. The &lt;strong&gt;cost-to-value ratio&lt;/strong&gt; tells the whole story: &lt;strong&gt;-1.9x&lt;/strong&gt;, meaning we're paying 33% more for 64% worse performance: a clear case of premium pricing without premium results.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Decision Logic:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IF positive_feedback_rate increase ≥ 15%
   AND probability_to_beat for positive_feedback_rate ≥ 90%
   AND probability_to_beat for cost ≥ 90%
   AND cost-value ratio increase ≥ .25
THEN deploy_claude_opus_4()
ELSE keep_current_model()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; Premium price delivered worse results on every metric. Experiment was stopped when positive feedback rate reached significance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Read across:&lt;/strong&gt; GPT-4o dominates on performance and speed and most likely also on cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Numbers Don't Lie:&lt;/strong&gt;&lt;/p&gt;



&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqy1vb29qzc7or51vs1c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqy1vb29qzc7or51vs1c.png" alt="Premium Model Value Analysis Results"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Key Insights from Real Experiment Data&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Low-variance metrics (latency, tokens) reach significance quickly (~1,000 samples). High-variance metrics (cost, feedback) may need 5,000-10,000+ samples. This isn't a flaw in your experiment but the reality of statistical power. Don't chase 90% confidence on every metric; focus on directional insights for high-variance metrics and statistical proof for low-variance ones.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Using a free Mistral model for security reduced that agent's cost to \$0, yet &lt;strong&gt;increased total system cost by 14.5%&lt;/strong&gt; because downstream agents had to work harder with reduced context. However, the experiment also revealed an &lt;strong&gt;unexpected 36% latency improvement&lt;/strong&gt;. Mistral is not just free but significantly faster. LaunchDarkly's user-level tracking captured both effects, enabling an informed decision: enhanced privacy + faster responses for ~\$0.004 more per request is a worthwhile tradeoff.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;At 87% confidence for latency (vs 90% target), the 36% improvement is clear enough for decision-making. Perfect statistical significance is ideal, but 85-89% confidence combined with other positive signals (stable feedback, acceptable cost) can justify deployment when the effect size is large.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Experimental Limitations &amp;amp; Mitigations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Model-as-Judge Evaluation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We use Claude to evaluate response quality rather than real users, which represents a limitation of this experimental setup. However, research shows that model-as-judge approaches correlate well with human preferences, as documented in &lt;a href="https://arxiv.org/abs/2212.08073" rel="noopener noreferrer"&gt;Anthropic's Constitutional AI paper&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Independent Experiments&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While random assignment naturally balances security versions across model versions, preventing systematic bias, you cannot analyze interaction effects between security and model choices. If interaction effects are important to your use case, consider running a proper &lt;a href="https://en.wikipedia.org/wiki/Factorial_experiment" rel="noopener noreferrer"&gt;factorial experiment design&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Statistical Confidence&lt;/strong&gt;&lt;br&gt;
LaunchDarkly uses &lt;strong&gt;&lt;a href="https://dev.to/home/experimentation/bayesian"&gt;Bayesian statistics&lt;/a&gt;&lt;/strong&gt; to calculate confidence, where 90% confidence means there's a 90% probability the true effect is positive. This is NOT the same as p-value &amp;lt; 0.10 from &lt;a href="https://en.wikipedia.org/wiki/Frequentist_inference" rel="noopener noreferrer"&gt;frequentist tests&lt;/a&gt;. We set the threshold at 90% (rather than 95%) to balance false positives versus false negatives, though for mission-critical features you should consider raising the confidence threshold to 95%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Mistakes You Just Avoided
&lt;/h2&gt;

&lt;p&gt;❌ &lt;strong&gt;"Let's run the experiment for a week and see"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;We defined success criteria upfront&lt;/strong&gt; (≥15% improvement threshold)&lt;/p&gt;

&lt;p&gt;❌ &lt;strong&gt;"We need 90% confidence on every metric to ship"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;We used 87% confidence + directional signals&lt;/strong&gt; (36% latency win was decision-worthy)&lt;/p&gt;

&lt;p&gt;❌ &lt;strong&gt;"Let's run experiments until all metrics reach significance"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;We understood variance&lt;/strong&gt; (cost/feedback need 5-10x more data than latency)&lt;/p&gt;

&lt;p&gt;❌ &lt;strong&gt;"Agent-level metrics show the full picture"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;We tracked user-level workflows&lt;/strong&gt; (revealed downstream cost increases)&lt;/p&gt;

&lt;h2&gt;
  
  
  What You've Accomplished
&lt;/h2&gt;

&lt;p&gt;You've built a &lt;strong&gt;data-driven optimization engine&lt;/strong&gt; with statistical rigor through falsifiable hypotheses and clear success criteria. Your predefined success criteria ensure clear decisions and prevent post-hoc rationalization. Every feature investment now has quantified business impact for ROI justification, and you have a framework for continuous optimization through ongoing measurable experimentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Long Response Times (&amp;gt;20 minutes)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you see requests taking exceptionally long, the root cause is likely the &lt;code&gt;semantic_scholar&lt;/code&gt; MCP tool hitting API rate limits, which causes 30-second retry delays. Queries using this tool may take 5-20 minutes to complete. The 2000-second timeout handles this gracefully, but if you need faster responses (60-120 seconds typical), consider removing &lt;code&gt;semantic_scholar&lt;/code&gt; from tool configurations. You can verify this issue by checking logs for &lt;code&gt;HTTP/1.1 429&lt;/code&gt; errors indicating rate limiting.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Cost Metrics Not Appearing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If &lt;code&gt;ai_cost_per_request&lt;/code&gt; events aren't showing in LaunchDarkly, first verify that &lt;code&gt;utils/cost_calculator.py&lt;/code&gt; has pricing configured for your models. Cost is only tracked when requests complete successfully (not on timeout or error). The system flushes cost events to LaunchDarkly immediately after each request completion. To debug, look for &lt;code&gt;COST CALCULATED:&lt;/code&gt; and &lt;code&gt;COST TRACKING (async):&lt;/code&gt; messages in your API logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond This Tutorial: Advanced AI Experimentation Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Other AI experimentation types you can run in LaunchDarkly&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Context from earlier:&lt;/em&gt; you ran two experiments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Security‑agent test&lt;/strong&gt;: a &lt;strong&gt;bundle change&lt;/strong&gt; (both prompt/instructions &lt;strong&gt;and&lt;/strong&gt; model changed).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Premium‑model test&lt;/strong&gt;: a &lt;strong&gt;model‑only&lt;/strong&gt; change.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI Configs come in two modes: &lt;strong&gt;prompt‑based&lt;/strong&gt; (single‑step completions) and &lt;strong&gt;agent‑based&lt;/strong&gt; (multi‑step workflows with tools). Below are additional patterns to explore.&lt;/p&gt;




&lt;h4&gt;
  
  
  Experiments you can run &lt;strong&gt;entirely in AI Configs&lt;/strong&gt; (no app redeploy)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompt &amp;amp; template experiments (prompt‑based or agent instructions)&lt;/strong&gt;&lt;br&gt;
Duplicate a variation and iterate on system/assistant messages or agent instructions to measure adherence to schema, tone, or qualitative satisfaction. Use LaunchDarkly Experimentation to tie those variations to product metrics.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model‑parameter experiments&lt;/strong&gt;&lt;br&gt;
In a single model, vary parameters like &lt;code&gt;temperature&lt;/code&gt; or &lt;code&gt;max_tokens&lt;/code&gt;, and (optionally) add &lt;strong&gt;custom parameters&lt;/strong&gt; you define (for example, an internal &lt;code&gt;max_tool_calls&lt;/code&gt; or decoding setting) directly on the variation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tool‑bundle experiments (agent mode or tool‑enabled completions)&lt;/strong&gt;&lt;br&gt;
Attach/detach reusable tools from the &lt;strong&gt;Tools Library&lt;/strong&gt; to compare stacks (e.g., &lt;code&gt;search_v2&lt;/code&gt;, a reranker, or MCP‑exposed research tools) across segments. Keep one variable at a time when possible&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost/latency trade‑offs&lt;/strong&gt;&lt;br&gt;
Compare "slim" vs "premium" stacks by segment. Track tokens, time‑to‑first‑token, duration, and satisfaction to decide where higher spend is warranted.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Practical notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;Experimentation&lt;/strong&gt; for behavior impact (clicks, task success); use the &lt;strong&gt;Monitoring&lt;/strong&gt; tab for LLM‑level metrics (tokens, latency, errors, satisfaction).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  - You &lt;strong&gt;can't&lt;/strong&gt; run a guarded rollout and an experiment on the same flag at the same time. Pick one: guarded rollout for risk‑managed releases, experiment for causal measurement.
&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Patterns that &lt;strong&gt;usually need feature flags and/or custom instrumentation&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fine‑grained RAG tuning&lt;/strong&gt;&lt;br&gt;
k‑values, similarity thresholds, chunking, reranker swaps, and cache policy are typically coded inside your retrieval layer. Expose these as flags or AI Config custom parameters if you want to A/B them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tool‑routing guardrails&lt;/strong&gt;&lt;br&gt;
Fallback flows (e.g., retry with a different tool/model on error), escalation rules, or heuristics need logic in your agent/orchestrator. Gate those behaviors behind flags and measure with custom metrics.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Safety guardrail calibration&lt;/strong&gt;&lt;br&gt;
Moderation thresholds, red‑team prompts, and PII sensitivity levers belong in a dedicated safety service or the agent wrapper. Wire them to flags so you can raise/lower sensitivity by segment (e.g., enterprise vs free).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Session budget enforcement&lt;/strong&gt;&lt;br&gt;
Monitoring will show token costs and usage, but enforcing per‑session or per‑org budgets (denylist, degrade model, or stop‑tooling) requires application logic. Wrap policies in flags before you experiment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  Targeting &amp;amp; segmentation ideas (works across all the above)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Route by &lt;strong&gt;plan/tier&lt;/strong&gt;, &lt;strong&gt;geo&lt;/strong&gt;, &lt;strong&gt;device&lt;/strong&gt;, or &lt;strong&gt;org&lt;/strong&gt; using AI Config targeting rules and percentage rollouts.&lt;/li&gt;
&lt;li&gt;Keep variations narrow (one change per experiment) to avoid confounding; reserve "bundle" tests for tool‑stack bake‑offs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Advanced Practices:&lt;/strong&gt; Require statistical evidence before shipping configuration changes. Pair each variation with clear success metrics, then A/B test prompt or tool adjustments and use confidence intervals to confirm improvements. When you introduce the new code paths above, protect them behind feature flags so you can run sequential tests, &lt;a href="https://dev.to/home/multi-armed-bandits"&gt;multi-armed bandits&lt;/a&gt; for faster convergence, or change your &lt;a href="https://dev.to/guides/experimentation/designing-experiments"&gt;experiment design&lt;/a&gt; to understand how prompts, tools, and safety levers interact.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Chaos to Clarity
&lt;/h2&gt;

&lt;p&gt;Across this three-part series, you've transformed from hardcoded AI configurations to a scientifically rigorous, data-driven optimization engine. &lt;strong&gt;&lt;a href="https://dev.to/tutorials/agents-langgraph"&gt;Part 1&lt;/a&gt;&lt;/strong&gt; established your foundation with a dynamic multi-agent &lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt; system controlled by &lt;a href="https://dev.to/guides/ai-configs"&gt;LaunchDarkly AI Configs&lt;/a&gt;, eliminating the need for code deployments when adjusting AI behavior. &lt;strong&gt;&lt;a href="https://dev.to/tutorials/multi-agent-mcp-targeting"&gt;Part 2&lt;/a&gt;&lt;/strong&gt; added sophisticated targeting with geographic privacy rules, user segmentation by plan tiers, and &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;MCP (Model Context Protocol)&lt;/a&gt; tool integration for real academic research capabilities. This tutorial, &lt;strong&gt;Part 3&lt;/strong&gt; completed your journey with statistical experimentation that proves ROI and guides optimization decisions with mathematical confidence rather than intuition.&lt;/p&gt;

&lt;p&gt;You now possess a defensible AI system that adapts to changing requirements, scales across user segments, and continuously improves through measured experimentation. Your stakeholders receive concrete evidence for AI investments, your engineering team deploys features with statistical backing, and your users benefit from optimized experiences driven by real data rather than assumptions. The chaos of ad-hoc AI development has given way to clarity through systematic, scientific product development.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;h2&gt;
  
  
  - &lt;strong&gt;&lt;a href="https://dev.to/home/experimentation"&gt;LaunchDarkly Experimentation Docs&lt;/a&gt;&lt;/strong&gt; - Deep dive into statistical methods
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Remember:&lt;/strong&gt; Every AI decision backed by data is a risk avoided and a lesson learned. Start small, measure everything, ship with confidence.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>launchdarkly</category>
      <category>python</category>
      <category>ux</category>
    </item>
    <item>
      <title>Smart AI Agent Targeting with MCP Tools</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Mon, 20 Oct 2025 21:09:42 +0000</pubDate>
      <link>https://dev.to/launchdarkly/smart-ai-agent-targeting-with-mcp-tools-hdn</link>
      <guid>https://dev.to/launchdarkly/smart-ai-agent-targeting-with-mcp-tools-hdn</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published September 22, 2025 at &lt;a href="https://launchdarkly.com/docs/tutorials/multi-agent-mcp-targeting" rel="noopener noreferrer"&gt;LaunchDarkly&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;Here's what nobody tells you about multi-agentic systems: the hard part isn't building them but making them profitable. One misconfigured model serving enterprise features to free users can burn $20K in a weekend. Meanwhile, you're manually juggling dozens of requirements for different user tiers, regions, and privacy compliance and each one is a potential failure point.&lt;/p&gt;

&lt;p&gt;Part 2 of 3 of the series: Chaos to Clarity: Defensible AI Systems That Deliver on Your Goals&lt;/p&gt;

&lt;p&gt;The solution? &lt;strong&gt;LangGraph multi-agent workflows&lt;/strong&gt; controlled by &lt;strong&gt;LaunchDarkly AI Config&lt;/strong&gt; targeting rules that intelligently route users: paid customers get premium tools and models, free users get cost-efficient alternatives, and EU users get Mistral for enhanced privacy. Use the &lt;strong&gt;LaunchDarkly REST API&lt;/strong&gt; to set up a custom variant-targeting matrix in 2 minutes instead of spending hours setting it up manually.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Build Today
&lt;/h2&gt;

&lt;p&gt;In the next 18 minutes, you'll transform your basic multi-agent system with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Business Tiers &amp;amp; MCP Integration&lt;/strong&gt;: Free users get internal keyword search, Paid users get premium models with RAG, external research tools and expanded tool call limits, all controlled by &lt;a href="https://dev.to/home/ai-configs"&gt;LaunchDarkly AI Configs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Geographic Targeting&lt;/strong&gt;: EU users automatically get Mistral and Claude models (enhanced privacy), other users get cost-optimized alternatives&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart Configuration&lt;/strong&gt;: Set up complex targeting matrices with &lt;a href="https://dev.to/home/flags/segments"&gt;LaunchDarkly segments&lt;/a&gt; and &lt;a href="https://dev.to/home/flags/target"&gt;targeting rules&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;✅ &lt;strong&gt;&lt;a href="//../agents-langgraph/agents-langgraph.mdx"&gt;Part 1 completed&lt;/a&gt;&lt;/strong&gt; with exact naming:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Project: &lt;code&gt;multi-agent-chatbot&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;AI Configs: &lt;code&gt;supervisor-agent&lt;/code&gt;, &lt;code&gt;security-agent&lt;/code&gt;, &lt;code&gt;support-agent&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Tools: &lt;code&gt;search_v2&lt;/code&gt;, &lt;code&gt;reranking&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Variations: &lt;code&gt;supervisor-basic&lt;/code&gt;, &lt;code&gt;pii-detector&lt;/code&gt;, &lt;code&gt;rag-search-enhanced&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🔑 &lt;strong&gt;Add to your &lt;code&gt;.env&lt;/code&gt; file&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LD_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-api-key        &lt;span class="c"&gt;# Get from LaunchDarkly settings&lt;/span&gt;
&lt;span class="nv"&gt;MISTRAL_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-key       &lt;span class="c"&gt;# Get from console.mistral.ai (free, requires phone + email validation)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Getting Your LaunchDarkly API Key
&lt;/h3&gt;

&lt;p&gt;The automation scripts in this tutorial use the LaunchDarkly REST API to programmatically create configurations. Here's how to get your API key:&lt;/p&gt;

&lt;p&gt;To get your LaunchDarkly API key, start by navigating to Organization Settings by clicking the gear icon (⚙️) in the left sidebar of &lt;a href="https://app.launchdarkly.com/" rel="noopener noreferrer"&gt;your LaunchDarkly dashboard&lt;/a&gt;. Once there, access Authorization Settings by clicking &lt;strong&gt;"Authorization"&lt;/strong&gt; in the settings menu. Next, create a new access token by clicking &lt;strong&gt;"Create token"&lt;/strong&gt; in the "Access tokens" section.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F10n891jq991kukpj2zmj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F10n891jq991kukpj2zmj.png" alt="API Token Creation" width="800" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When configuring your token, give it a descriptive name like "multi-agent-chatbot", select &lt;strong&gt;"Writer"&lt;/strong&gt; as the role (required for creating configurations), use the default API version (latest), and leave "This is a service token" unchecked for now.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft0xzaymsuctm1dce4l83.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft0xzaymsuctm1dce4l83.png" alt="Name API Token" width="800" height="635"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After configuring the settings, click &lt;strong&gt;"Save token"&lt;/strong&gt; and immediately copy the token value. This is &lt;strong&gt;IMPORTANT&lt;/strong&gt; because it's only shown once!&lt;/p&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frg53cdi0mdh3fi14w7nz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frg53cdi0mdh3fi14w7nz.png" alt="Copy API Token" width="800" height="199"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, add the token to your environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# Add this line to your .env file&lt;/span&gt;
   &lt;span class="nv"&gt;LD_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-copied-api-key-here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Security Note&lt;/strong&gt;: Keep your API key private and never commit it to version control. The token allows full access to your LaunchDarkly account.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Add External Research Tools (4 minutes)
&lt;/h2&gt;

&lt;p&gt;Your agents need more than just your internal documents. &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; connects AI assistants to live external data and they agents become orchestrators of your digital infrastructure, tapping into databases, communication tools, development platforms, and any system that matters to your business. MCP tools run as separate servers that your agents call when needed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The &lt;a href="https://registry.modelcontextprotocol.io" rel="noopener noreferrer"&gt;MCP Registry&lt;/a&gt; serves as a community-driven directory for discovering available MCP servers - like an app store for MCP tools. For this tutorial, we'll use manual installation since our specific academic research servers (ArXiv and Semantic Scholar) aren't yet available in the registry.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Install external research capabilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install ArXiv MCP server for academic paper search&lt;/span&gt;
uv tool &lt;span class="nb"&gt;install &lt;/span&gt;arxiv-mcp-server

&lt;span class="c"&gt;# Install Semantic Scholar MCP server for citation data&lt;/span&gt;
git clone https://github.com/JackKuo666/semanticscholar-MCP-Server.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;MCP Tools Added:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;arxiv_search&lt;/strong&gt;: Live academic paper search (Paid users)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;semantic_scholar&lt;/strong&gt;: Citation and research database (Paid users)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tools integrate with your agents via LangGraph while LaunchDarkly controls which users get access to which tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Configure with API Automation (2 minutes)
&lt;/h2&gt;

&lt;p&gt;Now we'll use programmatic API automation to configure the complete setup. The &lt;a href="https://dev.to/guides/api/rest-api"&gt;LaunchDarkly REST API&lt;/a&gt; lets you manage tools, segments, and &lt;a href="https://dev.to/home/ai-configs"&gt;AI Configs&lt;/a&gt; programmatically. Instead of manually creating dozens of variations in the UI, this &lt;strong&gt;configuration automation&lt;/strong&gt; makes REST API calls to provision user segments, AI Config variations, targeting rules, and tools. These are the same resources you could create manually through the LaunchDarkly dashboard. Your actual chat application continues running unchanged.&lt;/p&gt;

&lt;p&gt;Configure your complete targeting matrix with one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;bootstrap
uv run python create_configs.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What the script creates&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;3 new tools&lt;/strong&gt;: &lt;code&gt;search_v1&lt;/code&gt; (basic search), &lt;code&gt;arxiv_search&lt;/code&gt; and &lt;code&gt;semantic_scholar&lt;/code&gt; (MCP research tools)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4 combined user segments&lt;/strong&gt; with &lt;a href="https://dev.to/home/flags/segments"&gt;geographic and tier targeting rules&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Updated AI Configs&lt;/strong&gt;: &lt;code&gt;security-agent&lt;/code&gt; with 2 new geographic variations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complete &lt;a href="https://dev.to/home/flags/target"&gt;targeting rules&lt;/a&gt;&lt;/strong&gt; that route users to appropriate variations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intelligently reuses&lt;/strong&gt; existing resources: &lt;code&gt;supervisor-agent&lt;/code&gt;, &lt;code&gt;search_v2&lt;/code&gt;, and &lt;code&gt;reranking&lt;/code&gt; tools from Part 1&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Understanding the Bootstrap Script
&lt;/h3&gt;

&lt;p&gt;The automation works by reading a YAML manifest and translating it into LaunchDarkly API calls. Here's how the key parts work:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Segment Creation with Geographic Rules&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_segment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;segment_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 1: Create empty segment
&lt;/span&gt;    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;segment_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;segment_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: Add targeting rules via semantic patch
&lt;/span&gt;    &lt;span class="n"&gt;clauses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;clause&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;segment_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rules&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;clauses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;attribute&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;clause&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;attribute&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# "country" or "plan"
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;op&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;clause&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;op&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;              &lt;span class="c1"&gt;# "in"
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;values&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;clause&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;values&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;      &lt;span class="c1"&gt;# ["DE", "FR", ...] or ["free"]
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;contextKind&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;negate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;clause&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;negate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;       &lt;span class="c1"&gt;# false for EU, true for non-EU
&lt;/span&gt;        &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Model Configuration Mapping&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The script maps your YAML model IDs to LaunchDarkly's internal keys
&lt;/span&gt;&lt;span class="n"&gt;model_config_key_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-3-5-sonnet-20241022&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Anthropic.claude-3-7-sonnet-latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-3-5-haiku-20241022&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Anthropic.claude-3-5-haiku-20241022&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenAI.gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenAI.gpt-4o-mini-2024-07-18&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistral-small-latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mistral.mistral-small-latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Customizing for Your Use Case&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;To adapt this for your own multi-agent system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add your geographic regions&lt;/strong&gt; in the YAML segments:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apac-paid&lt;/span&gt;
     &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;attribute&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;country"&lt;/span&gt; 
         &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JP"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AU"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SG"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;KR"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Your APAC countries&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Define your business tiers&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;attribute&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;plan"&lt;/span&gt;
     &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enterprise"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;professional"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;starter"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Your pricing tiers&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Map your models&lt;/strong&gt; in the script:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-model-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Provider.your-launchdarkly-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script handles the complexity of LaunchDarkly's API while letting you define your targeting logic in simple YAML.&lt;/p&gt;

&lt;h3&gt;
  
  
  Validating the Bootstrap Script
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Expected terminal output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;🚀 LaunchDarkly AI Config Bootstrap
&lt;span class="o"&gt;==================================================&lt;/span&gt;
⚠️  IMPORTANT: This script is &lt;span class="k"&gt;for &lt;/span&gt;INITIAL SETUP ONLY
📝 After bootstrap completes:
   • Make ALL configuration changes &lt;span class="k"&gt;in &lt;/span&gt;LaunchDarkly UI
   • Do NOT modify ai_config_manifest.yaml
   • LaunchDarkly is your single &lt;span class="nb"&gt;source &lt;/span&gt;of truth
&lt;span class="o"&gt;==================================================&lt;/span&gt;

🚀 Starting multi-agent system bootstrap &lt;span class="o"&gt;(&lt;/span&gt;add-only&lt;span class="o"&gt;)&lt;/span&gt;...
📦 Project: multi-agent-chatbot

🔧 Creating tools...
  ✅ Tool &lt;span class="s1"&gt;'search_v1'&lt;/span&gt; created
  ✅ Tool &lt;span class="s1"&gt;'arxiv_search'&lt;/span&gt; created
  ✅ Tool &lt;span class="s1"&gt;'semantic_scholar'&lt;/span&gt; created

🤖 Ensuring AI configs exist...
✅ AI Config &lt;span class="s1"&gt;'supervisor-agent'&lt;/span&gt; exists
✅ AI Config &lt;span class="s1"&gt;'security-agent'&lt;/span&gt; exists
✅ AI Config &lt;span class="s1"&gt;'support-agent'&lt;/span&gt; exists

🧩 Creating variations...
  ✅ Variation &lt;span class="s1"&gt;'strict-security'&lt;/span&gt; created
  ✅ Variation &lt;span class="s1"&gt;'eu-free'&lt;/span&gt; created
  ✅ Variation &lt;span class="s1"&gt;'eu-paid'&lt;/span&gt; created
  ✅ Variation &lt;span class="s1"&gt;'other-free'&lt;/span&gt; created
  ✅ Variation &lt;span class="s1"&gt;'other-paid'&lt;/span&gt; created

📦 Creating segments &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;targeting rules&lt;span class="o"&gt;)&lt;/span&gt;...
✅ Empty segment &lt;span class="s1"&gt;'eu-free'&lt;/span&gt; created
  ✅ Rules added to segment &lt;span class="s1"&gt;'eu-free'&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;final count: 1&lt;span class="o"&gt;)&lt;/span&gt;
✅ Empty segment &lt;span class="s1"&gt;'eu-paid'&lt;/span&gt; created
  ✅ Rules added to segment &lt;span class="s1"&gt;'eu-paid'&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;final count: 1&lt;span class="o"&gt;)&lt;/span&gt;
✅ Empty segment &lt;span class="s1"&gt;'other-free'&lt;/span&gt; created
  ✅ Rules added to segment &lt;span class="s1"&gt;'other-free'&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;final count: 1&lt;span class="o"&gt;)&lt;/span&gt;
✅ Empty segment &lt;span class="s1"&gt;'other-paid'&lt;/span&gt; created
  ✅ Rules added to segment &lt;span class="s1"&gt;'other-paid'&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;final count: 1&lt;span class="o"&gt;)&lt;/span&gt;

🎯 Updating targeting rules...
✅ Targeting rules updated &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="s1"&gt;'security-agent'&lt;/span&gt;
✅ Targeting rules updated &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="s1"&gt;'support-agent'&lt;/span&gt;

✨ Bootstrap &lt;span class="nb"&gt;complete&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;In your LaunchDarkly dashboard&lt;/strong&gt;, navigate to your &lt;code&gt;multi-agent-chatbot&lt;/code&gt; project. You should see:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;AI Configs tab&lt;/strong&gt;: Three configs (&lt;code&gt;supervisor-agent&lt;/code&gt;, &lt;code&gt;security-agent&lt;/code&gt;, &lt;code&gt;support-agent&lt;/code&gt;) with new variations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Segments tab&lt;/strong&gt;: Four new segments (&lt;code&gt;eu-free&lt;/code&gt;, &lt;code&gt;eu-paid&lt;/code&gt;, &lt;code&gt;other-free&lt;/code&gt;, &lt;code&gt;other-paid&lt;/code&gt;) &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools tab&lt;/strong&gt;: Five tools total (including &lt;code&gt;search_v1&lt;/code&gt;, &lt;code&gt;arxiv_search&lt;/code&gt;, &lt;code&gt;semantic_scholar&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Troubleshooting Common Issues&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;❌ &lt;strong&gt;Error: "LD_API_KEY environment variable not set"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check your &lt;code&gt;.env&lt;/code&gt; file contains: &lt;code&gt;LD_API_KEY=your-api-key&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Verify the API key has "Writer" permissions in LaunchDarkly settings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ &lt;strong&gt;Error: "AI Config 'security-agent' not found"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensure you completed &lt;a href="//../agents-langgraph/agents-langgraph.mdx"&gt;Part 1&lt;/a&gt; with exact naming requirements&lt;/li&gt;
&lt;li&gt;Verify your project is named &lt;code&gt;multi-agent-chatbot&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Check that &lt;code&gt;supervisor-agent&lt;/code&gt;, &lt;code&gt;security-agent&lt;/code&gt;, and &lt;code&gt;support-agent&lt;/code&gt; exist in your LaunchDarkly project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ &lt;strong&gt;Error: "Failed to create segment"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your LaunchDarkly account needs segment creation permissions&lt;/li&gt;
&lt;li&gt;Try running the script again; it's designed to handle partial failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ &lt;strong&gt;Script runs but no changes appear&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wait 30-60 seconds for LaunchDarkly UI to refresh&lt;/li&gt;
&lt;li&gt;Check you're looking at the correct project and environment (Production)&lt;/li&gt;
&lt;li&gt;Verify your API key matches your LaunchDarkly organization&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 3: See How Smart Segmentation Works (2 minutes)
&lt;/h2&gt;

&lt;p&gt;Here's how the smart segmentation works:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;By Region:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;EU users&lt;/strong&gt;: Mistral for security processing + Claude for support (privacy + compliance)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-EU users&lt;/strong&gt;: Claude for security + GPT for support (cost optimization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All users&lt;/strong&gt;: Claude for supervision and workflow orchestration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;By Business Tier:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free users&lt;/strong&gt;: Basic search tools (&lt;code&gt;search_v1&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paid users&lt;/strong&gt;: Full research capabilities (&lt;code&gt;search_v1&lt;/code&gt;, &lt;code&gt;search_v2&lt;/code&gt;, &lt;code&gt;reranking&lt;/code&gt;, &lt;code&gt;arxiv_search&lt;/code&gt;, &lt;code&gt;semantic_scholar&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 4: Test Segmentation with Script (2 minutes)
&lt;/h2&gt;

&lt;p&gt;The included test script simulates real user scenarios across all segments, verifying that your targeting rules work correctly. It sends actual API requests to your system and confirms each user type gets the right model, tools, and behavior.&lt;/p&gt;

&lt;p&gt;First, start your system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 1: Start the backend&lt;/span&gt;
uv run uvicorn api.main:app &lt;span class="nt"&gt;--reload&lt;/span&gt; &lt;span class="nt"&gt;--port&lt;/span&gt; 8000

&lt;span class="c"&gt;# Terminal 2: Run the test script&lt;/span&gt;
uv run python api/segmentation_test.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Expected test output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;🚀 COMPREHENSIVE TUTORIAL 2 SEGMENTATION TESTS
Testing Geographic + Business Tier Targeting Matrix
&lt;span class="o"&gt;======================================================================&lt;/span&gt;

🔄 Running: EU Paid → Claude Sonnet + Full MCP Tools

&lt;span class="o"&gt;============================================================&lt;/span&gt;
🧪 TESTING: DE paid user &lt;span class="o"&gt;(&lt;/span&gt;ID: user_eu_paid_001&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;============================================================&lt;/span&gt;
📊 SUPPORT AGENT:
   Model: claude-3-7-sonnet-latest &lt;span class="o"&gt;(&lt;/span&gt;expected: claude-3-7-sonnet-latest&lt;span class="o"&gt;)&lt;/span&gt; ✅
   Variation: eu-paid &lt;span class="o"&gt;(&lt;/span&gt;expected: eu-paid&lt;span class="o"&gt;)&lt;/span&gt; ✅
   Tools: &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'search_v1'&lt;/span&gt;, &lt;span class="s1"&gt;'search_v2'&lt;/span&gt;, &lt;span class="s1"&gt;'reranking'&lt;/span&gt;, &lt;span class="s1"&gt;'arxiv_search'&lt;/span&gt;, &lt;span class="s1"&gt;'semantic_scholar'&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; ✅
   Expected: &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'search_v1'&lt;/span&gt;, &lt;span class="s1"&gt;'search_v2'&lt;/span&gt;, &lt;span class="s1"&gt;'reranking'&lt;/span&gt;, &lt;span class="s1"&gt;'arxiv_search'&lt;/span&gt;, &lt;span class="s1"&gt;'semantic_scholar'&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
   MCP Tools: Yes &lt;span class="o"&gt;(&lt;/span&gt;should be: Yes&lt;span class="o"&gt;)&lt;/span&gt; ✅

📝 RESPONSE:
   Length: 847 chars
   Tools Called: &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'search_v2'&lt;/span&gt;, &lt;span class="s1"&gt;'arxiv_search'&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
   Preview: Based on your request, I&lt;span class="s1"&gt;'ll search both internal documentation and recent academic research...

🎯 RESULT: ✅ PASSED

🔄 Running: EU Free → Claude Haiku + Basic Tools
[Similar detailed output for EU Free user...]

🔄 Running: US Paid → GPT-4 + Full MCP Tools  
[Similar detailed output for US Paid user...]

🔄 Running: US Free → GPT-4o Mini + Basic Tools
[Similar detailed output for US Free user...]

======================================================================
📊 FINAL RESULTS
======================================================================
✅ PASSED: 4/4
❌ FAILED: 0/4

🎉 ALL TESTS PASSED! LaunchDarkly targeting is working correctly.
   • Geographic segmentation: Working
   • Business tier routing: Working
   • Model assignment: Working
   • Tool configuration: Working
   • MCP integration: Working

🔗 Next: Test manually in UI at http://localhost:8501
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This confirms your targeting matrix is working correctly across all user segments!&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Experience Segmentation in the Chat UI (3 minutes)
&lt;/h2&gt;

&lt;p&gt;Now let's see your segmentation in action through the user interface. With your backend already running from Step 4, start the UI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 3: Start the chat interface&lt;/span&gt;
uv run streamlit run ui/chat_interface.py &lt;span class="nt"&gt;--server&lt;/span&gt;.port 8501
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;a href="http://localhost:8501" rel="noopener noreferrer"&gt;http://localhost:8501&lt;/a&gt; and test different user types:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;User Dropdown&lt;/strong&gt;: Find the user dropdown by using the &lt;strong&gt;&amp;gt;&amp;gt; icon&lt;/strong&gt; to open the  &lt;strong&gt;left nav menu&lt;/strong&gt;.. Select different regions (eu, other) and plans (Free, Paid).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ask Questions&lt;/strong&gt;: Try "Search for machine learning papers."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch Workflow&lt;/strong&gt;: In the server logs, watch which model and tools get used for each user type.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify Routing&lt;/strong&gt;: EU users get Mistral for security. Other users get GPT. Paid users get MCP tools.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp25uyqqsffijumethh89.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp25uyqqsffijumethh89.png" alt="Chat Interface User Selection" width="800" height="612"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next: Part 3 Preview
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;In Part 3&lt;/strong&gt;, we'll prove what actually works using controlled A/B experiments:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Set up Easy Experiments&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool Implementation Test&lt;/strong&gt;: Compare search_v1 vs search_v2 on identical models to measure search quality impact&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Efficiency Analysis&lt;/strong&gt;: Test models with the same full tool stack to measure tool-calling precision and cost&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Real Metrics You'll Track&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;User satisfaction&lt;/strong&gt;: thumbs up/down feedback&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool call efficiency&lt;/strong&gt;: average number of tools used per successful query&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token cost analysis&lt;/strong&gt;: cost per query across different model configurations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response latency&lt;/strong&gt;: performance impact of security and tool variations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of guessing which configurations work better, you'll have data proving which tool implementations provide value, which models use tools more efficiently, and what security enhancements actually costs in performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Path Forward
&lt;/h2&gt;

&lt;p&gt;You've built something powerful: a multi-agent system that adapts to users by design. More importantly, you've proven that sophisticated AI applications don't require repeated deployments; they require smart configuration.&lt;/p&gt;

&lt;p&gt;This approach scales beyond tutorials. Whether you're serving 100 users or 100,000, the same targeting principles apply: segment intelligently, configure dynamically, and let data guide decisions instead of assumptions.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Questions? Issues? Reach out at &lt;code&gt;aiproduct@launchdarkly.com&lt;/code&gt; or open an issue in the &lt;a href="https://github.com/launchdarkly-labs/devrel-agents-tutorial/issues" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>launchdarkly</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Build a LangGraph Multi-Agent system in 20 Minutes with LaunchDarkly AI Configs</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Fri, 19 Sep 2025 16:47:30 +0000</pubDate>
      <link>https://dev.to/launchdarkly/build-a-production-multi-agent-system-with-langgraph-and-launchdarkly-in-20-minutes-4ifl</link>
      <guid>https://dev.to/launchdarkly/build-a-production-multi-agent-system-with-langgraph-and-launchdarkly-in-20-minutes-4ifl</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on the &lt;a href="https://launchdarkly.com/docs/tutorials/agents-langgraph" rel="noopener noreferrer"&gt;LaunchDarkly Documentation&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;Build a working multi-agent system with dynamic configuration in 20 minutes using LangGraph multi-agent workflows, RAG search, and LaunchDarkly AI Configs.&lt;/p&gt;

&lt;p&gt;Part 1 of 3 of the series: Chaos to Clarity: Defensible AI Systems That Deliver on Your Goals&lt;/p&gt;

&lt;p&gt;You've been there: your AI chatbot works great in testing, then production hits and GPT-4 costs spiral out of control. You switch to Claude, but now European users need different privacy rules. Every change means another deploy, more testing, and crossed fingers that nothing breaks.&lt;/p&gt;

&lt;p&gt;The teams shipping faster? They control AI behavior dynamically instead of hardcoding everything.&lt;/p&gt;

&lt;p&gt;This series shows you how to build &lt;strong&gt;LangGraph multi-agent workflows&lt;/strong&gt; that get their intelligence from &lt;strong&gt;RAG&lt;/strong&gt; search through your business documents, enhanced with &lt;strong&gt;MCP tools&lt;/strong&gt; for live external data, all controlled through &lt;strong&gt;LaunchDarkly AI Configs&lt;/strong&gt; without needing to deploy code changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Series Covers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 1&lt;/strong&gt; (this post): Build a working multi-agent system with dynamic configuration in 20 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 2&lt;/strong&gt;: Add advanced features like segment targeting, MCP tool integration, and cost optimization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3&lt;/strong&gt;: Run production A/B experiments to prove what actually works&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end, you'll have a system that measures its own performance and adapts based on user data instead of guesswork.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Build Today
&lt;/h2&gt;

&lt;p&gt;In the next 20 minutes, you'll have a LangGraph multi-agent system with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Supervisor Agent&lt;/strong&gt;: Orchestrates workflow between specialized agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Agent&lt;/strong&gt;: Detects PII and sensitive information&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Support Agent&lt;/strong&gt;: Answers questions using your business documents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Control&lt;/strong&gt;: Change models, tools, and behavior through LaunchDarkly without code changes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;You'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.9+&lt;/strong&gt; with &lt;code&gt;uv&lt;/code&gt; package manager (&lt;a href="https://docs.astral.sh/uv/getting-started/installation/" rel="noopener noreferrer"&gt;install uv&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LaunchDarkly account&lt;/strong&gt; (&lt;a href="https://app.launchdarkly.com/signup" rel="noopener noreferrer"&gt;sign up for free&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI API key&lt;/strong&gt; (required for RAG architecture embeddings)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic API key&lt;/strong&gt; (required for Claude models) or &lt;strong&gt;OpenAI API key&lt;/strong&gt; (for GPT models)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Clone and Configure (2 minutes)
&lt;/h2&gt;

&lt;p&gt;First, let's get everything running locally. We'll explain what each piece does as we build.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get the code&lt;/span&gt;
git clone https://github.com/launchdarkly-labs/devrel-agents-tutorial
&lt;span class="nb"&gt;cd &lt;/span&gt;agents-demo

&lt;span class="c"&gt;# Install dependencies (LangGraph, LaunchDarkly SDK, etc.)&lt;/span&gt;
uv &lt;span class="nb"&gt;sync&lt;/span&gt;

&lt;span class="c"&gt;# Configure your environment&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First, you need to get your LaunchDarkly SDK key by creating a project:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Sign up for LaunchDarkly&lt;/strong&gt; at &lt;a href="https://app.launchdarkly.com" rel="noopener noreferrer"&gt;app.launchdarkly.com&lt;/a&gt; (free account).&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;If you're a brand new user, after signing up for an account, you'll need to verify your email address. You can skip through the new user onboarding flow after that.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Find projects on the side bar&lt;/strong&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dar1i1f71n879ezyqsv.png" alt=" " width="568" height="560"&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create a new project&lt;/strong&gt; called "multi-agent-chatbot"
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7k4axtse1wnhv47zpp3.png" alt=" " width="800" height="477"&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Get your SDK key&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;⚙️ (bottom of sidebar) → &lt;strong&gt;Projects&lt;/strong&gt; → &lt;strong&gt;multi-agent-chatbot&lt;/strong&gt; → ⚙️ (to the right) &lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Environments&lt;/strong&gt; → &lt;strong&gt;Production&lt;/strong&gt; → &lt;strong&gt;...&lt;/strong&gt; → &lt;strong&gt;SDK key&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;this is your &lt;code&gt;LD_SDK_KEY&lt;/code&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filh2stzcgenkz3ae43qm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filh2stzcgenkz3ae43qm.png" alt=" " width="800" height="542"&gt;&lt;/a&gt;&lt;br&gt;
Now edit &lt;code&gt;.env&lt;/code&gt; with your keys:&lt;br&gt;
&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LD_SDK_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-launchdarkly-sdk-key  &lt;span class="c"&gt;# From step above&lt;/span&gt;
&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-openai-key        &lt;span class="c"&gt;# Required for RAG embeddings&lt;/span&gt;
&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-anthropic-key  &lt;span class="c"&gt;# Required for Claude models&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sets up a &lt;strong&gt;LangGraph&lt;/strong&gt; application that uses LaunchDarkly to control AI behavior. Think of it like swapping actors, directors, even props mid-performance without stopping the show.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Do not check the &lt;code&gt;.env&lt;/code&gt; into your source control. Keep those secrets safe!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Step 2: Add Your Business Knowledge (2 minutes)
&lt;/h2&gt;

&lt;p&gt;The system includes a sample reinforcement learning textbook. Replace it with your own documents for your specific domain.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Option A: Use the sample (AI/ML knowledge)&lt;/span&gt;
&lt;span class="c"&gt;# Already included: kb/SuttonBarto-IPRL-Book2ndEd.pdf&lt;/span&gt;

&lt;span class="c"&gt;# Option B: Add your documents&lt;/span&gt;
&lt;span class="nb"&gt;rm &lt;/span&gt;kb/&lt;span class="k"&gt;*&lt;/span&gt;.pdf  &lt;span class="c"&gt;# Clear sample&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; /path/to/your-docs/&lt;span class="k"&gt;*&lt;/span&gt;.pdf kb/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Document types that work well:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Legal&lt;/strong&gt;: Contracts, case law, compliance guidelines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Healthcare&lt;/strong&gt;: Protocols, research papers, care guidelines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SaaS&lt;/strong&gt;: API docs, user guides, troubleshooting manuals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;E-commerce&lt;/strong&gt;: Product catalogs, policies, FAQs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 3: Initialize Your Knowledge Base (2 minutes)
&lt;/h2&gt;

&lt;p&gt;Turn your documents into searchable &lt;strong&gt;RAG&lt;/strong&gt; knowledge:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create vector embeddings for semantic search&lt;/span&gt;
uv run python initialize_embeddings.py &lt;span class="nt"&gt;--force&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This builds your &lt;strong&gt;RAG&lt;/strong&gt; (Retrieval-Augmented Generation) foundation using &lt;strong&gt;OpenAI's&lt;/strong&gt; text-embedding model and FAISS vector database. &lt;strong&gt;RAG&lt;/strong&gt; converts documents into vector embeddings that capture semantic meaning rather than just keywords, making search actually understand context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Define Your Tools (3 minutes)
&lt;/h2&gt;

&lt;p&gt;Define the search tools your agents will use.&lt;/p&gt;

&lt;p&gt;In the LaunchDarkly app sidebar, click &lt;strong&gt;Library&lt;/strong&gt; in the AI section. On the following screen, click the &lt;strong&gt;Tools&lt;/strong&gt; tab, then &lt;strong&gt;Create tool&lt;/strong&gt;.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa4c4d9nqiwvnfu30ta4y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa4c4d9nqiwvnfu30ta4y.png" alt=" " width="800" height="465"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Create the RAG vector search tool:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: we will be creating a simple search_v1 during part 3 when we learn about experimentation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Create a tool using the following configuration:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;search_v2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Description:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Semantic search using vector embeddings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Schema:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Search query for semantic matching"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"top_k"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Number of results to return"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"number"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"additionalProperties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"query"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you're done, click &lt;strong&gt;Save&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create the reranking tool:
&lt;/h3&gt;

&lt;p&gt;Back on the Tools section, click &lt;strong&gt;Add tool&lt;/strong&gt; to create a new tool. Add the following properties: &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;reranking
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Description:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Reorders results by relevance using BM25 algorithm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Schema:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Original query for scoring"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"results"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Results to rerank"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"array"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"additionalProperties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"results"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you're done, click &lt;strong&gt;Save&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;reranking&lt;/code&gt; tool takes search results from &lt;code&gt;search_v2&lt;/code&gt; and reorders them using the BM25 algorithm to improve relevance. This hybrid approach combines semantic search (vector embeddings) with lexical matching (keyword-based scoring), making it especially useful for technical terms, product names, and error codes where exact term matching matters more than conceptual similarity.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;🔍 How Your RAG Architecture Works&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your &lt;strong&gt;RAG&lt;/strong&gt; system works in two stages: &lt;code&gt;search_v2&lt;/code&gt; performs semantic similarity search using FAISS by converting queries into the same vector space as your documents (via &lt;strong&gt;OpenAI&lt;/strong&gt; embeddings), while &lt;code&gt;reranking&lt;/code&gt; reorders results for maximum relevance. This &lt;strong&gt;RAG&lt;/strong&gt; approach significantly outperforms keyword search by understanding context, so asking "My app is broken" can find troubleshooting guides that mention "application errors" or "system failures."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Step 5: Create Your AI Agents in LaunchDarkly (5 minutes)
&lt;/h2&gt;

&lt;p&gt;Create LaunchDarkly AI Configs to control your &lt;strong&gt;LangGraph&lt;/strong&gt; multi-agent system dynamically. &lt;strong&gt;LangGraph&lt;/strong&gt; is LangChain's framework for building stateful, multi-&lt;strong&gt;agent&lt;/strong&gt; applications that maintain conversation state across &lt;strong&gt;agent&lt;/strong&gt; interactions. Your &lt;strong&gt;LangGraph&lt;/strong&gt; architecture enables sophisticated workflows where &lt;strong&gt;agents&lt;/strong&gt; collaborate and pass context between each other.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create the Supervisor Agent
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;In the LaunchDarkly dashboard sidebar, navigate to &lt;strong&gt;AI Configs&lt;/strong&gt; and click &lt;strong&gt;Create New&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Select &lt;code&gt;🤖 Agent-based&lt;/code&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7sln8wj0udnv1nv4eh62.png" alt=" " width="800" height="789"&gt;
&lt;/li&gt;
&lt;li&gt;Name it &lt;code&gt;supervisor-agent&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Add this configuration:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;variation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;supervisor-basic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Model configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Anthropic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude-3-7-sonnet-latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Goal or task:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are an intelligent routing supervisor for a multi-agent system. Your primary job is to assess whether user input likely contains PII (personally identifiable information) to determine the most efficient processing route.

  **PII Assessment:**
  Analyze the user input and provide:
  - likely_contains_pii: boolean assessment
  - confidence: confidence score (0.0 to 1.0)
  - reasoning: clear explanation of your decision
  - recommended_route: either 'security_agent' or 'support_agent'

  **Route to SECURITY_AGENT** if the text likely contains:
  - Email addresses, phone numbers, addresses
  - Names (first/last names, usernames)
  - Financial information (credit cards, SSNs, account numbers)
  - Sensitive personal data

  **Route to SUPPORT_AGENT** if the text appears to be:
  - General questions without personal details
  - Technical queries
  - Search requests
  - Educational content requests

  Analyze this user input and recommend the optimal route:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Click &lt;strong&gt;Review and save&lt;/strong&gt;. Now enable your AI Config by switching to the &lt;strong&gt;Targeting&lt;/strong&gt; tab and editing the default rule to serve the variation you just created.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6o5s2lwdzgyzhpzx8jms.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6o5s2lwdzgyzhpzx8jms.png" alt=" " width="800" height="515"&gt;&lt;/a&gt;&lt;br&gt;
Click &lt;strong&gt;Edit&lt;/strong&gt; on the Default rule, change it to serve your &lt;code&gt;supervisor-basic&lt;/code&gt; variation, and save with a note like "Enabling new agent config".&lt;/p&gt;

&lt;p&gt;The supervisor &lt;strong&gt;agent&lt;/strong&gt; demonstrates &lt;strong&gt;LangGraph&lt;/strong&gt; orchestration by routing requests based on content analysis rather than rigid rules. &lt;strong&gt;LangGraph&lt;/strong&gt; enables this &lt;strong&gt;agent&lt;/strong&gt; to maintain conversation context and make intelligent routing decisions that adapt to user needs and LaunchDarkly AI Config parameters.&lt;/p&gt;
&lt;h3&gt;
  
  
  Create the Security Agent
&lt;/h3&gt;

&lt;p&gt;Similarly, create another AI Config called &lt;code&gt;security-agent&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;variation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pii-detector
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Model configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Anthropic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude-3-7-sonnet-latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Goal or task:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a privacy agent that REMOVES PII and formats the input for another process. Analyze the input text and identify any personally identifiable information including: Email addresses, Phone numbers, Social Security Numbers, Names (first, last, full names), Physical addresses, Credit card numbers, Driver's license numbers, Any other sensitive personal data. Respond with: detected: true if any PII was found, false otherwise,types: array of PII types found (e.g., ['email', 'name', 'phone']), redacted: the input text with PII replaced by [REDACTED], keeping the text readable and natural. Examples: Input: 'My email is john@company.com and I need help', Output: detected=true, types=['email'], redacted='My email is [REDACTED] and I need help'. Input: 'I need help with my account',Output: detected=false, types=[], redacted='I need help with my account'. Input: 'My name is Sarah Johnson and my phone is 555-1234', Output: detected=true, types=['name', 'phone'], redacted='My name is [REDACTED] and my phone is [REDACTED]'. Be thorough in your analysis and err on the side of caution when identifying potential PII.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This agent detects PII and provides detailed redaction information, showing exactly what sensitive data was found and how it would be handled for compliance and transparency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remember to switch to the Targeting tab and enable this agent the same way we did for the supervisor - edit the default rule to serve your &lt;code&gt;pii-detector&lt;/code&gt; variation and save it.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Create the Support Agent
&lt;/h3&gt;

&lt;p&gt;Finally, create &lt;code&gt;support-agent&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;variation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rag-search-enhanced
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Model configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Anthropic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude-3-7-sonnet-latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Click &lt;strong&gt;Attach tools&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;select: &lt;strong&gt;✅ reranking&lt;/strong&gt; &lt;strong&gt;✅ search_v2&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Add parameters&lt;/strong&gt;&lt;br&gt;
→ &lt;strong&gt;Click Custom parameters&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"max_tool_calls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Goal or task:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a helpful assistant that can search documentation and research papers. When search results are available, prioritize information from those results over your general knowledge to provide the most accurate and up-to-date responses. Use available tools to search the knowledge base and external research databases to answer questions accurately and comprehensively.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This &lt;strong&gt;agent&lt;/strong&gt; combines &lt;strong&gt;LangGraph&lt;/strong&gt; workflow management with your &lt;strong&gt;RAG&lt;/strong&gt; tools. &lt;strong&gt;LangGraph&lt;/strong&gt; enables the &lt;strong&gt;agent&lt;/strong&gt; to chain multiple tool calls together: first using &lt;strong&gt;RAG&lt;/strong&gt; for document retrieval, then semantic reranking, all while maintaining conversation state and handling error recovery gracefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remember to switch to the Targeting tab and enable this agent the same way - edit the default rule to serve your &lt;code&gt;rag-search-enhanced&lt;/code&gt; variation and save it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you are done, you should have three enabled AI Config Agents.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4xp3fml2krbzuo90kwv7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4xp3fml2krbzuo90kwv7.png" alt=" " width="800" height="239"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 6: Launch Your System (2 minutes)
&lt;/h2&gt;

&lt;p&gt;Start the system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 1: Start the backend&lt;/span&gt;
uv run uvicorn api.main:app &lt;span class="nt"&gt;--reload&lt;/span&gt; &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 2: Launch the UI  &lt;/span&gt;
uv run streamlit run ui/chat_interface.py &lt;span class="nt"&gt;--server&lt;/span&gt;.port 8501
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;a href="http://localhost:8501" rel="noopener noreferrer"&gt;http://localhost:8501&lt;/a&gt; in your browser. You should see a clean chat interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: Test Your Multi-Agent System (2 minutes)
&lt;/h2&gt;

&lt;p&gt;Test with these queries:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Basic Knowledge Test:&lt;/strong&gt;&lt;br&gt;
"What is reinforcement learning?" (if using sample docs)&lt;br&gt;
Or ask about your specific domain: "What's our refund policy?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PII Detection Test:&lt;/strong&gt;&lt;br&gt;
"My email is &lt;a href="mailto:john.doe@example.com"&gt;john.doe@example.com&lt;/a&gt; and I need help"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workflow Details&lt;/strong&gt; show:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which agents are activated&lt;/li&gt;
&lt;li&gt;What models and tools are being used&lt;/li&gt;
&lt;li&gt;Text after redaction
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbtdqjyjrs92ssf2cc507.png" alt=" " width="800" height="470"&gt;
Watch LangGraph in action: the supervisor agent first routes to the security agent, which detects PII. It then passes control to the support agent, which uses your RAG system for document search. LangGraph maintains state across this multi-agent workflow so that context flows seamlessly between agents.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Step 8: Make Changes Without Deploying Code
&lt;/h2&gt;

&lt;p&gt;Try these experiments in LaunchDarkly:&lt;/p&gt;
&lt;h3&gt;
  
  
  Switch Models Instantly
&lt;/h3&gt;

&lt;p&gt;Edit your &lt;code&gt;support-agent&lt;/code&gt; config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chatgpt-4o-latest"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;was&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;claude&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save and refresh your chat. No code deployment or restart required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adjust Tool Usage
&lt;/h3&gt;

&lt;p&gt;Want to limit tool calls? Reduce the limits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"customParameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max_tool_calls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;was&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Change Agent Behavior
&lt;/h3&gt;

&lt;p&gt;Want more thorough searches? Update instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"instructions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You are a research specialist. Always search multiple times from different angles before answering. Prioritize accuracy over speed."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Changes take effect immediately without downtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding What You Built
&lt;/h2&gt;

&lt;p&gt;Your &lt;strong&gt;LangGraph&lt;/strong&gt; multi-&lt;strong&gt;agent&lt;/strong&gt; system with &lt;strong&gt;RAG&lt;/strong&gt; includes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. LangGraph Orchestration&lt;/strong&gt;&lt;br&gt;
The supervisor &lt;strong&gt;agent&lt;/strong&gt; uses &lt;strong&gt;LangGraph&lt;/strong&gt; state management to route requests intelligently based on content analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Privacy Protection&lt;/strong&gt;&lt;br&gt;
The supervisor &lt;strong&gt;agent&lt;/strong&gt; uses &lt;strong&gt;LangGraph&lt;/strong&gt; state management to route requests intelligently. This separation allows you to assign a trusted model to the security and supervisor agents and consider on a less-trusted model for the more expensive support agent at a reduced risk of PII exposure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. RAG Knowledge System&lt;/strong&gt;&lt;br&gt;
The support &lt;strong&gt;agent&lt;/strong&gt; combines &lt;strong&gt;LangGraph&lt;/strong&gt; tool chaining with your &lt;strong&gt;RAG&lt;/strong&gt; system for semantic document search and reranking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Runtime Control&lt;/strong&gt;&lt;br&gt;
LaunchDarkly controls both &lt;strong&gt;LangGraph&lt;/strong&gt; behavior and &lt;strong&gt;RAG&lt;/strong&gt; parameters without code changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;Your multi-agent system is running with dynamic control and ready for optimization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In Part 2&lt;/strong&gt;, we'll add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Geographic-based privacy rules (strict for EU, standard for Other)&lt;/li&gt;
&lt;li&gt;MCP tools for external data&lt;/li&gt;
&lt;li&gt;Business tier configurations (free, paid)&lt;/li&gt;
&lt;li&gt;Cost optimization strategies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;In Part 3&lt;/strong&gt;, we'll run A/B experiments to prove which configurations actually work best with real data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try This Now
&lt;/h2&gt;

&lt;p&gt;Experiment with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Different Instructions&lt;/strong&gt;: Make agents more helpful, more cautious, or more thorough&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool Combinations&lt;/strong&gt;: Add/remove tools to see impact on quality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Comparisons&lt;/strong&gt;: Try different models for different agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Limits&lt;/strong&gt;: Find the sweet spot between quality and cost&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every change is instant, measurable, and reversible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Multi-agent systems work best when each agent has a specific role&lt;/li&gt;
&lt;li&gt;Dynamic configuration handles changing requirements better than hardcoding&lt;/li&gt;
&lt;li&gt;LaunchDarkly AI Configs control and change AI behavior without requiring deployments&lt;/li&gt;
&lt;li&gt;Start simple and add complexity as you learn what works&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related Resources
&lt;/h2&gt;

&lt;p&gt;Explore the &lt;strong&gt;&lt;a href="https://docs.launchdarkly.com/home/getting-started/mcp" rel="noopener noreferrer"&gt;LaunchDarkly MCP Server&lt;/a&gt;&lt;/strong&gt; - enable AI agents to access feature flag configurations, user segments, and experimentation data directly through the Model Context Protocol.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Questions? Issues? Reach out at &lt;code&gt;aiproduct@launchdarkly.com&lt;/code&gt; or open an issue in the &lt;a href="https://github.com/launchdarkly-labs/devrel-agents-tutorial/issues" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was originally published on the &lt;a href="https://launchdarkly.com/docs/tutorials/agents-langgraph" rel="noopener noreferrer"&gt;LaunchDarkly Documentation site&lt;/a&gt;. Follow &lt;a href="https://twitter.com/launchdarkly" rel="noopener noreferrer"&gt;@LaunchDarkly&lt;/a&gt; for more AI and feature flag content!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>langgraph</category>
      <category>python</category>
      <category>launchdarkly</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
