<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joshua</title>
    <description>The latest articles on DEV Community by Joshua (@bigdata5911).</description>
    <link>https://dev.to/bigdata5911</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2940671%2Fe3a8c556-2f59-4c1f-963e-0eb5a59d7488.png</url>
      <title>DEV Community: Joshua</title>
      <link>https://dev.to/bigdata5911</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bigdata5911"/>
    <language>en</language>
    <item>
      <title>Building Smarter AI Agents with Schema-Guided Reasoning</title>
      <dc:creator>Joshua</dc:creator>
      <pubDate>Fri, 07 Nov 2025 07:42:37 +0000</pubDate>
      <link>https://dev.to/bigdata5911/building-smarter-ai-agents-with-schema-guided-reasoning-m3n</link>
      <guid>https://dev.to/bigdata5911/building-smarter-ai-agents-with-schema-guided-reasoning-m3n</guid>
      <description>&lt;p&gt;Please give me star ⭐ if this is helpful for your work.&lt;br&gt;
&lt;strong&gt;Github Repo&lt;/strong&gt;: &lt;a href="https://github.com/bigdata5911/schema-guided-reasoning" rel="noopener noreferrer"&gt;bigdata5911/schema-guided-reasoning&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I came across (and tried out) a really interesting project called &lt;strong&gt;Schema-Guided Reasoning (SGR)&lt;/strong&gt; — a small but powerful demo showing how to make AI agents that can &lt;em&gt;reason&lt;/em&gt;, &lt;em&gt;plan&lt;/em&gt;, and &lt;em&gt;take action&lt;/em&gt; using structured logic.&lt;/p&gt;

&lt;p&gt;Instead of just chatting, this agent can actually &lt;em&gt;do things&lt;/em&gt; — issue invoices, send emails, or apply business rules — all based on clear, validated schemas. It’s a great example of how you can combine reasoning with structured outputs to make AI more reliable and explainable.&lt;/p&gt;


&lt;h2&gt;
  
  
  So what exactly is Schema-Guided Reasoning?
&lt;/h2&gt;

&lt;p&gt;The idea behind &lt;strong&gt;SGR&lt;/strong&gt; is simple but clever: instead of letting an AI respond freely in text, you &lt;em&gt;guide&lt;/em&gt; its reasoning through a &lt;strong&gt;schema&lt;/strong&gt; — basically, a blueprint that defines what kind of outputs it can produce.&lt;/p&gt;

&lt;p&gt;By doing that, the AI can plan its steps, pick tools to call, and execute them safely without breaking anything.&lt;/p&gt;

&lt;p&gt;In this demo, the schema-driven agent works inside a &lt;strong&gt;mini in-memory CRM system&lt;/strong&gt;. It can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Look up customers and products&lt;/li&gt;
&lt;li&gt;Issue or void invoices&lt;/li&gt;
&lt;li&gt;Send emails&lt;/li&gt;
&lt;li&gt;Apply business rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s like a lightweight business assistant that &lt;em&gt;understands&lt;/em&gt; structure and can think through actions before executing them.&lt;/p&gt;


&lt;h2&gt;
  
  
  Two ways to run it
&lt;/h2&gt;

&lt;p&gt;The repo gives you two different setups — one that uses OpenAI’s API and another that runs completely locally with &lt;strong&gt;Qwen3-4B&lt;/strong&gt; via &lt;code&gt;llama.cpp&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;1. OpenAI API (schema-guided-reasoning.py)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This version uses the OpenAI model &lt;code&gt;gpt-4o&lt;/code&gt; and runs everything through the cloud.&lt;/p&gt;

&lt;p&gt;Setup is super simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pydantic annotated-types rich openai requests
&lt;span class="nv"&gt;$env&lt;/span&gt;:OPENAI_API_KEY &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"YOUR_API_KEY"&lt;/span&gt;
python schema-guided-reasoning.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once it runs, you’ll see the agent print out each task, plan the next step, call tools, and validate everything using &lt;strong&gt;Pydantic&lt;/strong&gt; schemas. The output looks clean in the console thanks to the &lt;code&gt;rich&lt;/code&gt; package.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. Local llama.cpp version (sgr_assistant.py)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you prefer to stay offline or just like running models locally (like me), there’s a &lt;strong&gt;Qwen3-4B&lt;/strong&gt; version that connects to a &lt;code&gt;llama.cpp&lt;/code&gt; HTTP server.&lt;/p&gt;

&lt;p&gt;You can spin it up with something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./llama-server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; /path/to/Qwen3-4B-Instruct-2507-Q8_0.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-ngl&lt;/span&gt; 999 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 12345 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--threads&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ctx-size&lt;/span&gt; 20000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python sgr_assistant.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This one includes a little bit of cleanup logic to strip out &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; tags and formatting issues that local models sometimes produce — nice touch.&lt;/p&gt;




&lt;h2&gt;
  
  
  What’s happening under the hood
&lt;/h2&gt;

&lt;p&gt;Both versions share a similar core:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An &lt;strong&gt;in-memory database&lt;/strong&gt; with mock data (customers, products, invoices, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema definitions&lt;/strong&gt; for tools like &lt;code&gt;SendEmail&lt;/code&gt;, &lt;code&gt;IssueInvoice&lt;/code&gt;, or &lt;code&gt;GetCustomerData&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;dispatcher&lt;/strong&gt; that simulates what happens when those tools are called&lt;/li&gt;
&lt;li&gt;And a &lt;strong&gt;task list&lt;/strong&gt; that the model executes step-by-step&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s all in pure Python — easy to read, easy to extend. You could add your own tool or new logic in just a few lines.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this approach is cool
&lt;/h2&gt;

&lt;p&gt;A lot of AI “agents” today are just prompt wrappers around chat models. They can do some planning, but often they’re unpredictable — one small formatting issue, and everything breaks.&lt;/p&gt;

&lt;p&gt;SGR fixes that by forcing the model to stay inside a &lt;strong&gt;strict JSON schema&lt;/strong&gt;. Every output has to validate before it runs. That means fewer hallucinations, clearer reasoning steps, and easier debugging.&lt;/p&gt;

&lt;p&gt;In other words, you’re not just getting an answer — you’re getting a process you can &lt;em&gt;trust and inspect&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Customize it your way
&lt;/h2&gt;

&lt;p&gt;The best part is how easy it is to tweak.&lt;/p&gt;

&lt;p&gt;You can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Edit the &lt;code&gt;TASKS&lt;/code&gt; list to make it do new things&lt;/li&gt;
&lt;li&gt;Add more tools with &lt;code&gt;pydantic&lt;/code&gt; models&lt;/li&gt;
&lt;li&gt;Change the &lt;code&gt;system_prompt&lt;/code&gt; to give it different rules or products&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything’s local and lightweight, so you can experiment freely without breaking anything.&lt;/p&gt;




&lt;h2&gt;
  
  
  A few quick tips
&lt;/h2&gt;

&lt;p&gt;If you run into issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Make sure all dependencies are installed:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  pip &lt;span class="nb"&gt;install &lt;/span&gt;pydantic annotated-types rich openai requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Double-check your OpenAI API key (if using the API version).&lt;/li&gt;
&lt;li&gt;For local models, confirm the &lt;code&gt;llama.cpp&lt;/code&gt; server is running and reachable.&lt;/li&gt;
&lt;li&gt;If the model outputs invalid JSON, try lowering temperature or adjusting cleanup logic.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why it matters
&lt;/h2&gt;

&lt;p&gt;Projects like this might seem small, but they hint at something big — &lt;strong&gt;how structured reasoning could make AI agents more dependable&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of “guessing” what the next step is, the model is guided by schemas, validated by code, and executed deterministically. It’s the difference between a chat assistant and a reasoning engine.&lt;/p&gt;




&lt;p&gt;If you want to check it out yourself, the repo’s here:&lt;br&gt;
👉 &lt;a href="https://github.com/bigdata5911/schema-guided-reasoning" rel="noopener noreferrer"&gt;bigdata5911/schema-guided-reasoning&lt;/a&gt;&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>agents</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>⚡ Rethinking Prompt Engineering: How Agent Lightning’s APO Teaches Agents to Write Better Prompts</title>
      <dc:creator>Joshua</dc:creator>
      <pubDate>Thu, 06 Nov 2025 11:46:36 +0000</pubDate>
      <link>https://dev.to/bigdata5911/rethinking-prompt-engineering-how-agent-lightnings-apo-teaches-agents-to-write-better-prompts-hon</link>
      <guid>https://dev.to/bigdata5911/rethinking-prompt-engineering-how-agent-lightnings-apo-teaches-agents-to-write-better-prompts-hon</guid>
      <description>&lt;p&gt;⭐If this help your work, Please Give Star⭐&lt;br&gt;
&lt;strong&gt;GitHub Repo:&lt;/strong&gt; &lt;a href="https://github.com/bigdata5911/agent-lightning-automatic-prompt-optimization" rel="noopener noreferrer"&gt;bigdata5911/agent-lightning-automatic-prompt-optimization&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;*&lt;strong&gt;*For years, we’ve obsessed over improving model weights and architectures.&lt;br&gt;
But what if the real breakthrough in AI performance comes not from **training the model&lt;/strong&gt;, but from &lt;strong&gt;training the prompt&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;That’s the premise behind &lt;strong&gt;Agent Lightning&lt;/strong&gt;, a new framework from Microsoft that allows AI agents to improve themselves.&lt;br&gt;
It introduces two key algorithms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;VERL&lt;/strong&gt; — for reinforcement learning at the policy level&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;APO (Automatic Prompt Optimization)&lt;/strong&gt; — for learning &lt;em&gt;textual gradients&lt;/em&gt; that refine prompts based on performance feedback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, I’ll show how APO works, why it’s a game-changer, and how I used it to enhance a &lt;strong&gt;Text-to-SQL agent&lt;/strong&gt; built with LangGraph — improving accuracy from 84% to 88% in just two rounds of optimization.&lt;/p&gt;


&lt;h2&gt;
  
  
  🌩️ The Idea: Prompts That Learn
&lt;/h2&gt;

&lt;p&gt;Prompt engineering has always been a manual, intuition-driven process. You tweak a few words, rerun your agent, and hope it performs better.&lt;br&gt;
APO replaces that guesswork with &lt;strong&gt;data-grounded self-improvement&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It doesn’t retrain the underlying model — instead, it trains the &lt;em&gt;text&lt;/em&gt; of the prompt itself.&lt;br&gt;
Think of it as “gradient descent in natural language.”&lt;/p&gt;

&lt;p&gt;At the heart of APO are two cooperating LLMs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;Critic&lt;/strong&gt; that examines what went wrong in failed tasks&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;Editor&lt;/strong&gt; that rewrites the prompt to address those weaknesses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each iteration produces multiple improved prompts, scores them on validation data, and preserves the best through &lt;strong&gt;beam search&lt;/strong&gt; — a form of controlled exploration.&lt;/p&gt;

&lt;p&gt;The result? A system that &lt;em&gt;writes its own better prompt&lt;/em&gt; with every round.&lt;/p&gt;


&lt;h2&gt;
  
  
  🧮 The Science of Textual Gradients
&lt;/h2&gt;

&lt;p&gt;APO builds on ideas from two research papers — &lt;strong&gt;ProTeGi (EMNLP 2023)&lt;/strong&gt; and &lt;strong&gt;TextGrad (Nature 2024)&lt;/strong&gt; — which formalize how text itself can encode gradient-like feedback.&lt;/p&gt;

&lt;p&gt;Here’s what happens inside one APO cycle:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Run current prompt&lt;/strong&gt; on a small batch of tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score results&lt;/strong&gt; using an objective metric (for example, SQL correctness)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critic model&lt;/strong&gt; reviews (input, output, reward) pairs and summarizes failures in natural language&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Editor model&lt;/strong&gt; applies that feedback to produce refined prompt candidates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Beam search&lt;/strong&gt; evaluates several rewritten prompts and keeps the top performers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example critique:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The prompt doesn’t specify how to handle type mismatches in JOIN columns.&lt;br&gt;
When Singer_ID is INTEGER in one table but TEXT in another, use CAST(col_text AS INTEGER) and filter invalid values.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This text acts as a &lt;em&gt;direction of improvement&lt;/em&gt; — like a gradient — but expressed entirely in language.&lt;/p&gt;


&lt;h2&gt;
  
  
  🧩 A Practical Experiment: Teaching a SQL Agent to Self-Optimize
&lt;/h2&gt;

&lt;p&gt;To test APO, I applied it to a &lt;strong&gt;Text-to-SQL agent&lt;/strong&gt; that converts natural language questions into SQL queries.&lt;br&gt;
I used the &lt;strong&gt;Spider dataset&lt;/strong&gt; — a well-known text-to-SQL benchmark — and ran 50 examples for training, 50 for validation.&lt;/p&gt;


&lt;h3&gt;
  
  
  🏗️ The Setup
&lt;/h3&gt;

&lt;p&gt;The agent was built in &lt;strong&gt;LangGraph&lt;/strong&gt;, following a self-correcting workflow.&lt;br&gt;
Agent Lightning handled the optimization loop; I only needed to define the &lt;code&gt;@rollout&lt;/code&gt; function that executes the task and returns a reward.&lt;/p&gt;

&lt;p&gt;Here’s a minimal setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentlightning&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Trainer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentlightning.algorithm.apo&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;APO&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncOpenAI&lt;/span&gt;

&lt;span class="n"&gt;openai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncOpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;algo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;APO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;openai_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;val_batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gradient_batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;beam_width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;branch_factor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;beam_rounds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;algorithm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;algo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n_runners&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;initial_resources&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt_template&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;prompt_template_baseline&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;train_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_spider_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data/dev.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;val_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_spider_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data/dev.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sql_agent_rollout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;train_dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;val_dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;val_data&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  🧠 The Rollout Function
&lt;/h3&gt;

&lt;p&gt;This is where APO gets its feedback signal — the reward for how well a generated SQL query matches the ground truth.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentlightning&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;rollout&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentlightning.types&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PromptTemplate&lt;/span&gt;

&lt;span class="nd"&gt;@rollout&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sql_agent_rollout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt_template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PromptTemplate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SQLAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;databases/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;db_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;db_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.sqlite&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;write_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt_template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;dialect&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SQLite&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;table_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;get_schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;db_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;evaluate_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each rollout returns a numeric reward (1 for correct, 0 for incorrect), giving APO objective feedback for learning.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ From Draft to Expert Prompt
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Baseline (v0)
&lt;/h3&gt;

&lt;p&gt;The initial prompt was something you’d write on your first try — short and vague:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Be careful not to query for columns that do not exist.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Accuracy: &lt;strong&gt;84% (42/50)&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  After Optimization (v5)
&lt;/h3&gt;

&lt;p&gt;After two rounds of APO, the prompt evolved into a structured specification over 350 words long, defining explicit rules for schema validation, safe joins, deterministic ordering, and fallback responses.&lt;/p&gt;

&lt;p&gt;Accuracy: &lt;strong&gt;88% (44/50)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Round&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;v0&lt;/td&gt;
&lt;td&gt;84%&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;v3&lt;/td&gt;
&lt;td&gt;86%&lt;/td&gt;
&lt;td&gt;Added type casting logic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;v5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;88%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Added rule hierarchy and validation checks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  Example Improvements
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Use the tables listed below.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Use only the tables and columns in {table_info}.&lt;br&gt;
If a required column is missing, respond with an empty result or 'UNABLE TO ANSWER' rather than guessing.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The optimized prompt became longer, yes — but also &lt;em&gt;far more robust&lt;/em&gt;, preventing many subtle SQL errors.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔍 Why APO Feels Different
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;It Learns from Real Mistakes&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Critiques come directly from actual task failures, not from hand-written advice.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;It Explores Multiple Futures&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Beam search means the optimizer doesn’t get trapped in one idea of “better.” It keeps multiple hypotheses alive.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;It’s Transparent&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Every edit is interpretable. You can read the critic’s feedback and understand &lt;em&gt;why&lt;/em&gt; the prompt changed.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;It’s Objective&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Rewards are computed from measurable outcomes — in this case, SQL correctness — not subjective LLM scoring.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧭 What We Learned
&lt;/h2&gt;

&lt;p&gt;After two APO rounds, the system showed clear, measurable gains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📈 &lt;strong&gt;Accuracy:&lt;/strong&gt; 84% → 88%&lt;/li&gt;
&lt;li&gt;📜 &lt;strong&gt;Prompt length:&lt;/strong&gt; 90 → 360 words&lt;/li&gt;
&lt;li&gt;⚖️ &lt;strong&gt;Rules:&lt;/strong&gt; 3 vague hints → 19 explicit constraints&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Validation:&lt;/strong&gt; added schema checks and safe SQL handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In essence, APO &lt;em&gt;taught the agent how to write its own better instructions.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🧰 Try It Yourself
&lt;/h2&gt;

&lt;p&gt;You can reproduce this entire setup:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub Repo(Please Give Star⭐):&lt;/strong&gt; &lt;a href="https://github.com/bigdata5911/agent-lightning-automatic-prompt-optimization" rel="noopener noreferrer"&gt;bigdata5911/agent-lightning-automatic-prompt-optimization&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Agent Lightning Docs:&lt;/strong&gt; &lt;a href="https://microsoft.github.io/agent-lightning/stable/" rel="noopener noreferrer"&gt;microsoft.github.io/agent-lightning&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Spider Dataset:&lt;/strong&gt; &lt;a href="https://yale-lily.github.io/spider" rel="noopener noreferrer"&gt;yale-lily.github.io/spider&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Requirements
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.8+&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;uv&lt;/code&gt; package manager&lt;/li&gt;
&lt;li&gt;OpenAI API key (GPT-5 access)&lt;/li&gt;
&lt;li&gt;Sufficient disk space for Spider dataset&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv &lt;span class="nb"&gt;sync&lt;/span&gt;
./setup_data.sh
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-api-key"&lt;/span&gt;
uv run python train.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🌟 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Agent Lightning’s &lt;strong&gt;Automatic Prompt Optimization (APO)&lt;/strong&gt; is more than an automation trick — it’s a paradigm shift.&lt;/p&gt;

&lt;p&gt;Instead of endlessly hand-crafting prompts, you can let your agent &lt;strong&gt;learn from its own mistakes&lt;/strong&gt;, guided by measurable outcomes and transparent reasoning.&lt;/p&gt;

&lt;p&gt;In my experiments, APO transformed a generic baseline into a specialized, rule-driven prompt that performed better, explained itself better, and could continue improving indefinitely.&lt;/p&gt;

&lt;p&gt;Prompt engineering just got an upgrade — now, the prompts engineer &lt;em&gt;themselves&lt;/em&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Follow me for more explorations into autonomous agents, self-optimizing prompts, and data-driven LLM workflows.&lt;/em&gt; ⚡&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>promptengineering</category>
      <category>apo</category>
    </item>
    <item>
      <title>How to Build Your Own AI-Powered Voice Agent with LiveKit and Twillio: Step-by-Step Implementation Guide</title>
      <dc:creator>Joshua</dc:creator>
      <pubDate>Thu, 24 Apr 2025 08:56:08 +0000</pubDate>
      <link>https://dev.to/bigdata5911/how-to-build-your-own-ai-powered-voice-agent-with-livekit-and-twillio-step-by-step-implementation-2i8k</link>
      <guid>https://dev.to/bigdata5911/how-to-build-your-own-ai-powered-voice-agent-with-livekit-and-twillio-step-by-step-implementation-2i8k</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F95ahwgqu05cdceoxo7x2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F95ahwgqu05cdceoxo7x2.png" alt=" " width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Create a Twilio Account
&lt;/h2&gt;

&lt;p&gt;Start by signing up for a Twilio account if you haven’t already. Simply visit &lt;a href="https://www.twilio.com/" rel="noopener noreferrer"&gt;Twilio’s website&lt;/a&gt; and follow the registration process to set up your account.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Create a Phone Number
&lt;/h2&gt;

&lt;p&gt;Once your account is ready, navigate to the Twilio Console and create a phone number. You don’t need to configure any additional settings at this stage—just select a number and you’re good to go. This number will be used to handle incoming and outgoing calls in the later steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Get Your API Credentials from Twilio
&lt;/h2&gt;

&lt;p&gt;Next, you’ll need your Twilio API credentials to integrate with LiveKit. These include your Account SID and Auth Token. Follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to the Twilio Console.&lt;/li&gt;
&lt;li&gt;Navigate to the Account Info section.&lt;/li&gt;
&lt;li&gt;Copy your Account SID and Auth Token, and Twillio phone number —you’ll use these in the next steps.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiiilcchdmstqy4pg187n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiiilcchdmstqy4pg187n.png" alt=" " width="800" height="568"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Create a LiveKit Account and Project
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt; Create a LiveKit Account&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Sign up for a LiveKit account if you don’t have one already by visiting LiveKit’s website.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Create a Project&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After signing up, log in and create a new project within LiveKit. This project will be used to handle real-time audio and video interactions.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Get the Project URL and SIP URI Parameters&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Navigate to the Settings section of your newly created project and locate the Project URL and SIP URI parameters. These will be crucial in the later steps when configuring the integration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxug7nh3bgrydbao1zee8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxug7nh3bgrydbao1zee8.png" alt=" " width="800" height="253"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Simplify Complex Settings with a Script
&lt;/h2&gt;

&lt;p&gt;To streamline the configuration process for Twilio and LiveKit, use the pre-built script available at the following URL: Twilio &amp;amp; LiveKit Integration Script(scripts/create_inbound_trunk.py).&lt;/p&gt;

&lt;p&gt;Here’s what you need to do:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Download or clone the script from the link above.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Replace the placeholders in the script with the necessary details:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Account SID&lt;/li&gt;
&lt;li&gt;Auth Token&lt;/li&gt;
&lt;li&gt;Phone Number&lt;/li&gt;
&lt;li&gt;SIP URI (found in previous steps)&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;To ensure your environment is ready for Twilio, LiveKit, and OpenAI integration, install the necessary Python packages. Run the following command in your terminal:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Install the LiveKit SDK&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you’re using macOS, you can install the LiveKit CLI via Homebrew:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;livekit-cli &lt;span class="c"&gt;# MacOS&lt;/span&gt;
winget &lt;span class="nb"&gt;install &lt;/span&gt;LiveKit.LiveKitCLI &lt;span class="c"&gt;# Windows&lt;/span&gt;
curl &lt;span class="nt"&gt;-sSL&lt;/span&gt; https://get.livekit.io/cli | bash &lt;span class="c"&gt;# Linux&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Authenticate with LiveKit&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After installation, authenticate to your LiveKit account by running the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lk cloud auth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Run the script to automatically configure Twilio and LiveKit with the required settings, minimizing the manual setup process. It will automatically create a SIP Trunk in Twilio and make all required configurations.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This script will handle most of the heavy lifting, simplifying the integration between Twilio and LiveKit for real-time communication.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F19y49c6t660avn8yjj0p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F19y49c6t660avn8yjj0p.png" alt=" " width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Log in to Twilio and Update Voice Configuration on SIP Trunk
&lt;/h2&gt;

&lt;p&gt;After the script has automatically created the SIP Trunk on Twilio, you’ll need to manually update the Voice Configuration to ensure everything works correctly.&lt;/p&gt;

&lt;h1&gt;
  
  
  Run Voice Agent
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI Realtime Voice AI Agent
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python scripts/openai_realtime_voice_ai_agent.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Run Voice Pipeline AI Agent with Functional Calling and Saving chat message
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python scripts/save_chatctx.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Useful livekit-cli commands
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lk sip inbound list
lk sip inbound create inbound_trunk.json
lk sip inbound delete SIP_ID
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Preferences
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://docs.livekit.io/agents/overview/" rel="noopener noreferrer"&gt;https://docs.livekit.io/agents/overview/&lt;/a&gt;&lt;br&gt;
&lt;a href="https://docs.livekit.io/agents/quickstarts/voice-agent/" rel="noopener noreferrer"&gt;https://docs.livekit.io/agents/quickstarts/voice-agent/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agents-playground.livekit.io/" rel="noopener noreferrer"&gt;https://agents-playground.livekit.io/&lt;/a&gt;&lt;br&gt;
&lt;a href="https://kitt.livekit.io/" rel="noopener noreferrer"&gt;https://kitt.livekit.io/&lt;/a&gt;&lt;br&gt;
&lt;a href="https://cartesia-assistant.vercel.app/" rel="noopener noreferrer"&gt;https://cartesia-assistant.vercel.app/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/livekit/agents/tree/main/examples/voice-pipeline-agent/llamaindex-rag" rel="noopener noreferrer"&gt;https://github.com/livekit/agents/tree/main/examples/voice-pipeline-agent/llamaindex-rag&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.datavise.ai/blog/usage-of-realtime-openai-api-with-twillio-and-livekit" rel="noopener noreferrer"&gt;https://www.datavise.ai/blog/usage-of-realtime-openai-api-with-twillio-and-livekit&lt;/a&gt;&lt;br&gt;
&lt;a href="https://gist.github.com/ShayneP/51eabe243f9e7126929ea7e9db1dc683" rel="noopener noreferrer"&gt;https://gist.github.com/ShayneP/51eabe243f9e7126929ea7e9db1dc683&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Author
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://github.com/bigdata5911" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;br&gt;
&lt;a href="https://t.me/bigdata5911" rel="noopener noreferrer"&gt;Telegram&lt;/a&gt;&lt;br&gt;
&lt;a href="https://discord.gg/pSEtb9sJf6" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;br&gt;
&lt;a href="mailto:worker.opentext@gmail.com"&gt;Email&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Building a Scalable SQL AI Agent</title>
      <dc:creator>Joshua</dc:creator>
      <pubDate>Wed, 23 Apr 2025 18:55:28 +0000</pubDate>
      <link>https://dev.to/bigdata5911/building-a-scalable-sql-ai-agent-4akn</link>
      <guid>https://dev.to/bigdata5911/building-a-scalable-sql-ai-agent-4akn</guid>
      <description>&lt;p&gt;This time, I am going to share my small experience in developing SQL AI Agent.&lt;/p&gt;

&lt;p&gt;In today’s data-driven world, accessing databases and retrieving information efficiently is crucial.&lt;/p&gt;

&lt;p&gt;However, not everyone is proficient in SQL.&lt;/p&gt;

&lt;p&gt;Especially, businesses often struggle to extract insights from their data without technical help.&lt;/p&gt;

&lt;p&gt;SQL Agent makes that barrier disappear-giving non-technical users easy, secure, and fast access to complex analytics.&lt;/p&gt;

&lt;p&gt;That’s where SQL agents come in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&amp;gt; What is a SQL Agent?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SQL agents bridge the gap between natural language and structured database queries. They allow users technical or not to ask questions in plain English and receive answers derived from complex relational data.&lt;/p&gt;

&lt;p&gt;This post shares my experience building a SQL LLM Agent: a system that takes natural language queries, converts them into SQL, runs them on a large PostgreSQL database, and returns human-readable responses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&amp;gt; Project Requirements&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To meet modern enterprise demands, the system was designed with the following capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Large Database Support: Designed to handle PostgreSQL databases with 100+ relational tables and hundreds of gigabytes of data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;High Concurrency: Supports 50–100 concurrent users without slowing down.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Performance-Centric: Low latency and fast response times are key, even under heavy load.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Token Efficiency: Optimized to minimize token usage with LLMs — reducing cost and improving speed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Caching: Implements Redis and in-memory caching to store frequently asked queries and results.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Asynchronous Processing: Handles simultaneous user queries using async I/O and task queues.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Natural Language Interface: Users can interact in plain English — no SQL knowledge required.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Smart Query Handling:&lt;br&gt;
Translates natural language into optimized SQL&lt;br&gt;
Executes queries on PostgreSQL&lt;br&gt;
Summarizes the results in clear, readable natural language&lt;br&gt;
Tech Stack&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Backend Framework: FastAPI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Database: PostgreSQL (100+ tables, hundreds of GBs)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Caching: Redis&lt;br&gt;
Task Queue: Celery&lt;br&gt;
Containerization: Docker + Docker Compose&lt;br&gt;
Agent Framework: LangChain + LangGraph&lt;br&gt;
LLM: OpenAI (GPT-4 / GPT-4o)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;System Architecture Overview&lt;br&gt;
The SQL LLM Agent is an intelligent, scalable pipeline that transforms natural language queries into executable SQL and returns the results in a conversational format.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&amp;gt; Core Components&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;FastAPI Application&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hosts the /api/query endpoint&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dockerized with auto-reload (port: 8000)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SQL Agent (Main Engine)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Defined in app/agents/sql_agent.py&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Powered by LangGraph’s StateGraph to control multi-step processing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Connects language models with real-time database operations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Database Layer&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Uses PostgreSQL with SQLAlchemy (AsyncSession) for non-blocking queries&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Caching System&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Redis stores previously run queries and their results&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reduces redundant LLM and DB calls&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TTL-based memory cache handles hot data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LLM Integration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OpenAI’s GPT-4 / GPT-4o is used for:&lt;br&gt;
Translating NL to SQL&lt;br&gt;
Summarizing SQL output in plain English&lt;br&gt;
Auto-correcting faulty SQL&lt;br&gt;
-Generating follow-up questions&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&amp;gt; LangGraph Workflow&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent follows a directed graph workflow using LangGraph, broken into these modular steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;choose_tables - Identifies relevant tables from user query&lt;br&gt;
get_ddls - Converts NL query to SQL&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;generate_sql - Executes the SQL and handles errors or retries&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;suggest_followups - Offers relevant follow-up questions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deployment Architecture&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deployed using Docker Compose with three primary services:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&amp;gt;  Service Role&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;app - FastAPI backend (exposes port 8000)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;db - PostgreSQL container with persisted volume pgdata&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;redis - In-memory cache for faster data access&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&amp;gt; Key Features&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;High Performance: Handles large-scale databases under heavy load.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Smart Caching: Avoids repeated work using Redis and memory-based -caching.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Error Resilience: Automatically corrects broken or malformed SQL queries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Schema Introspection: Dynamically understands and adapts to the DB structure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Conversational Interaction: Natural language input and output, no SQL required.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modular Workflow: Built on LangGraph for flexible, stateful processing.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&amp;gt; Final Thoughts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This project was a powerful learning experience that combined LLMs, database engineering, performance tuning, and API design. It’s a strong step toward democratizing data access — making it simple, fast, and intuitive for everyone.&lt;/p&gt;

&lt;p&gt;If you’ve ever struggled with getting insights from a complex database or want to make your data more accessible to business teams — this is the direction to explore.&lt;/p&gt;

&lt;p&gt;Let me know what you think or if you’re building something similar — I’d love to connect!&lt;/p&gt;

&lt;p&gt;[(&lt;a href="https://github.com/CodeMaster1022/sql-agent)" rel="noopener noreferrer"&gt;https://github.com/CodeMaster1022/sql-agent)&lt;/a&gt;]&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
