DEV Community: Joshua

Building Smarter AI Agents with Schema-Guided Reasoning

Joshua — Fri, 07 Nov 2025 07:42:37 +0000

Please give me star ⭐ if this is helpful for your work.
Github Repo: bigdata5911/schema-guided-reasoning

I came across (and tried out) a really interesting project called Schema-Guided Reasoning (SGR) — a small but powerful demo showing how to make AI agents that can reason, plan, and take action using structured logic.

Instead of just chatting, this agent can actually do things — issue invoices, send emails, or apply business rules — all based on clear, validated schemas. It’s a great example of how you can combine reasoning with structured outputs to make AI more reliable and explainable.

So what exactly is Schema-Guided Reasoning?

The idea behind SGR is simple but clever: instead of letting an AI respond freely in text, you guide its reasoning through a schema — basically, a blueprint that defines what kind of outputs it can produce.

By doing that, the AI can plan its steps, pick tools to call, and execute them safely without breaking anything.

In this demo, the schema-driven agent works inside a mini in-memory CRM system. It can:

Look up customers and products
Issue or void invoices
Send emails
Apply business rules

It’s like a lightweight business assistant that understands structure and can think through actions before executing them.

Two ways to run it

The repo gives you two different setups — one that uses OpenAI’s API and another that runs completely locally with Qwen3-4B via llama.cpp.

1. OpenAI API (schema-guided-reasoning.py)

This version uses the OpenAI model gpt-4o and runs everything through the cloud.

Setup is super simple:

pip install pydantic annotated-types rich openai requests
$env:OPENAI_API_KEY = "YOUR_API_KEY"
python schema-guided-reasoning.py

Once it runs, you’ll see the agent print out each task, plan the next step, call tools, and validate everything using Pydantic schemas. The output looks clean in the console thanks to the rich package.

2. Local llama.cpp version (sgr_assistant.py)

If you prefer to stay offline or just like running models locally (like me), there’s a Qwen3-4B version that connects to a llama.cpp HTTP server.

You can spin it up with something like this:

./llama-server \
  -m /path/to/Qwen3-4B-Instruct-2507-Q8_0.gguf \
  -ngl 999 \
  --port 12345 \
  --threads -1 \
  --ctx-size 20000

Then just run:

python sgr_assistant.py

This one includes a little bit of cleanup logic to strip out <think> tags and formatting issues that local models sometimes produce — nice touch.

What’s happening under the hood

Both versions share a similar core:

An in-memory database with mock data (customers, products, invoices, etc.)
Schema definitions for tools like SendEmail, IssueInvoice, or GetCustomerData
A dispatcher that simulates what happens when those tools are called
And a task list that the model executes step-by-step

It’s all in pure Python — easy to read, easy to extend. You could add your own tool or new logic in just a few lines.

Why this approach is cool

A lot of AI “agents” today are just prompt wrappers around chat models. They can do some planning, but often they’re unpredictable — one small formatting issue, and everything breaks.

SGR fixes that by forcing the model to stay inside a strict JSON schema. Every output has to validate before it runs. That means fewer hallucinations, clearer reasoning steps, and easier debugging.

In other words, you’re not just getting an answer — you’re getting a process you can trust and inspect.

Customize it your way

The best part is how easy it is to tweak.

You can:

Edit the TASKS list to make it do new things
Add more tools with pydantic models
Change the system_prompt to give it different rules or products

Everything’s local and lightweight, so you can experiment freely without breaking anything.

A few quick tips

If you run into issues:

Make sure all dependencies are installed:

  pip install pydantic annotated-types rich openai requests

Double-check your OpenAI API key (if using the API version).
For local models, confirm the llama.cpp server is running and reachable.
If the model outputs invalid JSON, try lowering temperature or adjusting cleanup logic.

Why it matters

Projects like this might seem small, but they hint at something big — how structured reasoning could make AI agents more dependable.

Instead of “guessing” what the next step is, the model is guided by schemas, validated by code, and executed deterministically. It’s the difference between a chat assistant and a reasoning engine.

If you want to check it out yourself, the repo’s here:
👉 bigdata5911/schema-guided-reasoning

⚡ Rethinking Prompt Engineering: How Agent Lightning’s APO Teaches Agents to Write Better Prompts

Joshua — Thu, 06 Nov 2025 11:46:36 +0000

⭐If this help your work, Please Give Star⭐
GitHub Repo: bigdata5911/agent-lightning-automatic-prompt-optimization

**For years, we’ve obsessed over improving model weights and architectures.
But what if the real breakthrough in AI performance comes not from **training the model, but from training the prompt?

That’s the premise behind Agent Lightning, a new framework from Microsoft that allows AI agents to improve themselves.
It introduces two key algorithms:

VERL — for reinforcement learning at the policy level
APO (Automatic Prompt Optimization) — for learning textual gradients that refine prompts based on performance feedback

In this article, I’ll show how APO works, why it’s a game-changer, and how I used it to enhance a Text-to-SQL agent built with LangGraph — improving accuracy from 84% to 88% in just two rounds of optimization.

🌩️ The Idea: Prompts That Learn

Prompt engineering has always been a manual, intuition-driven process. You tweak a few words, rerun your agent, and hope it performs better.
APO replaces that guesswork with data-grounded self-improvement.

It doesn’t retrain the underlying model — instead, it trains the text of the prompt itself.
Think of it as “gradient descent in natural language.”

At the heart of APO are two cooperating LLMs:

A Critic that examines what went wrong in failed tasks
An Editor that rewrites the prompt to address those weaknesses

Each iteration produces multiple improved prompts, scores them on validation data, and preserves the best through beam search — a form of controlled exploration.

The result? A system that writes its own better prompt with every round.

🧮 The Science of Textual Gradients

APO builds on ideas from two research papers — ProTeGi (EMNLP 2023) and TextGrad (Nature 2024) — which formalize how text itself can encode gradient-like feedback.

Here’s what happens inside one APO cycle:

Run current prompt on a small batch of tasks
Score results using an objective metric (for example, SQL correctness)
Critic model reviews (input, output, reward) pairs and summarizes failures in natural language
Editor model applies that feedback to produce refined prompt candidates
Beam search evaluates several rewritten prompts and keeps the top performers

Example critique:

“The prompt doesn’t specify how to handle type mismatches in JOIN columns.
When Singer_ID is INTEGER in one table but TEXT in another, use CAST(col_text AS INTEGER) and filter invalid values.”

This text acts as a direction of improvement — like a gradient — but expressed entirely in language.

🧩 A Practical Experiment: Teaching a SQL Agent to Self-Optimize

To test APO, I applied it to a Text-to-SQL agent that converts natural language questions into SQL queries.
I used the Spider dataset — a well-known text-to-SQL benchmark — and ran 50 examples for training, 50 for validation.

🏗️ The Setup

The agent was built in LangGraph, following a self-correcting workflow.
Agent Lightning handled the optimization loop; I only needed to define the @rollout function that executes the task and returns a reward.

Here’s a minimal setup:

from agentlightning import Trainer
from agentlightning.algorithm.apo import APO
from openai import AsyncOpenAI

openai_client = AsyncOpenAI()

algo = APO(
    openai_client,
    val_batch_size=10,
    gradient_batch_size=4,
    beam_width=2,
    branch_factor=2,
    beam_rounds=2,
)

trainer = Trainer(
    algorithm=algo,
    n_runners=8,
    initial_resources={"prompt_template": prompt_template_baseline()}
)

train_data = load_spider_dataset("data/dev.json")[:50]
val_data = load_spider_dataset("data/dev.json")[50:100]

trainer.fit(
    agent=sql_agent_rollout,
    train_dataset=train_data,
    val_dataset=val_data
)

🧠 The Rollout Function

This is where APO gets its feedback signal — the reward for how well a generated SQL query matches the ground truth.

from agentlightning import rollout
from agentlightning.types import PromptTemplate

@rollout
def sql_agent_rollout(task, prompt_template: PromptTemplate) -> float:
    agent = SQLAgent(
        db_path=f"databases/{task['db_id']}/{task['db_id']}.sqlite",
        write_prompt=prompt_template.format(
            dialect="SQLite",
            table_info=get_schema(task['db_id'])
        )
    )

    result = agent.run(task["question"])
    return evaluate_query(result["query"], task["query"], task["db_id"])

Each rollout returns a numeric reward (1 for correct, 0 for incorrect), giving APO objective feedback for learning.

⚙️ From Draft to Expert Prompt

Baseline (v0)

The initial prompt was something you’d write on your first try — short and vague:

“Be careful not to query for columns that do not exist.”

Accuracy: 84% (42/50)

After Optimization (v5)

After two rounds of APO, the prompt evolved into a structured specification over 350 words long, defining explicit rules for schema validation, safe joins, deterministic ordering, and fallback responses.

Accuracy: 88% (44/50)

Round	Version	Accuracy	Notes
0	v0	84%	Baseline
1	v3	86%	Added type casting logic
2	v5	88%	Added rule hierarchy and validation checks

Example Improvements

Before:

“Use the tables listed below.”

After:

“Use only the tables and columns in {table_info}.
If a required column is missing, respond with an empty result or 'UNABLE TO ANSWER' rather than guessing.”

The optimized prompt became longer, yes — but also far more robust, preventing many subtle SQL errors.

🔍 Why APO Feels Different

1. It Learns from Real Mistakes

Critiques come directly from actual task failures, not from hand-written advice.

2. It Explores Multiple Futures

Beam search means the optimizer doesn’t get trapped in one idea of “better.” It keeps multiple hypotheses alive.

3. It’s Transparent

Every edit is interpretable. You can read the critic’s feedback and understand why the prompt changed.

4. It’s Objective

Rewards are computed from measurable outcomes — in this case, SQL correctness — not subjective LLM scoring.

🧭 What We Learned

After two APO rounds, the system showed clear, measurable gains:

📈 Accuracy: 84% → 88%
📜 Prompt length: 90 → 360 words
⚖️ Rules: 3 vague hints → 19 explicit constraints
✅ Validation: added schema checks and safe SQL handling

In essence, APO taught the agent how to write its own better instructions.

🧰 Try It Yourself

You can reproduce this entire setup:

GitHub Repo(Please Give Star⭐): bigdata5911/agent-lightning-automatic-prompt-optimization
Agent Lightning Docs: microsoft.github.io/agent-lightning
Spider Dataset: yale-lily.github.io/spider

Requirements

Python 3.8+
uv package manager
OpenAI API key (GPT-5 access)
Sufficient disk space for Spider dataset

Quick Start

uv sync
./setup_data.sh
export OPENAI_API_KEY="your-api-key"
uv run python train.py

🌟 Final Thoughts

Agent Lightning’s Automatic Prompt Optimization (APO) is more than an automation trick — it’s a paradigm shift.

Instead of endlessly hand-crafting prompts, you can let your agent learn from its own mistakes, guided by measurable outcomes and transparent reasoning.

In my experiments, APO transformed a generic baseline into a specialized, rule-driven prompt that performed better, explained itself better, and could continue improving indefinitely.

Prompt engineering just got an upgrade — now, the prompts engineer themselves.

Follow me for more explorations into autonomous agents, self-optimizing prompts, and data-driven LLM workflows. ⚡

How to Build Your Own AI-Powered Voice Agent with LiveKit and Twillio: Step-by-Step Implementation Guide

Joshua — Thu, 24 Apr 2025 08:56:08 +0000

Step 1: Create a Twilio Account

Start by signing up for a Twilio account if you haven’t already. Simply visit Twilio’s website and follow the registration process to set up your account.

Step 2: Create a Phone Number

Once your account is ready, navigate to the Twilio Console and create a phone number. You don’t need to configure any additional settings at this stage—just select a number and you’re good to go. This number will be used to handle incoming and outgoing calls in the later steps.

Step 3: Get Your API Credentials from Twilio

Next, you’ll need your Twilio API credentials to integrate with LiveKit. These include your Account SID and Auth Token. Follow these steps:

Go to the Twilio Console.
Navigate to the Account Info section.
Copy your Account SID and Auth Token, and Twillio phone number —you’ll use these in the next steps.

Step 4: Create a LiveKit Account and Project

Create a LiveKit Account

Create a Project

After signing up, log in and create a new project within LiveKit. This project will be used to handle real-time audio and video interactions.

Get the Project URL and SIP URI Parameters

Navigate to the Settings section of your newly created project and locate the Project URL and SIP URI parameters. These will be crucial in the later steps when configuring the integration.

Step 5: Simplify Complex Settings with a Script

To streamline the configuration process for Twilio and LiveKit, use the pre-built script available at the following URL: Twilio & LiveKit Integration Script(scripts/create_inbound_trunk.py).

Here’s what you need to do:

Download or clone the script from the link above.
Replace the placeholders in the script with the necessary details:

Account SID
Auth Token
Phone Number
SIP URI (found in previous steps)

To ensure your environment is ready for Twilio, LiveKit, and OpenAI integration, install the necessary Python packages. Run the following command in your terminal:

pip install -r requirements.txt

Install the LiveKit SDK

If you’re using macOS, you can install the LiveKit CLI via Homebrew:

brew install livekit-cli # MacOS
winget install LiveKit.LiveKitCLI # Windows
curl -sSL https://get.livekit.io/cli | bash # Linux

Authenticate with LiveKit

After installation, authenticate to your LiveKit account by running the following command:

lk cloud auth

Run the script to automatically configure Twilio and LiveKit with the required settings, minimizing the manual setup process. It will automatically create a SIP Trunk in Twilio and make all required configurations.

This script will handle most of the heavy lifting, simplifying the integration between Twilio and LiveKit for real-time communication.

Step 6: Log in to Twilio and Update Voice Configuration on SIP Trunk

After the script has automatically created the SIP Trunk on Twilio, you’ll need to manually update the Voice Configuration to ensure everything works correctly.

Run Voice Agent

OpenAI Realtime Voice AI Agent

python scripts/openai_realtime_voice_ai_agent.py

Run Voice Pipeline AI Agent with Functional Calling and Saving chat message

python scripts/save_chatctx.py

Useful livekit-cli commands

lk sip inbound list
lk sip inbound create inbound_trunk.json
lk sip inbound delete SIP_ID

Preferences

https://docs.livekit.io/agents/overview/
https://docs.livekit.io/agents/quickstarts/voice-agent/

https://agents-playground.livekit.io/
https://kitt.livekit.io/
https://cartesia-assistant.vercel.app/

https://github.com/livekit/agents/tree/main/examples/voice-pipeline-agent/llamaindex-rag

https://www.datavise.ai/blog/usage-of-realtime-openai-api-with-twillio-and-livekit
https://gist.github.com/ShayneP/51eabe243f9e7126929ea7e9db1dc683

Author

Github
Telegram
Discord
Email

Building a Scalable SQL AI Agent

Joshua — Wed, 23 Apr 2025 18:55:28 +0000

This time, I am going to share my small experience in developing SQL AI Agent.

In today’s data-driven world, accessing databases and retrieving information efficiently is crucial.

However, not everyone is proficient in SQL.

Especially, businesses often struggle to extract insights from their data without technical help.

SQL Agent makes that barrier disappear-giving non-technical users easy, secure, and fast access to complex analytics.

That’s where SQL agents come in.

> What is a SQL Agent?

SQL agents bridge the gap between natural language and structured database queries. They allow users technical or not to ask questions in plain English and receive answers derived from complex relational data.

This post shares my experience building a SQL LLM Agent: a system that takes natural language queries, converts them into SQL, runs them on a large PostgreSQL database, and returns human-readable responses.

> Project Requirements

To meet modern enterprise demands, the system was designed with the following capabilities:

Large Database Support: Designed to handle PostgreSQL databases with 100+ relational tables and hundreds of gigabytes of data.
High Concurrency: Supports 50–100 concurrent users without slowing down.
Performance-Centric: Low latency and fast response times are key, even under heavy load.
Token Efficiency: Optimized to minimize token usage with LLMs — reducing cost and improving speed.
Caching: Implements Redis and in-memory caching to store frequently asked queries and results.
Asynchronous Processing: Handles simultaneous user queries using async I/O and task queues.
Natural Language Interface: Users can interact in plain English — no SQL knowledge required.
Smart Query Handling:
Translates natural language into optimized SQL
Executes queries on PostgreSQL
Summarizes the results in clear, readable natural language
Tech Stack
Backend Framework: FastAPI
Database: PostgreSQL (100+ tables, hundreds of GBs)
Caching: Redis
Task Queue: Celery
Containerization: Docker + Docker Compose
Agent Framework: LangChain + LangGraph
LLM: OpenAI (GPT-4 / GPT-4o)

System Architecture Overview
The SQL LLM Agent is an intelligent, scalable pipeline that transforms natural language queries into executable SQL and returns the results in a conversational format.

> Core Components

FastAPI Application
Hosts the /api/query endpoint
Dockerized with auto-reload (port: 8000)
SQL Agent (Main Engine)
Defined in app/agents/sql_agent.py
Powered by LangGraph’s StateGraph to control multi-step processing
Connects language models with real-time database operations
Database Layer
Uses PostgreSQL with SQLAlchemy (AsyncSession) for non-blocking queries
Caching System
Redis stores previously run queries and their results
Reduces redundant LLM and DB calls
TTL-based memory cache handles hot data
LLM Integration
OpenAI’s GPT-4 / GPT-4o is used for:
Translating NL to SQL
Summarizing SQL output in plain English
Auto-correcting faulty SQL
-Generating follow-up questions

> LangGraph Workflow

The agent follows a directed graph workflow using LangGraph, broken into these modular steps:

choose_tables - Identifies relevant tables from user query
get_ddls - Converts NL query to SQL
generate_sql - Executes the SQL and handles errors or retries
suggest_followups - Offers relevant follow-up questions
Deployment Architecture
Deployed using Docker Compose with three primary services:

> Service Role

app - FastAPI backend (exposes port 8000)
db - PostgreSQL container with persisted volume pgdata
redis - In-memory cache for faster data access

> Key Features

High Performance: Handles large-scale databases under heavy load.
Smart Caching: Avoids repeated work using Redis and memory-based -caching.
Error Resilience: Automatically corrects broken or malformed SQL queries.
Schema Introspection: Dynamically understands and adapts to the DB structure.
Conversational Interaction: Natural language input and output, no SQL required.
Modular Workflow: Built on LangGraph for flexible, stateful processing.

> Final Thoughts

This project was a powerful learning experience that combined LLMs, database engineering, performance tuning, and API design. It’s a strong step toward democratizing data access — making it simple, fast, and intuitive for everyone.

If you’ve ever struggled with getting insights from a complex database or want to make your data more accessible to business teams — this is the direction to explore.

Let me know what you think or if you’re building something similar — I’d love to connect!

[(https://github.com/CodeMaster1022/sql-agent)]