Joshua

Posted on Nov 7

Building Smarter AI Agents with Schema-Guided Reasoning

#systemdesign #agents #ai #opensource

Please give me star ⭐ if this is helpful for your work.
Github Repo: bigdata5911/schema-guided-reasoning

I came across (and tried out) a really interesting project called Schema-Guided Reasoning (SGR) — a small but powerful demo showing how to make AI agents that can reason, plan, and take action using structured logic.

Instead of just chatting, this agent can actually do things — issue invoices, send emails, or apply business rules — all based on clear, validated schemas. It’s a great example of how you can combine reasoning with structured outputs to make AI more reliable and explainable.

So what exactly is Schema-Guided Reasoning?

The idea behind SGR is simple but clever: instead of letting an AI respond freely in text, you guide its reasoning through a schema — basically, a blueprint that defines what kind of outputs it can produce.

By doing that, the AI can plan its steps, pick tools to call, and execute them safely without breaking anything.

In this demo, the schema-driven agent works inside a mini in-memory CRM system. It can:

Look up customers and products
Issue or void invoices
Send emails
Apply business rules

It’s like a lightweight business assistant that understands structure and can think through actions before executing them.

Two ways to run it

The repo gives you two different setups — one that uses OpenAI’s API and another that runs completely locally with Qwen3-4B via llama.cpp.

1. OpenAI API (schema-guided-reasoning.py)

This version uses the OpenAI model gpt-4o and runs everything through the cloud.

Setup is super simple:

pip install pydantic annotated-types rich openai requests
$env:OPENAI_API_KEY = "YOUR_API_KEY"
python schema-guided-reasoning.py

Once it runs, you’ll see the agent print out each task, plan the next step, call tools, and validate everything using Pydantic schemas. The output looks clean in the console thanks to the rich package.

2. Local llama.cpp version (sgr_assistant.py)

If you prefer to stay offline or just like running models locally (like me), there’s a Qwen3-4B version that connects to a llama.cpp HTTP server.

You can spin it up with something like this:

./llama-server \
  -m /path/to/Qwen3-4B-Instruct-2507-Q8_0.gguf \
  -ngl 999 \
  --port 12345 \
  --threads -1 \
  --ctx-size 20000

Then just run:

python sgr_assistant.py

This one includes a little bit of cleanup logic to strip out <think> tags and formatting issues that local models sometimes produce — nice touch.

What’s happening under the hood

Both versions share a similar core:

An in-memory database with mock data (customers, products, invoices, etc.)
Schema definitions for tools like SendEmail, IssueInvoice, or GetCustomerData
A dispatcher that simulates what happens when those tools are called
And a task list that the model executes step-by-step

It’s all in pure Python — easy to read, easy to extend. You could add your own tool or new logic in just a few lines.

Why this approach is cool

A lot of AI “agents” today are just prompt wrappers around chat models. They can do some planning, but often they’re unpredictable — one small formatting issue, and everything breaks.

SGR fixes that by forcing the model to stay inside a strict JSON schema. Every output has to validate before it runs. That means fewer hallucinations, clearer reasoning steps, and easier debugging.

In other words, you’re not just getting an answer — you’re getting a process you can trust and inspect.

Customize it your way

The best part is how easy it is to tweak.

You can:

Edit the TASKS list to make it do new things
Add more tools with pydantic models
Change the system_prompt to give it different rules or products

Everything’s local and lightweight, so you can experiment freely without breaking anything.

A few quick tips

If you run into issues:

Make sure all dependencies are installed:

  pip install pydantic annotated-types rich openai requests

Double-check your OpenAI API key (if using the API version).
For local models, confirm the llama.cpp server is running and reachable.
If the model outputs invalid JSON, try lowering temperature or adjusting cleanup logic.

Why it matters

Projects like this might seem small, but they hint at something big — how structured reasoning could make AI agents more dependable.

Instead of “guessing” what the next step is, the model is guided by schemas, validated by code, and executed deterministically. It’s the difference between a chat assistant and a reasoning engine.

If you want to check it out yourself, the repo’s here:
👉 bigdata5911/schema-guided-reasoning

DEV Community