AI agents are more than “LLM + prompt.” A useful agent can plan, use tools, remember context, and act safely in the real world (files, APIs, databases). In this post, we’ll build a small but capable agent in Python using an open-source stack.
We’ll implement:
- A minimal agent loop (think/plan → tool call → observe → repeat)
- A tool registry with typed inputs
- Lightweight memory (conversation + notes)
- Basic guardrails (tool allowlist + timeouts + validation)
- A working example: an agent that can search docs (locally), summarize, and draft a response
This is aimed at intermediate Python developers who want to understand the moving parts and keep the architecture flexible.
What is an “AI agent” (in practice)?
A practical agent typically includes:
- Model: an LLM that can reason over text and choose actions.
- Tools: functions the model can call (HTTP requests, DB queries, file I/O).
- Memory: state across turns (chat history, scratchpad, retrieved notes).
- Policy/Loop: logic that decides when to call tools and when to stop.
- Safety: constraints to avoid dangerous actions.
A key design choice: don’t hide the loop. You’ll debug and extend agents more easily when the control flow is visible.
Project setup
We’ll use:
- Python 3.11+
-
pydanticfor tool input validation -
httpx(optional) for web calls - An LLM client (examples include OpenAI-compatible APIs or local models). I’ll show an OpenAI-compatible interface, but the agent architecture is model-agnostic.
Install dependencies:
pip install pydantic httpx
If you’re using an OpenAI-compatible endpoint:
pip install openai
Step 1: Define tools (the agent’s capabilities)
Tools are just Python callables plus metadata:
- Name
- Description (for the model)
- Input schema
- Function to execute
We’ll implement a tiny tool framework.
from __future__ import annotations
from dataclasses import dataclass
from typing import Any, Callable, Dict, Optional, Type
from pydantic import BaseModel, ValidationError
@dataclass
class Tool:
name: str
description: str
input_model: Type[BaseModel]
fn: Callable[..., Any]
def run(self, raw_args: Dict[str, Any]) -> Any:
args = self.input_model(**raw_args)
return self.fn(**args.model_dump())
class ToolRegistry:
def __init__(self):
self._tools: Dict[str, Tool] = {}
def register(self, tool: Tool) -> None:
if tool.name in self._tools:
raise ValueError(f"Tool already registered: {tool.name}")
self._tools[tool.name] = tool
def get(self, name: str) -> Tool:
return self._tools[name]
def list(self) -> Dict[str, Tool]:
return dict(self._tools)
Example tools
We’ll add two tools:
-
search_local_docs: search a local folder of markdown/text files -
summarize_text: a non-LLM “tool” (simple chunking + truncation) to show that tools can be deterministic
import os
import re
from pathlib import Path
from typing import List
from pydantic import BaseModel, Field
class SearchLocalDocsInput(BaseModel):
query: str = Field(..., min_length=2)
folder: str = Field(..., description="Folder containing .md/.txt files")
max_results: int = Field(5, ge=1, le=20)
def search_local_docs(query: str, folder: str, max_results: int = 5) -> List[dict]:
q = query.lower().strip()
folder_path = Path(folder)
results = []
for path in folder_path.rglob("*"):
if path.suffix.lower() not in {".md", ".txt"}:
continue
try:
text = path.read_text(encoding="utf-8", errors="ignore")
except OSError:
continue
if q in text.lower():
# Grab a small snippet around the first match
m = re.search(re.escape(q), text, re.IGNORECASE)
start = max(0, m.start() - 120) if m else 0
end = min(len(text), (m.end() + 120) if m else 240)
snippet = text[start:end].replace("\n", " ")
results.append({"file": str(path), "snippet": snippet})
return results[:max_results]
class SummarizeTextInput(BaseModel):
text: str = Field(..., min_length=1)
max_chars: int = Field(600, ge=100, le=5000)
def summarize_text(text: str, max_chars: int = 600) -> str:
text = re.sub(r"\s+", " ", text).strip()
if len(text) <= max_chars:
return text
return text[: max_chars - 3] + "..."
Register them:
registry = ToolRegistry()
registry.register(
Tool(
name="search_local_docs",
description="Search local markdown/text files for a query and return file snippets.",
input_model=SearchLocalDocsInput,
fn=search_local_docs,
)
)
registry.register(
Tool(
name="summarize_text",
description="Summarize text by truncating to a max character length.",
input_model=SummarizeTextInput,
fn=summarize_text,
)
)
Step 2: Define messages + memory
We’ll store a basic conversation history plus a “notes” field the agent can update.
from dataclasses import dataclass, field
from typing import Literal, List
Role = Literal["system", "user", "assistant", "tool"]
@dataclass
class Message:
role: Role
content: str
name: str | None = None # used for tool name
@dataclass
class AgentState:
messages: List[Message] = field(default_factory=list)
notes: str = ""
def add(self, role: Role, content: str, name: str | None = None) -> None:
self.messages.append(Message(role=role, content=content, name=name))
Step 3: The model interface (OpenAI-compatible)
Many providers (and local gateways) implement an OpenAI-compatible Chat Completions API. We’ll keep this thin so you can swap it out.
We’ll ask the model to respond in a structured JSON format:
- Either:
{ "type": "final", "answer": "..." } - Or:
{ "type": "tool", "name": "...", "args": { ... } }
import json
from typing import Any, Dict
class LLMClient:
def __init__(self, model: str = "gpt-4o-mini"):
from openai import OpenAI
self.client = OpenAI()
self.model = model
def chat(self, messages: list[dict]) -> str:
resp = self.client.chat.completions.create(
model=self.model,
messages=messages,
temperature=0.2,
)
return resp.choices[0].message.content
def to_openai_messages(state: AgentState) -> list[dict]:
msgs = []
for m in state.messages:
d = {"role": m.role, "content": m.content}
if m.name:
d["name"] = m.name
msgs.append(d)
return msgs
Step 4: Build the agent loop
The agent loop:
- Send system prompt + history + notes
- Parse model output
- If tool call: validate args, run tool, append tool result
- If final: return answer
- Stop after N steps
We’ll also add basic guardrails:
- Tool allowlist: only registered tools can run
- Validation: Pydantic schemas
- Step limit: prevents infinite loops
SYSTEM_PROMPT = """
You are a helpful AI agent.
You can either:
1) Call a tool, by responding with strict JSON:
{"type":"tool","name":"...","args":{...}}
2) Or answer the user, by responding with strict JSON:
{"type":"final","answer":"..."}
Rules:
- Only call tools that are available.
- If you call a tool, keep args minimal and valid.
- Use the agent notes when helpful.
- Output MUST be valid JSON and nothing else.
""".strip()
class Agent:
def __init__(self, llm: LLMClient, tools: ToolRegistry):
self.llm = llm
self.tools = tools
def run(self, user_input: str, state: Optional[AgentState] = None, max_steps: int = 8) -> str:
state = state or AgentState()
# Add system prompt once at the start
if not state.messages or state.messages[0].role != "system":
state.messages.insert(0, Message("system", SYSTEM_PROMPT))
state.add("user", user_input)
for step in range(max_steps):
# Provide notes as context (simple approach)
if state.notes:
state.add("system", f"Agent notes: {state.notes}")
raw = self.llm.chat(to_openai_messages(state))
try:
payload = json.loads(raw)
except json.JSONDecodeError:
# If the model misbehaves, force a final response
return "Model returned non-JSON output. Try again with a stricter prompt."
if payload.get("type") == "final":
answer = payload.get("answer", "")
state.add("assistant", answer)
return answer
if payload.get("type") == "tool":
name = payload.get("name")
args = payload.get("args") or {}
if name not in self.tools.list():
state.add("tool", f"ERROR: tool not allowed: {name}", name=name)
continue
tool = self.tools.get(name)
try:
result = tool.run(args)
state.add("tool", json.dumps(result, ensure_ascii=False), name=name)
except ValidationError as ve:
state.add("tool", f"VALIDATION_ERROR: {ve}", name=name)
except Exception as e:
state.add("tool", f"TOOL_ERROR: {e}", name=name)
continue
# Unknown response type
state.add("assistant", "I couldn't determine the next action.")
return "I couldn't determine the next action."
return "Max steps reached without a final answer."
Step 5: Try it end-to-end
Create a docs/ folder with a couple .md files (project notes, API docs, etc.). Then run:
if __name__ == "__main__":
llm = LLMClient(model="gpt-4o-mini")
agent = Agent(llm=llm, tools=registry)
question = "Search my docs for 'rate limit' and explain what it says in 3 bullet points. Folder is docs."
print(agent.run(question))
A typical interaction looks like:
- Model calls
search_local_docswith{query: "rate limit", folder: "docs"} - Tool returns snippets
- Model calls
summarize_text(optional) - Model returns a final bullet list
Making it more agentic (without making it fragile)
Once the basics work, here are practical upgrades.
1) Add a “planner” step
Instead of letting the model decide everything in one shot, add an explicit planning phase:
- Step A: produce a plan (no tools)
- Step B: execute the next tool call
This reduces randomness and improves debuggability.
2) Add retrieval (RAG) properly
Our search_local_docs is naive substring matching. For real projects, use embeddings:
-
sentence-transformersfor local embeddings - A vector store like FAISS, Chroma, or SQLite-based solutions
Then create a tool like retrieve_context(query) -> passages.
3) Add tool timeouts and cancellation
Tools that hit networks should use timeouts:
import httpx
def fetch_url(url: str) -> str:
with httpx.Client(timeout=10.0, follow_redirects=True) as client:
return client.get(url).text
4) Add a strict allowlist and “capabilities” policy
A common mistake is giving agents broad file/network access. Prefer:
- A small set of tools
- Explicit path sandboxing (only within a workspace directory)
- Read-only tools by default
5) Add structured tool outputs
Returning JSON strings is fine for demos, but you’ll want consistent schemas. Consider:
- Tool output models (Pydantic)
- A standardized envelope:
{ ok: bool, data: ..., error: ... }
Open-source note: keep the architecture swappable
If you later adopt a framework (LangGraph, LlamaIndex, Haystack, Semantic Kernel), you’ll still benefit from understanding:
- How tools are validated
- Where memory lives
- How the loop terminates
- How errors are handled
A good rule: frameworks should reduce boilerplate, not hide control flow.
Summary
You now have a minimal, extensible Python AI agent with:
- A clear agent loop
- Typed tools with validation
- Basic memory
- Guardrails (allowlist, step limit)
From here, the biggest improvements come from:
- Better retrieval (embeddings)
- Better planning (explicit plan/execute)
- Better safety (sandboxing + permissions)
If you want, I can follow up with a second post that adds:
- Embeddings + FAISS for retrieval
- A planner/executor split
- Streaming outputs and better tracing/logging
Top comments (1)
The "don't hide the loop" principle is spot on — one thing worth adding is that explicit loops also make it much easier to add circuit breakers and token budgets, which become critical the moment you move past demo to production.