Cheney Li

Posted on Feb 28

Building AI Agents with Python: A Practical, Open-Source First Guide

#ai #python #agents #tutorial

AI agents are more than “LLM + prompt.” A useful agent can plan, use tools, remember context, and act safely in the real world (files, APIs, databases). In this post, we’ll build a small but capable agent in Python using an open-source stack.

We’ll implement:

A minimal agent loop (think/plan → tool call → observe → repeat)
A tool registry with typed inputs
Lightweight memory (conversation + notes)
Basic guardrails (tool allowlist + timeouts + validation)
A working example: an agent that can search docs (locally), summarize, and draft a response

This is aimed at intermediate Python developers who want to understand the moving parts and keep the architecture flexible.

What is an “AI agent” (in practice)?

A practical agent typically includes:

Model: an LLM that can reason over text and choose actions.
Tools: functions the model can call (HTTP requests, DB queries, file I/O).
Memory: state across turns (chat history, scratchpad, retrieved notes).
Policy/Loop: logic that decides when to call tools and when to stop.
Safety: constraints to avoid dangerous actions.

A key design choice: don’t hide the loop. You’ll debug and extend agents more easily when the control flow is visible.

Project setup

We’ll use:

Python 3.11+
pydantic for tool input validation
httpx (optional) for web calls
An LLM client (examples include OpenAI-compatible APIs or local models). I’ll show an OpenAI-compatible interface, but the agent architecture is model-agnostic.

Install dependencies:

pip install pydantic httpx

If you’re using an OpenAI-compatible endpoint:

pip install openai

Step 1: Define tools (the agent’s capabilities)

Tools are just Python callables plus metadata:

Name
Description (for the model)
Input schema
Function to execute

We’ll implement a tiny tool framework.

from __future__ import annotations

from dataclasses import dataclass
from typing import Any, Callable, Dict, Optional, Type

from pydantic import BaseModel, ValidationError


@dataclass
class Tool:
    name: str
    description: str
    input_model: Type[BaseModel]
    fn: Callable[..., Any]

    def run(self, raw_args: Dict[str, Any]) -> Any:
        args = self.input_model(**raw_args)
        return self.fn(**args.model_dump())


class ToolRegistry:
    def __init__(self):
        self._tools: Dict[str, Tool] = {}

    def register(self, tool: Tool) -> None:
        if tool.name in self._tools:
            raise ValueError(f"Tool already registered: {tool.name}")
        self._tools[tool.name] = tool

    def get(self, name: str) -> Tool:
        return self._tools[name]

    def list(self) -> Dict[str, Tool]:
        return dict(self._tools)

Example tools

We’ll add two tools:

search_local_docs: search a local folder of markdown/text files
summarize_text: a non-LLM “tool” (simple chunking + truncation) to show that tools can be deterministic

import os
import re
from pathlib import Path
from typing import List

from pydantic import BaseModel, Field


class SearchLocalDocsInput(BaseModel):
    query: str = Field(..., min_length=2)
    folder: str = Field(..., description="Folder containing .md/.txt files")
    max_results: int = Field(5, ge=1, le=20)


def search_local_docs(query: str, folder: str, max_results: int = 5) -> List[dict]:
    q = query.lower().strip()
    folder_path = Path(folder)
    results = []

    for path in folder_path.rglob("*"):
        if path.suffix.lower() not in {".md", ".txt"}:
            continue
        try:
            text = path.read_text(encoding="utf-8", errors="ignore")
        except OSError:
            continue

        if q in text.lower():
            # Grab a small snippet around the first match
            m = re.search(re.escape(q), text, re.IGNORECASE)
            start = max(0, m.start() - 120) if m else 0
            end = min(len(text), (m.end() + 120) if m else 240)
            snippet = text[start:end].replace("\n", " ")
            results.append({"file": str(path), "snippet": snippet})

    return results[:max_results]


class SummarizeTextInput(BaseModel):
    text: str = Field(..., min_length=1)
    max_chars: int = Field(600, ge=100, le=5000)


def summarize_text(text: str, max_chars: int = 600) -> str:
    text = re.sub(r"\s+", " ", text).strip()
    if len(text) <= max_chars:
        return text
    return text[: max_chars - 3] + "..."

registry = ToolRegistry()

registry.register(
    Tool(
        name="search_local_docs",
        description="Search local markdown/text files for a query and return file snippets.",
        input_model=SearchLocalDocsInput,
        fn=search_local_docs,
    )
)

registry.register(
    Tool(
        name="summarize_text",
        description="Summarize text by truncating to a max character length.",
        input_model=SummarizeTextInput,
        fn=summarize_text,
    )
)

Step 2: Define messages + memory

We’ll store a basic conversation history plus a “notes” field the agent can update.

from dataclasses import dataclass, field
from typing import Literal, List


Role = Literal["system", "user", "assistant", "tool"]


@dataclass
class Message:
    role: Role
    content: str
    name: str | None = None  # used for tool name


@dataclass
class AgentState:
    messages: List[Message] = field(default_factory=list)
    notes: str = ""

    def add(self, role: Role, content: str, name: str | None = None) -> None:
        self.messages.append(Message(role=role, content=content, name=name))

Step 3: The model interface (OpenAI-compatible)

Many providers (and local gateways) implement an OpenAI-compatible Chat Completions API. We’ll keep this thin so you can swap it out.

We’ll ask the model to respond in a structured JSON format:

Either: { "type": "final", "answer": "..." }
Or: { "type": "tool", "name": "...", "args": { ... } }

import json
from typing import Any, Dict


class LLMClient:
    def __init__(self, model: str = "gpt-4o-mini"):
        from openai import OpenAI
        self.client = OpenAI()
        self.model = model

    def chat(self, messages: list[dict]) -> str:
        resp = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            temperature=0.2,
        )
        return resp.choices[0].message.content


def to_openai_messages(state: AgentState) -> list[dict]:
    msgs = []
    for m in state.messages:
        d = {"role": m.role, "content": m.content}
        if m.name:
            d["name"] = m.name
        msgs.append(d)
    return msgs

Step 4: Build the agent loop

The agent loop:

Send system prompt + history + notes
Parse model output
If tool call: validate args, run tool, append tool result
If final: return answer
Stop after N steps

We’ll also add basic guardrails:

Tool allowlist: only registered tools can run
Validation: Pydantic schemas
Step limit: prevents infinite loops

SYSTEM_PROMPT = """
You are a helpful AI agent.

You can either:
1) Call a tool, by responding with strict JSON:
   {"type":"tool","name":"...","args":{...}}
2) Or answer the user, by responding with strict JSON:
   {"type":"final","answer":"..."}

Rules:
- Only call tools that are available.
- If you call a tool, keep args minimal and valid.
- Use the agent notes when helpful.
- Output MUST be valid JSON and nothing else.
""".strip()


class Agent:
    def __init__(self, llm: LLMClient, tools: ToolRegistry):
        self.llm = llm
        self.tools = tools

    def run(self, user_input: str, state: Optional[AgentState] = None, max_steps: int = 8) -> str:
        state = state or AgentState()

        # Add system prompt once at the start
        if not state.messages or state.messages[0].role != "system":
            state.messages.insert(0, Message("system", SYSTEM_PROMPT))

        state.add("user", user_input)

        for step in range(max_steps):
            # Provide notes as context (simple approach)
            if state.notes:
                state.add("system", f"Agent notes: {state.notes}")

            raw = self.llm.chat(to_openai_messages(state))

            try:
                payload = json.loads(raw)
            except json.JSONDecodeError:
                # If the model misbehaves, force a final response
                return "Model returned non-JSON output. Try again with a stricter prompt."

            if payload.get("type") == "final":
                answer = payload.get("answer", "")
                state.add("assistant", answer)
                return answer

            if payload.get("type") == "tool":
                name = payload.get("name")
                args = payload.get("args") or {}

                if name not in self.tools.list():
                    state.add("tool", f"ERROR: tool not allowed: {name}", name=name)
                    continue

                tool = self.tools.get(name)
                try:
                    result = tool.run(args)
                    state.add("tool", json.dumps(result, ensure_ascii=False), name=name)
                except ValidationError as ve:
                    state.add("tool", f"VALIDATION_ERROR: {ve}", name=name)
                except Exception as e:
                    state.add("tool", f"TOOL_ERROR: {e}", name=name)
                continue

            # Unknown response type
            state.add("assistant", "I couldn't determine the next action.")
            return "I couldn't determine the next action."

        return "Max steps reached without a final answer."

Step 5: Try it end-to-end

Create a docs/ folder with a couple .md files (project notes, API docs, etc.). Then run:

if __name__ == "__main__":
    llm = LLMClient(model="gpt-4o-mini")
    agent = Agent(llm=llm, tools=registry)

    question = "Search my docs for 'rate limit' and explain what it says in 3 bullet points. Folder is docs."
    print(agent.run(question))

A typical interaction looks like:

Model calls search_local_docs with {query: "rate limit", folder: "docs"}
Tool returns snippets
Model calls summarize_text (optional)
Model returns a final bullet list

Making it more agentic (without making it fragile)

Once the basics work, here are practical upgrades.

1) Add a “planner” step

Instead of letting the model decide everything in one shot, add an explicit planning phase:

Step A: produce a plan (no tools)
Step B: execute the next tool call

This reduces randomness and improves debuggability.

2) Add retrieval (RAG) properly

Our search_local_docs is naive substring matching. For real projects, use embeddings:

sentence-transformers for local embeddings
A vector store like FAISS, Chroma, or SQLite-based solutions

Then create a tool like retrieve_context(query) -> passages.

3) Add tool timeouts and cancellation

Tools that hit networks should use timeouts:

import httpx

def fetch_url(url: str) -> str:
    with httpx.Client(timeout=10.0, follow_redirects=True) as client:
        return client.get(url).text

4) Add a strict allowlist and “capabilities” policy

A common mistake is giving agents broad file/network access. Prefer:

A small set of tools
Explicit path sandboxing (only within a workspace directory)
Read-only tools by default

5) Add structured tool outputs

Returning JSON strings is fine for demos, but you’ll want consistent schemas. Consider:

Tool output models (Pydantic)
A standardized envelope: { ok: bool, data: ..., error: ... }

Open-source note: keep the architecture swappable

If you later adopt a framework (LangGraph, LlamaIndex, Haystack, Semantic Kernel), you’ll still benefit from understanding:

How tools are validated
Where memory lives
How the loop terminates
How errors are handled

A good rule: frameworks should reduce boilerplate, not hide control flow.

Summary

You now have a minimal, extensible Python AI agent with:

A clear agent loop
Typed tools with validation
Basic memory
Guardrails (allowlist, step limit)

From here, the biggest improvements come from:

Better retrieval (embeddings)
Better planning (explicit plan/execute)
Better safety (sandboxing + permissions)

If you want, I can follow up with a second post that adds:

Embeddings + FAISS for retrieval
A planner/executor split
Streaming outputs and better tracing/logging

Top comments (1)

klement Gunndu • Feb 28

The "don't hide the loop" principle is spot on — one thing worth adding is that explicit loops also make it much easier to add circuit breakers and token budgets, which become critical the moment you move past demo to production.