If you’ve tinkered with Gen AI agents, you know the gap between cool demos and dependable systems is big. This article distills what actually works when going from a single-script agent to production-ready, multi-agent pipelines. It’s based on reusable patterns from typical agent service modules and use-case templates, adapted into generic snippets you can copy into your own stack.
Note: This is a vendor-agnostic, client-agnostic write-up. No company-specific details. All code is illustrative and production-friendly.
Why Agentic Architectures Matter
Autonomy without chaos: Agents plan, act, and reflect, but need guardrails.
Tool use is essential: Real utility comes from reliable integration with data, APIs, storage, and retrieval.
Memory and context: Short-term scratchpads plus durable episode/task memory improve success rates.
Orchestration beats monoliths: Separate concerns (planning, execution, observation, correction).
A Minimal Agent: Plan–Act–Observe–Reflect
This skeleton shows a single agent loop that plans, executes tools, observes results, and reflects to update its strategy.
from typing import Callable, Dict, Any, List
class Tool:
def init(self, name: str, runner: Callable[[Dict[str, Any]], Dict[str, Any]]):
self.name = name
self.run = runner
class Memory:
def init(self):
self.events: List[Dict[str, Any]] = []
def add(self, event: Dict[str, Any]):
self.events.append(event)
def last(self, n: int = 5) -> List[Dict[str, Any]]:
return self.events[-n:]
class Agent:
def init(self, planner: Callable[[str, List[Dict[str, Any]]], Dict[str, Any]],
reflector: Callable[[List[Dict[str, Any]]], str],
tools: Dict[str, Tool], memory: Memory):
self.planner = planner
self.reflector = reflector
self.tools = tools
self.memory = memory
def step(self, goal: str) -> Dict[str, Any]:
plan = self.planner(goal, self.memory.last())
tool_name = plan.get("tool")
args = plan.get("args", {})
result = self.tools.get(tool_name, Tool("noop", lambda _: {"error": "Unknown tool", "done": False})).run(args)
event = {"goal": goal, "plan": plan, "result": result}
self.memory.add(event)
feedback = self.reflector(self.memory.last())
return {"event": event, "feedback": feedback}
def run(self, goal: str, max_steps: int = 5) -> List[Dict[str, Any]]:
trace = []
for _ in range(max_steps):
trace.append(self.step(goal))
if trace[-1]["event"]["result"].get("done"):
break
return trace
Key idea: keep the loop simple and pure. Inject model/planner/reflector functions rather than hard-coding vendor calls.
Tools: Keep Interfaces Consistent
from typing import Dict, Any
def search_tool(params: Dict[str, Any]) -> Dict[str, Any]:
query = params.get("query", "")
# Replace with your search implementation (API, vector DB, etc.)
return {"items": [f"Result for: {query}"], "done": False}
def write_file_tool(params: Dict[str, Any]) -> Dict[str, Any]:
path = params.get("path")
content = params.get("content", "")
if not path:
return {"error": "Missing path", "done": False}
try:
with open(path, "w", encoding="utf-8") as f:
f.write(content)
return {"ok": True, "done": True}
except Exception as e:
return {"error": str(e), "done": False}
Plugging in LLMs for Planning and Reflection
Use any LLM provider. The important part is contract shape.
from typing import List, Dict, Any
def planner_llm(goal: str, recent_events: List[Dict[str, Any]]) -> Dict[str, Any]:
# Prompt craft is omitted; produce tool + args plan
# Simple heuristic plan (replace with LLM call)
if "write" in goal.lower():
return {"tool": "write_file", "args": {"path": "output.txt", "content": goal}}
return {"tool": "search", "args": {"query": goal}}
def reflector_llm(recent_events: List[Dict[str, Any]]) -> str:
# Summarize last results and propose improvements
return f"Reflect: {len(recent_events)} events processed. Consider narrowing the query or validating outputs."
Wire It Up
from agent_loop import Agent, Memory, Tool
from tools import search_tool, write_file_tool
from llm_adapters import planner_llm, reflector_llm
def build_agent() -> Agent:
tools = {
"search": Tool("search", search_tool),
"write_file": Tool("write_file", write_file_tool),
}
memory = Memory()
return Agent(planner_llm, reflector_llm, tools, memory)
if name == "main":
agent = build_agent()
trace = agent.run("Write a short note about agent patterns", max_steps=3)
for t in trace:
print(t)
Multi-Agent Pattern: Coordinator + Specialists
When tasks are complex, split into roles: Planner, Researcher, Implementer, Reviewer. The coordinator decomposes, routes, and reconciles.
from typing import Dict, Any, List
from agent_loop import Agent, Memory
class Coordinator:
def init(self, agents: Dict[str, Agent]):
self.agents = agents
def run(self, goal: str) -> List[Dict[str, Any]]:
# naive decomposition; replace with LLM planner
subtasks = [
{"role": "researcher", "goal": f"Find info: {goal}"},
{"role": "implementer", "goal": f"Draft output for: {goal}"},
{"role": "reviewer", "goal": f"Check draft for: {goal}"},
]
trace = []
for st in subtasks:
agent = self.agents.get(st["role"]) or self.agents.get("implementer")
trace.append(agent.run(st["goal"], max_steps=2))
return trace
def make_specialist(planner, reflector, tools) -> Agent:
return Agent(planner, reflector, tools, Memory())
Use Case Templates
Your codebase’s templates often include:
RAG agents: Retrieval-Augmented Generation using chunking, embeddings, and retrievers.
ReAct agents: Emphasizing step-by-step reasoning and tool use.
Text extraction agents: Focused on parsing documents and transforming unstructured data.
Example: a generic RAG tool for an agent.
from typing import Dict, Any, List
def rag_query(params: Dict[str, Any]) -> Dict[str, Any]:
question = params.get("question", "")
# Plug in your embedder, vector store, and reader components
# docs = retriever.search(question)
# answer = reader.synthesize(question, docs)
docs: List[str] = ["Doc A", "Doc B"]
answer = f"Answer synthesized for: {question} using {len(docs)} docs"
return {"answer": answer, "sources": docs, "done": True}
Then mount it as a tool:
from agent_loop import Agent, Memory, Tool
from llm_adapters import planner_llm, reflector_llm
from simple_rag_tool import rag_query
tools = {"rag": Tool("rag", rag_query)}
agent = Agent(planner_llm, reflector_llm, tools, Memory())
trace = agent.run("What is agentic RAG?", max_steps=1)
Guardrails and Safety
Input validation on tools (types, ranges, allowlists).
Sandboxed execution for file/network operations.
Rate limiting and circuit breakers for external APIs.
Observability: structured logs and traces per agent step.
Testing Strategy
Test agents like workflows:
Unit-test tools with deterministic inputs/outputs.
Mock LLM planners/reflectors to stabilize tests.
Scenario tests for end-to-end goals (success criteria + timeouts).
from agent_loop import Agent, Memory, Tool
def planner_stub(goal, _):
return {"tool": "echo", "args": {"text": goal}}
def reflector_stub(_):
return "reflect"
def echo_tool(params):
return {"echo": params.get("text", ""), "done": True}
def test_agent_runs_one_step():
tools = {"echo": Tool("echo", echo_tool)}
agent = Agent(planner_stub, reflector_stub, tools, Memory())
trace = agent.run("hello", max_steps=3)
assert trace[-1]["event"]["result"].get("done") is True
Deployment Tips
Package agents as stateless workers with externalized memory (DB/object store).
Use queues for long-running tasks; record step traces for resumability.
Keep prompts modular and versioned; migrate gradually.
Wrap-Up
Agentic systems shine when you architect for reliability, testability, and observability. Start with a clean loop, consistent tool interfaces, memory separation, and optional multi-agent coordination. Then plug in your LLM vendor and domain-specific tools. Template folders for agents (e.g., foundation, RAG, text extraction) are a solid foundation you can adapt.
10 Open-Source Agent Projects to Explore
Here are widely used, open-source agent frameworks and projects you can learn from and adapt. Each highlights different patterns: planning, tool use, multi-agent collaboration, memory, and orchestration.
Auto-GPT — Autonomous task-driven agent built on GPT models; showcases long-horizon planning and tool use. Link: https://github.com/Significant-Gravitas/AutoGPT
BabyAGI — Lightweight task management loop (create, prioritize, execute) with vector memory; great for understanding minimal agent cycles. Link: https://github.com/yoheinakajima/babyagi
Microsoft AutoGen — Framework for multi-agent conversations and collaboration with tooling and customization; strong for role-based agent teams. Link: https://github.com/microsoft/autogen
CrewAI — Python framework for multi-agent workflows with roles, tools, and processes; emphasizes structured collaboration. Link: https://github.com/joaomdmoura/crewai
LangGraph — Graph-based orchestration for agent loops, memory, and control; ideal for building reliable, inspectable agent pipelines. Link: https://github.com/langchain-ai/langgraph
LangChain Agents — Tool-using agents with planners, executors, and memory; integrates with a vast ecosystem of tools and vector DBs. Link: https://python.langchain.com/docs/modules/agents
OpenAI Agents SDK — Defines agents with tools and resources and handles orchestration; useful for standardized tool schemas and governance. Link: https://github.com/openai/openai-agents-python
CAMEL — Role-playing multi-agent framework with task decomposition and negotiation; useful for research on collaboration dynamics. Link: https://github.com/camel-ai/camel
AgentGPT (Web) — Browser-based autonomous agent setup for quick experiments; helpful to visualize prompts and iterative action loops. Link: https://github.com/reworkd/AgentGPT
ReAct Pattern Implementations — Combines reasoning traces with tool actions; many open implementations to learn prompt design and action validation. Link: https://arxiv.org/abs/2210.03629
Use these as references to pressure-test your design choices: planning reliability, tool APIs, memory schema, observability, and recovery strategies.
About the Author
Written by Suraj Khaitan — Gen AI Architect | Working on serverless AI & cloud platforms.
Top comments (0)