Mukunda Rao Katta

Posted on May 25

Why I Built 50 Small Libraries Instead of One Big Agent Framework

#hermeschallenge #ai #python #agents

Every serious agent project I have worked on ends up in the same place six months after launch. The team picked a big framework early, moved fast, and then hit the wall. The framework does not support the exact retry behavior you need. The egress filter is baked in at the wrong layer. The context management assumes a conversation structure your use case does not have. You file issues. Maintainers are busy. You monkey-patch things. The monkey patches accumulate. Now you own a fork of a framework you do not fully understand.

I wanted to find out if there was a better way. The bet: fifty small libraries, each solving exactly one problem, with zero inter-dependencies, installable in any combination. If it works, you get a stack that composes cleanly and can be swapped out piece by piece.

The Five Layers

Agent infrastructure breaks into five layers. Every project needs some subset of all five.

Safety is about keeping the agent from doing things it should not do. This includes arg validation before tool execution, egress allowlists that block requests to unauthorized hosts, prompt injection detection, and secret scrubbing from logs.

Observability is about knowing what the agent did and what it cost. Token counts, USD per run, per-step logs, trace snapshots, and decision rationale all live here.

Reliability is about keeping the agent running when things go wrong. Provider failover, retry with jitter, circuit breakers, loop detection, and time limits per run are the main pieces.

Context is about managing what goes into the prompt. Context window fitting, message window with paired tool_use/tool_result management, rolling history rotation, and token estimation live here.

Tool Infrastructure is about making tool calls safe and debuggable. This covers timeout enforcement, call deduplication, result caching, schema generation from function signatures, and side-effect tagging.

These five layers are genuinely independent. You can have great reliability and terrible safety. You can have excellent observability and no context management at all. A framework that bundles all five forces you to adopt its opinions about every layer even when you only care about one.

Single Responsibility in Practice

Here is what "one responsibility" looks like concretely. Take tool-loop-guard. It does exactly one thing: it tracks which tools have been called in the current session and raises an exception when the same tool is called with the same args more than N times in a sliding window. The entire public API is:

from tool_loop_guard import LoopGuard

guard = LoopGuard(max_calls=3, window=10)

def my_tool(query: str) -> str:
    guard.check("my_tool", {"query": query})
    # ... actual tool logic
    return result

That is it. No config file. No class inheritance. No plugin registration. If you want loop detection, you add two lines. If you decide loop detection is wrong for your use case, you remove two lines. Nothing else changes.

Compare this to how a framework handles the same feature. The framework has a ToolExecutor class. The loop detection is a policy hook. You configure it through a YAML file or a builder pattern. If you want custom behavior, you subclass BaseLoopPolicy. The test for loop detection mocks five framework internals. Six months later, the framework updates ToolExecutor and your subclass breaks.

Small libs do not have this problem because there is nothing to inherit from and nothing to mock.

Zero Dependencies is a Feature

Every library in this stack has zero runtime dependencies outside the standard library (or, for Rust ports, zero external crates beyond what the algorithm requires). This is intentional.

When a library has dependencies, your project inherits version conflicts. When agentguard requires httpx>=0.24, and your main app requires httpx==0.23.3 for a different reason, you have a problem. Multiply this by fifty libraries and you have an unsolvable dependency resolution puzzle.

Zero deps means pip install agentguard works every time, in every environment, with no conflicts. The library brings no transitive surface.

The API Contract Is Smaller

A framework's API surface is large by necessity. It has to support every use case it covers. You learn the framework, not just the feature you need.

A single-responsibility library has a small API because it only does one thing. llm-cost-cap exposes three things: a CostCap class, a check() method, and a CostLimitExceeded exception. That is the entire public surface. You can read the README in three minutes and understand it completely. You can audit the source in ten minutes.

This matters for security-sensitive components. If agentguard is enforcing your egress allowlist, you want to be able to audit it. A 200-line library with no dependencies is auditable. A 20,000-line framework is not.

What This Does NOT Solve

This approach does not solve orchestration. If you need a high-level agent loop that decides which tool to call next, these libraries do not provide that. You still need to write the loop, or use a framework for that layer specifically.

It does not solve long-term memory. Retrieval-augmented memory, episodic memory, and semantic search are different problems that require infrastructure (a vector database, embedding models, chunking strategies). None of the fifty libraries in this stack address that.

It does not solve discovery. If your agent needs to discover available tools at runtime from a registry, you need a registry. These libraries assume you already know what tools you have.

It also does not solve the cold-start problem. If you are starting an agent project from scratch and have no idea how to structure it, a framework gives you a starting point. These libraries assume you already have a structure and want to add specific capabilities to it.

When This Applies

Use small composable libs when you have a specific production failure to prevent or a specific capability to add. The pattern works best when you know what you need. "We are getting infinite loops in production" means tool-loop-guard. "We are spending too much per run" means llm-cost-cap and agenttrace. "We keep leaking API keys in logs" means tool-secret-scrubber.

Use a framework when you are starting from scratch and want a full agent scaffold. Frameworks are better for getting something working fast. They become liabilities when your requirements diverge from the framework's opinions.

The two approaches are not mutually exclusive. You can use a framework for the high-level loop and small libs for specific safety or reliability concerns that the framework does not handle well.

Quick Start

pip install agentvet agentguard agentcast agenttrace

from agentvet import ArgValidator, schema_from_fn
from agentguard import EgressGuard
from agentcast import StructuredOutput
from agenttrace import RunTracer

# Validate tool args before execution
validator = ArgValidator(schema_from_fn(my_tool))
validator.validate(tool_args)

# Block requests to unauthorized hosts
guard = EgressGuard(allowed=["api.openai.com", "api.anthropic.com"])
guard.check(target_url)

# Enforce structured output
output = StructuredOutput(schema=MySchema).parse(llm_response)

# Track cost per run
with RunTracer() as tracer:
    result = run_agent()
    print(tracer.total_usd)

The Five Layers, Library by Library

Layer	Problem	Library
Safety	Arg validation before tool exec	agentvet
Safety	Egress allowlist	agentguard
Safety	Prompt injection detection	prompt-shield
Safety	Secret scrubbing from logs	tool-secret-scrubber
Safety	PII redaction	llm-pii-redact
Observability	Per-run cost tracking	agenttrace
Observability	Trace snapshots	agentsnap
Observability	Decision log	agent-decision-log
Reliability	Provider failover	llm-fallback-router
Reliability	Retry with jitter	llm-retry-py
Reliability	Circuit breaker	llm-circuit-breaker-py
Reliability	Loop detection	tool-loop-guard
Reliability	Time limit per run	agent-deadline
Context	Context window fitting	agentfit
Context	Message window management	agent-message-window
Tool Infra	Per-tool timeout	tool-timeout-wrap
Tool Infra	Result caching	tool-result-cache
Tool Infra	Schema from function	tool-schema-from-fn

What Is Next

The missing piece right now is a lightweight registry that lets you declare which libs are active for a given agent run, and a shared context object that lets them coordinate without being coupled. Something like a run-scoped dependency container that each lib can read from without importing each other.

If you are building agents and hitting specific production problems, check the GitHub org at MukundaKatta. Most of these are already published. The ones that are pending PyPI publication have the source available and can be installed from git in the meantime.

The Hermes Agent Challenge sprint was the forcing function for building all fifty at once. The next step is to pick the ones that are solving real problems in real codebases and invest more deeply in those.

Top comments (1)

Mehmet Can Farsak • Jun 14

Completely agree with the small-libs-over-frameworks approach. The composability wins every time once you're past the hello-world stage. I built Brainstorm-Mode (mehmetcanfarsak on GitHub) following this philosophy — single-purpose plugin that plugs into the hook system, adds three ideation modes (divergent, actionable, academic), and blocks tool calls during brainstorming. Zero dependencies, drops in via the marketplace, doesn't touch anything else. The kind of library you add or remove without a second thought.