Your Coding Agent Doesn't Need a Bigger Context Window. It Needs Coworkers.

#python #ai #opensource #programming

I've been building with Claude Code for months. It's genuinely impressive — until your codebase gets big enough that the agent starts drowning in its own context.

The 1M token context window sounds huge. But feed it a real project — a few hundred files, import chains six layers deep, config scattered across yaml and env files — and you start hitting walls. Responses slow down. Quality degrades. Costs climb. The model spends half its tokens reading code it doesn't need for your question.

The instinct is to wait for bigger context windows. But I think that's the wrong fix. A smarter CTO doesn't mean the CTO should personally review every line of code.

So I built AgentHub: a lightweight Python library that splits your codebase into specialized agents, routes queries to the right one, and lets them collaborate when a question spans multiple domains.

Here's how it works, layer by layer.

The Problem in Practice

Say you're working on a Django project. You ask your coding agent: "How does the payment flow work?" The agent now has to process your auth middleware, your URL routing, your serializers, your payment service, your Stripe webhooks, your database models, your test fixtures — all of it loaded into one context window, most of it irrelevant.

The real answer lives in maybe 4-5 files across 2-3 modules. But the agent doesn't know that upfront. So it ingests everything, burns through tokens, and gives you a diluted answer because it's trying to hold too many things in its head at once.

This is a routing problem, not a capacity problem.

Layer 1: Auto-Generated Domain Agents

The first insight is that your codebase already tells you where the natural boundaries are. Folders map to domains. Files map to responsibilities. Import graphs map to dependencies. Why not use that structure to create agents automatically?

AgentHub's build command scans your repo and generates what I call Tier B agents — one per logical domain, each preloaded with only the files it owns.

from agenthub import AgentHub
from agenthub.auto import discover_all_agents

hub, summary = discover_all_agents("./my-project")
response = hub.run("How does user authentication work?")

Under the hood, the build step does three things. First, it walks your file tree and maps the structure — which folders exist, how big they are, what languages they use. Second, it builds an import graph by parsing import statements across your codebase, so it knows that api/views.py depends on services/auth.py which depends on models/user.py. Third, it uses those two signals to create focused agents, each with a context window loaded only with the files in its domain.

When a query comes in, a router scores it against each agent's keywords and domain, and dispatches to the best match. The src/api/ agent answers API questions. The src/models/ agent answers schema questions. No wasted tokens.

There's also a Tier A layer — hand-written agents for business logic that can't be inferred from code structure. A pricing agent that knows your margin rules. An analytics agent that knows your KPI definitions. These are optional but powerful when you need domain expertise that lives in people's heads, not in source files.

Layer 2: DAG Teams for Cross-Cutting Questions

Single-agent routing works great for scoped questions ("What does the UserSerializer class do?"). It breaks down for cross-cutting ones ("How does the checkout flow work end to end?"), where the answer lives across your API layer, your service layer, your models, and your payment integration.

For these, AgentHub spins up a DAG team. The process works like this: a complexity classifier detects that the query spans multiple domains. A decomposer identifies which agents are relevant. The import graph provides the dependency edges between them. Independent agents execute in parallel, dependent ones run in sequence, and a synthesizer merges their results into a coherent answer.

The key decision here was making the DAG dynamic rather than hardcoded. It's built per-query from the import graph, so the collaboration structure reflects how your code actually connects, not how you think it connects.

Layer 3: Parallel Sessions

This is where it gets ambitious. When a developer says "Add a save button to the toolbar and also build a time series chart component," those are two independent tasks touching completely different files. Today, coding agents handle them sequentially. But why?

AgentHub's parallel sessions layer decomposes multi-part requests into tasks, analyzes file overlap using the import graph, and when tasks are independent, spins up separate Claude Code sessions on separate git branches. Each session is scoped — it only sees the files relevant to its task. When all sessions complete, a merge coordinator brings the branches together, running tests to catch semantic conflicts that git can't detect.

I think of this as the company model. When a startup is two people, the CTO (your coding agent) knows everything. As the company grows, you hire specialists (Tier B agents). When departments get big, you organize teams (DAG teams). And when there's a company-wide initiative, teams work simultaneously on their own tracks, syncing at boundaries.

The merge step is where it gets tricky. Textual conflicts are easy — git handles those. Semantic conflicts are harder: two branches that merge cleanly but break each other's assumptions. For those, AgentHub runs the test suite post-merge. If tests fail, a resolution agent examines the failure, traces it back to the conflicting changes, and either fixes it or escalates to the developer.

Where It Stands

AgentHub is at v0.1.0. The core auto-agent generation, DAG teams, and parallel session infrastructure are all implemented. It ships as a Python CLI and integrates with Claude Code via MCP, so the agents show up as tools right in your terminal.

The honest status: auto-agent generation works well — point it at a repo and you get useful agents. DAG teams work but need tuning on the decomposer prompts. The routing layer — deciding which agent gets a query — still needs some manual tweaking to perform reliably across different codebases. And parallel sessions are the most experimental; the git orchestration is solid, but semantic conflict resolution is still rough around the edges.

I'm not open-sourcing this yet. There's enough rough edges that I want to get the routing and agent quality to a place I'm happy with before putting it out there. But I'm sharing the architecture because I think the core ideas — using import graphs for agent boundaries, DAG-based collaboration, the company growth model — are useful regardless of implementation.

If you're building with coding agents and hitting context limits, I'd love to hear how you're solving it.

Built by John. The entire AgentHub codebase was developed collaboratively with Claude — which felt appropriately recursive.