No manual tuning. No architecture redesign. Just a plain-English instruction and a feedback loop.
The Setup
I maintain KISS, a minimalist multi-agent framework built on one principle: keep it simple, stupid. The framework's flagship coding agent, RelentlessCodingAgent, is a single-agent system with smart auto-continuation — it runs sub-sessions of an LLM-powered coding loop, tracks progress across sessions, and keeps hammering at a task until it succeeds or exhausts its budget. The agent was self-evolved to run relentlessly.
It works. But it was expensive. A single run with Claude Sonnet 4.5 cost $3–5 and took 600–800 seconds. For an agent framework that preaches simplicity and efficiency, that felt like hypocrisy.
So I built a 69-line Python script and told it, in plain English, to fix the problem.
The Tool: repo_optimizer.py
The entire optimizer is a RelentlessCodingAgent pointed at its own source code. Here is the core of it:
from kiss.agents.coding_agents.relentless_coding_agent import RelentlessCodingAgent
TASK = """
Your working directory is {work_dir}.
Can you run the command {command}
in the background so that you can monitor the output in real time,
and correct the code in the working directory if needed? I MUST be able to
see the command output in real time.
If you observe any repeated errors in the output or the command is not able
to complete successfully, please fix the code in the working directory and run the
command again. Repeat the process until the command can finish successfully.
After the command finishes successfully, run the command again
and monitor its output in real time. You can add diagnostic code which will print
metrics {metrics} information at finer level of granularity.
Check for opportunities to optimize the code
on the basis of the metrics information---you need to minimize the metrics.
If you discover any opportunities to minimize the metrics based on the code
and the command output, optimize the code and run the command again.
Note down the ideas you used to optimize the code and the metrics you achieved in a file,
so that you can use the file to not repeat ideas that have already been tried and failed.
You can also use the file to combine ideas that have been successful in the past.
Repeat the process. Do not forget to remove the diagnostic
code after the optimization is complete....
"""
agent = RelentlessCodingAgent("RepoAgent")
result = agent.run(
prompt_template=TASK,
arguments: ...,
model_name="claude-opus-4-6",
work_dir=PROJECT_ROOT
)
That's it. The agent runs itself, watches the output, diagnoses problems, edits its own code, and runs itself again — in a loop — until the numbers drop.
No gradient descent. No hyperparameter grid search. No reward model. Just an LLM reading logs and rewriting source files.
What the Optimizer Actually Does
The feedback loop works like this:
- Run the target agent on a benchmark task and capture the output.
- Monitor the logs in real time. If the agent crashes or hits repeated errors, fix the code and rerun.
- Analyze a successful run: wall-clock time, token count, dollar cost.
- Optimize the source code using strategies specified in plain English — compress prompts, switch models, eliminate wasted steps.
- Repeat until the metrics plateau or the target reduction is hit.
The strategies themselves are just bullet points in the task prompt:
- Shorter system prompts that preserve meaning
- Remove redundant instructions
- Minimize conversation turns
- Batch operations, use early termination
- Search the web for agentic patterns that improve efficiency and reliability
The optimizer isn't hard-coded to apply any particular technique. It reads, reasons, experiments, and iterates. Which techniques it picks depend on what the logs reveal.
The Results
After running overnight, the optimizer produced this report:
| Metric | Before (Claude Sonnet 4.5) | After (Gemini 2.5 Flash) | Reduction |
|---|---|---|---|
| Time | ~600–800s | 169.5s | ~75% |
| Cost | ~$3–5 | $0.12 | ~96–98% |
| Tokens | millions | 300,729 | massive |
All three benchmark tests passed after optimization: diamond dependency resolution, circular detection, and failure propagation.
What the Optimizer Changed
The optimizer made nine concrete modifications, all discovered autonomously:
- Model switch: Claude Sonnet 4.5 ($3/$15 per million tokens) to Gemini 2.5 Flash ($0.30/$2.50 per million tokens) — 10x cheaper input, 6x cheaper output.
-
Compressed prompts: Stripped verbose
CODING_INSTRUCTIONSboilerplate, shortenedTASK_PROMPTandCONTINUATION_PROMPTwithout losing meaning. -
Added
Write()tool: The original agent only hadEdit(), which fails on uniqueness conflicts. Each failure wasted 2–3 steps. AddingWrite()eliminated that. - Stronger finish instruction: "IMMEDIATELY call finish once tests pass. NO extra verification." — stopped the agent from burning tokens on redundant confirmation runs.
-
Bash timeout guidance: "set
timeout_seconds=120for test runs" — prevented hangs on parallel bash execution. - Bounded poll loops: "use bounded poll loops, never unbounded waits" — eliminated infinite-loop risks on background processes.
-
Reduced
max_steps: 25 down to 15. Forced the agent to be efficient. Still enough to complete the task. -
Simplified step threshold: Always
max_steps - 2instead of a complex adaptive calculation. -
Removed
CODING_INSTRUCTIONSimport: Eliminated unnecessary token overhead loaded into every prompt.
None of these changes are exotic. Each one is obvious in hindsight. But together they compound into a 98% cost reduction. The point is that no human sat down and applied them — the optimizer discovered and validated each one through experimentation.
Why This Works
The RelentlessCodingAgent is a general-purpose coding loop: it gets a task in natural language, has access to Bash, Read, Edit, and Write tools, and runs sub-sessions until it succeeds. The repo_optimizer.py simply reuses this same loop, pointed inward.
This is possible because of three properties of the KISS framework:
-
Agents are just Python functions. There's no config ceremony or deployment pipeline. An agent is a class you instantiate and call
.run()on. So an agent can instantiate and run another agent — or itself. -
Tools are just Python functions.
Bash(),Read(),Edit(),Write()— plain functions with type hints. The agent calls them natively. No wrappers, no adapters. - Tasks are just strings. The optimization strategy, the constraints, the success criteria — all expressed in the task prompt. Changing what the optimizer does means editing a paragraph, not rewriting a pipeline.
The result is a self-improving system built from the same primitives as every other KISS agent.
The Bigger Picture: repo_agent.py
The optimizer is actually a specialization of an even simpler tool: repo_agent.py. This is a 28-line script that takes any task as a command-line argument and executes it against your project root:
uv run python -m kiss.agents.coding_agents.repo_agent "Add retry logic to the API client."
The repo agent and the repo optimizer share the same engine (RelentlessCodingAgent) and the same interface (a string). The only difference is the task. The optimizer's task happens to be "optimize this agent for speed and cost." It could just as easily be "add comprehensive test coverage" or "migrate from REST to GraphQL."
The agents in KISS don't care what you ask them to do. They care about doing it relentlessly until it's done.
Try It Yourself
# [Install KISS](https://github.com/ksenxx/kiss_ai/README.md)
# Run the repo optimizer on your own codebase
uv run python -m kiss.agents.coding_agents.repo_optimizer
# Or give the repo agent any task in plain English
uv run python -m kiss.agents.coding_agents.repo_agent "Refactor the database layer for connection pooling."
The framework, the agents, and the optimizer are all open source: github.com/ksenxx/kiss_ai.
KISS is built by Koushik Sen. Contributions welcome.

Top comments (0)