DEV Community

Fight Club
Fight Club

Posted on

We built an AI coding agent you design step by step

We built an AI coding agent you design step by step

You don't know what your AI coding agent actually does. There's a system prompt you've never read, written by someone who's not on your team. A tool restriction policy you can't inspect. A retry rule nobody documented. A refusal posture tuned by a trust-and-safety team that doesn't know you're testing your own honeypot. All of that, together, is the actual product you're paying for. The vendor just calls it the assistant.

You can swap the model. You can write a longer system message. You cannot change what the agent does.

ko makes the agent loop your code. Every task runs through a pipeline you compose, with the model, system prompt, tool allowlist, retry budget and transitions set per step. The Plan step can't rm. The Verify step can't write files. The Reviewer runs on a different provider than the Implementer ran on, so the second opinion is actually a second opinion.

$ ko "Add Stripe subscription billing with usage metering"
[PLAN]   3 models debating approach... consensus reached
[ACTION] Implementing: 6 files to create, 3 to modify
[VERIFY] Running tests... 14 passed, 0 failed
Done. 9 files changed, 847 lines. $0.03. 42 seconds.
Enter fullscreen mode Exit fullscreen mode

That's the standard pipeline (KO_STANDARD_MULTILLM) on a typical SaaS task. Plan with three frontier models debating. Implement on whoever's strongest right now. Verify on something cheap, because verifying is reading. Save the JSON, ko uses it next time you run.

What the loop actually looks like

Click "Graph" in the dashboard and you see the workflow your tasks run through. No hidden system prompt. No invisible retry policy. The whole state machine is in front of you.

A three-step pipeline rendered as a graph in the Knockout visual editor. Plan with role Planner and claude-opus-4.6, connecting to Implement with the implementer running claude-opus-4.6 plus gpt-5 plus gemini-2.5-pro in 30-second consensus mode, connecting to Verify with gemini-2.5-flash-lite. Edges show the transition conditions including default and tests_fail.

Each node is a step. Each edge is a transition with a condition. success goes here, tests_fail loops back there, security_issue jumps to a SecurityAudit step. The graph is the pipeline. The pipeline is JSON in your repo at .ko/pipeline.json. Drill into any node and you get this.

Detail view of the Plan step. Role Planner, model claude-opus-4.6, temperature 0.3, retry 1. All 18 tool chips are visible but none selected, because this Planner is configured to think in text only. The transitions table routes success to Implement, failure to _done and default to Implement. Pre and post hook slots sit below, with the planner system prompt textarea at the bottom.

Model, temperature, retry budget, tool allowlist, pre and post hooks, full system prompt. All editable. All visible. The seven canonical roles compose with eighteen tools wired in today. Bash, Read, Write, Edit, Glob, Grep, WebFetch, WebSearch, Diff, Clipboard, Docker, SQL, Patch and LSP for the work, plus AskUser, KoPlan, StepResult and Classify for agent handoffs.

{
  "name": "Plan",
  "role": "planner",
  "models": ["claude-opus-4.6"],
  "temperature": 0.3,
  "maxRetries": 1,
  "tools": [],
  "systemPrompt": "You are Knockout in PLANNER mode. Your only job is to write a short implementation plan in text. You have no tools this turn. The next step (Implement) has the write/edit/bash tools and will execute your plan.",
  "transitions": [
    { "condition": "success", "goto": "Implement" },
    { "condition": "failure", "goto": "_done" },
    { "condition": "default", "goto": "Implement" }
  ]
}
Enter fullscreen mode Exit fullscreen mode

That's a real Plan step. Notice "tools": []. The Planner is given no tools deliberately, so it has to think in plain text, and the next step does the actual work. There's no equivalent in any agent loop on the market.

Multi-model on any step

Multi-model in ko is a step-level setting. You apply it wherever you need it, with a strategy that fits the step.

  • consensus runs N models in parallel and auto-picks if their outputs agree above a similarity threshold. If they diverge, a judge model picks the winner.
  • fallback tries models sequentially until one succeeds. Cheap insurance for routine steps that occasionally hit a rate limit.
  • chain pipes one model's output into the next for sequential refinement.
  • present_all runs them in parallel and offers you a numbered choice in the CLI.

Stick a consensus step in front of Implement. Run verification through two verifiers to cross-check. Put a security-audit consensus on top of any edit step. Any step, any pipeline.

Templates for the work you actually do

Nine pipelines ship out of the box. They cover most real workflows.

  • KO_oneStep is the default. One Implement step, full permissions. The pipeline equivalent of "just do the thing".
  • KO_economy runs Implement on DeepSeek V3 and Verify on Gemini Flash. Cheapest end to end.
  • KO_standard is the multi-model consensus pipeline from the demo above.
  • KO_consultant routes first. Trivial tasks skip straight to Implement. Complex ones get the full multi-model treatment.
  • KO_TDD enforces tests-first. WriteTest has to pass before Implement runs. Refactor cleans up. Verify confirms.
  • KO_SECURE adds a SecurityAudit step after Implement that loops back on any vulnerability found.
  • KO_consensus runs three frontier models in parallel with a judge picking the winner. Maximum quality, when you actually need it.
  • KO_review is read-only. Analyze, then Review, then Report. No writes. Good for code review on a branch you don't own.
  • KO_superpowers is the full-discipline pipeline. Brainstorm, Plan, WriteTest, Implement, CodeReview, Verify.

Fork any of them, save the JSON, ko uses it next time you run.

Guardrails are configurable categories

This is where ko differs hardest from every other agent on the market. Every other tool ships one refusal posture, one set of safety rules, tuned to a median user, applied identically to every task. ko gives you 15 guardrail categories you set per pipeline.

Categories include refusal_posture, anti_hallucination, anti_fabrication, objectivity, tone_verbosity, destructive_ops, credentials, code_file_discipline, professional_disclaimers, citation_format, profanity and others. Each one has levels: strict, normal, off, plus extras like lenient for refusal_posture and terse or verbose for tone_verbosity.

So KO_SECURE on a production migration runs destructive_ops at strict. KO_legal on a contract review runs professional_disclaimers at strict. KO_review for code review on someone else's branch sets refusal_posture to lenient so the model doesn't refuse to comment on the questionable patterns it finds. The guardrails reflect the task you're running, not a median user across all possible tasks.

The model is a variable

ko works with Anthropic, OpenAI, Google, Mistral, Cohere, Groq, Cerebras, Together, Fireworks, OpenRouter, Ollama, DeepSeek, Perplexity, xAI, SambaNova, Azure OpenAI, AWS Bedrock, Hugging Face and Replicate today, with new ones added as workloads demand. Bring your own keys, or use the platform pool with auto top-up.

Your pipeline treats the model as a variable. Vendors don't get to pin you to their roadmap. When the next frontier model ships, you change one field and your pipelines run on it. The OpenRouter catalog syncs live too, so new models appear in your dropdown the day they release, with current wholesale pricing auto-updated the same day.

Parallel agents, file locking, replayable transcripts

Run multiple ko sessions in parallel. Different tasks, different branches, different machines, one account. They show up in one dashboard. Within a single session, pipeline steps can run in parallel too. A ParallelGroup with concurrency 5 runs five sub-agents at once, each with its own sub-session ID and forked conversation history. CLI side, there's a file lock table with read/write locks per file. Write tools fail fast on conflict. Same agent can re-entrantly write to a file. No race conditions on parallel edits.

Every run produces a complete replayable transcript. Every prompt, every tool call, every model response, every cost, stamped with the step that issued it. Stored for audit, review, dispute resolution, or just "what the hell did I run yesterday". One-time stream tickets (30-second TTL, single-use) let you share a live session view without leaking long-lived tokens.

Run the dashboard against the cloud, or run it against your own ko daemon on your own machine if you'd rather not stream to a SaaS. Both modes ship.

MCP mode for your IDE

ko ships an MCP server. Works in Cursor, Claude Desktop, Zed, VS Code, JetBrains and Windsurf. Same pipelines, same budgets, same vault, called from inside your editor. The CLI binary and the MCP server are the same Go binary running in different modes.

Vault and signed skills

Your API keys are encrypted with a scrypt-derived key from a password we never store. Per-user vault, AES-256-GCM throughout. Sessions, prompts, saved pipelines and cross-machine snapshots all encrypted with the same key. Lose the password and there's no recovery, you reset the vault. That's deliberate.

Skills are Ed25519-signed and verified on the client before they run. Three ship pre-installed. Superpowers covers TDD, debugging, planning and review patterns. Context7 fetches live library docs so the agent stops hallucinating APIs that don't exist. Security Guidance scans every edit for vulnerabilities before it lands. Community submissions go through an admin review pipeline before they're signed.

Active sessions run in memory on mTLS-secured compute nodes with no disk persistence and zeroed buffers at session end. Cross-machine code sync encrypts on your machine before upload, so plaintext never reaches the cloud.

Cost and billing

ko is cheaper too, as a consequence of the architecture. Each of the seven canonical roles declares what tier of model it actually needs. Router and Verifier are cheap and fast. Reasoning falls to Planner and Reviewer. The implementing roles do the heavy lifting on whatever frontier model fits the step. Spread across a real workflow, that's around 80% less than running one frontier model on every step. The standard pipeline does "Add JWT auth with refresh tokens" on an Express codebase for about $0.62, against Opus-end-to-end's $2.78. The savings come with cross-checking. Three frontier minds debated the plan. The tests got written by two providers with different blind spots. Two verifiers had to sign off before the run completed. Cheaper and more rigorous than running one expensive model on everything.

The Knockout platform itself is free to sign up. You pay for AI usage from there, with a small pass-through on the hosted models to cover infra. The exact token math is visible on every run, no surprise line items. If you'd rather use your own Anthropic or OpenAI accounts at zero markup, there's a small monthly fee that switches that on. Per-task and per-session budget caps, with a daily cap layered on top. Live spend meter as the run executes. CSV export of everything. Low-balance alerts before you're blocked, auto top-up if you want to set it and forget it.

Typical task prices land at $0.01-$0.05 for a quick fix, $0.05-$0.50 for a full feature, $0.50-$2.00 for a bespoke project from a template wizard.

The engine is general

Coding is the deepest catalog, but the same engine drives non-coding pipelines too. Legal does contract review with adversarial advocate-vs-critic debate. Finance has bull/bear investment analysis. The HR catalog runs employee and employer angles on policy design in parallel. Plus pipelines for business strategy, research, creative work and personal decision support. Eight verticals total, one engine, one wallet.

Because the engine is a pipeline orchestrator, you can build pipelines for whatever your team actually needs. A pipeline for triaging support tickets. Another for analyzing incident postmortems. A third for the on-call runbook your senior wrote in Notion. Each one is JSON in your repo or your team's shared library.

Try it now

curl -fsSL https://fightclub.pro/ko/install.sh | sh
ko auth login
ko "your first prompt"
Enter fullscreen mode Exit fullscreen mode

Auto-detects your OS and architecture (macOS, Linux, WSL). Free to sign up, no credit card needed. Hosted models from minute one, or your own keys for a small monthly fee. Five-minute tutorial at fightclub.pro/tutorial.

Knockout is one of three products on the platform. Ringside is the sister API at api.fightclub.pro/v1/*, an OpenAI-compat endpoint with assistants, batch, files, audio, image, embeddings and response caching, billed per token with a real customer wallet, for when you want to build your own AI product without re-implementing all this from scratch. Fight Club is the original public site at fightclub.pro where you pit LLMs against each other in structured debates and the public votes on the result. Your login and wallet from any of them work in the other two.


Suggested dev.to tags: ai, agents, cli, devtools

Top comments (1)

Collapse
 
christopher_karatzinis_7b profile image
Christopher Karatzinis

Easy to install, works fine