Same model. Different results. — AgentKit Benchmark + OpenCode Integration

#opensource #ai #gemma #devtools

We open-sourced AgentKit two weeks ago with zero guarantees anyone would care.

400+ clones later — we're shipping the biggest update yet. And we have benchmark data to back it up.

Quick note: AgentKit Preview is our closed, in-development intelligence layer. The fully open-source AgentKit is live and ready to use today at github.com/Ajaysable123/AgentKit — npx agentkit-ai@latest init gets you running in seconds.

Live Benchmark — Gemma 4 31b · Same Model · Same Task

Both runs used Gemma 4 31b via OpenCode. The only variable was AgentKit Preview's workflow enforcement, skill injection, and plan gates.

Benchmark	Vanilla OpenCode	+ AgentKit Preview
Structured planning before coding	0%	100%
Plan approved before first edit	—	✅ Yes (40.6s review)
Task interruptions	1x	0x
Task completion	20% (scaffolding only)	80% (DER parser implemented)
Hard problem solved	❌ No	✅ Yes

Without AgentKit — Gemma 4 31b gave up on the hard part and shipped placeholder strings ([ASN.1 Decoding Required]). No plan, no verification, interrupted once.

With AgentKit — Same Gemma 4 31b implemented a real custom ASN.1 DER parser, handled both UTCTime and GeneralizedTime, built expiration logic. Completed the task properly.

The model didn't get smarter. AgentKit's workflow gates changed its behavior:

Plan gate forced it to think through the DER parsing approach before writing code
Approval step made it commit to solving the hard problem instead of sidestepping it
State machine kept it accountable through RESEARCH → PLAN → EXECUTE → REVIEW

What else just landed

🔌 Native OpenCode Integration

AgentKit now ships a native TUI plugin for OpenCode that lives inside the terminal UI — not just in the system prompt.

Select the agentkit agent from the agent switcher and you get:

Pre-loaded skills injected automatically
Workflow gates (RESEARCH → PLAN → EXECUTE → REVIEW → SHIP)
Mandatory approval dialogs before any code edit
Memory context from previous sessions

🤖 Works With Any Model

The skill router, workflow engine, and marketplace run entirely via CLI — no Claude API required. Tested on Gemma 4 31b, MiniMax M2.5, and Claude.

# Works with any model in OpenCode
agentkit workflow transition RESEARCH
agentkit workflow approve
agentkit workflow transition EXECUTE

Get started

Open-source AgentKit (free — stable & ready to use):

npx agentkit-ai@latest init

👉 github.com/Ajaysable123/AgentKit

AgentKit Preview (closed beta — in active development)

To everyone who cloned, starred, or tried AgentKit — thank you. This is just getting started. 🚀




The callout block at the top does the heavy lifting — anyone who lands on the article immediately knows the open-source version is stable and available, and Preview is the next thing being built. Want any other changes?