AJAY SABLE

Posted on Mar 31

I studied 5 repos with 200K+ combined stars and built the tool they were all missing

#ai #claudecode #productivity #opensource

I build agentic AI systems for a living — multi-agent compliance pipelines, document orchestration, RAG-powered assistants. Claude Code is my daily driver.

Last month, my Claude Code bill hit $213.

Not because I was doing anything unusual. Standard development work. But I was burning tokens on skills that weren't relevant to my current task, re-explaining my project architecture every new session, and running Opus for tasks that Haiku could handle fine.

So I spent a few weeks studying the most popular tools trying to solve pieces of this problem:

Repo	⭐ Stars	What it solves	What it doesn't
Superpowers	108K	Workflow methodology, TDD, subagent development	No memory, no token optimization, no cost tracking
claude-mem	39.9K	Session memory persistence	No skills, no workflow, no model routing
awesome-claude-code	30.9K	Curates 1,234+ skills	It's a directory — no intelligence, no routing
ruflo	24.8K	Multi-agent swarm orchestration	Complex, heavy, uses 7x tokens
ui-ux-pro-max-skill	~500	Design-specific SKILL.md	Single domain only

The pattern was obvious: everyone built one layer. Nobody built the intelligence layer that ties them together while cutting your costs.

I built that layer. It's called AgentKit.

npx agentkit init

One command. Detects your platform. Installs everything. Starts saving tokens immediately.

What AgentKit actually does

Five layers, each solving a specific problem:

Layer 1: Intelligent Skill Router

This is the single biggest token saver.

The problem: You install 50 skills or dump everything into CLAUDE.md. All of it loads into context on every prompt — even when you're debugging Python and your React, Docker, and GraphQL skills are just sitting there burning tokens.

The fix: A 3-tier classifier that runs on every prompt:

# Tier 1: Keyword regex (instant, free)
# Tier 2: Heuristic scoring (instant, free)  
# Tier 3: Haiku fallback for ambiguous prompts (~$0.0003)

"Python AttributeError..."     → debugging    (confidence: 1.00)
"Write jest tests..."          → testing      (confidence: 1.00)
"Add JWT auth to REST API..."  → api-work     (confidence: 0.50)

It loads 2-5 relevant skills instead of all of them. And it uses progressive disclosure — skills load at 3 detail levels:

Level 1:  ~50 tokens   (trigger description — always loaded)
Level 2:  ~500 tokens  (core instructions — loaded when task confirmed)
Level 3:  ~2,000 tokens (full references — loaded for complex work)

Result: 45,000 tokens/session → ~5,000 tokens/session. 89% reduction.

Plus a forced-eval hook that bumps skill activation from 20% to 84%:

# hooks/forced_eval.sh — PreToolUse hook
LOADED_SKILLS="${AGENTKIT_LOADED_SKILLS:-}"
if [ -z "$LOADED_SKILLS" ]; then
  exit 0
fi
echo "SKILL_EVAL: Before proceeding, check if any active skill applies: ${LOADED_SKILLS}"

This one hook is probably worth the entire install.

Layer 2: Project Memory Graph

The problem: Claude forgets everything between sessions. Every morning you re-explain your architecture, re-discover API patterns, re-establish conventions.

The fix: A SQLite knowledge graph that automatically captures entities from your coding session:

-- memory/schema.sql
CREATE TABLE entities (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    type TEXT NOT NULL,  -- file, function, api_route, package, command
    context TEXT,
    confidence REAL DEFAULT 1.0,
    session_id TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE decisions (
    id INTEGER PRIMARY KEY,
    description TEXT NOT NULL,
    rationale TEXT,
    session_id TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- FTS5 for fast full-text search
CREATE VIRTUAL TABLE entities_fts USING fts5(name, context, content=entities);

At session end, it generates a Haiku-compressed handoff (~$0.0015). Next session, it injects only the memory nodes relevant to your current task — not everything.

Result: 10,000 tokens of context → ~2,000 tokens. 80% reduction. And your agent actually knows what happened yesterday.

Layer 3: Token Budget Intelligence

Three automatic optimizations that stack:

Auto model routing:

def route_model(prompt: str, is_subagent: bool = False) -> str:
    if is_subagent:
        return "claude-haiku-4-5"          # Always cheapest for subagents

    if has_complex_signals(prompt):         # "architect", "security audit", etc.
        return "claude-opus-4-6"           # $15/M tokens — only when needed

    if has_simple_signals(prompt):          # "find", "list", "rename", etc.
        return "claude-haiku-4-5"          # $0.25/M tokens

    return "claude-sonnet-4-6"             # $3/M — the 80% default

Thinking budget tuning:

Trivial tasks (file search, formatting):  0 thinking tokens     → saves $0.48/request
Moderate tasks (bug fixes, features):     8,192 thinking tokens → saves $0.36/request
Complex tasks (architecture, security):   32,000 thinking tokens → full power

Real-time cost dashboard:

💰 $0.034 | 🧠 Sonnet | ⚡ 12,450 tok | 📈 saved 32% vs baseline

Combined result: ~$200/mo → ~$60/mo. 70% cost reduction.

Layer 4: Workflow Engine

The problem: AI agents jump straight to coding. No research. No plan. Then you spend 3 hours debugging something that a 5-minute plan would have prevented.

The fix: An enforced state machine:

IDLE → RESEARCH → PLAN → EXECUTE → REVIEW → SHIP

The plan gate hook literally blocks Edit/Write operations until a plan exists:

# hooks/plan_gate.sh — PreToolUse hook
# Blocks Edit/Write tools if workflow state is not PLAN or EXECUTE
TOOL="$1"
STATE=$(python3 "${AGENTKIT_HOME}/workflow/enforcer.py" --action check)

if [[ "$TOOL" =~ ^(Edit|Write) ]] && [[ "$STATE" != "PLAN" ]] && [[ "$STATE" != "EXECUTE" ]]; then
    echo "BLOCK: Cannot edit files without an approved plan. Run research first."
    exit 1
fi

Quality gates run after every edit — syntax, lint, type checks, tests. Five languages supported: Python, TypeScript, JavaScript, Go, Rust.

Layer 5: Universal Platform Layer

One SKILL.md file → 10 platforms:

Claude Code   → Native SKILL.md + full hooks
Cursor        → .mdc rules in .cursor/rules/
Codex CLI     → AGENTS.md sections
Gemini CLI    → .gemini/GEMINI.md
Antigravity   → Plugin YAML
OpenCode      → Config JSON system prompt
Windsurf      → Cascade rules
Aider         → .aider.conf.yml
Kilo Code     → Plugin YAML
Augment       → Context file

npx agentkit init detects which platforms you have installed and configures the right format for each. Zero manual conversion.

The numbers

Everything above has been smoke-tested with real prompts:

What	Before AgentKit	After AgentKit	Change
Tokens per session (skills)	~45,000	~5,000	89% ↓
Memory context tokens	~10,000	~2,000	80% ↓
Monthly cost	~$200	~$60	70% ↓
Skill activation rate	20%	84%	4.2x ↑
Platforms supported	1	10	10x
Can skip planning	Always	Never	Enforced

How it works with existing tools

AgentKit doesn't replace Superpowers or claude-mem — it complements them:

With Superpowers: AgentKit adds the memory, token optimization, and model routing that Superpowers doesn't have. Use Superpowers for methodology + AgentKit for intelligence.
With claude-mem: AgentKit's memory graph is more structured (entities + relationships + decisions vs flat text), but they solve the same core problem. Use whichever fits your workflow.
With Ruflo swarms: AgentKit can optimize Ruflo swarm costs by routing worker agents to Haiku and loading only relevant skills per agent. (Phase 3 roadmap.)

Try it

# One command install
npx agentkit init

# Check what's running
npx agentkit status

# See your savings
npx agentkit costs

GitHub: github.com/Ajaysable123/AgentKit

npm: npm install -g agentkit-ai

MIT licensed. 16 open issues tagged "good first issue" if you want to contribute. We already got our first external contributor submitting 4 new skills via PR within 48 hours of launch.

If it saves you money, star it ⭐. If something breaks, open an issue. PRs welcome — especially skills for languages and frameworks I haven't covered yet.

I'm Ajay — a Senior Gen AI Developer building agentic systems in production for FinTech and Logistics clients. I built AgentKit because I was tired of paying $200/month for Claude Code when 70% of those tokens were wasted. Follow me on GitHub or LinkedIn for updates on AgentKit and agentic AI development.

DEV Community