DEV Community

Cover image for I studied 5 repos with 200K+ combined stars and built the tool they were all missing
AJAY SABLE
AJAY SABLE

Posted on

I studied 5 repos with 200K+ combined stars and built the tool they were all missing

I build agentic AI systems for a living — multi-agent compliance pipelines, document orchestration, RAG-powered assistants. Claude Code is my daily driver.

Last month, my Claude Code bill hit $213.

Not because I was doing anything unusual. Standard development work. But I was burning tokens on skills that weren't relevant to my current task, re-explaining my project architecture every new session, and running Opus for tasks that Haiku could handle fine.

So I spent a few weeks studying the most popular tools trying to solve pieces of this problem:

Repo ⭐ Stars What it solves What it doesn't
Superpowers 108K Workflow methodology, TDD, subagent development No memory, no token optimization, no cost tracking
claude-mem 39.9K Session memory persistence No skills, no workflow, no model routing
awesome-claude-code 30.9K Curates 1,234+ skills It's a directory — no intelligence, no routing
ruflo 24.8K Multi-agent swarm orchestration Complex, heavy, uses 7x tokens
ui-ux-pro-max-skill ~500 Design-specific SKILL.md Single domain only

The pattern was obvious: everyone built one layer. Nobody built the intelligence layer that ties them together while cutting your costs.

I built that layer. It's called AgentKit.

npx agentkit init
Enter fullscreen mode Exit fullscreen mode

One command. Detects your platform. Installs everything. Starts saving tokens immediately.


What AgentKit actually does

Five layers, each solving a specific problem:

Layer 1: Intelligent Skill Router

This is the single biggest token saver.

The problem: You install 50 skills or dump everything into CLAUDE.md. All of it loads into context on every prompt — even when you're debugging Python and your React, Docker, and GraphQL skills are just sitting there burning tokens.

The fix: A 3-tier classifier that runs on every prompt:

# Tier 1: Keyword regex (instant, free)
# Tier 2: Heuristic scoring (instant, free)  
# Tier 3: Haiku fallback for ambiguous prompts (~$0.0003)

"Python AttributeError..."      debugging    (confidence: 1.00)
"Write jest tests..."           testing      (confidence: 1.00)
"Add JWT auth to REST API..."   api-work     (confidence: 0.50)
Enter fullscreen mode Exit fullscreen mode

It loads 2-5 relevant skills instead of all of them. And it uses progressive disclosure — skills load at 3 detail levels:

Level 1:  ~50 tokens   (trigger description — always loaded)
Level 2:  ~500 tokens  (core instructions — loaded when task confirmed)
Level 3:  ~2,000 tokens (full references — loaded for complex work)
Enter fullscreen mode Exit fullscreen mode

Result: 45,000 tokens/session → ~5,000 tokens/session. 89% reduction.

Plus a forced-eval hook that bumps skill activation from 20% to 84%:

# hooks/forced_eval.sh — PreToolUse hook
LOADED_SKILLS="${AGENTKIT_LOADED_SKILLS:-}"
if [ -z "$LOADED_SKILLS" ]; then
  exit 0
fi
echo "SKILL_EVAL: Before proceeding, check if any active skill applies: ${LOADED_SKILLS}"
Enter fullscreen mode Exit fullscreen mode

This one hook is probably worth the entire install.


Layer 2: Project Memory Graph

The problem: Claude forgets everything between sessions. Every morning you re-explain your architecture, re-discover API patterns, re-establish conventions.

The fix: A SQLite knowledge graph that automatically captures entities from your coding session:

-- memory/schema.sql
CREATE TABLE entities (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    type TEXT NOT NULL,  -- file, function, api_route, package, command
    context TEXT,
    confidence REAL DEFAULT 1.0,
    session_id TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE decisions (
    id INTEGER PRIMARY KEY,
    description TEXT NOT NULL,
    rationale TEXT,
    session_id TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- FTS5 for fast full-text search
CREATE VIRTUAL TABLE entities_fts USING fts5(name, context, content=entities);
Enter fullscreen mode Exit fullscreen mode

At session end, it generates a Haiku-compressed handoff (~$0.0015). Next session, it injects only the memory nodes relevant to your current task — not everything.

Result: 10,000 tokens of context → ~2,000 tokens. 80% reduction. And your agent actually knows what happened yesterday.


Layer 3: Token Budget Intelligence

Three automatic optimizations that stack:

Auto model routing:

def route_model(prompt: str, is_subagent: bool = False) -> str:
    if is_subagent:
        return "claude-haiku-4-5"          # Always cheapest for subagents

    if has_complex_signals(prompt):         # "architect", "security audit", etc.
        return "claude-opus-4-6"           # $15/M tokens — only when needed

    if has_simple_signals(prompt):          # "find", "list", "rename", etc.
        return "claude-haiku-4-5"          # $0.25/M tokens

    return "claude-sonnet-4-6"             # $3/M — the 80% default
Enter fullscreen mode Exit fullscreen mode

Thinking budget tuning:

Trivial tasks (file search, formatting):  0 thinking tokens     → saves $0.48/request
Moderate tasks (bug fixes, features):     8,192 thinking tokens → saves $0.36/request
Complex tasks (architecture, security):   32,000 thinking tokens → full power
Enter fullscreen mode Exit fullscreen mode

Real-time cost dashboard:

💰 $0.034 | 🧠 Sonnet | ⚡ 12,450 tok | 📈 saved 32% vs baseline
Enter fullscreen mode Exit fullscreen mode

Combined result: ~$200/mo → ~$60/mo. 70% cost reduction.


Layer 4: Workflow Engine

The problem: AI agents jump straight to coding. No research. No plan. Then you spend 3 hours debugging something that a 5-minute plan would have prevented.

The fix: An enforced state machine:

IDLE → RESEARCH → PLAN → EXECUTE → REVIEW → SHIP
Enter fullscreen mode Exit fullscreen mode

The plan gate hook literally blocks Edit/Write operations until a plan exists:

# hooks/plan_gate.sh — PreToolUse hook
# Blocks Edit/Write tools if workflow state is not PLAN or EXECUTE
TOOL="$1"
STATE=$(python3 "${AGENTKIT_HOME}/workflow/enforcer.py" --action check)

if [[ "$TOOL" =~ ^(Edit|Write) ]] && [[ "$STATE" != "PLAN" ]] && [[ "$STATE" != "EXECUTE" ]]; then
    echo "BLOCK: Cannot edit files without an approved plan. Run research first."
    exit 1
fi
Enter fullscreen mode Exit fullscreen mode

Quality gates run after every edit — syntax, lint, type checks, tests. Five languages supported: Python, TypeScript, JavaScript, Go, Rust.


Layer 5: Universal Platform Layer

One SKILL.md file → 10 platforms:

Claude Code   → Native SKILL.md + full hooks
Cursor        → .mdc rules in .cursor/rules/
Codex CLI     → AGENTS.md sections
Gemini CLI    → .gemini/GEMINI.md
Antigravity   → Plugin YAML
OpenCode      → Config JSON system prompt
Windsurf      → Cascade rules
Aider         → .aider.conf.yml
Kilo Code     → Plugin YAML
Augment       → Context file
Enter fullscreen mode Exit fullscreen mode

npx agentkit init detects which platforms you have installed and configures the right format for each. Zero manual conversion.


The numbers

Everything above has been smoke-tested with real prompts:

What Before AgentKit After AgentKit Change
Tokens per session (skills) ~45,000 ~5,000 89% ↓
Memory context tokens ~10,000 ~2,000 80% ↓
Monthly cost ~$200 ~$60 70% ↓
Skill activation rate 20% 84% 4.2x ↑
Platforms supported 1 10 10x
Can skip planning Always Never Enforced

How it works with existing tools

AgentKit doesn't replace Superpowers or claude-mem — it complements them:

  • With Superpowers: AgentKit adds the memory, token optimization, and model routing that Superpowers doesn't have. Use Superpowers for methodology + AgentKit for intelligence.
  • With claude-mem: AgentKit's memory graph is more structured (entities + relationships + decisions vs flat text), but they solve the same core problem. Use whichever fits your workflow.
  • With Ruflo swarms: AgentKit can optimize Ruflo swarm costs by routing worker agents to Haiku and loading only relevant skills per agent. (Phase 3 roadmap.)

Try it

# One command install
npx agentkit init

# Check what's running
npx agentkit status

# See your savings
npx agentkit costs
Enter fullscreen mode Exit fullscreen mode

GitHub: github.com/Ajaysable123/AgentKit

npm: npm install -g agentkit-ai

MIT licensed. 16 open issues tagged "good first issue" if you want to contribute. We already got our first external contributor submitting 4 new skills via PR within 48 hours of launch.

If it saves you money, star it ⭐. If something breaks, open an issue. PRs welcome — especially skills for languages and frameworks I haven't covered yet.


I'm Ajay — a Senior Gen AI Developer building agentic systems in production for FinTech and Logistics clients. I built AgentKit because I was tired of paying $200/month for Claude Code when 70% of those tokens were wasted. Follow me on GitHub or LinkedIn for updates on AgentKit and agentic AI development.

Top comments (0)