Axit

Posted on Mar 25 • Originally published at aumiqx.com

Claude Code Agents: How I Run Two AI Agents as My Full Engineering Team

#claudecode #gemini #aiagents #multiagent

Solo Dev, Big Product, 22-Week Clock

SalesClawd is an AI marketing platform for small businesses. Three autonomous agents — SEO, Email, and Booking — run a business's entire marketing engine 24/7. A real-time dashboard where humans and agents collaborate. 10+ third-party integrations. Encrypted credentials. Multi-tenant security.

That's the product. Here's the team: me.

Originally, two developers were supposed to build this. Both dropped out before we wrote a single line of code. I was left with the same 22-week timeline, the same feature list, and zero teammates.

Options:

Extend the timeline (no — the market won't wait)
Cut scope (no — half a product is no product)
Hire (slow, expensive, and onboarding takes months)
Build a system that lets one person move like a team

I chose #4. And it's working.

This isn't "using ChatGPT to write code faster." I built an actual multi-agent development system using Claude Code agents and Gemini CLI — specialized AI agents running in parallel, reviewing each other's code, with nothing shipping to production until both sign off.

The result? Three products in three weeks: a 342-page website, a multi-agent marketing SaaS, and a real-time meeting intelligence tool. All with the same setup.

If you've read the explainers — alexop.dev has a great one on Claude Code's architecture — you know what agents, hooks, and skills are. This guide shows you what happens when you actually use them to build products. Every day. For months.

What Claude Code Agents Actually Are (Skip the Marketing)

Let's cut through the hype. Claude Code agents are isolated Claude instances with their own context window, tool access, and instructions. That's it. No magic. No AGI. Just scoped AI workers that do one thing well.

There are two types:

Subagents are spawned by a parent agent. They get a task, execute it, and return results. They don't see each other. The parent coordinates. Think of them as contract workers — you brief them, they deliver, they leave.

Agent Teams (shipped February 2026 with Opus 4.6) are peer-to-peer. They can message each other, share state, and coordinate without a central boss. Think of them as a squad.

But here's what nobody tells you: the agents themselves aren't the secret weapon. The configuration is.

Claude Code has two modes:

Deterministic: CLAUDE.md files and hooks. These run every time, no exceptions. Your coding standards, your file structure rules — these are laws.
Probabilistic: Skills and agents. Claude uses judgment about when and how to apply these — these are advisors.

When people complain that "AI agents don't work," they've usually put probabilistic trust where they needed deterministic rules. If you need strict TypeScript types, don't ask an agent — put it in CLAUDE.md.

According to Anthropic's 2026 Agentic Coding Trends Report, 95% of professional developers now use AI coding tools weekly. But most use a single agent, in a single context, doing one thing at a time. Multi-agent development is the jump that changes everything.

The Core Idea: One Builds, One Audits

I have access to two AI coding tools: Claude Code (Opus 4.6) and Gemini CLI (3.1 Pro). Most developers use one or the other. I use both — but not for the same thing.

The insight: one AI shouldn't review its own work.

If Claude writes a function and Claude reviews it, the same blind spots persist. The same assumptions go unchallenged. It's like grading your own exam.

But if Claude writes and Gemini reviews? Different training data. Different reasoning patterns. Different things they notice. Suddenly you have actual cross-review — the same benefit you get from two developers, but without the meetings or the Slack threads.

The system:

Claude Code = fast implementer. 4 parallel sessions, each with sub-agents. Builds features at speed.
Gemini CLI = strict auditor. 2 sessions. Reviews every piece of code for security bugs. Builds security-critical modules independently.

Neither agent merges to main without the other's sign-off. And I review everything before it ships.

The Architecture: 6 Sessions, 14-26 Parallel Operations

Here's the actual orchestration map from our repo:

+-----------------------------------------------------------------------+
|                     AGENT ORCHESTRATION MAP                            |
|                                                                       |
|  CLAUDE CODE (4 Primary Sessions)                                     |
|  +----------------+ +----------------+ +----------------+ +----------+|
|  | Session 1      | | Session 2      | | Session 3      | | Session 4||
|  | BACKEND        | | AGENT ENGINE   | | FRONTEND       | | INTEGR.  ||
|  |                | |                | |                | |          ||
|  | Sub-agents:    | | Sub-agents:    | | Sub-agents:    | | Sub-agts:||
|  | - Auth module  | | - Planner      | | - SEO panel    | | - WP plug||
|  | - Workspace    | | - Executor     | | - Email panel  | | - Google ||
|  | - Approval     | | - Verifier     | | - Booking view | | - Twilio ||
|  +----------------+ +----------------+ +----------------+ +----------+|
|                                                                       |
|  GEMINI CLI (2 Sessions)                                              |
|  +---------------------------+ +---------------------------+          |
|  | Session G1: REVIEWER      | | Session G2: BUILDER        |          |
|  | - Security audits         | | - Parallel module build   |          |
|  | - Code review (all PRs)   | | - Test generation         |          |
|  | - Schema validation       | | - Notification adapters   |          |
|  +---------------------------+ +---------------------------+          |
|                                                                       |
|  AXIT (Commander)                                                     |
|  +-- Reviews all PRs before merge                                     |
|  +-- Approves/rejects agent decisions via DECISIONS.md                |
|  +-- Steers priorities across sessions via SPRINT.md                  |
+-----------------------------------------------------------------------+

Each Claude session runs 3-5 sub-agents. That's 12-20 Claude operations plus 2 Gemini sessions — 14-22 effective concurrent operations at any time. In burst mode: up to 26.

The key constraint: clear ownership boundaries. Session 1 owns backend. Session 2 owns the agent engine. Session 3 owns frontend. Session 4 owns integrations. Gemini G1 owns security reviews. Gemini G2 owns crypto, RLS middleware, and notification adapters. No overlap. No file collisions.

The entire 22-week plan maps each of these 6 slots to specific work across 7 phases. Phase overlap is allowed — if Session 1 finishes its Phase 1 tasks early, it pulls Phase 2 tasks from the sprint board.

The Config: CLAUDE.md, Skills, Hooks, and Communication Scripts

Agents without config are expensive autocomplete. Here's the infrastructure that makes our Claude Code agents actually useful.

CLAUDE.md — The Rulebook

Every project has a CLAUDE.md at the root. Ours enforces: TypeScript strict (no any), Tailwind v4 with CSS variables, named exports over defaults, path alias @/* → src/*, and the golden rule: read before writing, match existing patterns, no unnecessary changes.

These rules are deterministic. Every agent, every session, every time. We also have directory-level CLAUDE.md files — the monorepo's API code has different rules than the frontend.

Skills — The Workflow Library

We maintain 30+ Claude Code skills:

Skill	What It Does
`/aumiqx`	Full 6-phase pipeline: brainstorm → design → implement → validate → ship
`/ship`	typecheck → lint → test → build → PR in one command
`/review`	Security + correctness review before pushing
`/fix`	Bug fix with full context reading first
`/meeting`	Real-time meeting intelligence with agent swarm

Skills are "how we work" encoded as repeatable processes. Write them once, use them forever.

Hooks — Self-Learning Automation

Claude Code hooks fire automatically on lifecycle events: pre-task routing (routes tasks to the right agent type), post-edit formatting, session memory save/restore, and intelligence learning that tracks which routing decisions succeed and adjusts over time.

Communication Layer — How the Agents Talk

The agents don't communicate directly. They use shared files:

Context Bridge (SYNC.md) — A living document both agents read and write. Tracks active work, review status, blockers, and architectural decisions both agents have agreed on.
Sprint Board (SPRINT.md) — Task assignments, statuses, branch names.
Review Reports — Gemini writes security findings to .claude/reports/gemini-review-*.md. Claude writes verification reports. Both are auditable.
Decision Log (DECISIONS.md) — Every architectural choice with date, context, and rationale.
Git Branches — Claude works on claude/*, Gemini on gemini/*. Neither pushes directly to main.

Five shell scripts automate the cross-agent workflow:

Script	Purpose
`gemini-review.sh`	Sends diff to Gemini with mode-specific prompts (quick/full/security)
`claude-verify.sh`	Runs TypeScript check + test suite + diff analysis
`merge-gate.sh`	Dual-gate verification — both agents must pass or merge is blocked
`dual-verify.sh`	Full 4-step verification: tests, typecheck, Gemini security review, summary
`gemini-implement.sh`	Delegates a build task to Gemini on a `gemini/*` branch

When the agents disagree? Both write their position to a conflicts/ folder. I read both, make the final call, and log it in DECISIONS.md. No merge proceeds until the conflict is resolved.

A Real Bug Catch: The Crypto Module Story

This happened on day one. It's the best demonstration of why cross-review works.

Step 1: Gemini Builds the Crypto Module

Task 0.A: build AES-256-GCM encryption for storing OAuth credentials. Gemini's first implementation used scryptSync with a hardcoded salt:

// Gemini's FIRST version (the one with bugs)
function deriveKey(masterKey: string): Buffer {
  return crypto.scryptSync(masterKey, "salesclawd-salt", 32);
}

It worked. Tests passed. Round-trip encryption succeeded. Gemini pushed to gemini/crypto-utils.

Step 2: Claude Reviews — Finds 2 Critical Bugs

The actual review report found:

Critical #1: ENCRYPTION_KEY not in env schema. Gemini's code imported env.ENCRYPTION_KEY but never added it to the Zod env schema. TypeScript error. Runtime crash on import. The app wouldn't even start.

Critical #2: Hardcoded salt. "salesclawd-salt" means every key derivation produces the same derived key. Per NIST SP 800-132, salts must be random per credential. A hardcoded salt defeats the purpose of key derivation entirely.

Plus warnings: scryptSync is designed for password hashing (intentionally slow), not for encrypting credentials where the key is already high-entropy — HKDF is more appropriate.

Step 3: Gemini Fixes Everything

The fixed version (now in production):

// FIXED — HKDF with random per-encryption salts
const SALT_LENGTH = 16;

function deriveKey(masterKey: string | Buffer, salt: Buffer): Buffer {
  return Buffer.from(
    crypto.hkdfSync("sha256", masterKey, salt,
      "salesclawd-encryption-v1", KEY_LENGTH)
  );
}

export function encrypt(plaintext: string, masterKey = env.ENCRYPTION_KEY): string {
  const salt = crypto.randomBytes(SALT_LENGTH);  // random salt per encryption
  const key = deriveKey(masterKey, salt);
  const iv = crypto.randomBytes(IV_LENGTH);

  const cipher = crypto.createCipheriv("aes-256-gcm", key, iv);
  const encrypted = Buffer.concat([cipher.update(plaintext, "utf8"), cipher.final()]);
  const tag = cipher.getAuthTag();

  // Format: salt:iv:tag:ciphertext (all hex)
  return [salt, iv, tag, encrypted].map(b => b.toString("hex")).join(":");
}

Two bugs found. Two bugs fixed. Total time: minutes, not days. No meeting. No Slack thread. Just an automated review, a structured report, and a fix.

A single AI reviewing its own code would likely miss its own assumptions. A second AI, from a completely different angle, spotted critical issues immediately.

The Merge Gate: Nothing Ships Without Both Agents

The merge gate is a shell script that runs two verification passes before any code reaches main:

#!/usr/bin/env bash
# merge-gate.sh — Dual-agent verification gate
set -euo pipefail

echo "============================================"
echo " MERGE GATE — Dual Agent Verification"
echo "============================================"

PASS_COUNT=0; FAIL_COUNT=0

# Gate 1: Claude verification (typecheck + tests)
echo "--- Gate 1: Claude Verification ---"
if bash scripts/claude-verify.sh "$BRANCH"; then
  echo "Claude: PASS"; PASS_COUNT=$((PASS_COUNT + 1))
else
  echo "Claude: FAIL"; FAIL_COUNT=$((FAIL_COUNT + 1))
fi

# Gate 2: Gemini security review
echo "--- Gate 2: Gemini Security Review ---"
if bash scripts/gemini-review.sh --security; then
  if grep -q "### Critical" "$LATEST_REPORT"; then
    echo "Gemini: FAIL (critical findings)"
    FAIL_COUNT=$((FAIL_COUNT + 1))
  else
    echo "Gemini: PASS"; PASS_COUNT=$((PASS_COUNT + 1))
  fi
fi

if [ "$FAIL_COUNT" -eq 0 ]; then
  echo " MERGE GATE: OPEN ($PASS_COUNT/2 passed)"
else
  echo " MERGE GATE: BLOCKED ($FAIL_COUNT failures)"; exit 1
fi

Gate 1: Claude runs TypeScript strict checking across the full monorepo + the entire Vitest test suite. Zero errors allowed.

Gate 2: Gemini receives the diff with a security-focused prompt covering SQL injection, XSS, CSRF, auth bypass, credential exposure, and multi-tenant isolation. If any finding is "Critical," the gate fails.

Both must pass. Even when they do, I still review before approving. Safety valve: if Gemini CLI is unavailable, Gate 2 is marked SKIPPED — but I manually verify security in that case.

What Breaks: Honest Failures and How We Fixed Them

Agentic coding isn't magic. Here's what went wrong.

Context Window Overflow

When we tried running 12 agents on a complex feature, half lost track. The context filled up with tool results and the agent forgot the original task.

Fix: Keep swarms at 6-8 max. Hierarchical topology — one coordinator, focused workers. If you need more parallelism, batch into sequential swarm runs.

File Collision

Two agents edited the same file simultaneously. The second write overwrote the first.

Fix: Clear file ownership per agent. CLAUDE.md declares boundaries. The coordinator resolves conflicts before they happen.

The "Helpful" Agent Problem

Agents sometimes "improve" code they weren't asked to touch — add comments, refactor functions, rename variables. All helpful in isolation, all destructive to a coordinated build.

Fix: CLAUDE.md rule: Do what has been asked; nothing more, nothing less. Deterministic. Fires every time.

Coordination Overhead Beyond 8 Agents

Agents spend more time reading shared memory than doing work.

Fix: SPARC methodology — Specification → Pseudocode → Architecture → Refinement → Completion. Good architecture creates such clear boundaries that agents rarely need to coordinate at all.

The Numbers: Day One Results

One day. One developer. Two AI agents.

Metric	Count
Tasks completed	12
Modules built	7 (auth, MCP Gateway, crypto, RLS, notifications, BullMQ, database)
Tests passing	20+
Security reviews completed	3
Critical bugs caught by cross-review	2
Bugs that reached main	0
Meetings held	0
Slack messages sent	0

Actual dual-verify output from today:

$ ./scripts/dual-verify.sh

==========================================
 DUAL AGENT VERIFICATION — Branch: main
==========================================

[1/4] Running test suite...
 ✓ tests/health.test.ts (6 tests) 58ms
 Test Files  1 passed (1)
      Tests  6 passed (6)
   Duration  293ms

[2/4] Running typecheck...
 Tasks:    6 successful, 6 total
   Time:   156ms >>> FULL TURBO

[3/4] Gemini security review...
 Review saved to: .claude/reports/gemini-review-20260326.md

[4/4] Verification complete.
=== All gates passed ===

156ms typecheck (Turbo caches everything). 293ms tests. Total verification: under 30 seconds.

The Workflow That Ships Products

Every product follows the same flow:

Scribe starts — recording every decision from minute one
Brainstorm — interactive with the human. Research, clarify, spec out.
Architect designs — file paths, data flow, component hierarchy. Runs on Opus for maximum reasoning.
Builders implement — frontend and backend in parallel, reading the architect's spec from shared memory.
Cross-review — Claude verifies (typecheck + tests), Gemini audits (security). Both must pass.
Ship — build, test, commit, push. One command: /ship.

Phases 1-2 are interactive (they need human taste). Phases 3-5 are autonomous (they need speed). Phase 6 is a checkpoint (human approves).

This is the /aumiqx command — a single slash command that orchestrates the entire multi-agent pipeline. Three products in three weeks. That's not hustle culture. That's systems thinking applied to agentic coding.

Try This Yourself

You don't need the exact same setup. The pattern works with any two AI tools.

Step 1: Give Each Agent a Different Job

Builder: Claude Code, Cursor, Windsurf — whatever writes code fastest
Auditor: Gemini CLI, a second Claude instance with a security prompt, or ChatGPT with a strict review prompt

Step 2: Create a Shared Communication Layer

Create a SYNC.md in your repo. Both agents read it before starting work. Both update it after completing work. Cheapest, most effective coordination — just a markdown file.

Step 3: Write a Merge Gate

#!/bin/bash
# Simple merge gate
npm test || { echo "BLOCKED: tests failed"; exit 1; }
gemini -p "Review for security bugs: $(git diff main)" > review.md
if grep -qi "critical" review.md; then
  echo "BLOCKED: critical security findings"; exit 1
fi
echo "MERGE GATE: OPEN"

Step 4: Never Skip the Gate

The moment you merge without running the gate "just this once," the system breaks down.

Step 5: Log Everything

Keep a DECISIONS.md. When you — or the agents — revisit a decision in week 12, the context is right there.

The tools don't matter. The pattern does: one agent builds, another audits, nothing ships without both signing off, and a human makes the final call.

What's Next: 22 Weeks, 100+ Tasks, Building in Public

We're in Phase 1 of a 7-phase, 22-week build. 100+ tasks across backend, agent engine, frontend, integrations, and security. Every task follows the same lifecycle: claim, implement, cross-review, merge gate, approve.

The next 21 weeks will test whether this scales. Phase 2 (autonomous agent execution loops)? Phase 5 (SEO tools with real APIs)? Phase 7 (production Terraform deployment)?

The future of building software isn't about replacing developers. It's about giving one developer the leverage of an entire team.

Solo doesn't mean alone anymore.

DEV Community