TeHsuWang

Posted on Apr 10

I Built This Tool With Three AIs at Once — Claude, Gemini, and Copilot

#ai #webdev #productivity #opensource

Most AI-assisted development follows a simple pattern: open a chat, describe the problem, copy the output, adjust, repeat. One AI, one conversation, one step at a time.

aeoptimize was built differently.

It's an open-source CLI that scores websites for AI readability — how likely your content is to be cited by ChatGPT, Perplexity, or Google AI Overview. During development, we dispatched different components to different AI systems in parallel. The tool does the same thing at runtime.

The division of labor

We split the work by capability fit, not convenience:

Claude: core architecture, scoring engine, security audit
Gemini: Vite plugin
Copilot: Next.js plugin

Claude handles long-context reasoning well, which suited the 17-rule scoring engine and the adversarial review. Gemini and Copilot each received a self-contained plugin with a clear interface, where code generation speed mattered more than sustained context. The two plugins also needed to conform to their respective framework conventions, and each AI had stronger exposure to one ecosystem's idioms.

What each AI produced

Claude built the core: the HTML/Markdown parser, 17 deterministic scoring rules across 5 dimensions, the llms.txt generator, JSON-LD schema generator, and the pre-commit hook system. Four rounds of security review — SSRF protection, shell injection prevention, staged content scanning — were also Claude's work.

Gemini built the Vite plugin (~52 lines). Hooks into configResolved and closeBundle to auto-generate llms.txt and JSON-LD at build time. Clean on the first pass.

Copilot built the Next.js plugin (~55 lines). Same behavior, different framework conventions. Also clean.

Both ship under the same npm package with separate export paths (aeoptimize/vite, aeoptimize/next).

The adversarial review

After the first stable version, we ran a Codex security review — treating it as an attacker, not a collaborator.

Four HIGH severity findings:

GitHub Action using unpinned npx aeoptimize@latest — any future compromised version runs automatically in CI
Pre-commit hook scoring the working tree instead of the staged diff
Hook install/uninstall replacing the entire .git/hooks/pre-commit, destroying other hooks in the process
SSRF bypass via redirects — following hops without validating each destination, which could resolve a public URL to an internal IP

All four fixed in v0.5.2. The GitHub Action now uses bundled $GITHUB_ACTION_PATH/../dist/cli/index.js. The hook reads staged content via git show :"$FILE". Install/uninstall uses # BEGIN/END aeoptimize delimited blocks. Each redirect hop is now validated against private IP ranges before following.

The redirect issue is worth dwelling on. The original code validated the initial URL — that seemed sufficient. An adversarial reviewer asked what happened when a validated URL redirected somewhere else. The answer was not good. That kind of question is harder to ask about code you wrote.

Multi-AI scoring at runtime

The same pattern extends to the tool itself:

npx aeoptimize scan your-site.com --multi-ai

With --multi-ai, the tool checks whether gemini and copilot CLIs are installed, sends each the page content for independent evaluation, then merges results with the rule engine:

2 AIs available: 50/50 between rule score and AI consensus
1 AI available: 60/40 in favor of the rule engine
No AIs: 100% rule engine

The rule engine runs offline and is fully deterministic — the base scan has no cost and no rate limits. The AI layer adds qualitative insight when available.

Score: 72/100 (Rule Engine: 61 | AI Consensus: 83)

AI Insights:
  Gemini:  "Missing llms.txt reduces discoverability by AI crawlers"
  Copilot: "FAQ section present but lacks FAQPage schema markup"

Why this works

The redirect vulnerability explains it well enough. The developer who wrote the code also validated it. The validation made sense given what the developer intended. An external reviewer — one not trying to make the code work — looked at the same function and found the gap.

Independent evaluation surfaces assumptions the original author couldn't see, because those assumptions were load-bearing when the code was written.

That applies to security review. It also applies to content quality, schema completeness, and whatever else you're trying to score consistently.

Try it

npx aeoptimize scan your-site.com
npx aeoptimize scan your-site.com --multi-ai

GitHub: https://github.com/dexuwang627-cloud/aeoptimize
npm: https://www.npmjs.com/package/aeoptimize

If you've used a multi-AI workflow in your own work, curious how you split it up.

DEV Community