Vilius

Posted on Jun 26 • Edited on Jun 30

How to onboard an existing project with AI tools

#ai #opensource #tutorial #agents

You cloned a mature project, pointed an AI agent at it, and it produced garbage. It didn't know the auth flow, tripped on schema quirks, and kept writing code that didn't fit. You blamed the model. It wasn't the model's fault.

The problem isn't the agent's ability to code. It's your project's ability to be coded by an agent. This guide fixes that — one phase at a time. You don't need to finish all of them today. Do one, come back next week. The goal is progress, not perfection.

Before you start — take stock

Not every project needs the same treatment. Spend 10 minutes assessing what you're working with:

Does it have docs? Are they accurate?
Is there a test suite? Unit, integration, E2E?
What's the auth situation? Service account, MFA, SSO?
What does the README claim vs what's actually true?
Is there a committed schema (GraphQL, OpenAPI) or is the contract tribal knowledge?

You're looking for gaps in what an agent needs to work effectively. You'll fill them one at a time.

Phase one: Documentation baseline

An agent relearns your project every session unless you give it somewhere to look. That somewhere is a docs/ folder at project root. Every plan, spec, architecture decision, and bug fix goes there. Organized home, reduced mental load.

What lives in `docs/`:

AGENTS.md — how the agent should write code for this project. Conventions to follow, patterns to avoid, gotchas that aren't obvious from the code. This is more important than the README in 2026. Start with one, keep it honest. (If your tool prefers CLAUDE.md or CURSOR_RULES, use those — same purpose, different name.)

docs/BUGS.md — bug catalog with root causes, symptoms, and fixes. The same issue doesn't need to be fixed twice. The agent reads this before starting new work.

docs/LESSONS.md — things that went wrong. Architectural decisions that aged poorly. What the agent should never do. This is durable institutional memory — more valuable than any lint rule. New team members (human or agent) read this first.

docs/TECH-DEBT.md — anti-pattern inventory with a phased fix plan. The agent checks this before refactoring so it doesn't step into known traps.

docs/SPECS/ — feature specs and implementation plans. The agent works from a written spec, not memory. One plan per feature, organised by status.

docs/SCHEMA.md (optional) — data model reference. Committed types, API contracts, field descriptions. If your project has a formal schema, document it. If not, the existing code is the contract — and that's fine.

Frontend docs cross-reference backend docs. One entry point. The agent goes there first, guesses second.

Setup in one paste — single source of truth

mkdir -p docs/SPECS && cat > AGENTS.md << 'EOF'
# AGENTS.md

Before writing code: read this file, then check docs/ for known issues,
the data model, and project conventions. Tests must pass after changes.

All project knowledge lives in docs/. Single source of truth.
Read docs/BUGS.md, docs/LESSONS.md, and docs/TECH-DEBT.md before any work.
EOF
touch docs/BUGS.md docs/LESSONS.md docs/TECH-DEBT.md
echo "✅ docs/ + AGENTS.md created — agent entry point is set"

Paste that in your project root. Creates docs/ as the single source of truth and AGENTS.md as the agent's entry point. The agent reads AGENTS.md first, everything else branches from docs/. Works with Claude Code, Codex, Cursor, Pi — any agent that respects project docs.

The agent only links to AGENTS.md. That file points to docs/. One entry point, everything discoverable from there.

⚠️ The `/init` trap

Your favourite CLI will scan your project and generate documentation. Some of it will be wrong — wrong architecture labels, wrong folder purposes, wrong dependency descriptions. Service-layered vs atomic architecture look identical to a scanner. They aren't.

Read every line it wrote. Find what's incorrect. Fix it. The effort isn't generating text — it's curation.

Phase two: Testing — the trust threshold

This is the hard part. It determines how much AI can assist and how much trust you can hand over. Evals, regression safety, autonomous debugging — everything lives or dies on whether this works.

Don't have a full suite yet? Start with one smoke test. One spec that proves auth works. Expand later — the first domino matters more than full coverage.

Authentication — the blocker

Most mature apps won't let an agent in without MFA, SSO, or some redirect chain. If a service account exists, use it. Otherwise:

Ideally, use a persistent browser profile. Login once, the profile saves, reuses forever. Covers MFA, SSO, redirect chains — handled on day one, never touched again. Playwright makes this trivial:

npx playwright open --save-storage=.auth/profile.json
# login manually once in the headed browser window
# profile.json now saves the session — reuse everywhere

If you can't create a dedicated profile, clone your actual profile. Less ideal — you're coupling test setup to your personal session — but it works and gets you moving.

Don't use CDP. Chrome DevTools Protocol sounds elegant. Every time I've tried it, it flakes. Connection drops, session expires mid-test, weird race conditions. More time debugging the connection than writing tests.

After auth — the details

Login scripts. Cookie popup dismissals. Wait-for-element by specific selectors. Build these once, commit them, forget them. The agent inherits the same setup — same profile, same scripts, same waits.

Phase three: MCP servers — extending what the agent can do

Once docs and auth are in place, the next step is giving the agent tools it doesn't have natively. MCP servers do that — they're project-local services the agent discovers and calls automatically.

Two that matter for project onboarding:

Playwright MCP
Drives a real browser. The agent navigates to URLs, clicks buttons, reads the DOM, takes screenshots. Uses the same persistent profile from phase two, so auth just works. Drop a link and describe the bug — the agent reproduces it, inspects the mismatch, and writes a fix. No manual reproduction steps, no screenshot pasting.

Configured in the project's .mcp.json, pointing at a dedicated Playwright instance with the persistent Chrome profile. The agent discovers it automatically on session start.

Context7 MCP
Gives the agent up-to-date documentation and code examples for any library or framework. When the agent needs to use an unfamiliar API, it queries Context7 instead of guessing or hallucinating. Covers the full ecosystem — React, Fastify, GraphQL, Playwright, everything with published docs.

Where to get these

Playwright: npm install -D @playwright/test then npx playwright install chromium. The MCP server comes from @anthropic-ai/mcp-playwright or configure it manually via the MCP specification.
Context7: Available at context7.com — install their MCP server and configure it in your project.
Auth setup: Run npx playwright open --save-storage=.auth/profile.json, login once, reuse everywhere.

Phase four: Feedback loop

Now you have docs, auth'd tests, and tool access. What connects them is the workflow:

Write a spec
Agent implements against it
Tests run — pass or fail
Agent fixes or you review the output
Commit

This is the cadence. It doesn't need tooling — a single cycle proves the chain works. The loop matters before the automation does.

Useful add-ons (not inbuilt, go install these)

Beyond the setup, there are third-party tools worth knowing about. These aren't built into any framework — you find and install them yourself:

Caveman — compressed specification writing for agent-friendly specs. Cuts token count ~75% while staying precise. Useful when you need the agent to work from a spec that's dense enough to fit in context and precise enough to not hallucinate around.

RTK (Rust Token Killer) — CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Filters and compresses command outputs (ls, git, cargo test, pytest, etc.) before they reach your LLM context. Single Rust binary, 100+ supported commands, <10ms overhead. Install via brew install rtk or rtk init --agent hermes. When work needs millions of tokens, RTK is the lever — cuts the noise, keeps the signal. github.com/rtk-ai/rtk

Ponytail — efficiency patterns for agentic development. Developer-laziness-as-virtue philosophy: ship what you own, validate what you don't. Covers code compression, structural laziness patterns, and avoiding over-engineering.

What this unlocks

Once docs, auth, and MCP tools exist, you've passed the threshold. Not the finish line — but the point where the agent becomes useful instead of dead weight.

Drop a link, describe the bug — the agent debugs it itself.
Playwright MCP opens the browser, navigates, interacts, reads the DOM. One prompt: "the dropdown on this page shows wrong values" → agent reproduces, identifies the mismatch between API response and render, fixes it.

Feed design screenshots — the agent builds from them.
Pixel-perfect? Not yet. Close enough that your job shifts from building to reviewing.

Ask for a new field — the agent checks the schema, migrates if needed, builds.
If your project has a committed schema (GraphQL, OpenAPI, types file), the agent reuses what exists or creates the migration. If it doesn't, the agent works from the existing code as the implicit contract. Either way, backend and frontend stay consistent without manual coordination.

Hand off to a colleague on another OS.
Windows, macOS, Linux — same setup, same agent, same reliability. I handed over a solution on a 2-hour call. The agent drove a persistent Chrome browser until deployment issues were resolved. No SSH. Just "fix this."

Ask it to fix code smells across the codebase.
With the test suite as a safety net, the agent can refactor confidently.

2026 best practices that changed the game

Agent-specific docs over human READMEs.
CLAUDE.md / AGENTS.md is the most important file in the project now. READMEs tell humans how to run it. Agent docs tell the agent how to think about it. Different audiences.

Document the data model.
If your project has a formal schema (GraphQL, OpenAPI, protobuf), commit it and make it the source of truth. If it doesn't, document the implicit contract — key types, API shapes, field meanings — in docs/SCHEMA.md. The agent reads this before touching data-layer code. The goal isn't a perfect schema, it's less guesswork.

Auth persistence as infrastructure.
Persistent browser profiles aren't a convenience. They're infrastructure. Without them, the agent spends half its context on re-authenticating. With them, every session starts ready to work.

LESSONS.md as institutional memory.
What went wrong last sprint. Why the agent should never mutate cache in a certain path. This prevents more bugs than any lint rule.

Document the anti-patterns, not just the patterns.
What not to do is more valuable than what to do. An agent can guess the pattern. It can't guess the mistake you made three months ago unless you wrote it down.

MCP servers extend tool reach.
The trend in 2026 is project-local MCP servers that give the agent capabilities it doesn't have natively — browser driving (Playwright MCP), documentation lookup (Context7 MCP). Configure them in the project, commit the config, the agent discovers them automatically.

What still needs you

Cross-file refactors with implicit dependencies — the agent misses ripple effects across distant modules
Product judgment — anything that needs taste, not correctness
First-time architectural patterns — you scaffold the structure, the agent fills it in
Curating the docs baseline — the initial curation is still a human task

The ratio shifts hard. Most of what took hours now takes minutes of review.

When to stop

You don't need full coverage. You need enough coverage to trust the output. One auth'd E2E test, one docs baseline, one MCP server — that's the minimum viable setup. Everything else is compound returns.

If today wasn't the day — that's fine. The project will be here tomorrow. Pick one thing, move it forward, stop when you've made progress.

Phases one and two are the hard ones. The rest is where the compound returns live. You can stop after any of them.

Top comments (1)

Raju Dandigam • Jun 30

This is a useful way to think about AI adoption in real codebases. The question is not just “which agent should we use?” but “is this project legible enough for an agent to work safely?” Accurate docs, committed schemas, reliable tests, and clear auth flows are now part of the developer experience for both humans and AI tools. I especially like the idea of improving the project one phase at a time instead of trying to make the whole codebase agent-ready overnight.