I built 6 AI app boilerplates that actually compile (RAG, lead scoring, support triage, resume parser, Slack bot, web scraper)

#ai #typescript #python #programming

Every time I need to prototype an AI feature, I go looking for a clean starting point. Ninety percent of the time, what I find is a Medium article with 40 lines of Python that imports three packages I have to guess the version of, calls openai.ChatCompletion.create (the old API), and has no error handling. I run it, it breaks, I spend an hour debugging the wrong thing.

So I built the versions I actually want to start from.

What I built

Six complete AI app repos — TypeScript and Python, all using the Anthropic SDK:

rag-pdf-chat — Upload a PDF, chunk it with token-aware splitting, embed it, store in a vector store (in-memory by default, Pinecone via env), and chat with the document. The answers include citations back to source chunks, so you can verify the model isn't hallucinating. It's Next.js 14 app-router, API routes only, with a minimal UI page.

lead-scoring-api — POST /score with a lead JSON payload, get back { score: 0-100, reasons: string[], tier }. Weights are configurable (JSON file or env vars). Fires a webhook when score crosses a threshold, so you can pipe hot leads straight into your CRM or notification system.

support-triage-agent — A FastAPI webhook that receives a support ticket and classifies it ({ category: bug|billing|howto|feature, priority: P0-P3, confidence }). When confidence is above a threshold it drafts a reply; when it's below, it escalates to a human. No LangChain — just the Anthropic SDK and Pydantic v2 models throughout.

resume-parser — POST /parse with a PDF or DOCX file, get back a structured JSON validated by a Pydantic schema: contact info, skills array, experience array (with dates, company, role, bullets), education array. Uses pypdf and python-docx for text extraction. Documented OCR fallback note for scanned-image PDFs.

slack-ai-bot — app_mention events reply with a Claude-generated response using the last N messages of the channel as context, so replies are coherent with the conversation. /summarize slash command summarizes a thread. Persists per-thread conversation history in an in-memory store (pluggable interface).

web-scraper-llm — POST /scrape with { url, schema } where schema is a JSON shape describing what you want extracted. The app fetches the page (rotating User-Agent pool), cleans the HTML to text, passes it to Claude, and returns data matching your schema. Local-fixture mode so tests don't hit the network.

The thing that makes them actually useful: offline tests

Every app has a test suite that runs without an API key or any network access. The Anthropic client is injected via a factory function — tests swap in a fake. Pinecone, Slack, and HTTP are all mocked.

app                   tests   result
──────────────────────────────────────
rag-pdf-chat          19      ✓ passed
lead-scoring-api      14      ✓ passed
support-triage-agent  12      ✓ passed
resume-parser          4      ✓ passed
slack-ai-bot           5      ✓ passed
web-scraper-llm        9      ✓ passed
──────────────────────────────────────
Total                 63      all green

When I say "it builds clean" I mean: npm install && npm run build && npm test goes green on a fresh clone, and pip install -r requirements.txt && pytest goes green too. No secret environment variables required for CI.

Why this matters more than you'd think

The reason AI-generated code is often useless in practice isn't that the logic is wrong — it's that it's missing the boring parts:

No lockfile, so npm install resolves different versions on your machine
No .env.example, so you don't know what to set
No retry logic, so one 429 from the LLM crashes the process
No error handling, so you get an unhandled exception instead of a { error: "..." } response
No tests, so you have no way to verify your changes don't break the happy path

These apps have all of that. Not because it's impressive — because it's what makes the difference between "I pasted this and it worked" and "I spent a day debugging before giving up."

Shared conventions and the prompts library

All six apps follow the same conventions: TypeScript apps use Vitest, ESM, strict mode, and Zod for input validation. Python apps use pytest and httpx for the test client. Every server exposes GET /health.

The prompt templates live in apps/_shared/prompts/ — one Markdown file per app. The apps load them at runtime, so you can tune a prompt without touching application code.

Cost estimates

Lead scoring and resume parsing are cheap — those run on claude-haiku-4-5 and cost under $1 for 1,000 requests. RAG chat and web scraping run on claude-sonnet-4-6 for quality and cost more. Full estimates (1k/10k/100k requests) are in COSTS.md.