How We Built RepoSherpa to Onboard Engineers in Minutes

#python #slackapp #ai #webdev

The Pain: “Where’s the README?”

Every new engineer joins Slack, asks a question, and waits for someone to paste the same doc link. We were spending hours re‑explaining how to run tests or deploy. RepoSherpa removes that bottleneck by answering onboarding questions directly inside Slack, backed by the repo’s own docs.

High-Level Architecture

Ingestion pipeline: clone a repo, chunk markdown, embed with OpenAI text-embedding-3-large, store vectors in PostgreSQL (pgvector-ready).
Retrieval + LLM: on each question, fetch relevant chunks and craft a prompt for gpt-4o-mini, streaming responses back to Slack.
Slack interface: Slash commands (/onboard, /readme, /repo-status) plus app_mention handler via Slack Bolt on FastAPI.
Background jobs: Celery + Redis coordinate ingestion, reindexing, and scheduled refreshes.

Building Blocks

FastAPI + Slack Bolt

FastAPI exposes /slack/events and dev-only helper routes (/dev/onboard, /dev/question) so we can test without Slack. The Slack Bolt AsyncApp registers slash commands and app mentions, delegating to an OnboardingService.

Chunking + Embedding

Markdown files are split into ~1KB chunks with section context. We embed via OpenAI’s text-embedding-3-large and store results as JSON in Postgres (ready for pgvector). This keeps the stack SQLModel-friendly while making similarity search efficient.

Retrieval-Augmented Generation

When someone types /readme How do I run migrations?, RepoSherpa:

Finds repo-channel mapping.
Runs similarity search over stored chunks.
Renders a prompt summarizing the chunks + question.
Calls OpenAI gpt-4o-mini with streaming enabled.
Returns Slack blocks: final answer + context citations.
Operational Glue

Celery workers handle embeddings, ingestion, and scheduled reindexing.
Redis backs Celery broker + distributed locks.
Alembic migrations manage the Postgres schema.
Environment-aware DB names (sherpa_dev, sherpa_test, etc.) keep developers isolated.

Local Dev Workflow

uvicorn app.main:app — reload and celery -A worker.celery_app worker.
Expose FastAPI via ngrok http 8000; paste the /slack/events URL into Slack Event Subscriptions and Slash Commands.
Use /dev/onboard + /dev/question routes (only when ENVIRONMENT=development) to test ingestion + retrieval without Slack.
Once happy, install the Slack app, map a channel to a repo using /onboard , and start asking /readme questions.
Lessons Learned
Small chunk sizes win: sub-1KB markdown slices reduced hallucinations because context stays tight.
Streaming LLM answers boost UX; Slack users see replies build in real time.
Slash command hygiene: every command shares the same Request URL, so we document commands, chat:write, app_mentions:read scopes and remind folks to reinstall the app after changing them.
Dev-mode APIs remove friction; teammates can curl ingestion endpoints before wiring Slack.

What’s Next

pgvector-native storage for faster ANN search.
Fine-grained repo access controls across Slack channels.
Git webhook triggers for automatic reindex when docs change.
Additional connectors (Confluence, Notion) feeding the same retrieval stack.

Try RepoSherpa

RepoSherpa is open source (Python 3.11). Clone it, set SLACK_BOT_TOKEN, SLACK_SIGNING_SECRET, OPENAI_API_KEY, and DATABASE_URL, then run the dev stack. Contributions welcome — especially around chunking strategies, retrieval heuristics, and new Slack workflows.