“You’re not underqualified — your resume just isn’t speaking the same language as the job description.”
That thought became the seed for AlignCV, an open-source AI tool that helps people align their resumes with job descriptions using semantic analysis and rewriting.
This is the story of how it evolved from a local FastAPI script to a fully modular AI resume engine — with real-time job matching, Supabase backend, Qdrant vector search, and plenty of late-night debugging.
The Idea
Every developer or job seeker has experienced this:
You apply to multiple roles. You wait. And you hear nothing back.
Most of the time, it’s not about skill — it’s about alignment. Recruiters use automated filters that look for specific keywords and phrasing. If your resume doesn’t match that pattern, it never reaches human eyes.
I wanted to fix that.
So I built something that could compare a resume to a job description using semantic similarity — highlighting missing keywords, strengths, and areas to improve. That’s how AlignCV started.
Building V1 — FastAPI, Streamlit, and Pure Focus
The first version of AlignCV was minimal:
- Backend: FastAPI + Sentence-BERT (MiniLM-L6-v2)
- Frontend: Streamlit single-page app
-
Endpoints:
/analyzeand/health - Privacy: 100% local — no accounts, no tracking
The analysis was simple: paste your resume and job description, get a similarity score, and view what you could improve.
Performance mattered too. With LRU caching, response times went from 3–5s down to <1s. Memory stayed around 500MB.
By October 2025, V1 shipped — stable, private, fully documented, and passing all 38 unit tests. It was simple but it worked.
Scaling Up — Turning a Prototype into a Product
After the first release, I wanted AlignCV to do more than compare text.
It needed to help users actually act on the insights — rewrite resumes, manage uploads, and find matching jobs.
That’s when V2 was born.
What Changed
- Modular FastAPI routers for auth, documents, AI, jobs, notifications
- Supabase as the new backend (Postgres + Row-Level Security)
- Qdrant for vector search and job recommendations
- Groq’s LLaMA 3.1 8B Instant model for resume rewriting
- Expanded test coverage for all new routes
It went from a cool NLP demo to a complete AI-driven resume platform.
The AI Rewrite Engine
The rewrite engine was the first big leap in V2.
It analyzed a resume, suggested better phrasing, and generated new versions for different roles.
Initially, I used Mistral, but after several API 422 errors and response mismatches, I switched to Groq’s LLaMA 3.1 8B Instant model.
That decision changed everything — faster inference, better formatting, and a reliable free tier for experimentation.
Of course, nothing came easy. The frontend sent resume_text, while the backend expected resume_id. One tiny mismatch caused hours of debugging.
The fix? Align the schemas and refresh the cached frontend files.
Lesson learned: always validate payloads on both sides.
Vector Search and Job Matching
After rewriting resumes, it made sense to find matching jobs automatically.
I added Qdrant as the vector store and upgraded embeddings from MiniLM (384-dim) → BAAI/bge-base-en-v1.5 (768-dim).
This unlocked accurate semantic matching for job descriptions — users could now see which openings best aligned with their resume content.
But then came the infamous ID bug:
The frontend sent a UUID, while the backend expected a TEXT job_id.
The result? Every “apply” and “bookmark” action failed silently.
A one-line fix — standardizing the payload key — solved it.
It’s always the small things that break the biggest flows.
Supabase Migration
At this point, Firebase and local SQLite setups felt too limited.
I migrated everything to Supabase.
The migration process was... bumpy.
Indexes tried to create themselves before tables existed.
Schema updates failed mid-run.
Fresh databases crashed tests.
To fix this, I wrote idempotent SQL scripts:
-
supabase_complete_schema.sql supabase_performance_indexes.sql
They could safely rerun without breaking anything.
That move gave AlignCV the structure it needed for long-term growth.
Notifications and Final Touches
By late stages of V2, AlignCV had notifications, unread counts, and document tracking.
And of course — more bugs.
One of the weirdest ones:
'str' object has no attribute 'get'
Turns out the backend returned a dict, while the frontend parsed it as a list.
Once I fixed the JSON structure, everything started working again.
These moments — frustrating but rewarding — were the essence of building AlignCV.
The Stack Today
- Backend: FastAPI (modular V2 architecture)
- AI: Groq LLaMA 3.1 8B Instant
- Database: Supabase (Postgres + RLS)
- Storage: Supabase Storage
- Vector Store: Qdrant
- Doc Processing: PyMuPDF + python-docx
- Embeddings: BGE-base-en-v1.5
- Caching: Redis (Upstash)
- Frontend: Streamlit multi-page app
- Task Queue: Celery
- Email: SendGrid
- NLP: SpaCy (en_core_web_sm)
Every major feature is tested, documented, and versioned.
What I Learned
- Ship early, optimize later. V1’s simplicity gave me confidence to expand.
- Schemas break silently. Test both ends of your API often.
- Docs are not optional. Future you will thank past you for every README.
- Switch tools when needed. Mistral → Groq made a massive difference.
- Vibe coding works. Some of the best features came from planned AI coding sessions.
AlignCV isn’t finished — it’s evolving.
You can check out the code here:
👉 github.com/Pratham-Dabhane/AlignCV
Learn in public. Code in flow. Ship the vibe.
That’s the mantra that built AlignCV.
Top comments (2)
Love the "ship early, optimize later" mentality! Your Mistral → Groq pivot is a masterclass in pragmatic engineering—switching tools mid-project when data shows it's better is underrated.
The semantic alignment angle is brilliant. Most ATS-beating tools just do keyword stuffing, but using BGE-base-en-v1.5 for embeddings means you're capturing actual meaning. A resume that says "led engineering teams" will correctly match "managed technical staff" even though there's zero keyword overlap.
Quick architecture question: Why BGE-base-en-v1.5 for embeddings but LLaMA 3.1 8B for generation? I'm curious if you tested using the same model family end-to-end (like LLaMA for both embedding and rewriting) to see if semantic consistency improved. Or was BGE's embedding quality just too good to pass up?
Also, 38 unit tests for V1 is impressive discipline for a weekend project. Most builders would've shipped tests after users complained 😅
Starred the repo—excited to see where you take the job matching features!
Thanks for the thoughtful note, Alex 😇—exactly the goal: semantic alignment over keyword stuffing. The early 38 tests kept regressions in check during the Mistral → Groq pivot. Next up: tightening job matching UX and adding small quality heuristics to the ranking pipeline.
Initially, I did try “one family end‑to‑end” (Mistral for both rewriting and embeddings by pooling its hidden states), but retrieval was weak (top‑5 recall ~74% on ~500 jobs, more false positives). BGE-base‑en‑v1.5 is much stronger for search (MTEB‑leading, high recall, 768‑dim), so I switched to BGE for embeddings and LLaMA 3.1 8B for generation (fast, better instruction following on Groq). Best accuracy + speed combo.