Pratham Dabhane

Posted on Oct 20 • Originally published at github.com

How I Built AlignCV — From a Weekend Idea to an AI-Powered Resume Engine

#ai #fastapi #machinelearning #python

“You’re not underqualified — your resume just isn’t speaking the same language as the job description.”

That thought became the seed for AlignCV, an open-source AI tool that helps people align their resumes with job descriptions using semantic analysis and rewriting.

This is the story of how it evolved from a local FastAPI script to a fully modular AI resume engine — with real-time job matching, Supabase backend, Qdrant vector search, and plenty of late-night debugging.

The Idea

Every developer or job seeker has experienced this:

You apply to multiple roles. You wait. And you hear nothing back.

Most of the time, it’s not about skill — it’s about alignment. Recruiters use automated filters that look for specific keywords and phrasing. If your resume doesn’t match that pattern, it never reaches human eyes.

I wanted to fix that.

So I built something that could compare a resume to a job description using semantic similarity — highlighting missing keywords, strengths, and areas to improve. That’s how AlignCV started.

Building V1 — FastAPI, Streamlit, and Pure Focus

The first version of AlignCV was minimal:

Backend: FastAPI + Sentence-BERT (MiniLM-L6-v2)
Frontend: Streamlit single-page app
Endpoints: /analyze and /health
Privacy: 100% local — no accounts, no tracking

The analysis was simple: paste your resume and job description, get a similarity score, and view what you could improve.

Performance mattered too. With LRU caching, response times went from 3–5s down to <1s. Memory stayed around 500MB.

By October 2025, V1 shipped — stable, private, fully documented, and passing all 38 unit tests. It was simple but it worked.

Scaling Up — Turning a Prototype into a Product

After the first release, I wanted AlignCV to do more than compare text.

It needed to help users actually act on the insights — rewrite resumes, manage uploads, and find matching jobs.

That’s when V2 was born.

What Changed

Modular FastAPI routers for auth, documents, AI, jobs, notifications
Supabase as the new backend (Postgres + Row-Level Security)
Qdrant for vector search and job recommendations
Groq’s LLaMA 3.1 8B Instant model for resume rewriting
Expanded test coverage for all new routes

It went from a cool NLP demo to a complete AI-driven resume platform.

The AI Rewrite Engine

The rewrite engine was the first big leap in V2.

It analyzed a resume, suggested better phrasing, and generated new versions for different roles.

Initially, I used Mistral, but after several API 422 errors and response mismatches, I switched to Groq’s LLaMA 3.1 8B Instant model.

That decision changed everything — faster inference, better formatting, and a reliable free tier for experimentation.

Of course, nothing came easy. The frontend sent resume_text, while the backend expected resume_id. One tiny mismatch caused hours of debugging.

The fix? Align the schemas and refresh the cached frontend files.

Lesson learned: always validate payloads on both sides.

Vector Search and Job Matching

After rewriting resumes, it made sense to find matching jobs automatically.

I added Qdrant as the vector store and upgraded embeddings from MiniLM (384-dim) → BAAI/bge-base-en-v1.5 (768-dim).

This unlocked accurate semantic matching for job descriptions — users could now see which openings best aligned with their resume content.

But then came the infamous ID bug:

The frontend sent a UUID, while the backend expected a TEXT job_id.

The result? Every “apply” and “bookmark” action failed silently.

A one-line fix — standardizing the payload key — solved it.

It’s always the small things that break the biggest flows.

Supabase Migration

At this point, Firebase and local SQLite setups felt too limited.

I migrated everything to Supabase.

The migration process was... bumpy.

Indexes tried to create themselves before tables existed.

Schema updates failed mid-run.

Fresh databases crashed tests.

To fix this, I wrote idempotent SQL scripts:

supabase_complete_schema.sql
supabase_performance_indexes.sql

They could safely rerun without breaking anything.

That move gave AlignCV the structure it needed for long-term growth.

Notifications and Final Touches

By late stages of V2, AlignCV had notifications, unread counts, and document tracking.

And of course — more bugs.

One of the weirdest ones:

'str' object has no attribute 'get'

Turns out the backend returned a dict, while the frontend parsed it as a list.

Once I fixed the JSON structure, everything started working again.

These moments — frustrating but rewarding — were the essence of building AlignCV.

The Stack Today

Backend: FastAPI (modular V2 architecture)
AI: Groq LLaMA 3.1 8B Instant
Database: Supabase (Postgres + RLS)
Storage: Supabase Storage
Vector Store: Qdrant
Doc Processing: PyMuPDF + python-docx
Embeddings: BGE-base-en-v1.5
Caching: Redis (Upstash)
Frontend: Streamlit multi-page app
Task Queue: Celery
Email: SendGrid
NLP: SpaCy (en_core_web_sm)

Every major feature is tested, documented, and versioned.

What I Learned

Ship early, optimize later. V1’s simplicity gave me confidence to expand.
Schemas break silently. Test both ends of your API often.
Docs are not optional. Future you will thank past you for every README.
Switch tools when needed. Mistral → Groq made a massive difference.
Vibe coding works. Some of the best features came from planned AI coding sessions.

AlignCV isn’t finished — it’s evolving.

You can check out the code here:

👉 github.com/Pratham-Dabhane/AlignCV

Learn in public. Code in flow. Ship the vibe.

That’s the mantra that built AlignCV.

Top comments (2)

Alex Chen • Oct 21

Love the "ship early, optimize later" mentality! Your Mistral → Groq pivot is a masterclass in pragmatic engineering—switching tools mid-project when data shows it's better is underrated.

The semantic alignment angle is brilliant. Most ATS-beating tools just do keyword stuffing, but using BGE-base-en-v1.5 for embeddings means you're capturing actual meaning. A resume that says "led engineering teams" will correctly match "managed technical staff" even though there's zero keyword overlap.

Quick architecture question: Why BGE-base-en-v1.5 for embeddings but LLaMA 3.1 8B for generation? I'm curious if you tested using the same model family end-to-end (like LLaMA for both embedding and rewriting) to see if semantic consistency improved. Or was BGE's embedding quality just too good to pass up?

Also, 38 unit tests for V1 is impressive discipline for a weekend project. Most builders would've shipped tests after users complained 😅

Starred the repo—excited to see where you take the job matching features!

Pratham Dabhane • Oct 21

Thanks for the thoughtful note, Alex 😇—exactly the goal: semantic alignment over keyword stuffing. The early 38 tests kept regressions in check during the Mistral → Groq pivot. Next up: tightening job matching UX and adding small quality heuristics to the ranking pipeline.

Initially, I did try “one family end‑to‑end” (Mistral for both rewriting and embeddings by pooling its hidden states), but retrieval was weak (top‑5 recall ~74% on ~500 jobs, more false positives). BGE-base‑en‑v1.5 is much stronger for search (MTEB‑leading, high recall, 768‑dim), so I switched to BGE for embeddings and LLaMA 3.1 8B for generation (fast, better instruction following on Groq). Best accuracy + speed combo.