Your resume says: "Led team of 5 engineers to deliver the platform on time"
The job description says: "Experience managing engineers required"
Keyword matching scores that zero. No shared words. Filtered out.
Semantic matching scores it 0.48 — a strong match. Same meaning, different vocabulary.
This is not an edge case. It happens every time a candidate describes their experience in their own words rather than mirroring the exact phrasing of the job description. Which is most of the time.
I built Resume AI Tailor to solve resume matching. The first version used keyword extraction — pull terms from the job description, check how many appear in the resume. It worked well enough to ship. But the "led team of 5" problem was always there.
So I spent two weeks replacing keyword matching with semantic search. This post documents what I built, the data that came out of it, and what I learned that isn't in any RAG tutorial.
What Resume AI Tailor does
Resume AI Tailor is a SaaS product I'm building under NanoCrafts. You upload a resume PDF, paste a job description, and get back:
- A match score and analysis from GPT-4o-mini
- A list of matched and missing skills
- Rewritten resume bullets tailored to the role
The stack is Next.js, gpt-4o-mini, Clerk, Neon/Drizzle, and Vercel.
The keyword approach worked for the obvious cases — if your resume mentions React and the job requires React, that's a match. But it failed silently on vocabulary mismatches:
- "built APIs" vs "REST services experience"
- "coached junior developers" vs "mentored engineers"
- "Amazon Web Services" vs "AWS"
All the same skills. All scored zero by keyword matching.
What semantic matching produces
The headline number from my comparison test across 5 resume profiles:
| Matcher | Average score |
|---|---|
| Keyword | 0.030 |
| Semantic | 0.410 |
13× higher average score. The synonym pair — same skills, different vocabulary — showed the starkest gap: keyword 0.029, semantic 0.466, a +0.437 delta.
How it works — the technical details
Embeddings: text as vectors
text-embedding-3-small takes a string and returns 1536 floats encoding meaning, not words. Two strings with the same meaning produce vectors that point in roughly the same direction in 1536-dimensional space.
On Day 1 I tested this directly:
const sentences = [
'software engineer with 5 years of React experience',
'frontend developer specialising in JavaScript and TypeScript',
'chef with 10 years experience in fine dining restaurants',
];
// Pairwise cosine similarity results:
// software engineer vs frontend developer: 0.5741
// software engineer vs chef: 0.3843
// frontend developer vs chef: 0.2868
Engineer and developer are closest. Chef is furthest from both. No shared keywords between those sentences — the math is working on meaning.
Why text-embedding-3-small:
- $0.02 per million tokens — effectively zero cost at Resume AI Tailor's scale
- 1536 dimensions — sufficient for resume/JD matching
- 8191 token limit — a full resume fits comfortably
-
text-embedding-3-largecosts 6.5× more for ~2 MTEB points improvement — not worth it for single-document comparison
The synonym test from Day 5 confirmed the practical implication:
| Pair | Keyword | Semantic |
|---|---|---|
| "led team of 5" vs "managed engineers" | 0.000 | 0.480 |
| "built APIs" vs "REST services experience" | 0.000 | 0.470 |
| "Amazon Web Services" vs "AWS" | 0.018 | 0.630 |
| "coached junior devs" vs "mentored engineers" | 0.036 | 0.482 |
Zero keyword overlap on all four. Meaningful semantic scores on all four.
pgvector on Neon: the vector store decision
The two obvious options were Pinecone and pgvector on Neon.
| Factor | Pinecone | pgvector on Neon |
|---|---|---|
| Cost | $0–$70/month | $0 additional |
| Setup | New account, new SDK, sync logic | One SQL command |
| Performance at scale | Sub-10ms at 100M+ vectors | Excellent under 1M vectors |
Resume AI Tailor will not reach 1M vectors for years. Pinecone's performance advantage is irrelevant at this scale. pgvector costs nothing additional and requires no new service.
Enabling it:
CREATE EXTENSION IF NOT EXISTS vector;
That's the entire setup.
The schema:
export const resumeEmbeddings = pgTable(
'resume_embeddings',
{
id: uuid('id').defaultRandom().primaryKey(),
resumeId: uuid('resume_id').references(() => resumes.id, { onDelete: 'cascade' }),
jdId: uuid('jd_id').references(() => jobDescriptions.id, { onDelete: 'cascade' }),
chunkText: text('chunk_text').notNull(),
chunkType: text('chunk_type').notNull(),
// text-embedding-3-small at 1536 dimensions.
// Switching models requires drop column + re-embed everything.
embedding: vector('embedding', { dimensions: 1536 }).notNull(),
contentHash: text('content_hash'),
createdAt: timestamp('created_at', { withTimezone: true }).defaultNow(),
},
table => ({
embeddingIndex: index('embedding_cosine_idx').using(
'hnsw',
table.embedding.op('vector_cosine_ops')
),
})
);
The HNSW index trades a small amount of recall accuracy for dramatically faster query times — O(log n) vs O(n) brute force. At current scale brute force would work fine; the index future-proofs the query latency.
The similarity query uses pgvector's <=> operator (cosine distance, not similarity — subtract from 1 to convert):
SELECT
chunk_text,
1 - (embedding <=> $1::vector) AS score
FROM resume_embeddings
WHERE resume_id = $2::uuid
AND chunk_type = 'bullet'
ORDER BY score DESC
LIMIT 1;
Chunking strategy: why bullet-level beats everything else
The most important design decision in a RAG pipeline is not the model or the vector store — it's how you split your documents.
The problem with paragraph-level chunking for resumes:
Paragraph embedding of:
"Built REST APIs serving 2M daily requests" ← backend signal
"Owned CI/CD pipeline reducing deploy time 40%" ← DevOps signal
"Mentored 6 junior engineers" ← leadership signal
"Attended daily standups" ← no signal
The resulting vector is an average of all four. A query for "REST API experience" has to compete with leadership signal, DevOps signal, and standup-attendance noise averaged into one vector. The strong signal gets diluted.
Bullet-level chunking preserves the signal. Each bullet gets its own embedding. "Built REST APIs serving 2M daily requests" produces a vector that points strongly toward backend engineering concepts — regardless of what the other bullets say.
The chunking map:
type ResumeChunkType =
| 'bullet' // one per experience bullet — most important
| 'skills' // one for the full skills section
| 'summary' // one for the full summary/profile text
| 'education'; // one per degree
Skills get one chunk because co-occurrence matters. Summary gets one chunk because it's prose — context flows across sentences.
JD chunks are typed by section with weights:
type JDChunkType =
| 'jd_requirement' // weight 1.0 — must-have signal
| 'jd_responsibility' // weight 0.8 — day-to-day work
| 'jd_summary' // weight 0.5 — context only
| 'jd_nice_to_have'; // weight 0.3 — bonus signal
The full pipeline: PDF to semantic_matches
PDF upload
↓
/api/extract — parse PDF, extract structured resume JSON
↓
sessionStorage — resume JSON stored client-side
↓
/api/analyse — user pastes JD, submits
↓
Step 1 (sequential):
extractKeywords(jobDescription) ~2–4s
Step 2 (parallel):
analyseMatch() embedResume()
GPT-4o-mini ~3–6s + ingestJobDescription ~0ms warm / ~400ms cold
Step 3 (sequential):
getSemanticMatches(resumeId, jdId)
batch embed JD chunks → parallel DB queries ~0.1–1s
Response: { analysis, keywords, semantic_matches, rag_status }
The key optimisation in getSemanticMatches: all JD chunks embedded in one batch API call, all similarity queries run in parallel. Before this — sequential embed calls — the semantic matching step took ~15s for 8 chunks. After: ~0.5–1s.
The caching layer checks content hashes for both resume and JD embeddings. On warm requests — same resume, same JD — zero embedding API calls are made. The entire RAG pipeline costs ~100ms on warm.
If semantic matching returns empty, the route falls back to keyword matching automatically, and a rag_status field tells the frontend (and future-me debugging at 11pm) which path was taken.
Results
The comparison table
5 resume profiles against the same Senior Software Engineer JD:
| Resume profile | Keyword | Semantic | Delta |
|---|---|---|---|
| Senior engineer — exact match | 0.061 | 0.444 | +0.383 |
| Synonym terminology — no shared keywords | 0.029 | 0.466 | +0.437 |
| Partial match — some relevant skills | 0.018 | 0.390 | +0.372 |
| Career changer — unrelated background | 0.004 | 0.297 | +0.293 |
| Junior developer — relevant but underseniored | 0.036 | 0.451 | +0.415 |
| Average | 0.030 | 0.410 | +0.380 |
Semantic outperforms keyword on every single profile.
The synonym example
The synonym resume uses no keywords from the JD — completely different vocabulary, identical skills.
| Resume bullet | JD requirement | Keyword | Semantic |
|---|---|---|---|
| "Developed backend web services handling millions of daily transactions" | "work on high-scale distributed systems" | 0.036 | 0.491 |
| "Coached and grew a team of four early-career engineers" | "Mentor junior engineers and conduct code reviews" | 0.036 | 0.482 |
| "Deployed cloud infrastructure using Amazon Web Services" | "Experience with AWS or other cloud platforms" | 0.018 | 0.503 |
The AWS example is the clearest. "Amazon Web Services" vs "AWS" — zero keyword overlap, semantic similarity 0.503 against a core requirement. A candidate filtered out for writing the service name in full.
There's also a ranking inversion worth noting. Keyword matching ranks the exact-match senior engineer above the synonym candidate (0.061 vs 0.029). Semantic matching ranks the synonym candidate above the senior engineer (0.466 vs 0.444). A keyword-only system deprioritises a qualified candidate purely due to vocabulary choice.
Latency and cost
Warm request timing (Vercel, June 2026):
user + resumeId lookup: ~300–500ms
extractKeywords (GPT): ~2–5s sequential
analyseMatch + RAG (parallel): ~3–6s bounded by analyseMatch
getSemanticMatches: ~0.1–1s batch embed + parallel queries
TOTAL warm request: 8–10s
The embedding cost — the part I expected to be slow — is essentially free on warm requests. The latency is GPT, not vectors.
The path to sub-3s requires streaming the GPT response to the client and moving getSemanticMatches to a background job. That's a Week 3 architectural change.
Embedding cost:
Typical session: ~1050 tokens
Cost: 1050 / 1,000,000 × $0.02 = $0.000021
At 10,000 sessions/month: $0.21 total
The caching strategy eliminates embedding cost on repeat requests. A user who submits the same resume against five different JDs pays the resume embedding cost once. At any realistic usage level this rounds to zero.
What's next (Week 3)
-
Recency weighting — the
dateRangemetadata is stored on every bullet chunk but not yet applied; a 2024 bullet should score higher than a 2016 bullet - matched_role and matched_company — returning role context alongside each matched bullet so users can see which of their roles each semantic match came from
- Sub-3s latency — streaming GPT response, moving semantic matches to a background job
The code is on GitHub: github.com/Azeez1314
The product is live at: resumetailor.cv
If you're building semantic search into a product — leave a comment. Happy to compare notes.
Part of a build-in-public series documenting the NanoCrafts portfolio. Previous post: Improving AI resume matching with prompt iteration — 7.37 to 8.37/10
Top comments (1)
The useful part here is the move from keyword matching to measurable retrieval quality. Resume matching is one of those domains where "seems relevant" is not enough; the numbers matter because false positives waste recruiter and candidate time.