Azeez Roheem

Posted on Jun 16

How I added semantic search to a resume matcher — and what the numbers showed

#ai #rag #nextjs #typescript

Your resume says: "Led team of 5 engineers to deliver the platform on time"

The job description says: "Experience managing engineers required"

Keyword matching scores that zero. No shared words. Filtered out.

Semantic matching scores it 0.48 — a strong match. Same meaning, different vocabulary.

This is not an edge case. It happens every time a candidate describes their experience in their own words rather than mirroring the exact phrasing of the job description. Which is most of the time.

I built Resume AI Tailor to solve resume matching. The first version used keyword extraction — pull terms from the job description, check how many appear in the resume. It worked well enough to ship. But the "led team of 5" problem was always there.

So I spent two weeks replacing keyword matching with semantic search. This post documents what I built, the data that came out of it, and what I learned that isn't in any RAG tutorial.

What Resume AI Tailor does

Resume AI Tailor is a SaaS product I'm building under NanoCrafts. You upload a resume PDF, paste a job description, and get back:

A match score and analysis from GPT-4o-mini
A list of matched and missing skills
Rewritten resume bullets tailored to the role

The stack is Next.js, gpt-4o-mini, Clerk, Neon/Drizzle, and Vercel.

The keyword approach worked for the obvious cases — if your resume mentions React and the job requires React, that's a match. But it failed silently on vocabulary mismatches:

"built APIs" vs "REST services experience"
"coached junior developers" vs "mentored engineers"
"Amazon Web Services" vs "AWS"

All the same skills. All scored zero by keyword matching.

What semantic matching produces

The headline number from my comparison test across 5 resume profiles:

Matcher	Average score
Keyword	0.030
Semantic	0.410

13× higher average score. The synonym pair — same skills, different vocabulary — showed the starkest gap: keyword 0.029, semantic 0.466, a +0.437 delta.

How it works — the technical details

Embeddings: text as vectors

text-embedding-3-small takes a string and returns 1536 floats encoding meaning, not words. Two strings with the same meaning produce vectors that point in roughly the same direction in 1536-dimensional space.

On Day 1 I tested this directly:

const sentences = [
  'software engineer with 5 years of React experience',
  'frontend developer specialising in JavaScript and TypeScript',
  'chef with 10 years experience in fine dining restaurants',
];

// Pairwise cosine similarity results:
// software engineer vs frontend developer: 0.5741
// software engineer vs chef:               0.3843
// frontend developer vs chef:              0.2868

Engineer and developer are closest. Chef is furthest from both. No shared keywords between those sentences — the math is working on meaning.

Why text-embedding-3-small:

$0.02 per million tokens — effectively zero cost at Resume AI Tailor's scale
1536 dimensions — sufficient for resume/JD matching
8191 token limit — a full resume fits comfortably
text-embedding-3-large costs 6.5× more for ~2 MTEB points improvement — not worth it for single-document comparison

The synonym test from Day 5 confirmed the practical implication:

Pair	Keyword	Semantic
"led team of 5" vs "managed engineers"	0.000	0.480
"built APIs" vs "REST services experience"	0.000	0.470
"Amazon Web Services" vs "AWS"	0.018	0.630
"coached junior devs" vs "mentored engineers"	0.036	0.482

Zero keyword overlap on all four. Meaningful semantic scores on all four.

pgvector on Neon: the vector store decision

The two obvious options were Pinecone and pgvector on Neon.

Factor	Pinecone	pgvector on Neon
Cost	$0–$70/month	$0 additional
Setup	New account, new SDK, sync logic	One SQL command
Performance at scale	Sub-10ms at 100M+ vectors	Excellent under 1M vectors

Resume AI Tailor will not reach 1M vectors for years. Pinecone's performance advantage is irrelevant at this scale. pgvector costs nothing additional and requires no new service.

Enabling it:

CREATE EXTENSION IF NOT EXISTS vector;

That's the entire setup.

The schema:

export const resumeEmbeddings = pgTable(
  'resume_embeddings',
  {
    id:          uuid('id').defaultRandom().primaryKey(),
    resumeId:    uuid('resume_id').references(() => resumes.id, { onDelete: 'cascade' }),
    jdId:        uuid('jd_id').references(() => jobDescriptions.id, { onDelete: 'cascade' }),
    chunkText:   text('chunk_text').notNull(),
    chunkType:   text('chunk_type').notNull(),
    // text-embedding-3-small at 1536 dimensions.
    // Switching models requires drop column + re-embed everything.
    embedding:   vector('embedding', { dimensions: 1536 }).notNull(),
    contentHash: text('content_hash'),
    createdAt:   timestamp('created_at', { withTimezone: true }).defaultNow(),
  },
  table => ({
    embeddingIndex: index('embedding_cosine_idx').using(
      'hnsw',
      table.embedding.op('vector_cosine_ops')
    ),
  })
);

The HNSW index trades a small amount of recall accuracy for dramatically faster query times — O(log n) vs O(n) brute force. At current scale brute force would work fine; the index future-proofs the query latency.

The similarity query uses pgvector's <=> operator (cosine distance, not similarity — subtract from 1 to convert):

SELECT
  chunk_text,
  1 - (embedding <=> $1::vector) AS score
FROM resume_embeddings
WHERE resume_id = $2::uuid
AND chunk_type = 'bullet'
ORDER BY score DESC
LIMIT 1;

Chunking strategy: why bullet-level beats everything else

The most important design decision in a RAG pipeline is not the model or the vector store — it's how you split your documents.

The problem with paragraph-level chunking for resumes:

Paragraph embedding of:

"Built REST APIs serving 2M daily requests" ← backend signal

"Owned CI/CD pipeline reducing deploy time 40%" ← DevOps signal

"Mentored 6 junior engineers" ← leadership signal

"Attended daily standups" ← no signal

The resulting vector is an average of all four. A query for "REST API experience" has to compete with leadership signal, DevOps signal, and standup-attendance noise averaged into one vector. The strong signal gets diluted.

Bullet-level chunking preserves the signal. Each bullet gets its own embedding. "Built REST APIs serving 2M daily requests" produces a vector that points strongly toward backend engineering concepts — regardless of what the other bullets say.

The chunking map:

type ResumeChunkType =
  | 'bullet'      // one per experience bullet — most important
  | 'skills'      // one for the full skills section
  | 'summary'     // one for the full summary/profile text
  | 'education';  // one per degree

Skills get one chunk because co-occurrence matters. Summary gets one chunk because it's prose — context flows across sentences.

JD chunks are typed by section with weights:

type JDChunkType =
  | 'jd_requirement'    // weight 1.0 — must-have signal
  | 'jd_responsibility' // weight 0.8 — day-to-day work
  | 'jd_summary'        // weight 0.5 — context only
  | 'jd_nice_to_have';  // weight 0.3 — bonus signal

The full pipeline: PDF to semantic_matches

PDF upload

↓

/api/extract — parse PDF, extract structured resume JSON

↓

sessionStorage — resume JSON stored client-side

↓

/api/analyse — user pastes JD, submits

↓

Step 1 (sequential):

extractKeywords(jobDescription) ~2–4s
Step 2 (parallel):

analyseMatch() embedResume()

GPT-4o-mini ~3–6s + ingestJobDescription ~0ms warm / ~400ms cold
Step 3 (sequential):

getSemanticMatches(resumeId, jdId)

batch embed JD chunks → parallel DB queries ~0.1–1s
Response: { analysis, keywords, semantic_matches, rag_status }

The key optimisation in getSemanticMatches: all JD chunks embedded in one batch API call, all similarity queries run in parallel. Before this — sequential embed calls — the semantic matching step took ~15s for 8 chunks. After: ~0.5–1s.

The caching layer checks content hashes for both resume and JD embeddings. On warm requests — same resume, same JD — zero embedding API calls are made. The entire RAG pipeline costs ~100ms on warm.

If semantic matching returns empty, the route falls back to keyword matching automatically, and a rag_status field tells the frontend (and future-me debugging at 11pm) which path was taken.

Results

The comparison table

5 resume profiles against the same Senior Software Engineer JD:

Resume profile	Keyword	Semantic	Delta
Senior engineer — exact match	0.061	0.444	+0.383
Synonym terminology — no shared keywords	0.029	0.466	+0.437
Partial match — some relevant skills	0.018	0.390	+0.372
Career changer — unrelated background	0.004	0.297	+0.293
Junior developer — relevant but underseniored	0.036	0.451	+0.415
Average	0.030	0.410	+0.380

Semantic outperforms keyword on every single profile.

The synonym example

The synonym resume uses no keywords from the JD — completely different vocabulary, identical skills.

Resume bullet	JD requirement	Keyword	Semantic
"Developed backend web services handling millions of daily transactions"	"work on high-scale distributed systems"	0.036	0.491
"Coached and grew a team of four early-career engineers"	"Mentor junior engineers and conduct code reviews"	0.036	0.482
"Deployed cloud infrastructure using Amazon Web Services"	"Experience with AWS or other cloud platforms"	0.018	0.503

The AWS example is the clearest. "Amazon Web Services" vs "AWS" — zero keyword overlap, semantic similarity 0.503 against a core requirement. A candidate filtered out for writing the service name in full.

There's also a ranking inversion worth noting. Keyword matching ranks the exact-match senior engineer above the synonym candidate (0.061 vs 0.029). Semantic matching ranks the synonym candidate above the senior engineer (0.466 vs 0.444). A keyword-only system deprioritises a qualified candidate purely due to vocabulary choice.

Latency and cost

Warm request timing (Vercel, June 2026):
user + resumeId lookup: ~300–500ms

extractKeywords (GPT): ~2–5s sequential

analyseMatch + RAG (parallel): ~3–6s bounded by analyseMatch

getSemanticMatches: ~0.1–1s batch embed + parallel queries

TOTAL warm request: 8–10s

The embedding cost — the part I expected to be slow — is essentially free on warm requests. The latency is GPT, not vectors.

The path to sub-3s requires streaming the GPT response to the client and moving getSemanticMatches to a background job. That's a Week 3 architectural change.

Embedding cost:
Typical session: ~1050 tokens

Cost: 1050 / 1,000,000 × $0.02 = $0.000021
At 10,000 sessions/month: $0.21 total

The caching strategy eliminates embedding cost on repeat requests. A user who submits the same resume against five different JDs pays the resume embedding cost once. At any realistic usage level this rounds to zero.

What's next (Week 3)

Recency weighting — the dateRange metadata is stored on every bullet chunk but not yet applied; a 2024 bullet should score higher than a 2016 bullet
matched_role and matched_company — returning role context alongside each matched bullet so users can see which of their roles each semantic match came from
Sub-3s latency — streaming GPT response, moving semantic matches to a background job

The code is on GitHub: github.com/Azeez1314

The product is live at: resumetailor.cv

If you're building semantic search into a product — leave a comment. Happy to compare notes.

Part of a build-in-public series documenting the NanoCrafts portfolio. Previous post: Improving AI resume matching with prompt iteration — 7.37 to 8.37/10

Top comments (1)

Alex Shev • Jun 16

The useful part here is the move from keyword matching to measurable retrieval quality. Resume matching is one of those domains where "seems relevant" is not enough; the numbers matter because false positives waste recruiter and candidate time.