Every job applicant knows the feeling — you find a great role,
read through the requirements, and spend 20 minutes manually
figuring out which of your skills to highlight. I wanted to
automate that using the OpenAI API. What I didn't expect was
that the hardest part wouldn't be the AI — it would be
controlling what the AI actually returns.
What I Set Out to Build
Job applicants face two problems. First, their CV often doesn't
get picked because it doesn't match the language the employer
used in the job description. Second, going through each job
description manually is slow — it limits how many quality
applications you can submit in a day.
I wanted to build a script that solves both. Give it any job
description, get back a structured list of the exact keywords
to highlight on your resume. Simple idea. Harder to build
than I expected.
What I Tried That Didn't Work
My first attempt looked like it was working. I sent a job
description to the API and got something back that looked
like JSON. But when I tried to use it in my script,
everything broke.
Here's what the raw output actually looked like:
{
"keywords": [
"Senior Frontend Engineer",
"React",
"TypeScript",
"REST APIs"
]
}
Notice the backticks wrapping it. That's markdown formatting —
not valid JSON. When I ran JSON.parse() on that raw string,
my script crashed immediately.
But the backticks were only part of the problem. I ran the
same job description 5 times and got 5 slightly different
results. Keywords were dropped between runs. Phrasing changed.
On one run "strong communication skills" appeared, on another
just "communication skills." The structure itself changed —
sometimes the model added explanation text before the JSON,
sometimes not.
The problem wasn't dramatic failures. It was subtle
inconsistency that would silently corrupt a real application
over time. I was being polite with my prompt — "return the
results in JSON format" — when I needed to be strict.
I also made the mistake of setting temperature to 1 while
testing. Higher temperature introduces randomness — useful
for creative tasks, catastrophic for structured extraction.
Temperature 1 didn't just vary the content. It varied the
structure itself, meaning different field names on every run.
For extraction tasks, always use temperature 0.
The Moment response_format Changed Everything
I spent longer than I'd like to admit fighting with output
format before I found this:
response_format: { type: "json_object" }
This single line changed everything. Here's the difference
between asking and enforcing:
Before — prompt instruction only:
content: `Extract keywords and return them in JSON format.`
Result: inconsistent, backtick-wrapped, unparseable output.
After — API-level enforcement:
const response = await client.chat.completions.create({
model: "gpt-4o-mini",
temperature: 0,
response_format: { type: "json_object" },
messages: [...]
});
Result: clean, consistent, directly parseable JSON every time.
When you add response_format you're not giving the model
another instruction. You're flipping a switch at the API
level that forces valid JSON output, validates it before
returning it to you, and guarantees JSON.parse() will
never throw on the response.
No backticks. No explanation text. No inconsistency.
What the Final Script Does
The finished extractor takes any job description and returns
a structured JSON object with 7 fields:
{
"job_title": "",
"experience_level": "",
"required_skills": [],
"nice_to_have_skills": [],
"tools_and_technologies": [],
"soft_skills": [],
"ats_keywords": []
}
Here's a real example. Input:
Senior React Developer. 3+ years experience required.
Must know React, TypeScript, and REST APIs.
Nice to have: GraphQL, AWS.
Tools: Figma, GitHub, Jira.
Strong communication skills required.
Output:
{
"job_title": "Senior React Developer",
"experience_level": "3+ years",
"required_skills": ["React", "TypeScript", "REST APIs"],
"nice_to_have_skills": ["GraphQL", "AWS"],
"tools_and_technologies": ["Figma", "GitHub", "Jira"],
"soft_skills": ["strong communication skills"],
"ats_keywords": ["Senior React Developer", "React",
"TypeScript", "REST APIs", "Figma"]
}
The script also includes retry logic with exponential backoff,
input validation that rejects empty or very short descriptions
before spending tokens, output validation that catches
meaningless results, and token tracking on every call so
you're always aware of cost.
Full code is on GitHub: github.com/Azeez1314/ai-keyword-extractor
What I'd Do Differently
Add response_format on day one. I spent hours fighting
inconsistent output that vanished the moment I added that
single line. Enforce structure at the API level — not just
through prompt instructions.
Take prompt engineering more seriously from the start.
I assumed the model would figure out what I meant. It didn't
— it did exactly what I said, which was often not what I
meant. Every vague instruction became a bug. Every specific
constraint became a feature.
Track token usage from the first call. My system prompt
grew from 133 tokens to 700 tokens through refinement. That's
a 5x cost increase per call that I only noticed at the end.
Log tokens on every call from day one.
What's Next
This extractor is the first step in a larger pipeline. The
next version will accept both a job description and a
candidate's resume, compare the extracted keywords against
the candidate's actual experience, and suggest specific
improvements to make the resume a closer match.
If you're building something similar or have questions about
any part of this, the full code is available at:
github.com/Azeez1314/ai-keyword-extractor
Top comments (0)