<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Meet Kelwa</title>
    <description>The latest articles on DEV Community by Meet Kelwa (@meet_kelwa_e9fd8ff2ada1bd).</description>
    <link>https://dev.to/meet_kelwa_e9fd8ff2ada1bd</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3976539%2Fcdd902cb-9be3-424a-99e2-a372775f4fcd.jpg</url>
      <title>DEV Community: Meet Kelwa</title>
      <link>https://dev.to/meet_kelwa_e9fd8ff2ada1bd</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/meet_kelwa_e9fd8ff2ada1bd"/>
    <language>en</language>
    <item>
      <title>CareerPilot AI:AI Resume Analyzer</title>
      <dc:creator>Meet Kelwa</dc:creator>
      <pubDate>Tue, 09 Jun 2026 19:58:00 +0000</pubDate>
      <link>https://dev.to/meet_kelwa_e9fd8ff2ada1bd/careerpilot-aiai-resume-analyzer-1dmi</link>
      <guid>https://dev.to/meet_kelwa_e9fd8ff2ada1bd/careerpilot-aiai-resume-analyzer-1dmi</guid>
      <description>&lt;p&gt;In the modern job market, hiring managers and talent acquisition teams face an overwhelming influx of job applications. For a single opening, hundreds of resumes are submitted, each with unique formatting, fonts, layouts, and styles. Manually reading through each file is a huge bottleneck that costs teams countless hours.&lt;/p&gt;

&lt;p&gt;To solve this, I built the AI Resume Analyzer—a lightweight, cloud-native application that leverages Natural Language Processing (NLP) and Machine Learning (ML) to automatically parse PDF resumes, categorize candidates into primary professional domains (e.g., DevOps, Frontend, Data Science), analyze their skills, and suggest missing competencies to fill the gap.&lt;/p&gt;

&lt;p&gt;In this blog, I will walk you through the architecture, the machine learning pipeline, NLP extraction, and how I deployed it for free on the cloud.&lt;/p&gt;

&lt;p&gt;🚀 Key Project Features&lt;br&gt;
Instant PDF Extraction: Extracts and cleans raw text from unstructured PDF formats in under 300ms.&lt;br&gt;
AI-Driven Domain Classification: Classifies resumes into matching job roles (like Data Science or Design) with confidence percentages using an ML model.&lt;br&gt;
Contact Details Miner: Extracts phone numbers and email addresses automatically using regular expressions.&lt;br&gt;
NLP Entity &amp;amp; Skill Extraction: Uses spaCy's POS tagger and custom regex boundaries to identify technical skills.&lt;br&gt;
Career Gap Recommendations: Compares the candidate's skills against core domain standards to suggest missing technologies they should learn.&lt;br&gt;
🛠️ The Architecture &amp;amp; Data Flow&lt;br&gt;
The app is built as a REST API using FastAPI with a responsive front-end UI. The overall flow is as follows:&lt;/p&gt;

&lt;p&gt;[Resume PDF Upload] &lt;br&gt;
       ↓&lt;br&gt;
[PyPDF2 Text Extraction &amp;amp; Cleaning] &lt;br&gt;
       ↓&lt;br&gt;
[spaCy Named Entity Recognition (NER)] → Extracts Names, Companies &amp;amp; Locations&lt;br&gt;
       ↓&lt;br&gt;
[Word-Boundary Skills Extraction] → Identifies 30+ Tech Skills&lt;br&gt;
       ↓&lt;br&gt;
[TF-IDF Vectorization] → Converts cleaned text to numerical weights&lt;br&gt;
       ↓&lt;br&gt;
[Random Forest Classification] → Predicts Primary Job Category (e.g., Backend Engineering)&lt;br&gt;
       ↓&lt;br&gt;
[Skill Gap Analysis &amp;amp; Career Mapping] → Compares extracted skills with target profile&lt;br&gt;
       ↓&lt;br&gt;
[Interactive HTML Dashboard] → Displays Profile, Confidence, Contacts, &amp;amp; Recommended Skills&lt;br&gt;
🧠 The Machine Learning Core&lt;br&gt;
To categorize resumes, we implemented a supervised machine learning pipeline composed of TF-IDF Vectorization and a Random Forest Classifier in Python.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Vectorization (TF-IDF)&lt;br&gt;
Machines cannot read plain English paragraphs. We use TfidfVectorizer (Term Frequency-Inverse Document Frequency) from Scikit-Learn to convert clean text into a matrix of token counts. TF-IDF automatically down-weights common words (like experience, projects, or work) and awards higher statistical weights to highly discriminative technical words (like TensorFlow, Kubernetes, Figma, or React).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Classification (Random Forest)&lt;br&gt;
We utilize a RandomForestClassifier (an ensemble of decision trees) to predict the candidate's career category. Random Forest is ideal for text classification because it generalizes exceptionally well on dense matrices, averages out variance, and prevents overfitting on smaller training datasets.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here is the Python implementation used for the training stage (train.py):&lt;/p&gt;

&lt;p&gt;python&lt;/p&gt;

&lt;p&gt;import os&lt;br&gt;
import joblib&lt;br&gt;
import pandas as pd&lt;br&gt;
from sklearn.feature_extraction.text import TfidfVectorizer&lt;br&gt;
from sklearn.ensemble import RandomForestClassifier&lt;br&gt;
from sklearn.model_selection import train_test_split&lt;/p&gt;

&lt;h1&gt;
  
  
  1. Load Data
&lt;/h1&gt;

&lt;h1&gt;
  
  
  We generated specialized training text profiles for backend, frontend, data science, devops, and design.
&lt;/h1&gt;

&lt;p&gt;df = load_training_data() &lt;/p&gt;

&lt;h1&gt;
  
  
  2. Extract TF-IDF Features
&lt;/h1&gt;

&lt;p&gt;vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)&lt;br&gt;
X_vectorized = vectorizer.fit_transform(df['text'])&lt;br&gt;
y = df['category']&lt;/p&gt;

&lt;h1&gt;
  
  
  3. Split &amp;amp; Train
&lt;/h1&gt;

&lt;p&gt;X_train, X_test, y_train, y_test = train_test_split(X_vectorized, y, test_size=0.2, random_state=42)&lt;br&gt;
model = RandomForestClassifier(n_estimators=100, random_state=42)&lt;br&gt;
model.fit(X_train, y_train)&lt;/p&gt;

&lt;h1&gt;
  
  
  4. Save Models for API Inference
&lt;/h1&gt;

&lt;p&gt;os.makedirs("models", exist_ok=True)&lt;br&gt;
joblib.dump(model, "models/model.pkl")&lt;br&gt;
joblib.dump(vectorizer, "models/vectorizer.pkl")&lt;br&gt;
🔍 NLP Parsing: Entity &amp;amp; Skill Extraction&lt;br&gt;
While the ML model determines the high-level category, an NLP parsing engine extracts details from the text block.&lt;/p&gt;

&lt;p&gt;We used spaCy (en_core_web_sm) to analyze the grammatical structure of the sentences and retrieve named entities.&lt;/p&gt;

&lt;p&gt;Named Entity Recognition (NER): spaCy automatically identifies organizations (ORG), person names (PERSON), and geopolitical entities (GPE).&lt;br&gt;
Regular Expressions: We use strict regex patterns to extract emails and phone numbers.&lt;br&gt;
Skill Extraction: Using clean text boundaries (\b), a list of core technical skills is compared with the document text to compile verified skills.&lt;br&gt;
Here is a snippet of our parser (resume_parser.py):&lt;/p&gt;

&lt;p&gt;python&lt;/p&gt;

&lt;p&gt;import re&lt;br&gt;
import spacy&lt;br&gt;
nlp = spacy.load("en_core_web_sm")&lt;br&gt;
def extract_entities(text):&lt;br&gt;
    doc = nlp(text)&lt;br&gt;
    entities = {}&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Extract Names, Companies, and Locations
allowed_labels = ['ORG', 'PERSON', 'GPE']
for ent in doc.ents:
    if ent.label_ in allowed_labels:
        clean_text = re.sub(r'\s+', ' ', ent.text).strip()
        if 2 &amp;lt; len(clean_text) &amp;lt; 30:
            if ent.label_ not in entities:
                entities[ent.label_] = []
            entities[ent.label_].append(clean_text)

# Add Technical Skills via boundary matches
entities['Skills'] = extract_skills(text)
return entities
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;⚡ Interactive Frontend &amp;amp; FastAPI Server&lt;br&gt;
We integrated this pipeline into a FastAPI web server (app.py). When a user uploads a PDF, the file is temporarily loaded, parsed, and run through the loaded ML vectorizer.&lt;/p&gt;

&lt;p&gt;The API returns a JSON response containing the category, confidence percentage, parsed entities, contact details, and a list of recommended job titles.&lt;/p&gt;

&lt;p&gt;To make the application user-friendly, the server exposes a beautiful responsive HTML front-end styled with a sleek dark mode, glowing accents, and micro-animations. The interface gives candidates a visual report card of their skills, showing a direct comparison between what was found and what they should learn to land a job in their domain.&lt;/p&gt;

&lt;p&gt;☁️ Deployment Strategy&lt;br&gt;
We hosted the application on Render.com.&lt;/p&gt;

&lt;p&gt;Originally, we set up a Docker configuration. However, cloud-hosted Free Tiers are restricted to 512MB RAM. Heavy libraries like Scikit-Learn, Pandas, and spaCy require significant memory during the Docker image building phase, causing the build to crash.&lt;/p&gt;

&lt;p&gt;To solve this, we updated our configuration (render.yaml) to run in a Native Python Environment instead of Docker:&lt;/p&gt;

&lt;p&gt;yaml&lt;/p&gt;

&lt;p&gt;services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;type: web
name: ai-resume-analyzer-api
env: python
buildCommand: "pip install -r requirements.txt &amp;amp;&amp;amp; python -m spacy download en_core_web_sm &amp;amp;&amp;amp; python train.py"
startCommand: "uvicorn app:app --host 0.0.0.0 --port $PORT"
plan: free
By switching to native Python, Render installs packages using pre-built wheels, reducing memory overhead, and allowing our application to deploy successfully!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;📊 Project Metrics&lt;br&gt;
Average Inference Latency: &amp;lt;300 milliseconds&lt;br&gt;
Model Accuracy: ~95% classification accuracy on test resumes&lt;br&gt;
Deployment Cost: $0 (Utilizing Render free web service plan)&lt;br&gt;
🔗 Live Links &amp;amp; Code&lt;br&gt;
Live Web Application: &lt;a href="https://careerpilot-ai-s8xe.onrender.com" rel="noopener noreferrer"&gt;https://careerpilot-ai-s8xe.onrender.com&lt;/a&gt;&lt;br&gt;
GitHub Repository: &lt;a href="https://github.com/meetkelwa2005-bot/CareerPilot-AI.git" rel="noopener noreferrer"&gt;https://github.com/meetkelwa2005-bot/CareerPilot-AI.git&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>nlp</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
