Haji Rufai

Posted on May 24

Building a Smart Job Application Tracker with FastAPI, TF-IDF Matching, and Analytics

#machinelearning #career #webdev #python

Job hunting is a numbers game, and keeping track of dozens of applications across LinkedIn, Indeed, company sites, and cold emails quickly becomes chaotic. I built AppTrack — a full-stack job application tracker with resume-JD matching, pipeline analytics, and smart follow-up reminders. Here's how.

The Problem

When you're actively job hunting, you need to track:

Where you applied and when
Current status of each application
Which sources (LinkedIn, referral, etc.) actually get responses
When to follow up
How well your resume matches each role

Spreadsheets work initially, but they don't scale. You need filtering, analytics, and automation.

Architecture

┌─────────────────────────────────────┐
│           Frontend (SPA)            │
│   Tailwind CSS + Alpine.js + Chart  │
└──────────────┬──────────────────────┘
               │ REST API
┌──────────────▼──────────────────────┐
│          FastAPI Backend            │
│  ┌─────────┐ ┌─────────┐ ┌──────┐  │
│  │  CRUD   │ │Analytics│ │Match │  │
│  │ Router  │ │ Router  │ │Router│  │
│  └────┬────┘ └────┬────┘ └──┬───┘  │
│       │           │         │       │
│  ┌────▼───────────▼─────────▼───┐   │
│  │      Service Layer           │   │
│  │  ┌──────┐ ┌─────┐ ┌──────┐  │   │
│  │  │App   │ │Stats│ │TF-IDF│  │   │
│  │  │Svc   │ │ Svc │ │Match │  │   │
│  │  └──┬───┘ └──┬──┘ └──┬───┘  │   │
│  └─────┼────────┼───────┼──────┘   │
│        │        │       │           │
│  ┌─────▼────────▼───────▼──────┐   │
│  │     SQLite (aiosqlite)      │   │
│  │  applications | events |     │   │
│  │  contacts | reminders        │   │
│  └─────────────────────────────┘   │
└─────────────────────────────────────┘

Tech Stack

Component	Technology	Why
API Framework	FastAPI	Auto-generated OpenAPI docs, async, type-safe
Database	SQLite + aiosqlite	Zero config, async, perfect for personal tools
Matching	scikit-learn TF-IDF	No external APIs needed, fast, interpretable
Frontend	Tailwind + Alpine.js	Lightweight, no build step needed
Charts	Chart.js	Beautiful charts with minimal code
CLI	Click + Rich	Terminal-first workflow
CI	GitHub Actions	Automated testing on push

Key Feature: Resume-JD Matching

The most interesting feature is the TF-IDF-based resume matcher. It scores how well your resume matches a job description — completely offline, no API costs.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def score_match(resume_text: str, job_description: str) -> dict:
    vectorizer = TfidfVectorizer(
        stop_words="english",
        ngram_range=(1, 2),
        max_features=5000,
        sublinear_tf=True,
    )
    tfidf_matrix = vectorizer.fit_transform([resume_text, job_description])
    similarity = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])
    score = round(float(similarity[0][0]) * 100, 1)

    # Extract matching and missing keywords
    jd_keywords = extract_keywords(job_description)
    resume_keywords = extract_keywords(resume_text)
    matching = jd_keywords & resume_keywords
    missing = jd_keywords - resume_keywords

    return {
        "score": score,
        "matching_keywords": sorted(matching),
        "missing_keywords": sorted(missing),
        "suggestion": generate_suggestion(score, missing),
    }

The key decisions:

ngram_range=(1, 2) captures both single words ("python") and two-word phrases ("data engineering")
sublinear_tf=True applies logarithmic TF scaling so common words don't dominate
Keyword extraction uses a curated tech vocabulary plus regex for acronyms/proper nouns

This gives you a practical score plus actionable feedback: which keywords you match and which are missing.

Smart Reminders

When you create an application, AppTrack automatically sets a 7-day follow-up reminder. When you move an application to an interview stage, it creates:

An interview prep reminder (immediate)
A thank-you note reminder (1 day after)

async def update_status(app_id: str, new_status: str, note: str = None):
    # Update the status
    await db.execute(
        "UPDATE applications SET status = ?, updated_at = ? WHERE id = ?",
        (new_status, now, app_id),
    )

    # Log the event
    await db.execute(
        "INSERT INTO events (...) VALUES (...)",
        (event_id, app_id, 'status_change', old_status, new_status, now),
    )

    # Auto-create interview reminders
    if new_status in {"phone_screen", "technical", "onsite"}:
        await create_reminder(app_id, "interview_prep", "Prepare for interview")
        await create_reminder(app_id, "thank_you", "Send thank-you note", days=1)

The Dashboard

The frontend is a single HTML file using CDN-loaded Tailwind CSS, Alpine.js, and Chart.js. Four tabs:

Applications — Sortable, filterable table with inline status updates
Analytics — Pipeline funnel, weekly trends, source breakdown charts
Match Scorer — Paste a JD, get instant match analysis
Reminders — Pending follow-ups with dismiss functionality

No build step needed. Just serve the HTML.

Pipeline Analytics

The analytics module queries SQLite to calculate:

Response rate: % of applications that moved past "applied"
Source effectiveness: Which sources (LinkedIn vs referral vs cold email) convert best
Pipeline funnel: Visual breakdown of where applications are in the process
Weekly trends: Application velocity over time

async def get_sources():
    rows = await db.execute_fetchall("""
        SELECT
            COALESCE(source, 'unknown') as source,
            COUNT(*) as cnt,
            SUM(CASE WHEN status IN ('phone_screen', 'technical', 'onsite', 'offer', 'accepted')
                THEN 1 ELSE 0 END) as interview_cnt
        FROM applications
        GROUP BY source
        ORDER BY cnt DESC
    """)
    return [{
        "source": r["source"],
        "count": r["cnt"],
        "conversion_rate": round(r["interview_cnt"] / r["cnt"] * 100, 1)
    } for r in rows]

This is the data that actually helps you optimize your job search strategy.

Full REST API

The API covers everything:

POST   /api/applications          Create application
GET    /api/applications          List with filters/pagination
GET    /api/applications/{id}     Get details + timeline
PUT    /api/applications/{id}     Update fields
PATCH  /api/applications/{id}/status  Update status
DELETE /api/applications/{id}     Delete

GET    /api/analytics/overview    Summary stats
GET    /api/analytics/pipeline    Funnel data
GET    /api/analytics/trends      Weekly trends
GET    /api/analytics/sources     Source effectiveness

POST   /api/match/score           Score resume vs JD
POST   /api/import/csv            Import from CSV
GET    /api/export/csv            Export to CSV
GET    /api/reminders             Pending reminders
PATCH  /api/reminders/{id}        Dismiss/snooze

FastAPI auto-generates interactive Swagger docs at /docs — great for recruiter demos.

Testing

34 tests covering CRUD, analytics, matching, reminders, and integration scenarios:

$ pytest tests/ -v
========================= test session starts =========================
tests/test_analytics.py::test_overview_empty PASSED
tests/test_analytics.py::test_overview_with_data PASSED
tests/test_analytics.py::test_pipeline PASSED
tests/test_api.py::test_full_application_lifecycle PASSED
tests/test_api.py::test_csv_export PASSED
tests/test_applications.py::test_create_application PASSED
tests/test_applications.py::test_status_change_creates_event PASSED
tests/test_matcher.py::test_score_match_basic PASSED
tests/test_matcher.py::test_score_match_keywords PASSED
tests/test_reminders.py::test_reminders_created_on_apply PASSED
... (34 total)
========================= 34 passed in 0.30s =========================

Tests use an in-memory SQLite database and async HTTP client — fast and isolated.

Running It

# Clone and install
git clone https://github.com/hajirufai/apptrack.git
cd apptrack
pip install -r requirements.txt

# Run
python -m uvicorn app.main:app --reload

# Or with Docker
docker compose up -d

Visit http://localhost:8000 for the dashboard, /docs for the API.

What I'd Add Next

Email parsing: Auto-extract application data from confirmation emails
Browser extension: Quick-add from job listing pages
Salary tracking: Compare offers with market data
AI cover letter drafts: Generate tailored cover letters from the match analysis

Key Takeaways

SQLite is underrated for personal tools — zero config, fast, and aiosqlite makes it async-compatible
TF-IDF matching gives surprisingly useful results for resume-JD comparison without any API costs
Auto-generated reminders prevent the #1 job search mistake: forgetting to follow up
CDN-loaded frontend (Tailwind + Alpine.js) means zero build complexity for dashboard UIs
Build what you need — the best portfolio projects solve your own problems

Check out the full source on GitHub. If you're job hunting, feel free to fork it and track your own applications!

DEV Community