Long Phan

Posted on May 26, 2025

Real-Time Job Recommender with Bright Data MCP

#devchallenge #brightdatachallenge #ai #webdata

This is a submission for the Bright Data AI Web Access Hackathon

What I Built

I built a Real-Time Job Recommender system that empowers users to search for jobs using natural language queries and receive up-to-date job listings from across the web. Powered by LLMs and Bright Data’s Model Context Protocol (MCP), this AI agent autonomously navigates job websites, retrieves structured data in real time, and matches results based on intent rather than keywords.

This project solves the issue of outdated or irrelevant job recommendations by enabling live, intelligent search through current listings on the open web.

📽️ Demo

🔗 GitHub Repository: https://github.com/longphanquangminh/llm-job-suggestor

How I Used Bright Data's Infrastructure

right Data’s Model Context Protocol (MCP) was essential to building an autonomous AI agent that can operate on the live web.

Here's how MCP enhanced the system:

Discover: The agent locates relevant job listings across multiple platforms and job boards.
Access: MCP enables seamless access to modern, JavaScript-heavy job sites.
Extract: The agent retrieves structured data—job titles, companies, locations, requirements—in real time.
Interact: The system mimics human interaction with dynamic interfaces (e.g., pagination, filters) and adapts to changes in website structure.

MCP acts as a control layer for AI agents, providing real-time interaction with external web contexts. This allowed my job recommender to function as a self-sufficient explorer of the web, without preloaded data or brittle scrapers.

Performance Improvements

Using real-time web access via MCP led to substantial improvements over traditional approaches:

Live results: Users receive job opportunities that are currently available—not days or weeks old.
Richer understanding: LLMs analyze complete job descriptions, not just titles or summaries.
Semantic search: Vector search ensures the results match the user’s intent, even with fuzzy or casual phrasing.
Dynamic adaptability: The system can adapt to layout changes or new websites without reprogramming extractors.

Compared to static datasets or basic keyword scraping, this approach is significantly more intelligent, flexible, and responsive.

Features

✅ Real-time job discovery from multiple online sources
✅ Natural language job search (e.g., “remote backend roles in fintech”)
✅ LLM-based classification, summarization, and sentiment scoring
✅ Vector-based semantic search using OpenAI embeddings
✅ Gradio-powered interactive dashboard
✅ Filtering by industry, experience level, job type, and location
✅ Scheduled crawling to keep the dataset fresh and relevant

Tech Stack

Python 3.11
Bright Data MCP
OpenAI + LangChain
Chroma (Vector DB)
Gradio (Web UI)
Pandas and SQLite for lightweight data storage

What's Next

Some future directions I’m excited about:

Personalized job recommendations based on resume data
Real-time salary benchmarking
Company culture analysis using scraped reviews
Auto-application tools for integrated platforms
User profiles with alerts and saved searches

Acknowledgements

Shoutout to Bright Data for creating the infrastructure to empower agents to explore and interact with the web in real time. If you're building anything with AI agents or live web data, MCP is a game-changer.

Want to connect or collaborate? Reach out—I’d love to hear your thoughts 😊

DEV Community