Ben Utting

Posted on Apr 17

How I Built a Lead Gen Machine That Finds My Clients on Upwork

#automation #playwright #upwork #webdev

Two of my current clients came from the same system: a Python scraper that monitors Upwork every 20 minutes, scores each job with AI, and sends me a Telegram alert when something scores above a 6. I didn't find them. The system did.

This is how it works.

The problem

Upwork's search is fine if you check it manually a few times a day. But good jobs get buried in proposals fast. By the time I see a high-fit post, it already has 20+ applicants. I needed something that watched continuously and told me the moment a job worth bidding on appeared.

The architecture

The system runs on an Ubuntu VM on my home network. No cloud hosting, no SaaS. The full stack:

Patchright (Playwright fork) for maintaining a persistent Chromium session with Upwork
scrapling with Cloudflare bypass for fetching search results
SQLite for storing every job (1,410+ and counting)
N8N webhook for scoring and notification
systemd timer firing every 20 minutes, 6 AM to 8 PM
Telegram bot for real-time alerts
FastAPI dashboard on :8080 for browsing the data

It's a one-shot script. The timer fires, the scraper runs, it exits. No long-running process, no memory leaks, no daemon to babysit.

How the scraper works

Every 20 minutes, the script does this:

Opens the persistent Chromium profile to verify the Upwork session is still valid and extract cookies
For each search query (Automation, Workflow Automation, AI Automation, and a few others), fetches the search results page using a fresh browser profile with the extracted cookies
Parses each job listing, checks if it was posted in the last 10 minutes, and skips anything already in the database
For new jobs, fetches the full detail page to grab budget, client history, proposal count, and tags
Saves to SQLite and fires the N8N webhook

The fresh-profile trick is important. The persistent Chromium profile got fingerprinted by Cloudflare in April and couldn't auto-solve challenges anymore. Splitting login (persistent profile, Patchright) from scraping (fresh profile, scrapling) fixed it. If the scraper ever starts returning 5KB pages instead of full results, this is the first thing to check.

The scoring layer

Raw jobs go to an N8N webhook that scores them 1 to 10 based on fit. Hard disqualifiers kill the job immediately: unverified payment, hire rate below 30%, rating below 3.5. Boosts push the score up: n8n, Claude, OpenClaw, RAG, MCP, workflow automation, AI agent.

Anything scoring 6 or above gets a Telegram alert with the title, budget, and a link. I open it, read the description, and decide whether to bid. The whole loop from job posted to me reading it is usually under 20 minutes.

The enrichment layer

Every 20 minutes, a second systemd timer runs an AI enrichment script over any un-enriched jobs. It sends the job description to Gemini Flash Lite via OpenRouter and extracts structured fields: tools detected, skill requirements, industry, complexity, fit reasoning, and content opportunities.

After 1,410 jobs enriched, the patterns are clear. GoHighLevel is the most requested tool. n8n + GoHighLevel is the most common combo. 255 jobs scored a 9 for fit, 141 scored a perfect 10. Over 600 jobs were flagged as template opportunities, meaning someone is asking for the same thing that could be productised.

The dashboard

A FastAPI app reads the same SQLite database and serves a dashboard on :8080. It has tabs for KPIs, jobs over time, tool/skill breakdowns, budget distribution, client ratings, industry analysis, and a recent jobs table with drill-down.

The dashboard also has a maintenance tab that can start/stop the scraper, trigger immediate runs, and show a live colour-coded log viewer. It uses passwordless sudo for the systemd timer controls.

There's also an AI chat tab powered by OpenRouter that lets me ask questions about the data in natural language. "What percentage of automation jobs this week mention n8n?" gets answered from the actual database, not from a generic model.

What I'd do differently

Session management is fragile. Upwork invalidates the session every few months, and re-login requires opening a desktop session via Proxmox SPICE console and running the login script manually. I'd like to automate this, but Upwork's auth flow with 2FA makes it hard to do headlessly.

No database backup. The SQLite file only exists on this one VM. If the disk dies, 1,400+ enriched jobs are gone. A nightly sqlite3 .backup to a second location is overdue.

The N8N scoring could be local. Right now the webhook goes to n8n, which adds a network hop. Moving the scoring logic into the enrichment script would simplify the stack and remove the cloud dependency.

The result

The system has been running since mid-March 2026. It's scraped over 1,400 jobs, enriched all of them with AI, and surfaced the two clients I'm currently working with. It runs on a 4GB Ubuntu VM that costs nothing beyond the electricity.

More importantly, it changed how I think about freelancing. I don't browse Upwork anymore. I wait for the ping, read the job, and bid if it fits. The system does the searching. I do the selling.

ctrlaltautomate.com

DEV Community