Girish

Posted on Mar 19

How I Built an AI-Powered Job Search Copilot with OpenClaw (and Made It Reliable)

#openclaw #jobsearch #automation #productivity

I went from “why am I seeing the same jobs again?” to a production-grade job search system with dedupe, queue governance, and a dashboard I can actually operate daily.

Why I Built This

Like many people job hunting in tech, I ran into four recurring issues:

Repeated listings kept appearing.
No reliable state tracking for approve/apply/reject.
Scattered visibility across scripts, logs, and chat alerts.
Fragile operations where one bug could spam alerts or hide real progress.

I wanted a system that could automate aggressively while keeping key decisions under manual control.

For context: OpenClaw is an open-source agent orchestration framework for building automation workflows across tools, scripts, and messaging surfaces.

What I Implemented

I stopped thinking of this as “a script” and started treating it as a product with clear components.

1) Job ingestion + filtering pipeline

Pull jobs
Match against resume/profile criteria
Queue into status model:
- pending_approval
- approved
- applied
- rejected

2) Alerting that doesn’t spam

Canonical dedupe on normalized link
Skip already-sent items
Equal-sized batching for outbound alert messages
Cleanup/rotation of pending payloads after successful send

3) A real dashboard in Second Brain

I built a dedicated Job Search tab in a custom Second Brain app (React + Vite frontend, Bun backend; grounded in the "second brain" idea of externalized operational memory) that shows:

queue metrics,
editable search rules,
pipeline freshness timestamps,
pending/approved/applied/rejected sections,
conversion funnel,
recent run artifacts.

4) Service reliability

Always-on runtime with macOS launchd
Fixed process drift with port pinning to 3000
Added stop.sh and status.sh for operational control

5) Semantic search support

Enabled Gemini embeddings
Added reindex flow
Implemented graceful fallback to keyword search when key is unavailable

6) Component map (what each part does)

Collector: ingests source listings
Matcher: scores relevance vs profile
Dedupe guard: prevents repeated listings by canonical link
Batcher: sends alerts in balanced chunks
Queue orchestrator: controls status transitions
Policy engine: rule-driven filtering/tuning
Action layer: approve/reject controls in UI
Observability layer: timestamps, funnel, run artifacts
Search layer: keyword + semantic retrieval
Runtime supervisor: always-on stability and process hygiene

Architecture (High-Level)

Job Sources
  -> job_market_intelligence_bot
  -> dedupe + match + batch
  -> queue/jobs_queue.json
  -> Second Brain API (/api/job-search-progress)
  -> Job Search UI tab

The split matters:

Pipeline layer handles ingestion and state,
UI layer handles observability and decisioning.

Key Implementation Details

Queue model first, UI second

One of the most important decisions was making queue state explicit and authoritative.
Without that, dashboards lie and actions become risky.

Dedupe by canonical identity

I normalized links and deduped against historical alerts. This alone eliminated repeat-notification noise.

function normalizeLink(link) {
  const u = new URL(link);
  // common LinkedIn/Indeed/job-board tracking params
  ['utm_source','utm_medium','utm_campaign','ref'].forEach(p => u.searchParams.delete(p));
  u.hash = '';
  u.pathname = u.pathname.replace(/\/$/, '');
  return u.toString();
}

So job.com/123?utm_source=linkedin and job.com/123 collapse to one identity.

Human-in-the-loop actions

Pending jobs can be approved/rejected, and approved jobs can be re-rejected when needed.

Freshness timestamps

I surfaced update timestamps for queue/new-matches/outbox/runs so stale data is obvious immediately.

Problems I Hit (and Fixes)

Duplicate jobs were sent repeatedly

Cause: send path consumed pending data without strict dedupe checks.

Fix: dedupe by normalized link + skip sent + clear/rotate pending state.

Measured impact: in one cleanup run, 222 duplicate already-sent alerts were removed/skipped.

Dashboard loaded as blank screen

Cause: HTML shell served, but bundled assets were not.

Fix: serve Vite dist/index.html and dist/assets/* correctly.

Metrics showed zeros in always-on mode

Cause: wrong workspace context in daemon environment.

Fix: export explicit WORKSPACE in service runner.

Semantic search key errors in UI

Cause: embeddings key not available at runtime.

Fix: configure .env, restart service, reindex cache, add keyword fallback UX.

UX Improvements That Actually Helped

Collapsible queue sections
Top-5 with “show more”
Fit-score color chips
Relative timestamps (“12m ago”)
Sticky section headers
Better rules-form alignment and spacing

Small improvements, big operator-speed gain.

The daily operating loop (what changed behavior)

Before: browse jobs, lose context, repeat tomorrow.

Now:

Check freshness + queue counters
Process pending approvals
Re-check approved jobs for quality
Tune rules based on noise patterns
Review funnel and recent runs

That loop made the system compounding: each day’s actions improve next day’s signal.

What I Learned

Automation without observability is brittle.
Canonical state design beats post-hoc patching.
Service environment parity is critical (shell vs daemon).
Graceful degradation is mandatory for UX trust.
UI polish improves decision quality, not just aesthetics.

What’s Next

Batch actions for approved jobs
Better source confidence/ranking
Saved rule presets and custom views
Health checks + stale-state self-healing
Enhanced analytics for conversion trends

Repro Notes

If you’re replicating this approach:

start with queue states and dedupe key,
build freshness visibility early,
add always-on service controls before scaling sources,
keep secrets in .env and ensure .env is gitignored,
and only then optimize UX.

That sequence saved me a lot of rework.

Final take

If your current job search feels noisy and untrackable, don’t start by adding more scripts. Start by designing the operating model: states, dedupe identity, and visibility. Once those are solid, automation becomes trustworthy — and compounding.

Are you building something similar or want the OpenClaw implementation details? Drop a comment or reach out — I’m happy to share the practical playbook.

DEV Community