I went from “why am I seeing the same jobs again?” to a production-grade job search system with dedupe, queue governance, and a dashboard I can actually operate daily.
Why I Built This
Like many people job hunting in tech, I ran into four recurring issues:
- Repeated listings kept appearing.
- No reliable state tracking for approve/apply/reject.
- Scattered visibility across scripts, logs, and chat alerts.
- Fragile operations where one bug could spam alerts or hide real progress.
I wanted a system that could automate aggressively while keeping key decisions under manual control.
For context: OpenClaw is an open-source agent orchestration framework for building automation workflows across tools, scripts, and messaging surfaces.
What I Implemented
I stopped thinking of this as “a script” and started treating it as a product with clear components.
1) Job ingestion + filtering pipeline
- Pull jobs
- Match against resume/profile criteria
- Queue into status model:
pending_approvalapprovedappliedrejected
2) Alerting that doesn’t spam
- Canonical dedupe on normalized
link - Skip already-sent items
- Equal-sized batching for outbound alert messages
- Cleanup/rotation of pending payloads after successful send
3) A real dashboard in Second Brain
I built a dedicated Job Search tab in a custom Second Brain app (React + Vite frontend, Bun backend; grounded in the "second brain" idea of externalized operational memory) that shows:
- queue metrics,
- editable search rules,
- pipeline freshness timestamps,
- pending/approved/applied/rejected sections,
- conversion funnel,
- recent run artifacts.
4) Service reliability
- Always-on runtime with macOS
launchd - Fixed process drift with port pinning to 3000
- Added
stop.shandstatus.shfor operational control
5) Semantic search support
- Enabled Gemini embeddings
- Added reindex flow
- Implemented graceful fallback to keyword search when key is unavailable
6) Component map (what each part does)
- Collector: ingests source listings
- Matcher: scores relevance vs profile
- Dedupe guard: prevents repeated listings by canonical link
- Batcher: sends alerts in balanced chunks
- Queue orchestrator: controls status transitions
- Policy engine: rule-driven filtering/tuning
- Action layer: approve/reject controls in UI
- Observability layer: timestamps, funnel, run artifacts
- Search layer: keyword + semantic retrieval
- Runtime supervisor: always-on stability and process hygiene
Architecture (High-Level)
Job Sources
-> job_market_intelligence_bot
-> dedupe + match + batch
-> queue/jobs_queue.json
-> Second Brain API (/api/job-search-progress)
-> Job Search UI tab
The split matters:
- Pipeline layer handles ingestion and state,
- UI layer handles observability and decisioning.
Key Implementation Details
Queue model first, UI second
One of the most important decisions was making queue state explicit and authoritative.
Without that, dashboards lie and actions become risky.
Dedupe by canonical identity
I normalized links and deduped against historical alerts. This alone eliminated repeat-notification noise.
function normalizeLink(link) {
const u = new URL(link);
// common LinkedIn/Indeed/job-board tracking params
['utm_source','utm_medium','utm_campaign','ref'].forEach(p => u.searchParams.delete(p));
u.hash = '';
u.pathname = u.pathname.replace(/\/$/, '');
return u.toString();
}
So job.com/123?utm_source=linkedin and job.com/123 collapse to one identity.
Human-in-the-loop actions
Pending jobs can be approved/rejected, and approved jobs can be re-rejected when needed.
Freshness timestamps
I surfaced update timestamps for queue/new-matches/outbox/runs so stale data is obvious immediately.
Problems I Hit (and Fixes)
Duplicate jobs were sent repeatedly
Cause: send path consumed pending data without strict dedupe checks.
Fix: dedupe by normalized link + skip sent + clear/rotate pending state.
Measured impact: in one cleanup run, 222 duplicate already-sent alerts were removed/skipped.
Dashboard loaded as blank screen
Cause: HTML shell served, but bundled assets were not.
Fix: serve Vite dist/index.html and dist/assets/* correctly.
Metrics showed zeros in always-on mode
Cause: wrong workspace context in daemon environment.
Fix: export explicit WORKSPACE in service runner.
Semantic search key errors in UI
Cause: embeddings key not available at runtime.
Fix: configure .env, restart service, reindex cache, add keyword fallback UX.
UX Improvements That Actually Helped
- Collapsible queue sections
- Top-5 with “show more”
- Fit-score color chips
- Relative timestamps (“12m ago”)
- Sticky section headers
- Better rules-form alignment and spacing
Small improvements, big operator-speed gain.
The daily operating loop (what changed behavior)
Before: browse jobs, lose context, repeat tomorrow.
Now:
- Check freshness + queue counters
- Process pending approvals
- Re-check approved jobs for quality
- Tune rules based on noise patterns
- Review funnel and recent runs
That loop made the system compounding: each day’s actions improve next day’s signal.
What I Learned
- Automation without observability is brittle.
- Canonical state design beats post-hoc patching.
- Service environment parity is critical (shell vs daemon).
- Graceful degradation is mandatory for UX trust.
- UI polish improves decision quality, not just aesthetics.
What’s Next
- Batch actions for approved jobs
- Better source confidence/ranking
- Saved rule presets and custom views
- Health checks + stale-state self-healing
- Enhanced analytics for conversion trends
Repro Notes
If you’re replicating this approach:
- start with queue states and dedupe key,
- build freshness visibility early,
- add always-on service controls before scaling sources,
- keep secrets in
.envand ensure.envis gitignored, - and only then optimize UX.
That sequence saved me a lot of rework.
Final take
If your current job search feels noisy and untrackable, don’t start by adding more scripts. Start by designing the operating model: states, dedupe identity, and visibility. Once those are solid, automation becomes trustworthy — and compounding.
Are you building something similar or want the OpenClaw implementation details? Drop a comment or reach out — I’m happy to share the practical playbook.
Top comments (0)