Tony Lewis

Posted on Apr 3

The Part of My AI Stack That Isn't AI: Human Workers via MCP

#ai #automation #mcp #productivity

Everyone talks about MCP as the protocol for connecting AI to APIs. Stripe, HubSpot, Postgres, Gmail — plug in a server, get tools, let the AI call them.

But here's what nobody's writing about: MCP works just as well for dispatching tasks to humans.

I've been running a system where my AI orchestrates microtask workers — real people — through the same MCP tool interface it uses to call any other API. The AI creates campaigns, assigns tasks, monitors submissions, validates results, rates workers, and stores outputs. All through standard MCP tool calls. No custom integration. No separate dashboard. The human workforce is just another tool in the AI's toolkit.

The setup

Microtask platforms (Microworkers, Amazon Mechanical Turk, Toloka) have REST APIs. You can create tasks, assign them to workers, pull results, and rate quality programmatically. Building one of these as an MCP tool bundle takes the same effort as building tools for any other SaaS API.

Once you do, your AI gets tools like:

Create campaign — define a task, set price per completion, specify how many workers you need
List submissions — pull all worker responses for a campaign
Rate submission — mark work as accepted or rejected, with feedback
Get account balance — monitor spend

From the AI's perspective, these are identical to any other MCP tools. It calls them the same way it calls a Stripe API or a database query. The difference is that on the other end, a human does the work.

Why this matters

There's a category of tasks that AI handles badly and humans handle trivially. Signing up for a website. Navigating a UI that has no API. Confirming whether a physical location exists. Reading a CAPTCHA. Verifying that a phone number connects to a real business.

The conventional approach is to either skip these tasks or build brittle browser automation. Both are wrong. The correct abstraction is: route each task to whoever does it best.

Sometimes that's GPT-4. Sometimes that's a Python script. Sometimes that's a person in Nairobi who can complete the task in 90 seconds for $0.30.

MCP doesn't care which. The tool returns a result. The AI consumes it and continues.

What I learned running 300+ human tasks through MCP

Speed surprised me

Workers claim tasks within minutes of posting, not hours. The microtask workforce is global and online around the clock. I post a batch of 50 tasks at 3am my time and have results by breakfast.

Price per task is absurdly low for the right work

Simple tasks (visit a website, find a specific piece of information, paste it back) run $0.20–$0.40 each. More complex tasks (create an account, navigate multi-step flows, take screenshots as proof) run $0.40–$0.60. At these prices, redundancy is cheap — assign three workers to the same task and cross-validate their answers.

Worker quality varies wildly, and that's fine

Some workers are meticulous. Some paste garbage. The key insight is that quality control is a data problem, not a people problem. Track worker IDs across tasks. Build a quality score. Workers who consistently deliver good results get routed more work. Workers who submit garbage get excluded.

After a few hundred tasks, I had a clear tier list:

Tier	Workers	Behavior
Hire	~50	Consistently accurate, follows instructions
Neutral	~200	Variable quality, acceptable for simple tasks
Exclude	4	Submitted fake data, duplicated others' work

The exclude list is tiny. Most people do honest work when the task is clear and the pay is fair.

Per-task instructions are everything

My first campaign used generic instructions. Results were noisy — 30% of workers completed the wrong variant of the task because they picked whichever looked easiest. When I switched to unique, specific instructions per task (each worker gets exactly one assignment with step-by-step directions), accuracy jumped dramatically.

The AI generates these per-task instructions. It knows what each task requires, formats the instructions with the right URLs and field names, and submits them as template variables in the campaign creation call. The human gets a clear, unambiguous task. The AI gets a structured result back.

Escape hatches prevent wasted money

Not every task is completable. Sometimes the website requires a credit card. Sometimes the information doesn't exist. Workers need a clean way to say "I can't do this" without getting penalized.

I added explicit escape hatch options to every task: "REQUIRES_CC" and "NOT_AVAILABLE." Workers who correctly identify impossible tasks get rated the same as workers who complete possible ones. This sounds small but it changed the economics — I stopped paying workers to waste time on dead ends, and I stopped rejecting honest workers for reporting real blockers.

The AI handles the whole lifecycle

Here's what a typical batch looks like from the AI's perspective:

Select tasks — query a database for items that need human work
Generate instructions — create per-task directions from templates
Create campaign — call the microtask platform's API via MCP, submit all tasks
Wait — sleep, check back periodically (also an MCP tool)
Pull results — list all submissions, parse structured answers
Validate — test each result against a known-good source (API call, database lookup, HTTP request)
Rate workers — accept valid submissions, reject garbage
Store results — persist validated data
Update state — mark items as complete in the database

Every step is an MCP tool call. The AI doesn't need a human operator to run this loop. It dispatches to humans, validates their work, manages quality, and continues autonomously.

The cost math

Over 300+ tasks across multiple campaigns:

Metric	Value
Total spend	~$90
Average cost per task	$0.30
Tasks completed successfully	~75%
Tasks returned via escape hatch	~20%
Garbage submissions	~5%
Unique workers used	~250
Workers excluded for quality	4

$90 for 300 tasks that would have taken me days to do manually, or would have required building and maintaining fragile browser automation that breaks every time a website updates its UI.

The MCP tool definitions are identical to any other integration — same auth, same structured inputs and outputs, same orchestration. The only difference is that the "compute" on the other end is a human brain instead of a server.

What tasks in your pipeline should probably be done by a human instead of an AI? I'd genuinely like to know — drop a comment with your use case.

DEV Community