📦 Repo: github.com/linkc0829/go-chatgpt-tasks
This is Part 1 of 2. Part 1 builds the scheduler from scratch — the MCP interface, the queue-decoupled execution path, and the AI workflow that drove it. Part 2 takes this prototype and hardens it for multi-tenant production (tenancy, DST-safe recurrence, idempotency, job chains, LLM reliability, observability).
The brief was deceptively small: build a job scheduler with an MCP (Model Context Protocol) interface. Users schedule tasks via MCP tool calls, a background watcher scans for due jobs and pushes them onto a queue, workers pull and execute them, and the whole thing supports create / list / status / cancel. The interesting constraints were the ones hiding behind that one-liner: how do you make it scale (the design target was 10K jobs/sec), and how do you make execution reliable (at-least-once delivery without double-running side effects)?
This post covers two things:
- The AI workflow that turned that one-liner into a designed, sliced, verifiable feature
- The high-level design and the system-design decisions behind the prototype
1. The AI Workflow: Research and Design Before Code
The failure mode of "AI, build me a task scheduler" is that the model starts editing files in minute two — before anyone has agreed on what's actually being built, or even mapped what already exists. I used a staged workflow (internally I call it QRSPI) that produces a written artifact at every gate, and nothing downstream starts until the upstream artifact exists.
1. Question → neutral research questions (no opinions)
2. Research → objective answers, grounded in the actual code
3. Design → where are we going, and why
4. Structure → vertical slices + test checkpoints
5. Plan → the tactical, file-by-file working doc
6. Worktree → isolated git worktree for implementation
7. Implement → execute phase-by-phase, verify each
8. PR → description grounded in design + the real diff
What this bought me on a greenfield feature:
Research mapped the terrain before designing on it. The first pass answered factual questions about the existing codebase: Is there any scheduler/cron/worker precedent? Is Redis wired up? How does the app lifecycle start and stop goroutines? How many inbound transports exist? The answers were sobering and useful — there was no background-loop precedent, Redis was wired but passed around as an unused
_, the lifecycle ran exactly one goroutine, and there was exactly one transport (HTTP). Every later decision referenced those facts instead of a guess.Design named the non-goals. Before any code, the design doc wrote down what we were NOT doing: no Python server (the ticket's Python verification commands became Go equivalents), no real task side effects (a stub executor), no auth on the MCP transport this pass, no actually load-testing 10K jobs/sec — just building the shape that target implies. Naming the non-goals is what kept a "small" scheduler from sprawling.
Structure sliced by demoability. Four phases, each independently valuable: (1) the MCP CRUD surface, (2) the watcher + queue, (3) the worker pool + execution + DLQ, (4) recurring re-scheduling. If phase 3 stalled, phases 1–2 still delivered a working MCP surface and a queue-publishing watcher.
The meta-lesson: on a feature with no local precedent, the research phase is where the value is. Half the design decisions were really "establish the first instance of a pattern this repo will reuse" — and you can only know that if you've mapped what exists first.
2. High-Level Design
The codebase is a hexagonal (ports & adapters) Go backend with a feature-first layout: each feature is one package under internal/<feature>/ following an 11-file structure (domain, service, ports, adapters as separate files), with composition happening only in internal/bootstrap/wire.go. The scheduler became a new internal/task/ slice plus a second inbound transport.
The core idea is a producer/consumer split with a queue in the middle, so the thing that finds due work and the thing that executes it can scale independently:
The lifecycle of a job, read top-down:
- An MCP tool call (
task.create) goes through a registry into the sharedtask.Service, which persists aJob(the definition) and a firstJobRuninpendingstatus. - The watcher periodically queries for
pendingruns due within a 5-minute window, pushes each to Redis, and marks itqueued. - A worker from the pool consumes the stream, runs the (stub)
Executor, and drives the state machine:queued → running → success, orretry → requeue, or after max attemptsfailed → DLQ. Every transition appends aRunEvent. - The recurring watcher polls those
RunEvents; when a recurring job's run reaches a terminal state, it creates the nextpendingJobRun.
The same task.Service is wired into two processes — cmd/api (HTTP API + the background goroutines) and cmd/mcp (the stdio MCP server) — sharing one Postgres and one Redis.
3. System-Design Decisions
These are the calls that mattered, with the reasoning the workflow forced into writing.
Redis Streams as the queue
XADD to enqueue, a consumer group + XREADGROUP for per-message exclusivity, XACK on success, and XAUTOCLAIM (idle-based reclaim) for the visibility-timeout redelivery the brief required. A separate stream is the DLQ. This gives at-least-once delivery natively; pairing it with job_run_id as an idempotency key gives best-effort exactly-once. Streams hit the sweet spot — richer than a plain list, far less operational weight than Kafka for a prototype.
Why a queue between watcher and worker at all
A single cron that both scans and executes can't keep up at the target rate, and has nowhere to put a failed job. The queue decouples producer from consumer so workers scale horizontally; it lets a worker requeue and retry on failure; it's the layer that provides the at-least-once guarantee; and it gives failed-past-max-retries jobs a DLQ to land in for later review.
Time-bucket partitioning for the watcher query
Instead of SELECT … WHERE scheduled_at <= now() scanning an ever-growing table, job_runs carries an hourly time_bucket column and the watcher filters status='pending' AND time_bucket = $bucket AND scheduled_at <= now()+5min, backed by a composite index. At 600K jobs/minute a naive predicate forces the DB to collect matching rows across the whole table; bucketing keeps each scan local. (Native Postgres range partitioning was the aspiration; the prototype ships the single-table + index form, which satisfies the same query pattern.)
Supervised goroutines, never fire-and-forget
The existing app ran a single goroutine and had a fixed-order sequential shutdown. The watcher, worker pool, and recurring watcher all launch under a derived context.Context and a sync.WaitGroup inside App.Run; App.Shutdown cancels the context and waits for every goroutine to drain before closing Redis and Postgres. No leaked goroutines, no work cut off mid-flight. This deliberately extended the lifecycle pattern rather than bolting background loops on the side.
Domain model: Job / JobRun / RunEvent
Job is the definition (one-off or recurring + schedule spec). JobRun is a single execution attempt with a status. RunEvent is an append-only audit log of every transition — and crucially, it's the thing the recurring watcher polls to decide when to schedule the next run. Entities hold unexported fields, validate invariants in New* constructors, and expose pure, context-free transition methods (MarkRunning, MarkSuccess, MarkRetry, MarkFailed, Cancel). Zero-value-invalid states are impossible to construct.
A registry, not an if-else chain, for MCP tools
Tool dispatch is a map[string]toolHandler with O(1) routing. The pragmatic reason: adding the 20th tool shouldn't mean editing a 20-branch conditional. The naming convention (task.create, not createTask) is a namespace + action-verb pattern that helps the LLM on the other end pick the right tool more reliably.
A pluggable, stubbed Executor
Real task execution sits behind an Executor port; the prototype ships a StubExecutor that logs and marks success (with configurable failure for tests). This was deliberate: it exercises the entire queue / retry / DLQ / event machinery without committing to real side effects, so the reliability behavior is fully testable before any real handler exists. (Part 2 replaces this stub with a real job-type-dispatching executor and an LLM reliability layer.)
A Few Things Worth Stealing
- Research the existing codebase before you design. Half my "decisions" were really "this is the first instance of a pattern" — only visible because research established there was no precedent.
- Decouple producer from consumer with a queue. The watcher finds work; workers execute it; the queue absorbs bursts, enables retries, and gives you a DLQ for free.
- At-least-once + an idempotency key beats chasing exactly-once. Let the queue guarantee delivery and make the work idempotent.
-
Supervise your goroutines. A derived context plus a
WaitGroupturns "background loops" into something that shuts down cleanly instead of leaking. - Stub the side effect, exercise the machinery. A pluggable stub executor let me prove the whole retry/DLQ path before writing a single real handler.
That's the prototype: an MCP-driven, queue-decoupled scheduler with a clean state machine and a testable execution path — but single-tenant, timezone-naive, with a no-op executor and no idempotency on real side effects. Part 2 is the production-hardening cycle that fixes every one of those.
Next: **Part 2 — Hardening the scheduler for production: tenancy, DST-safe recurrence, idempotency under at-least-once delivery, linear job chains, an LLM reliability layer, and observability.

Top comments (0)