Kan-Chen Lin

Posted on Jun 15

Building a ChatGPT Task Scheduler in Go (Part 1) — MCP, Queues, and a Research-First AI Workflow

#ai #architecture #go #systemdesign

📦 Repo: github.com/linkc0829/go-chatgpt-tasks

This is Part 1 of 2. Part 1 builds the scheduler from scratch — the MCP interface, the queue-decoupled execution path, and the AI workflow that drove it. Part 2 takes this prototype and hardens it for multi-tenant production (tenancy, DST-safe recurrence, idempotency, job chains, LLM reliability, observability).

The brief was deceptively small: build a job scheduler with an MCP (Model Context Protocol) interface. Users schedule tasks via MCP tool calls, a background watcher scans for due jobs and pushes them onto a queue, workers pull and execute them, and the whole thing supports create / list / status / cancel. The interesting constraints were the ones hiding behind that one-liner: how do you make it scale (the design target was 10K jobs/sec), and how do you make execution reliable (at-least-once delivery without double-running side effects)?

This post covers two things:

The AI workflow that turned that one-liner into a designed, sliced, verifiable feature
The high-level design and the system-design decisions behind the prototype

1. The AI Workflow: Research and Design Before Code

The failure mode of "AI, build me a task scheduler" is that the model starts editing files in minute two — before anyone has agreed on what's actually being built, or even mapped what already exists. I used a staged workflow (internally I call it QRSPI) that produces a written artifact at every gate, and nothing downstream starts until the upstream artifact exists.

1. Question   → neutral research questions (no opinions)
2. Research   → objective answers, grounded in the actual code
3. Design     → where are we going, and why
4. Structure  → vertical slices + test checkpoints
5. Plan       → the tactical, file-by-file working doc
6. Worktree   → isolated git worktree for implementation
7. Implement  → execute phase-by-phase, verify each
8. PR         → description grounded in design + the real diff

What this bought me on a greenfield feature:

Research mapped the terrain before designing on it. The first pass answered factual questions about the existing codebase: Is there any scheduler/cron/worker precedent? Is Redis wired up? How does the app lifecycle start and stop goroutines? How many inbound transports exist? The answers were sobering and useful — there was no background-loop precedent, Redis was wired but passed around as an unused _, the lifecycle ran exactly one goroutine, and there was exactly one transport (HTTP). Every later decision referenced those facts instead of a guess.
Design named the non-goals. Before any code, the design doc wrote down what we were NOT doing: no Python server (the ticket's Python verification commands became Go equivalents), no real task side effects (a stub executor), no auth on the MCP transport this pass, no actually load-testing 10K jobs/sec — just building the shape that target implies. Naming the non-goals is what kept a "small" scheduler from sprawling.
Structure sliced by demoability. Four phases, each independently valuable: (1) the MCP CRUD surface, (2) the watcher + queue, (3) the worker pool + execution + DLQ, (4) recurring re-scheduling. If phase 3 stalled, phases 1–2 still delivered a working MCP surface and a queue-publishing watcher.

The meta-lesson: on a feature with no local precedent, the research phase is where the value is. Half the design decisions were really "establish the first instance of a pattern this repo will reuse" — and you can only know that if you've mapped what exists first.

2. High-Level Design

The codebase is a hexagonal (ports & adapters) Go backend with a feature-first layout: each feature is one package under internal/<feature>/ following an 11-file structure (domain, service, ports, adapters as separate files), with composition happening only in internal/bootstrap/wire.go. The scheduler became a new internal/task/ slice plus a second inbound transport.

The core idea is a producer/consumer split with a queue in the middle, so the thing that finds due work and the thing that executes it can scale independently:

The lifecycle of a job, read top-down:

An MCP tool call (task.create) goes through a registry into the shared task.Service, which persists a Job (the definition) and a first JobRun in pending status.
The watcher periodically queries for pending runs due within a 5-minute window, pushes each to Redis, and marks it queued.
A worker from the pool consumes the stream, runs the (stub) Executor, and drives the state machine: queued → running → success, or retry → requeue, or after max attempts failed → DLQ. Every transition appends a RunEvent.
The recurring watcher polls those RunEvents; when a recurring job's run reaches a terminal state, it creates the next pending JobRun.

The same task.Service is wired into two processes — cmd/api (HTTP API + the background goroutines) and cmd/mcp (the stdio MCP server) — sharing one Postgres and one Redis.

3. System-Design Decisions

These are the calls that mattered, with the reasoning the workflow forced into writing.

Redis Streams as the queue

XADD to enqueue, a consumer group + XREADGROUP for per-message exclusivity, XACK on success, and XAUTOCLAIM (idle-based reclaim) for the visibility-timeout redelivery the brief required. A separate stream is the DLQ. This gives at-least-once delivery natively; pairing it with job_run_id as an idempotency key gives best-effort exactly-once. Streams hit the sweet spot — richer than a plain list, far less operational weight than Kafka for a prototype.

Why a queue between watcher and worker at all

A single cron that both scans and executes can't keep up at the target rate, and has nowhere to put a failed job. The queue decouples producer from consumer so workers scale horizontally; it lets a worker requeue and retry on failure; it's the layer that provides the at-least-once guarantee; and it gives failed-past-max-retries jobs a DLQ to land in for later review.

Time-bucket partitioning for the watcher query

Instead of SELECT … WHERE scheduled_at <= now() scanning an ever-growing table, job_runs carries an hourly time_bucket column and the watcher filters status='pending' AND time_bucket = $bucket AND scheduled_at <= now()+5min, backed by a composite index. At 600K jobs/minute a naive predicate forces the DB to collect matching rows across the whole table; bucketing keeps each scan local. (Native Postgres range partitioning was the aspiration; the prototype ships the single-table + index form, which satisfies the same query pattern.)

Supervised goroutines, never fire-and-forget

The existing app ran a single goroutine and had a fixed-order sequential shutdown. The watcher, worker pool, and recurring watcher all launch under a derived context.Context and a sync.WaitGroup inside App.Run; App.Shutdown cancels the context and waits for every goroutine to drain before closing Redis and Postgres. No leaked goroutines, no work cut off mid-flight. This deliberately extended the lifecycle pattern rather than bolting background loops on the side.

Domain model: `Job` / `JobRun` / `RunEvent`

Job is the definition (one-off or recurring + schedule spec). JobRun is a single execution attempt with a status. RunEvent is an append-only audit log of every transition — and crucially, it's the thing the recurring watcher polls to decide when to schedule the next run. Entities hold unexported fields, validate invariants in New* constructors, and expose pure, context-free transition methods (MarkRunning, MarkSuccess, MarkRetry, MarkFailed, Cancel). Zero-value-invalid states are impossible to construct.

A registry, not an if-else chain, for MCP tools

Tool dispatch is a map[string]toolHandler with O(1) routing. The pragmatic reason: adding the 20th tool shouldn't mean editing a 20-branch conditional. The naming convention (task.create, not createTask) is a namespace + action-verb pattern that helps the LLM on the other end pick the right tool more reliably.

A pluggable, stubbed `Executor`

Real task execution sits behind an Executor port; the prototype ships a StubExecutor that logs and marks success (with configurable failure for tests). This was deliberate: it exercises the entire queue / retry / DLQ / event machinery without committing to real side effects, so the reliability behavior is fully testable before any real handler exists. (Part 2 replaces this stub with a real job-type-dispatching executor and an LLM reliability layer.)

A Few Things Worth Stealing

Research the existing codebase before you design. Half my "decisions" were really "this is the first instance of a pattern" — only visible because research established there was no precedent.
Decouple producer from consumer with a queue. The watcher finds work; workers execute it; the queue absorbs bursts, enables retries, and gives you a DLQ for free.
At-least-once + an idempotency key beats chasing exactly-once. Let the queue guarantee delivery and make the work idempotent.
Supervise your goroutines. A derived context plus a WaitGroup turns "background loops" into something that shuts down cleanly instead of leaking.
Stub the side effect, exercise the machinery. A pluggable stub executor let me prove the whole retry/DLQ path before writing a single real handler.

That's the prototype: an MCP-driven, queue-decoupled scheduler with a clean state machine and a testable execution path — but single-tenant, timezone-naive, with a no-op executor and no idempotency on real side effects. Part 2 is the production-hardening cycle that fixes every one of those.

Next: **Part 2 — Hardening the scheduler for production: tenancy, DST-safe recurrence, idempotency under at-least-once delivery, linear job chains, an LLM reliability layer, and observability.

DEV Community

Building a ChatGPT Task Scheduler in Go (Part 1) — MCP, Queues, and a Research-First AI Workflow

1. The AI Workflow: Research and Design Before Code

2. High-Level Design

3. System-Design Decisions

Redis Streams as the queue

Why a queue between watcher and worker at all

Time-bucket partitioning for the watcher query

Supervised goroutines, never fire-and-forget

Domain model: `Job` / `JobRun` / `RunEvent`

A registry, not an if-else chain, for MCP tools

A pluggable, stubbed `Executor`

A Few Things Worth Stealing

Top comments (0)

1. The AI Workflow: Research and Design Before Code

2. High-Level Design

3. System-Design Decisions

Redis Streams as the queue

Why a queue between watcher and worker at all

Time-bucket partitioning for the watcher query

Supervised goroutines, never fire-and-forget

Domain model: Job / JobRun / RunEvent

A registry, not an if-else chain, for MCP tools

A pluggable, stubbed Executor

A Few Things Worth Stealing

Domain model: `Job` / `JobRun` / `RunEvent`

A pluggable, stubbed `Executor`