DEV Community: Kan-Chen Lin

Hardening a Go Task Scheduler for Production (Part 2) — Tenancy, DST, Idempotency & LLM Reliability

Kan-Chen Lin — Mon, 15 Jun 2026 01:53:07 +0000

📦 Repo: github.com/linkc0829/go-chatgpt-tasks

This is Part 2 of 2. Part 1 built the scheduler from scratch — the MCP interface, the queue-decoupled watcher → worker execution path, and the AI workflow that drove it. This part takes that prototype and hardens it for multi-tenant production.

In Part 1 I built an MCP-driven task scheduler: a watcher scans for due jobs, a Redis Streams queue decouples it from a pool of workers, and a clean Job / JobRun / RunEvent state machine ties it together. It worked — but it was a prototype, and prototypes have tells.

It was MCP-only: no HTTP handler, no authentication, driven by an unauthenticated stdio server. There was no tenant_id or user_id anywhere. Recurrence was a naive interval_seconds added to the last run — no timezone or DST awareness. The run_events table was status-only (id, status, created_at) with no event type, payload, or error detail. The executor was a literal no-op stub. And while the Redis queue delivered at-least-once, there was no idempotency on real side effects.

This is the production-hardening cycle that fixed every one of those. Six problem areas, the internal/task/ package rewritten across seven vertical slices, ~8,200 lines of diff. This post covers:

A quick recap of the AI workflow (introduced in Part 1) and how it handled a large change
The high-level design it produced (with a diagram)
The system-design decisions and their trade-offs

1. The AI Workflow at Scale

Part 1 introduced the staged QRSPI workflow — Question → Research → Design → Structure → Plan → Worktree → Implement → PR — where each step produces a written artifact and nothing downstream starts until the upstream artifact exists. Where Part 1 used it to establish patterns in a greenfield feature, Part 2 leaned on it to keep a sprawling six-area change reviewable.

1. Question   → neutral research questions (no opinions)
2. Research   → objective answers, grounded in the actual code
3. Design     → where are we going, and why (+ explicit non-goals)
4. Structure  → vertical slices + test checkpoints
5. Plan       → the tactical, file-by-file working doc
6. Worktree   → isolated git worktree for implementation
7. Implement  → execute phase-by-phase, verify each
8. PR         → description grounded in design + the real diff

Three things mattered most on a change this size:

Research was read-only and opinion-free. Before touching anything, I re-mapped reality: what columns are on jobs / job_runs / run_events today? How does the recurring watcher compute the next run? What does the Redis consumer-group setup actually look like? So decisions referenced recurring_watcher.go:53, not a hallucinated guess.
Design recorded decisions and the rejected alternatives. The design doc explicitly listed the non-goals: fair/weighted scheduling, a real Anthropic adapter, full RFC-5545 RRULE, a DAG engine, exactly-once. Naming what we weren't doing is what stopped a six-area change from becoming a twelve-area one.
Structure forced altitude control. Seven slices, ordered by blast radius — identity first (every later slice needs it on the service signature), then the typed-event/metrics foundation, then quota, recurrence, idempotency, chains, and LLM reliability. Each slice crosses the full stack (migration → domain → service → API/MCP → tests) and ends in a concrete verification checkpoint. Each is independently shippable; drop a later one and the earlier ones still deliver.

The payoff: because the "why" already lived in design.md, the PR description explained reasoning instead of restating the diff — and a reviewer could read the design and structure docs and already know why every decision was made.

2. High-Level Design

The architecture is unchanged from Part 1 — hexagonal (ports & adapters), feature-first — but the task hexagon grew a lot of new surface. The biggest structural change: an authenticated HTTP API alongside the existing MCP transport, both funneling through the same service via an explicit identity.

Read it top-down: both client types funnel into the same service through an explicit Identity; the service talks only to interfaces in ports.go; adapters (Postgres, Redis, the LLM client, and user.Service as a cross-feature TenantLookup port) sit at the edge; and the worker drains the queue, executes idempotently behind the LLM reliability wrapper, and — like the service — emits typed events that feed Prometheus.

That user.Service-as-a-port wiring is worth a note: cross-feature communication never imports another feature directly. task declares a TenantLookup capability interface in its own ports.go, user.Service structurally satisfies it via Go's duck typing, and the composition root (bootstrap/wire.go) injects it. That's how task resolves a tenant from a user without ever importing internal/user.

3. System-Design Decisions

These are the calls that mattered, with the reasoning the workflow forced into writing.

Identity is an explicit argument, not a context value

Service methods take Identity{TenantID, UserID} as a first-class first argument, rather than smuggling it through context.Context. The reason: two very different callers (HTTP, where identity comes from a JWT subject, and MCP, where it comes from a service principal) must supply identity the same way. An explicit parameter makes the dependency visible in the signature and impossible to forget. For v1, with no tenants table in the spec, tenant_id is resolved 1:1 from the user via the TenantLookup port.

Stacked migrations: add-nullable → backfill → NOT NULL

The tables already existed in production-shaped form (from Part 1), so each schema change is a three-step dance within one migration pair: add the column nullable, backfill a sensible default (timezone_id = 'UTC', interval_seconds → equivalent recurrence_rule), then enforce NOT NULL. Heavier than a greenfield template needs, but it's the price of evolving a live schema safely.

A hand-rolled RRULE subset, not a dependency

Recurrence supports FREQ=DAILY|WEEKLY (+ optional INTERVAL) with a local_time and an IANA timezone_id, computed over the stdlib's time.LoadLocation. DST is the whole point: 08:00 America/New_York must stay 8 AM local across a spring-forward boundary. The DST policy is explicit — skipped local times roll to the next valid instant, ambiguous ones pick the first occurrence — and the decision is recorded as a RunEvent so it's auditable. No external RFC-5545 library; the surface we need is small, and a full RRULE engine is a non-goal.

At-least-once delivery + idempotency, not exactly-once

Part 1 gave us at-least-once delivery via Redis Streams consumer-group reclaim. Part 2 makes the handlers idempotent rather than chasing distributed exactly-once (a tar pit). An idempotency_records table backs a check → in-progress → side effect → completed contract, and JobRunMsg now carries an idempotency_key. Duplicate delivery of the same message runs the side effect exactly once — verified by a fake handler that counts its own calls.

Single stream now, fairness deferred — but the door is left open

Fair/weighted per-tenant scheduling was explicitly deferred. But tenant_id was added to the queue payload now, and a cheap intra-batch per-tenant round-robin (Worker.fairOrder) reorders only the messages a worker already read. That's a bounded down-payment on fairness — no per-tenant queues, no cross-worker coordination — that avoids a queue rewrite later.

LLM behind a port + a reliability wrapper + a fake

Part 1's no-op StubExecutor is replaced by an executor that dispatches by job type and wraps an LLMClient interface in a reliability layer: context.WithTimeout, output-schema validation, retry within a budget, and a pre-run cost estimate checked against the tenant's max_daily_llm_cost_cents. Invalid LLM output is never marked success — it retries, then fails. A deterministic fake client backs all tests; a real Anthropic adapter is a thin, deliberately-later add. The interesting logic lives in the wrapper, not the vendor SDK, so it's fully testable without a live API key.

Typed events + feature-level metrics

run_events graduated from status-only to event_type TEXT + event_payload JSONB with error code/message fields, and every lifecycle transition emits one. Alongside, the task feature got its own Prometheus vectors (task_runs_total{status}, task_run_duration_seconds, task_dlq_total, per-tenant quota rejections). The diff also ships a Grafana dashboard JSON and alert rules — though actual dashboard provisioning is correctly scoped as infra work, not part of the code change.

A Few Things Worth Stealing

Write the non-goals down. The single most effective scope-control tool was an explicit "What We're NOT Doing" list — and it's where you put the almost-deferred items (like intra-batch fairness) with a note on exactly how far they go.
Slice by blast radius, end every slice in a checkpoint. Identity went first because every later signature depended on it. Each of the seven phases was independently shippable.
Idempotency beats exactly-once. If you're tempted to chase distributed exactly-once, redirect that energy into making handlers idempotent under at-least-once delivery.
Keep external dependencies behind a port with a fake. The LLM reliability layer was fully testable before a single real API call existed.
Let the design doc write the PR. If your "why" already exists in writing, the pull request stops being a chore and becomes genuinely useful to reviewers.

The throughline across both parts: a disciplined, artifact-producing AI workflow turns both a greenfield build and a sprawling six-area hardening pass into something a reviewer can actually follow — because every decision was written down before it was coded, not reverse-engineered from the diff afterward.

Missed the start? *Part 1 — Building the scheduler** covers the MCP interface, the watcher → queue → worker execution path, and the design decisions behind the prototype.*

Building a ChatGPT Task Scheduler in Go (Part 1) — MCP, Queues, and a Research-First AI Workflow

Kan-Chen Lin — Mon, 15 Jun 2026 01:46:14 +0000

📦 Repo: github.com/linkc0829/go-chatgpt-tasks

This is Part 1 of 2. Part 1 builds the scheduler from scratch — the MCP interface, the queue-decoupled execution path, and the AI workflow that drove it. Part 2 takes this prototype and hardens it for multi-tenant production (tenancy, DST-safe recurrence, idempotency, job chains, LLM reliability, observability).

The brief was deceptively small: build a job scheduler with an MCP (Model Context Protocol) interface. Users schedule tasks via MCP tool calls, a background watcher scans for due jobs and pushes them onto a queue, workers pull and execute them, and the whole thing supports create / list / status / cancel. The interesting constraints were the ones hiding behind that one-liner: how do you make it scale (the design target was 10K jobs/sec), and how do you make execution reliable (at-least-once delivery without double-running side effects)?

This post covers two things:

The AI workflow that turned that one-liner into a designed, sliced, verifiable feature
The high-level design and the system-design decisions behind the prototype

1. The AI Workflow: Research and Design Before Code

The failure mode of "AI, build me a task scheduler" is that the model starts editing files in minute two — before anyone has agreed on what's actually being built, or even mapped what already exists. I used a staged workflow (internally I call it QRSPI) that produces a written artifact at every gate, and nothing downstream starts until the upstream artifact exists.

1. Question   → neutral research questions (no opinions)
2. Research   → objective answers, grounded in the actual code
3. Design     → where are we going, and why
4. Structure  → vertical slices + test checkpoints
5. Plan       → the tactical, file-by-file working doc
6. Worktree   → isolated git worktree for implementation
7. Implement  → execute phase-by-phase, verify each
8. PR         → description grounded in design + the real diff

What this bought me on a greenfield feature:

Research mapped the terrain before designing on it. The first pass answered factual questions about the existing codebase: Is there any scheduler/cron/worker precedent? Is Redis wired up? How does the app lifecycle start and stop goroutines? How many inbound transports exist? The answers were sobering and useful — there was no background-loop precedent, Redis was wired but passed around as an unused _, the lifecycle ran exactly one goroutine, and there was exactly one transport (HTTP). Every later decision referenced those facts instead of a guess.
Design named the non-goals. Before any code, the design doc wrote down what we were NOT doing: no Python server (the ticket's Python verification commands became Go equivalents), no real task side effects (a stub executor), no auth on the MCP transport this pass, no actually load-testing 10K jobs/sec — just building the shape that target implies. Naming the non-goals is what kept a "small" scheduler from sprawling.
Structure sliced by demoability. Four phases, each independently valuable: (1) the MCP CRUD surface, (2) the watcher + queue, (3) the worker pool + execution + DLQ, (4) recurring re-scheduling. If phase 3 stalled, phases 1–2 still delivered a working MCP surface and a queue-publishing watcher.

The meta-lesson: on a feature with no local precedent, the research phase is where the value is. Half the design decisions were really "establish the first instance of a pattern this repo will reuse" — and you can only know that if you've mapped what exists first.

2. High-Level Design

The codebase is a hexagonal (ports & adapters) Go backend with a feature-first layout: each feature is one package under internal/<feature>/ following an 11-file structure (domain, service, ports, adapters as separate files), with composition happening only in internal/bootstrap/wire.go. The scheduler became a new internal/task/ slice plus a second inbound transport.

The core idea is a producer/consumer split with a queue in the middle, so the thing that finds due work and the thing that executes it can scale independently:

The lifecycle of a job, read top-down:

An MCP tool call (task.create) goes through a registry into the shared task.Service, which persists a Job (the definition) and a first JobRun in pending status.
The watcher periodically queries for pending runs due within a 5-minute window, pushes each to Redis, and marks it queued.
A worker from the pool consumes the stream, runs the (stub) Executor, and drives the state machine: queued → running → success, or retry → requeue, or after max attempts failed → DLQ. Every transition appends a RunEvent.
The recurring watcher polls those RunEvents; when a recurring job's run reaches a terminal state, it creates the next pending JobRun.

The same task.Service is wired into two processes — cmd/api (HTTP API + the background goroutines) and cmd/mcp (the stdio MCP server) — sharing one Postgres and one Redis.

3. System-Design Decisions

These are the calls that mattered, with the reasoning the workflow forced into writing.

Redis Streams as the queue

XADD to enqueue, a consumer group + XREADGROUP for per-message exclusivity, XACK on success, and XAUTOCLAIM (idle-based reclaim) for the visibility-timeout redelivery the brief required. A separate stream is the DLQ. This gives at-least-once delivery natively; pairing it with job_run_id as an idempotency key gives best-effort exactly-once. Streams hit the sweet spot — richer than a plain list, far less operational weight than Kafka for a prototype.

Why a queue between watcher and worker at all

A single cron that both scans and executes can't keep up at the target rate, and has nowhere to put a failed job. The queue decouples producer from consumer so workers scale horizontally; it lets a worker requeue and retry on failure; it's the layer that provides the at-least-once guarantee; and it gives failed-past-max-retries jobs a DLQ to land in for later review.

Time-bucket partitioning for the watcher query

Instead of SELECT … WHERE scheduled_at <= now() scanning an ever-growing table, job_runs carries an hourly time_bucket column and the watcher filters status='pending' AND time_bucket = $bucket AND scheduled_at <= now()+5min, backed by a composite index. At 600K jobs/minute a naive predicate forces the DB to collect matching rows across the whole table; bucketing keeps each scan local. (Native Postgres range partitioning was the aspiration; the prototype ships the single-table + index form, which satisfies the same query pattern.)

Supervised goroutines, never fire-and-forget

The existing app ran a single goroutine and had a fixed-order sequential shutdown. The watcher, worker pool, and recurring watcher all launch under a derived context.Context and a sync.WaitGroup inside App.Run; App.Shutdown cancels the context and waits for every goroutine to drain before closing Redis and Postgres. No leaked goroutines, no work cut off mid-flight. This deliberately extended the lifecycle pattern rather than bolting background loops on the side.

Domain model: `Job` / `JobRun` / `RunEvent`

Job is the definition (one-off or recurring + schedule spec). JobRun is a single execution attempt with a status. RunEvent is an append-only audit log of every transition — and crucially, it's the thing the recurring watcher polls to decide when to schedule the next run. Entities hold unexported fields, validate invariants in New* constructors, and expose pure, context-free transition methods (MarkRunning, MarkSuccess, MarkRetry, MarkFailed, Cancel). Zero-value-invalid states are impossible to construct.

A registry, not an if-else chain, for MCP tools

Tool dispatch is a map[string]toolHandler with O(1) routing. The pragmatic reason: adding the 20th tool shouldn't mean editing a 20-branch conditional. The naming convention (task.create, not createTask) is a namespace + action-verb pattern that helps the LLM on the other end pick the right tool more reliably.

A pluggable, stubbed `Executor`

Real task execution sits behind an Executor port; the prototype ships a StubExecutor that logs and marks success (with configurable failure for tests). This was deliberate: it exercises the entire queue / retry / DLQ / event machinery without committing to real side effects, so the reliability behavior is fully testable before any real handler exists. (Part 2 replaces this stub with a real job-type-dispatching executor and an LLM reliability layer.)

A Few Things Worth Stealing

Research the existing codebase before you design. Half my "decisions" were really "this is the first instance of a pattern" — only visible because research established there was no precedent.
Decouple producer from consumer with a queue. The watcher finds work; workers execute it; the queue absorbs bursts, enables retries, and gives you a DLQ for free.
At-least-once + an idempotency key beats chasing exactly-once. Let the queue guarantee delivery and make the work idempotent.
Supervise your goroutines. A derived context plus a WaitGroup turns "background loops" into something that shuts down cleanly instead of leaking.
Stub the side effect, exercise the machinery. A pluggable stub executor let me prove the whole retry/DLQ path before writing a single real handler.

That's the prototype: an MCP-driven, queue-decoupled scheduler with a clean state machine and a testable execution path — but single-tenant, timezone-naive, with a no-op executor and no idempotency on real side effects. Part 2 is the production-hardening cycle that fixes every one of those.

Next: **Part 2 — Hardening the scheduler for production: tenancy, DST-safe recurrence, idempotency under at-least-once delivery, linear job chains, an LLM reliability layer, and observability.

From Template to Production-Shaped: An AI-Native Dev Flow for Go Side Projects

Kan-Chen Lin — Tue, 26 May 2026 02:07:42 +0000

I wanted my next side project to look like the kind of code I'd ship at work — hexagonal architecture, sqlc, depguard, integration tests — without the usual side-project tax of spending three evenings on scaffolding before writing the first line of domain logic. So I built it twice. First, I forked a Go backend template I'd been hardening for months. Then I drove every feature on top of it through a structured AI workflow I call qrspi: question → research → structure → plan → implement.

The product itself is unremarkable on purpose: a QR code generator. Paste a URL, get back a scannable PNG and a /r/:token redirect, with per-link scan counts and a soft-delete kill switch. The interesting part — the part I'd want a reviewer to look at — is the process that produced it.

Repo: linkc0829/go-qrcode-generator. Every artifact mentioned in this post is committed there.

Step 1: Choose the template, then commit to its rules

Step zero was actually choosing what to build on. I shortlisted several Go backend templates, walked through each one with Claude to pressure-test the architecture, and landed on the one I'd been hardening for a while: linkc0829/go-backend-template.

The template is a feature-first hexagonal Go backend. Each feature lives in a single package under internal/<feature>/, and inside that package, domain.go, service.go, ports.go, and the adapters sit side-by-side as separate files. The Go package boundary is the hexagon edge.

What makes it stick is depguard in .golangci.yml. The build fails if:

domain.go imports anything beyond stdlib and shared value objects
service.go reaches for a driver or web framework
handler_*.go touches a repo or cache directly
one feature imports another feature

That last rule is the one that pays the most rent. Cross-feature dependencies are forced through capability ports — feature A defines an interface named after the capability it needs, and the composition root in internal/bootstrap/wire.go injects feature B's service to satisfy it. The features never know about each other.

This was the first decision I had to actually live with. The template ships with demo user, order, and payment slices. My project has no orders and no payments. The rule is: don't leave dead code as "future scaffolding." Delete the whole slice — the package, the wire block, the SQL queries, the migration tables, the OpenAPI paths, the depguard block. make lint && make test after each removal flushes out dangling references. By the time I started writing QR code logic, the repo only knew about things that existed.

Step 2: Build the spec from a real system-design prompt

The functional spec came from a system-design exercise I'd worked through separately:

Generate a QR code from a URL
302 redirect through our server on every scan (so we can count, and so we can kill a link)
Targets: redirect latency < 100 ms, 1B codes, 100M users

The high-level design called out the load shape — read-heavy, one write to thousands of reads — and the levers that fall out of it: stateless API behind a gateway, cache qr_token → image_url, CDN the PNGs, index on qr_token. Tokens were originally specified as base62(SHA-256(url + user_secret)).

For the local build, I wrote down explicit deviations from the spec rather than pretending they didn't exist:

Tokens use 96-bit crypto/rand → base64url. Loses idempotency for repeated (user, url) pairs but avoids the deterministic-token leak surface.
The CDN tier is dropped. The browser fetches PNGs directly from MinIO using its anonymous download bucket policy. Same architectural shape as S3+CloudFront, minus the edge cache.
Soft delete and PUT/DELETE endpoints land in a later slice.

Writing the deviations down up front is the part that makes the design honest. It's also the part that makes a portfolio reviewer's job easier — they can see what was traded and why, not just what got built.

Step 3: qrspi — the workflow that does the actual building

The workflow idea started from Research-Plan-Implement (RPI). QRSPI is an 8-phase extension of it that I picked up from community discussions and adapted for this project.

Once the spec was on paper, every feature went through the same eight phases. Each phase is a slash command backed by a skill, and each one writes its artifact to thoughts/qrspi/<date>-<slug>/:

/qrspi:1_question — decompose the ticket into neutral research questions. No opinions yet.
/qrspi:2_research — answer the questions by reading the codebase. Facts only.
/qrspi:3_design — discuss where we're going before how. Trade-offs surface here.
/qrspi:4_structure — outline vertical slices with test checkpoints.
/qrspi:5_plan — the tactical implementation plan; my working document.
/qrspi:6_worktree — isolated git worktree so the main checkout stays clean.
/qrspi:7_implement — execute the plan phase by phase, verifying at each checkpoint.
/qrspi:8_pr — open a PR that carries the design context forward into review.

The MinIO feature shows the whole thing on disk: a ticket.md pulled from the Notion source via MCP, then questions.md, research.md, design.md, structure.md, and plan.md. Each one builds on the last. By the time implementation starts, the agent isn't guessing — it's executing a plan I already agreed with.

The follow-up Redis redirect cache shipped the same way. The plan called out the read-heavy shape, picked a write-behind click-count buffer to avoid hammering Postgres on every scan, and named the cache invariants explicitly. The implementation was almost mechanical because the design phase had already resolved the interesting questions.

What this buys, and what it costs

The cost is real: each feature carries five or six markdown files of design artifacts. For a single-developer side project, that's overhead I wouldn't tolerate in a freeform sketch.

What it buys:

The diff is reviewable. Every commit is small, scoped, and traceable back to a design decision.
The architecture holds. depguard catches the slow-drift violations (handler reaches into a repo, feature A imports feature B) the moment they appear, not three months later.
The agent stays useful past 2,000 LOC. Most AI-coding flows degrade as the codebase grows because the model loses the plot. Writing the plot down — in design.md, in plan.md — keeps the next session grounded.
The portfolio story is the process, not the artifact. Anyone can ship a QR code generator. Shipping one where the architecture, the trade-offs, and the deviations from spec are all written down on disk is a different signal.

The template is on GitHub; the qrspi artifacts are committed alongside the code. If you want to see how a single feature flows from ticket to PR, the MinIO slice is the cleanest example. The architecture ADRs in docs/adr/ cover the two foundational decisions: feature-first hexagonal, and sqlc over an ORM.

Next up: a metrics slice (Prometheus + a Grafana dashboard for redirect latency), and a proper deletion follow-up so the spec's full CRUD surface lands. Both will go through qrspi. That's the point.

DEV Community: Kan-Chen Lin

Hardening a Go Task Scheduler for Production (Part 2) — Tenancy, DST, Idempotency & LLM Reliability

1. The AI Workflow at Scale

2. High-Level Design

3. System-Design Decisions

Identity is an explicit argument, not a context value

Stacked migrations: add-nullable → backfill → NOT NULL

A hand-rolled RRULE subset, not a dependency

At-least-once delivery + idempotency, not exactly-once

Single stream now, fairness deferred — but the door is left open

LLM behind a port + a reliability wrapper + a fake

Typed events + feature-level metrics

A Few Things Worth Stealing

Building a ChatGPT Task Scheduler in Go (Part 1) — MCP, Queues, and a Research-First AI Workflow

1. The AI Workflow: Research and Design Before Code

2. High-Level Design

3. System-Design Decisions

Redis Streams as the queue

Why a queue between watcher and worker at all

Time-bucket partitioning for the watcher query

Supervised goroutines, never fire-and-forget

Domain model: Job / JobRun / RunEvent

A registry, not an if-else chain, for MCP tools

A pluggable, stubbed Executor

A Few Things Worth Stealing

From Template to Production-Shaped: An AI-Native Dev Flow for Go Side Projects

Step 1: Choose the template, then commit to its rules

Step 2: Build the spec from a real system-design prompt

Step 3: qrspi — the workflow that does the actual building

What this buys, and what it costs

Domain model: `Job` / `JobRun` / `RunEvent`

A pluggable, stubbed `Executor`