Building an AI Marketing Platform: Sprint 0 Retrospective — What We Built, What Failed, and How AI Did the Work

#orchestrate #ai #devops #transparency

We just finished Sprint 0 of a project to rebuild a LinkedIn campaign management platform from scratch — using AI agents as the primary developers, operating under a strict agile methodology enforced by an MCP (Model Context Protocol) server. This post is the honest record: what we attempted, what we actually built, what failed, and how AI participated in every phase.

What Is This Project?

ORCHESTRATE is a marketing platform that manages content scheduling across LinkedIn pages. The V2 system — a 102-tool MCP server with React UI, Docker deployment, and 4 active LinkedIn pages — has been running in production. V3 is an ambitious expansion: YouTube integration, podcast generation, audio narration, AI-assisted news generation, and multi-channel publishing at scale.

Sprint 0 was pure infrastructure. No new features. No UI changes. Just the foundation that V3 needs to exist.

What We Attempted

The original Sprint 0 plan had 5 stories covering:

V2-to-V3 Data Migration — Bridge to move posts, pages, and activity data from V2's JSON files to V3's structured format
Credential Rotation — Automated lifecycle management for API credentials (LinkedIn, Dev.to, Printify)
Backup System — Create, restore, and verify database backups with integrity checking
Publishing Reliability — Verify the Dev.to blog publishing pipeline still works
Data Integrity Suite — Migration verification with 5-check validation and rollback procedures

Two stories were descoped mid-sprint (OAS-040 and OAS-041 were reconstituted as OAS-042 and OAS-073) — the right call, since the original scope was too ambitious for a foundation sprint.

What We Actually Built

18 tickets completed through full documentation-driven TDD. Every ticket went through 8 phases: write docs → bind docs → write failing tests → verify failures → implement → refactor → validate → done.

The concrete deliverables:

V2 Migration Bridge: Dry-run preview, full migration execution, rollback with audit trail, idempotent re-migration. Every migrated record gets a SHA-256 checksum and an append-only audit entry linking V2 source to V3 target.
Backup Manager: Create/restore backups with SHA-256 integrity verification. Retention policy: keep last 7 daily + 4 weekly backups, auto-delete beyond that. Restore verifies checksum before returning data.
Migration Verifier: 5-check validation suite — record count reconciliation, field completeness, referential integrity, stratified sample verification, and analytics preservation (with epsilon 0.01 for float drift). Generates a PASS/FAIL report with detailed diagnostics.
Credential Rotation Scheduler: Composable CredentialStore + AlertingService. Handles create/rotate/expire lifecycle, configurable rotation windows (30/60/90 day), overlapping validity during rotation, and multi-provider support.
Publishing Reliability: Verified Dev.to API connectivity, blog format compliance, and repaired publishing path.
Docker & Database Foundation: Multi-service Docker Compose, health check endpoints, SQLite databases with WAL mode, concurrency testing, startup validation.

468 tests. 464 passing. 4 pre-existing Dev.to API failures (credential-related, not code bugs). Zero flaky tests.

What Worked

Documentation-Driven TDD is genuinely effective. Writing documentation before tests before code forces contract thinking. You discover edge cases during the doc phase — "what happens if the activity line is malformed JSON?" — before writing a single test. Every service emerged with cleaner interfaces than if we'd coded first.

In-memory architecture was the right Sprint 0 call. All services use TypeScript Maps and arrays — no database, no file I/O, no external dependencies. This made tests fast (~0.01s per test), deterministic (no flaky tests from I/O timing), and eliminated environment-specific configuration. The trade-off: zero data survives a process restart. That's acceptable for Sprint 0 but becomes the primary risk for Sprint 1.

The audit trail pattern proved its value. The V2 Migration Bridge creates an append-only audit record for every migrated item, linking V2 source ID → V3 target ID with a SHA-256 checksum of the source data. This means rollback isn't "delete everything" — it's "trace each migrated record back to its source and remove precisely that." The migration verifier can then validate the rollback was clean.

Stratified sampling in verification catches issues efficiently. Even at 10% sample rate, the verifier samples at least 1 record per entity type (posts, pages, activity), evenly spaced across the dataset. This means a single corrupted record in any entity type gets caught without scanning everything.

What Failed

Sprint planning overestimated capacity. We planned 5 stories and had to descope 2 mid-sprint. The DD TDD ceremony — 8 phases per ticket with evidence comments at each — adds real overhead. Our retrospective concluded that 18 infrastructure tickets is realistic velocity, but feature tickets (with more complex integration) should target 15-16.

No user-visible progress. Sprint 0 delivered zero features that users can see or interact with. The 4 LinkedIn pages continue operating on V2 infrastructure unchanged. This is correct engineering (build foundations first) but creates stakeholder communication challenges.

Test fixtures are duplicated. The V2 test data (pages, posts, activity lines) is copy-pasted across multiple test files. This is technical debt — a change to the fixture format requires updating every copy. Sprint 1 should extract shared fixtures to a tests/fixtures/ module.

Security is deferred. Credentials are stored in-memory with no encryption at rest. No integration with external secret managers. No access audit logging. This is acceptable while everything is in-memory (credentials vanish on restart anyway) but must be addressed before any production deployment with persistence.

The 4 Dev.to API test failures persisted all sprint. These tests fail because they hit a real external API that returns 422 errors — not code bugs, but they pollute the test report and should be fixed or explicitly skipped with documentation.

How AI Participated

This is where it gets interesting. Sprint 0 was executed entirely by AI agents operating under the ORCHESTRATE Agile MCP framework. Here's what that means concretely:

The MCP server mechanically enforces methodology. It's not a suggestion system — it blocks actions that violate the lifecycle. Try to write code without completing inception? Blocked. Try to skip the VERIFY phase in TDD? Blocked. Try to create a ticket without DONE criteria? Blocked. The AI agents operate within these constraints, not around them.

AI personas own specific domains. Each ticket is assigned a named AI persona based on content — Query Quinn (database) for migration work, Guard Ian (security) for credentials, Pip Line (DevOps) for infrastructure. The persona brings domain expertise and decision style to every phase of the ticket. This isn't role-playing; it's scoped expertise application.

The retrospective was AI-conducted. 11 AI personas spoke in turn order (infrastructure-closest first: Pip Line → Query Quinn → Guard Ian → ... → Owen Pro), each giving an honest assessment of what they saw, what worked, what failed, and what they want changed. The retrospective identified 6 cross-persona themes and 3 unresolved tensions that will shape Sprint 1 planning.

Memory persists across conversations. Lessons learned and decisions made are stored in a program-level memory system. When Sprint 1 begins, the agents can recall "in-memory architecture was chosen for velocity but creates persistence risk" without re-discovering it. 5 lessons and 3 decisions from Sprint 0 are now stored and retrievable.

What AI can't do (yet): The MCP's recall_memory search is keyword-based, not semantic — specific technical queries return nothing while broad terms return everything. The TDD ceremony adds overhead that a human developer might shortcut judiciously; AI follows it literally every time. Context window exhaustion forces conversation restarts that create ramp-up overhead.

Provenance

Every claim in this post traces to source artifacts:

Ticket counts and status: OAS-072-T1 Work Artifacts & Logs document (18 tickets, all DONE)
Test counts: Sprint 0 test suite summary — 468 total, 464 passing, 4 pre-existing API failures
Retrospective themes and tensions: OAS-072-T3 Turn-Based Retrospective Ceremony — 11 persona statements, 6 themes, 3 tensions
Lessons and decisions: OAS-072-T4 Retrospective Summary & Decisions — 5 LESSON entries, 3 DECISION entries, 14 follow-up actions stored in program memory
Code artifacts: tool/mcp-server/src/services/ — v2-migration.ts, backup-manager.ts, migration-verifier.ts, credential-rotation-scheduler.ts
Descoping history: OAS-040 → OAS-042 and OAS-041 → OAS-073 reconstitution documented in sprint records

This post was drafted by an AI agent (Pip Line persona, Claude Opus 4.6) as ticket OAS-072-T5, following full DD TDD methodology. The draft was reviewed against three source documents before publication.

What's Next

Sprint 1 priorities (decided in retrospective):

SQLite persistence — swap in-memory storage to real databases while preserving service APIs
CI pipeline — automated test execution on push, build verification
At least one user-visible feature — demonstrate V3 progress to stakeholders

Target: 16 tickets. First real test of whether the velocity baseline from Sprint 0 holds when we add integration complexity.

This post is part of a series documenting the ORCHESTRATE Marketing Platform V3 build. Sprint 0 covered infrastructure. Sprint 1 will cover persistence, CI, and the first user-visible feature. All development is AI-agent-driven with full traceability.