Koustubh

Posted on Nov 4

Part 3: Building Station Station - Agent-OS Workflow in Action

#specdriven #agentos #ai #claude

In Parts 1 and 2, I introduced Spec-Driven Development and showed you the finished Station Station project—8 features, live on GitHub Pages, solving my real hybrid work compliance problem. But how did we actually get there? What does the agent-os workflow look like in practice?

This part walks you through the complete development process, using real examples from Station Station. No theoretical abstractions—just the actual workflow I followed to go from "I need to track my office attendance" to a deployed web application.

The Five Phases of Agent-OS

Agent-os structures development into five distinct phases, each with specific deliverables and human review checkpoints. For this project, I used Claude (Anthropic's AI) as the AI assistant throughout the entire workflow—from spec shaping to implementation.

AGENT-OS DEVELOPMENT WORKFLOW
════════════════════════════

1. Create Product  →  2. Shape Spec  →  3. Write Spec  →  4. Write Tasks  →  5. Implement
   Mission & Roadmap   Requirements      Technical Spec    Task Breakdown    AI + Human Code
         ↓                  ↓                  ↓                 ↓                 ↓
   [Human Review]     [Human Review]     [Human Review]    [Human Review]    [Human Review]

Notice the review checkpoints? That's deliberate. You're not waiting until the end to discover the AI misunderstood your requirements. You're validating assumptions at each phase before moving forward.

Let me show you what each phase actually looks like.

Phase 1: Create Product

The first step isn't writing code—it's defining what you're building and why. You start with just a raw idea, and agent-os helps you shape it into a structured product plan.

Here's how it actually works:

You start with a simple idea:
"I need to track my office attendance using my Myki train card data to meet my company's 50% hybrid work requirement."

Agent-os asks clarifying questions:

What problem are you solving? Who's the target user?
What are your key constraints? (Budget, timeline, technical preferences)
What features are must-haves vs nice-to-haves?
Do you have preferred technologies or deployment platforms?

You answer honestly:

Problem: I need proactive visibility into my attendance, not reactive manager notifications
User: Primarily me, but could be useful for other hybrid workers
Constraints: Zero hosting costs, mobile-first (I'll check this on my phone)
Tech: Python (I'm comfortable with it), React (modern and fast), GitHub Pages (free)

Agent-os generates three documents:

mission.md - The "why" behind the project:

Problem: Hybrid workers need transparency into office attendance compliance
Solution: Automated attendance tracking using Melbourne's Myki transit data
Target user: Me (and anyone else commuting via train for hybrid work)

roadmap.md - The "what" we're building:

8 features broken into 3 phases
Phase 1: Foundation (auth + API discovery)
Phase 2: Data Layer (extraction + processing)
Phase 3: Integration & UI (frontend + config)

tech-stack.md - The "how" we'll build it:

Backend: Python + Playwright for browser automation
Frontend: React + Tailwind CSS v4
Deployment: GitHub Pages (static site, zero hosting costs)

This took about 30 minutes—mostly answering questions and reviewing the generated documents. You don't write these from scratch; the AI creates them based on your answers. But you review and refine them to ensure they match your vision.

Human role here: You provide the raw idea and answer questions about your constraints, priorities, and technical preferences. AI can't guess that you want zero hosting costs or mobile-first design—that's domain knowledge only you have.

Phase 2: Shape the Spec

Once you have the product defined, you pick a feature from the roadmap and start spec shaping. This is where agent-os really shines—the spec-shaper agent asks clarifying questions to gather detailed requirements.

For the React frontend feature, the shaping process went like this:

Spec-shaper: "What data format will the frontend consume?"
Me: "JSON from GitHub raw URL, generated by Python backend"

Spec-shaper: "What visualizations do you need?"
Me: "Calendar view with attended days marked, monthly bar chart showing attendance trends, summary statistics"

Spec-shaper: "Mobile or desktop priority?"
Me: "Mobile-first. I'll be checking this on my phone most mornings."

Spec-shaper: "Should the calendar be read-only or interactive?"
Me: "Interactive—clicking an attended day should show which station and what time I tapped on."

This back-and-forth surfaces implementation details you might not think about upfront. By the end of shaping, you have a clear picture of what "build the frontend" actually means.

Important: This is iterative. The spec-shaper generates a draft spec based on your answers. You review it. If something's not quite right—maybe it misunderstood your intent, or you realized you forgot to mention a key requirement—you provide more input. The spec gets refined. You review again. This continues until you're satisfied.

For Station Station, I went through 2-3 refinement rounds on some specs. The first draft might have missed that I wanted public holidays automatically displayed on the calendar. I'd point that out, and the spec would be updated to include it. No starting over—just iterative improvement.

Human role here: Answer questions honestly about your use case, then review and refine the generated spec. The AI can't guess that you'll primarily use this on mobile, or that you care more about quick glances than detailed analytics. And if the first draft misses something, that's fine—just keep refining until it's right.

Phase 3: Write the Spec

Now the spec-writer agent takes your answers from the shaping phase and generates a detailed technical specification. Here's what the actual spec looked like for the frontend:

# Specification: Attendance Tracker Frontend UI

## Goal
Build a responsive static React web application to visualize work attendance data
from the Myki attendance tracker JSON output, enabling users to view attendance
statistics, explore monthly calendars with marked attended days, analyze trends
through bar charts, and filter data by date ranges across mobile and desktop devices.

## User Stories
- As a user, I want to see a monthly calendar with my attended days visually marked
  so that I can quickly identify when I was at the office
- As a user, I want to view monthly attendance percentages in a bar chart so that I
  can understand my attendance trends over time
- As a user, I want to filter data by date range so that I can focus on specific
  time periods like quarters or financial years

## Specific Requirements

**Calendar View Component**
- Display monthly grid calendar showing current month by default
- Provide previous/next month navigation buttons for browsing history
- Mark attended days with red visual indicators (red background circle or dot)
- Make attended days clickable to show detail modal or tooltip
- Display timestamp and target station name when attended day is clicked
- Use react-calendar library for calendar functionality
- Ensure keyboard navigation support for accessibility
- Mobile-optimized with touch-friendly date selection

**Monthly Bar Chart Visualization**
- Display one bar per month showing attendance percentage (0-100%)
- Use Recharts library for rendering responsive bar charts
- Color bars in red theme to match attended day indicators
- Include tooltips showing exact percentage, working days, and days attended on hover
- Ensure chart is fully responsive and readable on mobile screens

This continues for 9 specific requirement areas, totaling about 100 lines of detailed specifications. The spec-writer captured my shaping answers and translated them into implementable requirements.

Key insight: Notice the specificity. Not "build a chart" but "use Recharts library, red theme, responsive on mobile, tooltips on hover." That level of detail lets the AI implement without guessing.

Just like shaping, this is iterative too. The spec-writer generates a detailed spec based on the shaped requirements. You review it carefully. Maybe you notice it specified the wrong color theme, or it didn't include a requirement for error handling, or the accessibility requirements aren't strong enough. You provide feedback, and the spec gets updated. Review again. Refine again. Keep going until the spec accurately represents what you want built.

For the frontend spec, I noticed the first draft didn't specify what should happen when the JSON fetch fails. I asked for better error handling requirements—retry option, user-friendly messages, graceful degradation. The spec was updated to include those details. Same with accessibility—I pushed for stronger requirements around keyboard navigation and screen reader support.

Human role here: Review the spec thoroughly and keep refining until it's right. Did it capture your intent? Are there edge cases missing? Requirements that don't make sense? This is your last chance to catch misunderstandings before code gets written, so it's worth taking the time to get it right.

Phase 4: Write Tasks

With an approved spec, the task-writer agent breaks it into granular, actionable tasks. Here's how the frontend spec became 6 task groups with 40+ individual tasks:

# Task Breakdown: Attendance Tracker Frontend UI

## Task List

### Task Group 1: Initial Project Setup
**Dependencies:** None

- [x] 1.1 Create new Vite React project
  - Run: `npm create vite@latest attendance-tracker -- --template react`
  - Navigate into project directory
  - Install base dependencies: `npm install`

- [x] 1.2 Install and configure Tailwind CSS
  - Install: `npm install -D tailwindcss postcss autoprefixer`
  - Initialize: `npx tailwindcss init -p`
  - Configure tailwind.config.js with content paths and custom 'attended' color (#ef4444)
  - Add Tailwind directives to src/index.css

- [x] 1.3 Install required libraries
  - Chart library: `npm install recharts`
  - Calendar library: `npm install react-calendar`
  - Date picker: `npm install react-datepicker`

### Task Group 2: Data Fetching and Processing
**Dependencies:** Task Group 1

- [x] 2.1 Write 2-6 focused tests for data utilities
  - Test JSON fetch success scenario
  - Test error handling for network failures
  - Test date filtering calculation

- [x] 2.2 Create data fetching utility
  - File: src/utils/dataFetcher.js
  - Implement fetchAttendanceData() function
  - URL: https://raw.githubusercontent.com/koustubh25/station-station/main/output/attendance.json
  - Use cache: 'no-cache' for fresh data
  - Handle network errors with descriptive messages

Each task is concrete enough that I could hand it to any developer (or AI) and they'd know exactly what to build. Dependencies are explicit—you can't build the calendar component until data fetching works.

Human role here: Review the task breakdown. Is anything missing? Are tasks sequenced correctly? Do the dependencies make sense? Sometimes AI misses edge cases or creates circular dependencies.

Phase 5: Implement Tasks

This is where AI assistance actually writes code. But it's not fully autonomous—there are specific checkpoints where human review is critical.

For Station Station, the implementation flow looked like this:

AI implements task - The implementer agent writes code according to the task spec
AI runs tests - Verifies the implementation works (if tests exist)
Human reviews output - You check the code, test it manually, and approve or request changes
Move to next task - Repeat for each task in the breakdown

Here's where I learned an important aspect: agent-os CLI permissions. The AI can read files, write files, and run tests. But certain operations require your explicit approval:

Git commits - You review and commit changes yourself
Git pushes - You decide when to push to remote
Workflow triggers - You manually kick off CI/CD pipelines

This is by design. You maintain control over version history and deployments.

But here's what I discovered: After gaining confidence in the AI-generated code—once I'd reviewed a few implementations and saw they were solid—I started allowing the AI to do git push and use the gh CLI to view or trigger GitHub Actions workflows. This let the AI work more autonomously: push code, trigger the build, check if tests passed, and if they failed, fix the issues and try again.

The workflow became: AI implements → AI pushes → AI triggers workflow → AI monitors results → If failures, AI fixes and repeats. I'd check in periodically, but for well-defined tasks, the AI could iterate autonomously until everything passed.

This isn't the default (and probably shouldn't be for unfamiliar projects), but once you've established trust through review, you can grant more autonomy where it makes sense.

Note: Agent-os also has an orchestrate-tasks command that provides even more advanced multi-agent coordination and autonomous task execution. But that's beyond the scope of this blog—we'll cover it in detail in a future post. For Station Station, the standard task-by-task implementation workflow was sufficient.

Human role here: Code review, manual testing, and deployment decisions. AI can generate the boilerplate, but you verify it works in your specific context.

The Complete Agent-OS Workflow

Now that we've walked through all five phases, let's see how they fit together into a complete cycle.

The agent-os workflow follows a structured, iterative cycle. Notice the feedback loop where human review catches issues that require debugging before the feature is complete. This isn't full automation—it's a partnership where AI handles implementation and humans guide the architecture and review the results.

Human Review at Key Decision Points

This sequence diagram reveals the continuous human-AI collaboration throughout development. Review happens at multiple stages, not just at the end. Each phase includes a human checkpoint where you validate the AI's work before proceeding.

The key insight: this is continuous collaboration, not "AI does everything then human reviews at the end." You're involved throughout, making architectural decisions, reviewing outputs, and course-correcting when needed.

The Iterative Reality

Here's what the spec doesn't show: iteration. The workflow diagrams make it look linear, but reality is messier.

For Station Station, I went through multiple rounds:

Spec refinement: Realized mid-development I needed manual attendance dates (for days I drove to work instead of taking the train). Went back and updated the spec.
Task adjustments: Some tasks were too large and got broken into smaller chunks. Others were unnecessary and got removed.
Implementation bugs: AI couldn't fix the manualAttendanceDates field bug after several attempts. I had to review the code, identify the issue location, then let AI implement the fix.
New specs added later: The initial roadmap had 8 features, but I added more specs later for enhancements like security improvements and manual attendance features. You don't have to plan everything upfront—you can always create new specs for additional features as needs emerge.

The workflow provides structure, but you'll loop back. That's normal. The key difference from ad-hoc AI chat is that when you loop back, you update the spec or tasks—so the system stays consistent. Future features can reference the updated spec instead of inheriting outdated assumptions.

And when you think of new features? Just create a new spec and go through the same workflow. The product documentation evolves with your project.

What Makes This Different

The structured approach of agent-os SDD provides several key benefits:

Clear Direction Throughout

Every feature starts with documented requirements, not assumptions
The roadmap gives you a clear view of what's done and what's next
When you solve one problem (like Cloudflare bypass), the spec tells you exactly what comes next
No more "I got this working, but now what?"

Persistent Context

Because you have a record of all specs, tasks, and their completion status, the AI can pick up exactly where you left off—even weeks later
Come back after a break: "Task 7 is complete, Task 8 is next, here's what needs to be done"
No context loss, no re-explaining what you've already built
The documentation serves as persistent memory across sessions

Easier Debugging

When something breaks, you can reference the spec to understand intended behavior
Task breakdown makes it easy to isolate which component is failing
Specs document edge cases and requirements that are easy to forget during implementation

Iterative Refinement

Update specs as you learn—they evolve with your understanding
Add new specs for new features without disrupting existing work
Each iteration is documented, so you can see why decisions were made

The time investment is front-loaded. Spec creation took longer than just prompting Claude to "build a frontend." But I shipped all 8 planned features. The debugging was easier. The resumability was huge. And when I added new features a week later, the specs told me exactly where to hook them in.

That's the ROI of Spec-Driven Development—not faster initial code generation, but fewer surprises, clearer direction, and maintainable progress.

What's Next

We've seen the agent-os workflow in action: creating products, shaping specs, writing detailed specifications, breaking down tasks, and implementing with AI assistance. We have a structured process that transforms vague ideas into working code.

But this is the part where I need to be honest about limitations. The workflow isn't magic. AI still struggles with certain problems, and some features require significant human intervention. In Part 4, we'll dive into the real challenges: debugging stories where AI failed, the collaboration spectrum between AI and human, and when to know if Spec-Driven Development is overkill for your project.

If you're wondering whether this structured approach is always worth it—or where it breaks down—Part 4 has the answers.

DEV Community