Taufiqul Islam

Posted on Jun 15

This is What AI Development Actually Looks Like in 2026 - Stop Prompting. Start Directing.

#claudecode #ai #productivity #webdev

I want to be honest about something upfront.
Most "I built [anything] with AI" posts are actually "I prompted AI and fixed whatever came out." That is not a workflow. That is improvisation with extra steps.

This post is about a structured approach - one that works whether you are building a frontend portfolio, a full-stack SaaS, a REST API, or a microservice. The phases are the same. The discipline is the same. The results are the same.

Where We Are Right Now

As of 2026, Claude Code has crossed 500,000 active developer users worldwide. GitHub Copilot has over 1.8 million paid subscribers. A Stack Overflow survey found that 76% of developers are now using or planning to use AI tools in their development process.

But here is the uncomfortable number: 42% of developers said AI-generated code introduced bugs they had to spend significant time debugging. Another 38% said the code did not match their project's conventions.

Fast but wrong. That is the current state of AI-assisted development for most teams.

The problem is not the tools. The problem is the workflow - or the lack of one.

The Real Problem - AI Forgets Everything

Open a new chat. Paste an error. Get a fix. Copy it back.

Open another chat tomorrow. Explain the whole project again. Paste another error. Get another fix.

Every session starts from zero. The AI has no memory of your folder structure, your naming conventions, your design tokens, your API patterns. You spend half your time giving context instead of building.

This is the chatbox trap. It feels like using AI. It is actually just a slower Stack Overflow.

Before You Write a Single Command - The Design Foundation

This step happens before opening any terminal. It is what makes everything else work.

Take screenshots of your design. Figma, Adobe XD, Canva, PDF mockup - it does not matter. Export screenshots per feature, per page, per state. Close-up screenshots of specific components. Wide screenshots for layout. Light mode and dark mode both.

Organize them like this:

docs/
└── designs/
    ├── system-design.md     ← all design tokens
    └── ss/
        ├── hero-light.png
        ├── hero-dark.png
        └── dashboard-light.png

Write your system-design.md. Every color token with its hex value. Every font family, font size, font weight. Every spacing value. Every border radius. Light mode and dark mode separately.

## Colors
--color-bg-page:      #f7f7f2   (light) / #0d0d0d (dark)
--color-text-primary: #111111   (light) / #f0f0f0 (dark)
--color-accent:       #2d6a4f   (light) / #84cc16 (dark)

## Typography
--font-serif: Playfair Display
--font-sans:  Geist Sans
--font-mono:  Geist Mono

## Spacing
--spacing-section: 80px
--spacing-content: 40px

This file is what the AI reads to understand your visual language. Without it the AI guesses. With it every component uses your exact tokens.

Write CLAUDE.md. The project bible. Tech stack, folder structure, naming conventions, every rule the AI should follow. You write this once. The AI reads it every session and never forgets.

Write CONTEXT.md. The domain language. What are things actually called in your project? What is the difference between a User and a Consultant? What does assign mean versus link? When terminology is written down the AI uses it consistently across every file it touches.

With these four things in place - screenshots, system-design.md, CLAUDE.md, CONTEXT.md - the AI has permanent memory of your project. No more explaining from scratch every session.

The Four-Phase Loop

Four hard stops. Nothing moves forward until the current phase is complete. The stops are not bureaucracy - they are the entire point.

Phase 1 - Grilling

Triggered by: /grill-with-docs build <feature>

The AI reads all your documentation - system-design.md, CLAUDE.md, CONTEXT.md, the relevant screenshots - then interrogates you about the feature. One question at a time. It waits for your answer before asking the next.

For a frontend feature:
What font size is the heading? What happens on hover? What does empty state look like? Which screenshot does this match?

For a backend feature:
What does the request body look like? What HTTP status does a validation error return? Is this endpoint authenticated? What happens when the foreign key does not exist?

For a full-stack feature:
What is the API contract between frontend and backend? Who owns error handling? What does the frontend show when the request is pending? When it fails?

Every question answered here is a bug that never gets written in Phase 3. The phase ends when a task spec file is written at docs/tasks/<feature>.md. Then a hard stop. No code, nothing in src/.

"Every question answered in Phase 1 is a bug that never gets written in Phase 3."

Phase 2 - PRD

Triggered by: /prd or /to-prd

The AI synthesizes everything from the grilling session into a Product Requirements Document. Problem statement, user stories, implementation decisions, API contracts, data shapes, what is explicitly out of scope.

You review it. If something is wrong you flag it. If it aligns you say so. Then another hard stop.

With GitHub MCP connected, the PRD is automatically submitted as a GitHub issue. Every feature becomes a tracked issue before a single line of code is written.

# Connect GitHub MCP once
claude mcp add --transport http github https://api.githubcopilot.com/mcp/ \
  --header "Authorization: Bearer YOUR_GITHUB_TOKEN"

Now /prd creates the spec and opens the issue. When Phase 3 finishes, the AI automatically creates a PR linked to that issue. Your entire feature lifecycle - spec, implementation, review, merge - tracked in GitHub without leaving the terminal.

Phase 3 - TDD

Triggered by: /tdd build <feature>

All the code is written here. The cycle is red → green → refactor. Write a failing test, write the minimum code to pass it, clean up, move to the next behavior.

For a frontend component:
Types first, then data, then component, then wired into the page. After build passes, Playwright MCP opens the browser automatically, takes a screenshot, compares it against your design reference, lists every discrepancy, fixes them, re-screenshots. Repeats until it matches both light and dark mode.

For a backend endpoint:
Types first, then service logic, then controller, then route registration. Tests cover happy path, validation errors, authentication failures, and edge cases from the PRD.

For an API integration:
Service layer built and tested first with mocked responses. Real API connected. Error states, loading states, empty states all verified against the PRD.

Nothing is committed yet. Phase 4 happens first.

Phase 4 - Review

Triggered by: /review

After /tdd finishes the AI audits everything it just built against your CLAUDE.md rules.

Hardcoded values that should be tokens? Flagged.
Missing TypeScript types? Flagged.
Components that duplicate existing primitives? Flagged.
API calls missing error handling? Flagged.
Accessibility issues? Flagged.

Then it fixes everything it flagged. The diff you approve is clean code - not just working code.

After /review passes, the AI proposes a commit message, shows the full diff, and waits. You review, approve, and it commits - or creates a PR linked to the GitHub issue automatically.

The Full Setup

Install once globally:

# Claude Code CLI
npm install -g @anthropic-ai/claude-code

# Matt Pocock's skills - /grill-with-docs, /tdd, /prd, /review
npx openskills install mattpocock/skills --global

# Playwright MCP - real browser for visual verification
claude mcp add --transport stdio playwright -- npx -y @playwright/mcp@latest

# GitHub MCP - optional, for automatic issues and PRs
claude mcp add --transport http github https://api.githubcopilot.com/mcp/ \
  --header "Authorization: Bearer YOUR_TOKEN"

Per project - four files:

docs/designs/system-design.md   ← design tokens
docs/designs/ss/*.png            ← feature screenshots
CLAUDE.md                        ← project rules
CONTEXT.md                       ← domain language

Per feature - four commands:

/grill-with-docs build <feature>   # questions + task spec → STOP
/prd                                # PRD + GitHub issue   → STOP
/tdd build <feature>                # build + Playwright   → STOP
/review                             # quality gate → commit approval

What This Looks Like End to End

# Building a user management feature

/grill-with-docs build user management
# AI reads CLAUDE.md, CONTEXT.md, system-design.md
# Asks: what fields does a user have? what roles exist?
# what does the list page look like? (reads screenshot)
# what happens when you delete a user with active sessions?
# ... 10 questions total, you answer each one
# writes docs/tasks/user-management.md
# STOP

/prd
# synthesizes task spec into full PRD
# creates GitHub issue #47 automatically
# STOP

/tdd build user management
# writes failing tests
# builds UserService, UserController, routes
# builds UserTable, UserForm components
# Playwright screenshots list page, compares against design
# fixes spacing mismatch on row height
# re-screenshots, matches - done
# STOP

/review
# flags: UserForm missing loading state from PRD
# flags: one hardcoded hex color in UserTable
# fixes both
# proposes commit: "feat: add user management (#47)"
# you approve
# PR created, linked to issue #47

One feature. Four commands. Complete audit trail. Zero surprises.

Why The Hard Stops Matter

Most AI-assisted workflows fail because the AI keeps moving. It generates code while you are still figuring out what you want. By the time you realize the direction is wrong there is significant code to undo.

Forcing a stop after grilling means the spec is locked before implementation. Forcing a stop after the PRD means the contract is signed before code runs. Forcing a stop after TDD means the code is reviewed before it is committed.

By Phase 4 there are no open questions - just a quality check on clean execution.

The Playwright visual check closes the quality loop for frontend. The test suite closes it for backend. The /review phase closes it for code quality. Without automated verification "looks good" and "seems to work" are the standards. With it the check is objective and repeatable.

The Numbers

Gartner predicts that by 2028, 75% of enterprise software engineers will use AI coding assistants daily, up from roughly 25% in 2025. Claude Code usage has grown 340% year over year according to Anthropic's 2026 developer report. The percentage of professional developers using AI in their daily workflow crossed 60% in Q1 2026 for the first time.

But the shift that is coming is not about more developers using AI. It is about developers using AI differently - moving from ad-hoc prompting to structured workflows, from chatbox conversations to disciplined collaboration.

The four-phase loop is one version of that structure. The specific commands will evolve. Better tools will emerge. But the underlying discipline - front-load decisions, lock the spec, verify automatically, review before committing - that will remain.

Because that discipline is not about AI. It is about how good software gets built. AI just makes the execution faster.

The Honest Part

This workflow only works if you can answer the grilling questions. Phase 1 exposes every gap in your thinking. If you do not know what you are building the AI cannot figure it out for you.

AI removes friction between thinking and shipping. It does not replace the thinking.

We spent years learning to write code. The next skill is learning to direct it.

Start Here

Take screenshots of your current project's designs
Write a system-design.md extracting the tokens
Write a CLAUDE.md - ten minutes, basics only
Run /grill-with-docs on your next feature
Answer every question honestly

The first session will feel slower than prompting and copying. The second will feel natural. By the third you will not want to go back.

Software engineer, 4+ years building web applications.

GitHub · LinkedIn

DEV Community