Smarter Models Made My Workflow Stricter, Not Looser

Hagag — Mon, 01 Jun 2026 10:50:34 +0000

ahmedalaahagag / agentic-os

A lightweight operational template for AI-assisted product engineering with lanes, specs, tickets, handovers, verification, and project memory.

Agentic OS

A lightweight operational template for AI-assisted product engineering.

Agentic OS is the working repo structure for applying the Product Engineer Handbook workflow.

It is mainly designed for multi-repo products where planning, product decisions, execution, verification, and project memory need to stay coordinated without turning one AI chat into the source of truth.

Use lanes, plans, specs, tickets, handovers, verification, archives, and project memory to move from idea to shipped product without relying on one endless AI chat.

For the reasoning behind this workflow, read the Product Engineer Handbook.

Quick Start

Copy this repository as your product meta repo.
Keep your implementation repositories beside it.
Pick the right lane for the work.
Write a plan or specification.
Decompose the work into small tickets.
Create a focused handover for one ticket.
Open the target implementation repository.
Start the execution model from that repo with the handover.
Verify the result…

View on GitHub

Smarter Models, Stricter Workflows

What building real multi-repo software with Claude, Codex, Cursor, DeepSeek, and other AI coding tools taught me about agentic coding, token discipline, and why agents need an operating system.

The assumption I had wrong

When I first started using AI agents heavily for real product engineering work, I assumed the path was obvious: give the model more context, give it more autonomy, and let it handle bigger chunks of work.

That assumption was wrong.

The more capable the models became, the more dangerous loose workflow became. Long sessions drifted. Big prompts expanded scope. Multi-agent experiments created weak handoffs. Expensive models burned tokens doing work that cheaper models could have done if the task had been shaped properly.

The problem was not that AI coding agents were useless.

The problem was worse: they were useful enough to create expensive chaos.

That is why I built the Product Engineer Handbook and Agentic OS.

The real failure mode

Most discussions about AI coding focus on model intelligence: which model is best, which benchmark is higher, which tool edits code faster.

That matters, but it is not the main failure mode I hit.

The main failure mode was operational.

Problem	What happened
Long-running sessions	Context became muddy and old decisions stayed alive too long.
Broad prompts	The model invented scope instead of executing the intended task.
Multi-agent experiments	Agents produced output, but handoffs were weak and ownership was unclear.
Repo-wide exploration	Tokens were spent reading more than was needed.
Autonomous coding	The result still needed human cleanup, review, and verification.
Tool switching	Workflow optimization became a distraction from shipping.
Premium-model execution	Expensive reasoning models did mechanical work.

AI did not remove engineering management.

It moved engineering management closer to the code.

The engineer became the product manager, architect, reviewer, context manager, cost controller, QA gate, and release owner. The model could execute, but only if the surrounding system made the work bounded, inspectable, and verifiable.

Why I needed a playbook

Prompts were not enough.

A prompt is an instruction. A playbook is an operating discipline.

The Product Engineer Handbook exists because AI-assisted product work needs repeatable answers to basic questions:

Question	Playbook answer
What should the agent do?	Use small, scoped, reviewable tickets.
What should the agent not touch?	Define boundaries before execution.
What context does the agent need?	Provide only the relevant plan, spec, ticket, and handover.
How is work verified?	Use deterministic checks before judgment calls.
Who owns the decision?	The human engineer. Always.
When should the session end?	After one bounded task is completed, reviewed, and archived.

The handbook is the reasoning layer. It explains the rules, tradeoffs, and delivery model behind the workflow.

Agentic OS is the operational layer. It turns that reasoning into a copyable repository structure.

Why I needed an Agentic OS

Agentic OS is not an automation framework.

It is a control plane for AI-assisted product engineering.

Its purpose is not to make agents independent. Its purpose is to make them constrained, inspectable, and replaceable.

The structure is intentionally boring:

Layer	Purpose
Lanes	Separate product, coding, UI/UX, marketing, and release work.
Plans	Capture the direction before execution starts.
Specs	Turn direction into concrete behavior.
Tickets	Break work into small units.
Handovers	Pass focused context to the execution model.
Verification	Prove what changed and how it was checked.
Lessons	Convert repeated mistakes into standing rules.
Archives	Keep active context small and durable.

That structure matters because AI coding does not fail only when the model is weak.

It also fails when the model is strong but unconstrained.

A strong model with vague instructions can produce a lot of plausible work very quickly. That is useful only when the work is scoped correctly. Otherwise, it creates review debt.

The meta-repo pattern

For multi-repo products, implementation often spans backend services, mobile apps, admin tools, web surfaces, infrastructure, and planning artifacts.

A single AI chat is the wrong source of truth for that kind of product.

So Agentic OS uses a meta-repo pattern:

workspace/
  agentic-os/       # planning, specs, tickets, handovers, memory
  product-server/   # backend implementation
  product-mobile/   # mobile implementation
  product-web/      # web app or landing page

Planning happens in the meta repo.

Execution happens in the target implementation repo.

The handover connects the two.

That separation is important. If the execution model starts in the meta repo and scans everything, the workflow has already failed. The model should receive a focused handover, open the target implementation repo, inspect the named files, make the change, run verification, and stop.

Model routing matters more than model loyalty

The goal is not to find one best model.

The goal is to avoid using the best model for the wrong job.

A practical routing model looks like this:

Work type	Model/tool role
Product direction	Strong reasoning model with human judgment.
Architecture	Premium model, narrow scope.
Ticket writing	Strong model, then human review.
Defined execution	Coding-focused or cheaper model.
Mechanical edits	Cheapest reliable option.
Code review	Strong model with explicit diff context.
Repo-wide audit	Rare, tightly scoped dynamic workflow.
Final decision	Human engineer.

Expensive models should think.

Cheaper models can grind when the task is already shaped.

This is the opposite of how many people start. They open the strongest model, give it the whole repo, ask it to "fix the thing," and then wonder why the bill grows and the output needs cleanup.

Dynamic workflows make discipline more important

Dynamic workflows and subagents are impressive.

They are also dangerous when used casually.

Parallelism does not fix unclear intent. It multiplies it.

A hundred subagents with weak instructions do not produce leverage. They produce distributed ambiguity.

Dynamic workflows are useful for work like:

Good fit	Bad fit
Repo-wide audits	Vague feature work
Migration analysis	Open-ended product decisions
Consistency checks	Routine small edits
Multi-file impact mapping	"Go improve this" prompts
Cross-checking assumptions	Unbounded exploration

The stronger the agent, the more important the operating model becomes.

Better models do not remove the need for scope, context limits, handoffs, verification, and review.

They make the absence of those things more expensive.

What changed after adding the system

Agentic OS did not make AI coding effortless.

It made the effort legible.

The useful changes were practical:

Work started from a lane instead of a vague prompt.
Plans and specs became durable artifacts instead of chat history.
Tickets became smaller and easier to review.
Handovers reduced context bloat.
Execution sessions became disposable.
Verification became part of the workflow, not an afterthought.
Lessons turned into rules.
Expensive models were reserved for reasoning-heavy work.
Cheap execution became safer because tasks were better shaped.

The result was not more autonomy.

The result was more control.

The main lessons

These are the rules I would keep even as models improve:

Autonomy without constraints is expensive.
Context is a budget, not a dumping ground.
Small tickets beat giant prompts.
Handoffs matter more than model intelligence.
Expensive models should reason, not grind.
Cheap models are useful when the work is already shaped.
Deterministic verification should happen before subjective review.
The human remains the scheduler, judge, and owner.
Agentic workflows need boring engineering discipline.
The final responsibility always belongs to the engineer.

The point

I did not build Agentic OS because agents are magic.

I built it because agents are powerful enough to need management.

The future of AI coding is not one giant agent doing everything. It is a disciplined system where humans define intent, agents execute bounded work, and every step is cheap enough, clear enough, and reviewable enough to trust.

Smarter models did not make my workflow looser.

They made it stricter.

What’s your experience?

Do you find that AI agents have simplified your workflow, or have you also had to implement stricter controls to avoid "context drift"? I’m curious to hear how others are managing the overhead of agentic coding. Let's discuss in the comments!

DEV Community: Hagag