Esteban S. Abait

Posted on Oct 22

Designing Agentic Workflows: Lessons from Orchestration, Context, and UX

#ai #agents #ux #architecture

Many challenges in AI products stem less from choosing frameworks and more from how user experience (UX) and architecture shape each other.

I first noticed this while using ChatGPT to draft and maintain product requirement documents (PRDs) — reusing prompt variants, manually curating context, and constantly tweaking outputs to stay aligned. The workflow technically worked, but it felt brittle and overly manual.

That experience raised a question: What might it take for an agentic workflow — a coordinated system of specialized LLM sub-agents orchestrated by code rather than a single prompt — to produce and maintain a complex artifact like a PRD without so much manual prompting, context oversight, and guesswork? More broadly, how could changes in architecture and UX design improve usability, predictability, and trust in such systems?

This write-up shares early technical and UX explorations behind building an agentic workflow for structured artifacts, using PRDs as the initial testbed. The goal wasn’t to ship a product but to experiment — testing hypotheses about orchestration, context design, and agent-centric UX. Although exploratory, the lessons may apply to multi-step, artifact-producing AI workflows in general.

Terminology and scope: This is an agentic workflow—LLMs and tools orchestrated through predefined code paths—rather than an autonomous agent that directs its own tools. The hypotheses here are exploratory and intended to inform agents and AI‑powered products more broadly; this project is a working lab, not a shipped product.

Audience and Stack
Iteration 0 – Initial Architecture
Iteration 1 – Context Awareness
Iteration 2 – UX-Driven Agentic Workflow
**Creation vs. Editing
Conclusion
Working Hypotheses
References

Audience and Stack

Audience: AI engineers, architects, product designers, and UX practitioners working on multi-step or long-running agentic workflows and AI-powered products.
Stack: TypeScript monorepo, Vercel AI SDK, Zod schemas, Next.js frontend, OpenRouter integration.
Repository: github.com/eabait/product-agents
Influences: Anthropic on orchestration and context engineering,^{1, 2} Breunig on context failures and curation,^{3, 4} Luke Wroblewski on UX patterns in AI,^{5, 6} Jakob Nielsen on wait-time transparency and “Slow AI.”⁷

Iteration 0 – Initial Architecture

The first version established the core shape of the workflow, introducing three key roles within the agentic system:

Analyzer - a subagent that extracts or classifies structured information.
Writer - a subagent responsible for generating a specific section of the PRD.
Orchestrator - the controller that coordinates analyzers and writers across the workflow. This is implemented directly in the code without using an LLM (as opposed to the definition given by Anthropic^1).

Together, these components formed the foundation of the agentic workflow:

Clarification subagent acts as a gate before analysis — Ensures there’s enough grounding to write anything; the system asks 0–3 targeted questions, then proceeds or returns with gaps. Example: if “target users” is missing, it prompts for personas and primary jobs-to-be-done before any writer runs.
Centralized analyzers: Context, Risks, Summary — Consolidates one-pass extraction into a structured bundle reused by all writers, avoiding repeated reasoning and drift. Example: the risk list produced once is consumed by multiple sections needing constraints/assumptions.
Multiple section writers (e.g., context, problem, assumptions, metrics) — Decouples generation so sections can evolve independently and merge deterministically. Example: in a PRD or similar structured artifact, only the Metrics writer reruns when you request “tighten success metrics.”
Dual paths: full PRD generation and targeted edits — Selects creation vs. edition based on inputs and presence of an existing PRD to improve efficiency and stability. Example: if a prior PRD is supplied and the request says “update constraints,” only that slice is scheduled.
Shared-analysis caching to avoid redundant analyzer runs — Keys analyzer outputs by inputs so subsequent edits reuse results without recomputing. Example: copy tweaks reuse the same context summary instead of re-extracting from scratch.

Early issues still surfaced:

Analyzer overhead and coupling increased latency
Early UI offered limited visibility into the workflow’s steps
Edition path existed but lacked confidence/telemetry to guide edits

These friction points informed the next iteration, which focused on improving context handling and reducing latency.

Iteration 1 – Context Awareness

Applying ideas from Anthropic and Breunig, the workflow evolved toward planned cognition and curated context:

Five-section PRD redesign (Target Users, Solution, Features, Metrics, Constraints) — Aligns the artifact with audience-facing sections and reduces ambiguity about ownership. Example: “add OKRs” maps cleanly to the Metrics section.
Parallel section writers atop a shared ContextAnalyzer result — Lowers latency and coupling by letting independent writers run concurrently on the same structured inputs. Example: Solution and Features complete in parallel.
SectionDetectionAnalyzer subagent for edit routing and autodetecting existing PRDs — Interprets requests and selects affected sections, and if a PRD is present, defaults to edition. Example: “tighten constraints about latency” routes only to Constraints.
Confidence metadata per section to aid UX transparency — Each output carries a confidence hint so the UI can flag fragile changes. Example: low confidence on Personas nudges the user to review.
Modularized pipeline helpers (clarification check, shared analysis, parallel sections, assembly) — Improves maintainability and testability; responsibilities are isolated so new writers or analyzers slot in without side effects.

Once the workflow became context-aware, the next challenge was making its cognition visible to users — bringing UX principles directly into orchestration.

Iteration 2 – UX-Driven Agentic Workflow

With context reliability established, the project entered a new phase: aligning architectural choices with the UX principles that would make those inner workings transparent.

At this point, the work shifted toward system legibility. Wroblewski highlights recurring UX gaps in AI around context awareness, capability awareness, and readability,⁵ and Nielsen emphasizes transparency around wait time for “Slow AI.”⁷

These insights suggested that UX requirements should shape orchestration decisions, not just react to them.

What changed and why:

Streaming progress events — Reduces “Slow AI” ambiguity by emitting real-time updates (see Principle #1: Visible Capabilities + Streaming).
Configurable workers (per-writer runtime settings) — Allows specialization (e.g., different models/temperatures) while enforcing streaming capability for observability. Example: a concise, extractive analyzer vs. a creative section writer.
Usage and cost accounting — Surfaces telemetry and burn-rate transparency (see Principle #6: Cost Visibility).
Edition parity and fallbacks — Heuristics prevent silent no-op edits, and top-level fields stay in sync with partial updates to avoid stale PRDs. Example: editing Constraints also updates the summary if needed.

Key UX Principles and Screens

1. Visible Capabilities + Streaming (addressing “Slow AI”)
Starter affordances clarify what the agentic workflow can do. Streaming exposes long-running steps to reduce ambiguity.

2. Context Awareness and Control
Users can inspect, pin, or exclude context items before generation.

3. Structured Output Instead of “Walls of Text”
Structured components allow partial edits and reduce cognitive load.

4. Inspectability and Control (Configuration Drawer)
Exposes subagents toggles—temperature, model choice, context filters—without forcing a detour into config files

5. Localized Updates (Section-level Editing)
When someone says, “Change user personas to be from LATAM,” the system routes only the Personas writer, preserving other sections.

6. Cost Visibility
Surfaces estimated token usage and dollar cost per run. Engineers care about the burn.

Each UI principle emerges directly from the system’s underlying architecture — together, they form a feedback loop between technical design and user experience.

Architectural Foundations Behind the UX

The UI work only “clicked” once the agent runtime supported it. Every visible affordance required a corresponding architectural move. Implementation details are available in the open-source repository.

UX Principle	Architecture Foundation	How It Works
Localized edits	Section-level writers	Enables partial regeneration of sections (demonstrated in Principle #5: Localized Updates).
Explainability	Orchestrator hooks for intermediate artifacts	The orchestrator emits progress events and returns analyzer payloads before final assembly. These feed the status stream that the UI renders as visible steps, making the system’s reasoning legible.
Streaming transparency	Event-based updates	Progress callbacks stream over Server-Sent Events, letting the interface update the timeline and status indicators as each subagent completes — no more opaque spinners while the model “thinks.”
Inspectable context	Shared analysis bundle + context registry	Powers the context inspector UI (see Principle #2: Context Awareness and Control).
Repeatability	Audit logs and metadata	Each run captures usage metrics, cost estimates, and section-level metadata. The audit trail can be replayed so users can trace what changed, which model handled it, and how many tokens it consumed.
Configurable workers	Per-worker runtime settings	Each analyzer and writer can run with its own model configuration, temperature, or parameters, as long as it supports streaming for progress visibility.
Edition parity & fallbacks	Heuristic coverage + field sync	Heuristics prevent silent no-op edits; top-level PRD fields stay consistent with edited sections, ensuring partial updates don’t produce stale data.

Together, these shifts align the system’s internals with what the UI promises — when the interface says “only this section will change” or “here’s the context you’re about to send,” the architecture makes that statement true.

Design Principles for Agentic Workflows

These exploratory principles emerged while iterating on an agentic workflow — intended to be useful for agents and AI-powered products more broadly:

Expose System Cognition — When the system runs/thinks, show its phases (streaming, intermediate artifacts).

Let Users Curate Context — Treat context as a user-visible surface.

Structure the Artifact — Use sections and diffs, not monolithic text.

Localize Change — Architect so edits update only what changed.

Make Capabilities Legible — Provide affordances and visible configuration.

Reduce Waiting Ambiguity — If the system must be slow, it should not be silent.

Creation vs. Editing

There’s no toggle between “create” and “edit” in the UI. Instead, the orchestrator inspects the request—and the presence (or absence) of an existing PRD—to decide whether it should synthesize an entire document or focus on specific sections. That inference is handled by the same subagents we’ve already seen: the clarification analyzer checks if the agent has enough information to write anything, and the section-detection analyzer decides which slices of the artifact need attention.

Confidence signals from section detection are surfaced to help users decide when to intervene.

Detected Workflow	System Behavior	UX Goal	Typical UX Affordances
Full PRD generation	Multi-step synthesis across every section	Transparency	Clarification loop (up to three passes), context preview, streaming timeline, cost meter
Targeted update	Regenerate only the sections flagged by the analyzer	Precision	Section highlights, diff view, rollback controls, warnings when edits ripple into adjacent sections

How the Orchestrator Makes the Call

Clarification acts as a gatekeeper: When no prior PRD exists, the orchestrator will loop with the clarifier (up to three times) to gather personas, goals, and constraints before any section writers run. If the user supplies an existing PRD, the clarifier usually stands down because the grounding context is already available.
Section detection scopes the work: The section-detection-analyzer infers intent (“update the LATAM personas”) and hands the orchestrator a targeted section list. Only those section writers get invoked unless the analyzer indicates the request touches multiple areas.
Shared analysis keeps context in sync: Both scenarios reuse cached analyzer outputs whenever possible. A targeted update will draw from the existing analysis bundle and current PRD text instead of regenerating everything from scratch.
Audit logs reflect the path taken: When the orchestrator opts for full generation, the audit trail captures every section output and the clarifier’s reasoning. For targeted updates it records before/after diffs, confidence scores, and the sections that actually changed—mirroring what the UI presents.
Edition parity and fallbacks: Heuristics prevent silent no-op edits and keep top-level PRD fields consistent during partial updates.

So while users don’t flip between modes, the system has a working theory about which workflow they expect. Making that inference explicit—and surfacing it through the UX affordances—has reduced surprises when moving between drafting and maintenance tasks.

Working Hypotheses

Context is a user-facing product surface. Expose it.
Streaming is not cosmetic. It is trust-preserving UX for “thinking systems.”
Agent-driven interactive UI and structured outputs outperform walls of text.
Creation and editing require different mental models.
UX and agent orchestration must co-evolve. One cannot be downstream of the other.

Conclusion

This exploration began with a practical frustration described in the introduction: using general-purpose agents like ChatGPT to create and maintain complex documents (like PRDs) required repeating prompts, managing context by hand, and working through long, opaque generation cycles. The core friction wasn’t just in the model, but in the UX around the workflow — hidden state, unclear progress, and outputs that were difficult to iterate on.

Building a domain-specific PRD agent became a way to investigate whether orchestration patterns, context design, and UX choices could reduce that friction. The current version now includes structured outputs, context controls, streaming transparency, and targeted editing — enough functionality that, for this specific use case, it feels like a more effective alternative to a general-purpose chat interface.

The project is still evolving, but the journey so far suggests that UX and architecture designed together—from the first iteration—can meaningfully improve how people collaborate with AI on complex, evolving artifacts.

The next steps will focus on validating these ideas with real users, refining orchestration stability, and exploring new mechanisms for consistency and context evolution. While this implementation centers on PRDs, the underlying principles—legibility, localized change, and user-visible cognition—apply broadly to agentic systems and AI-powered products that coordinate multi-step work.

If you find this useful or want to explore the code, star or contribute to the open-source project onhttps://github.com/eabait/product-agents.

References

Anthropic, “Building Effective Agents.”
Anthropic, “Effective Context Engineering for AI Agents.”
Dan Breunig, “How Contexts Fail (and How to Fix Them).”
Dan Breunig, “How to Fix Your Context.”
Luke Wroblewski, “Common AI Product Issues.”
Luke Wroblewski, “Context Management UI in AI Products.”
Jakob Nielsen, “Slow AI.”

DEV Community