<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vinh Nguyen</title>
    <description>The latest articles on DEV Community by Vinh Nguyen (@vinh_nguyen_1708).</description>
    <link>https://dev.to/vinh_nguyen_1708</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3849046%2F3ac83255-06cc-478e-8f98-7a3f0a2c25eb.jpg</url>
      <title>DEV Community: Vinh Nguyen</title>
      <link>https://dev.to/vinh_nguyen_1708</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vinh_nguyen_1708"/>
    <language>en</language>
    <item>
      <title>Beyond CLAUDE.md and AGENTS.md: when your coding agent needs a behavior spec</title>
      <dc:creator>Vinh Nguyen</dc:creator>
      <pubDate>Wed, 15 Apr 2026 16:31:47 +0000</pubDate>
      <link>https://dev.to/stewiehq/beyond-claudemd-and-agentsmd-when-your-coding-agent-needs-a-behavior-spec-1bpp</link>
      <guid>https://dev.to/stewiehq/beyond-claudemd-and-agentsmd-when-your-coding-agent-needs-a-behavior-spec-1bpp</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; CLAUDE.md and AGENTS.md are excellent at steering how agents write code. They were never designed to capture what the product promises to do. When agents refactor, extend, or integrate code, they need a behavior spec — not more workflow instructions. That's the layer above instruction files.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Your agent refactors the billing module and changes the grace period from 14 days to 30. The code is cleaner. The tests pass. The product promise is broken.&lt;/p&gt;

&lt;p&gt;Your agent simplifies the auth flow and removes an edge case check. It looked redundant in the code — but it was handling a deliberate compliance requirement that existed nowhere except in your head.&lt;/p&gt;

&lt;p&gt;You had CLAUDE.md or AGENTS.md in the repo. You had coding conventions written down. The agent followed those perfectly — and still broke the product.&lt;/p&gt;

&lt;p&gt;This isn't a model capability problem. The models are better than they've ever been. It's a &lt;strong&gt;context architecture problem&lt;/strong&gt; — instruction files tell agents how to write code, but not what the product promises to do.&lt;/p&gt;

&lt;h2&gt;
  
  
  What CLAUDE.md and AGENTS.md actually are
&lt;/h2&gt;

&lt;p&gt;These files work. They solve a real problem. But it's worth being precise about &lt;em&gt;which&lt;/em&gt; problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLAUDE.md&lt;/strong&gt; tells Claude Code how to operate in your repo. Use pnpm, not npm. Run tests before committing. Don't modify generated files. Keep responses concise. It's a workflow configuration — process knowledge that changes when your conventions change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AGENTS.md&lt;/strong&gt; does the same for Codex and other agents: coding conventions, build commands, architecture patterns, file organization rules. OpenAI's &lt;a href="https://github.com/openai/codex" rel="noopener noreferrer"&gt;AGENTS.md spec&lt;/a&gt; has been adopted across tens of thousands of repos because it solves the "how to work here" problem well.&lt;/p&gt;

&lt;p&gt;Both are instruction files. They answer: &lt;strong&gt;"How should an agent behave in this repo?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's a useful question. But it's not the question that causes production incidents.&lt;/p&gt;

&lt;h2&gt;
  
  
  The question they don't answer
&lt;/h2&gt;

&lt;p&gt;The question that causes production incidents is: &lt;strong&gt;"What does this product promise to do?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When an agent refactors your billing module, it doesn't need to know whether to use pnpm or npm. It needs to know that the 14-day grace period is a confirmed product decision — not a magic number to clean up. It needs to know that the empty &lt;code&gt;tax_id&lt;/code&gt; field is intentionally blank for compliance reasons — not a bug to fix. It needs to know that the auth flow is deliberately minimal because the permission model hasn't been decided yet — not because nobody got around to adding OAuth.&lt;/p&gt;

&lt;p&gt;CLAUDE.md doesn't have this information. Neither does AGENTS.md. Not because they're badly designed — because they were designed for a different purpose.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where instruction files hit their ceiling
&lt;/h2&gt;

&lt;p&gt;The ceiling isn't one thing. It's a pattern that shows up in three ways:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. No semantic structure.&lt;/strong&gt; CLAUDE.md and AGENTS.md are freeform prose. A human reads them and infers what matters. An agent reads them and treats every line as equally weighted. "Use TypeScript" and "never change the grace period without product owner approval" have the same format — a bullet point. One is a preference. The other is load-bearing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. No trust signal.&lt;/strong&gt; Everything in an instruction file has the same status: written down. There's no way to distinguish a confirmed product decision from a provisional assumption from an active experiment. An agent treats them all as current truth — and they aren't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. No verification path.&lt;/strong&gt; After an agent runs, there's no way to check whether it honored the product constraints. You can lint code style. You can run tests. But "did the agent preserve the billing contract?" requires a human to review the diff line by line and remember every product decision in their head.&lt;/p&gt;

&lt;p&gt;These aren't flaws in CLAUDE.md or AGENTS.md. They're the natural limits of instruction files trying to carry behavior specs they were never built for.&lt;/p&gt;

&lt;h2&gt;
  
  
  What sits above instruction files
&lt;/h2&gt;

&lt;p&gt;The layer above instruction files is a &lt;strong&gt;behavior spec&lt;/strong&gt; — an artifact that captures what the product promises to do, in a format that's both human-reviewable and machine-readable.&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://github.com/stewie-sh/pbc-spec" rel="noopener noreferrer"&gt;behavior spec&lt;/a&gt; (&lt;code&gt;.pbc.md&lt;/code&gt; — formally a Product Behavior Contract) sits in your repo alongside CLAUDE.md and AGENTS.md, but it answers different questions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;CLAUDE.md / AGENTS.md&lt;/th&gt;
&lt;th&gt;.pbc.md&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Answers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"How should the agent work here?"&lt;/td&gt;
&lt;td&gt;"What does the product promise?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Contains&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Conventions, commands, patterns&lt;/td&gt;
&lt;td&gt;Behaviors, rules, states, edge cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Changes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;When your workflow changes&lt;/td&gt;
&lt;td&gt;When a product decision changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Audience&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Agents + new contributors&lt;/td&gt;
&lt;td&gt;Everyone — product owner, eng, QA, agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Structure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Freeform prose&lt;/td&gt;
&lt;td&gt;Markdown with typed semantic blocks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here's what the same knowledge looks like in each format:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In CLAUDE.md:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Billing rules&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Grace period is 14 days
&lt;span class="p"&gt;-&lt;/span&gt; Don't change billing logic without approval
&lt;span class="p"&gt;-&lt;/span&gt; Tax ID is required for invoices
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;In a &lt;code&gt;.pbc.md&lt;/code&gt; behavior spec:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Grace period enforcement&lt;/span&gt;

&lt;span class="gu"&gt;### When&lt;/span&gt;
A subscription payment fails

&lt;span class="gu"&gt;### Then&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; System enters a 14-day grace period
&lt;span class="p"&gt;-&lt;/span&gt; User retains full access during grace period
&lt;span class="p"&gt;-&lt;/span&gt; Daily retry attempts are made against the payment method
&lt;span class="p"&gt;-&lt;/span&gt; On day 14, if no successful payment: downgrade to free tier

&lt;span class="gu"&gt;### Invariants&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Grace period duration must be exactly 14 days — not configurable per plan
&lt;span class="p"&gt;-&lt;/span&gt; No data deletion occurs during grace period
&lt;span class="p"&gt;-&lt;/span&gt; Grace period cannot be extended manually by support

&lt;span class="gu"&gt;### Edge cases&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; If user upgrades plan during grace period: new payment attempt immediately
&lt;span class="p"&gt;-&lt;/span&gt; If payment method is removed during grace period: grace period continues (retry stops)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CLAUDE.md version tells an agent "don't touch this." The behavior spec tells the agent (and the product owner, and QA, and the next developer) &lt;em&gt;exactly what the product promises&lt;/em&gt; — in enough detail to verify whether the promise is still being kept.&lt;/p&gt;

&lt;h2&gt;
  
  
  The market knows something is missing
&lt;/h2&gt;

&lt;p&gt;This isn't a theoretical gap. The pain is already showing up across the ecosystem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic&lt;/strong&gt; published &lt;a href="https://docs.anthropic.com/en/docs/claude-code/hooks-guide" rel="noopener noreferrer"&gt;hooks documentation&lt;/a&gt; — deterministic controls that run &lt;em&gt;outside&lt;/em&gt; the model, because they recognized that prompt-level instructions aren't reliable enough for enforcement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI&lt;/strong&gt; published a &lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;harness engineering series&lt;/a&gt; describing how they built their entire product with agents — structured documentation directories as the source of truth, architectural constraints enforced mechanically via linters and CI — then didn't productize the approach&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gartner&lt;/strong&gt; started covering &lt;a href="https://www.gartner.com/en/newsroom/press-releases/2026-03-17-gartner-predicts-at-least-80-percent-of-governments-will-deploy-ai-agents-to-automate-routine-decision-making-by-2028" rel="noopener noreferrer"&gt;agentic AI governance&lt;/a&gt; as a distinct market category&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The vocabulary is fragmenting — guardrails, governance, policies, behavior, intent — but the underlying need is converging: &lt;strong&gt;teams need a structured way to specify what agents are and aren't allowed to do, above the instruction file layer.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to start
&lt;/h2&gt;

&lt;p&gt;You don't need to spec your entire product on day one. Start with the module where an agent mistake would hurt most — usually billing, auth, or entitlements.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create &lt;code&gt;billing.pbc.md&lt;/code&gt; in your repo&lt;/li&gt;
&lt;li&gt;Write the 3-5 behaviors that are non-negotiable (grace period, refund window, upgrade logic)&lt;/li&gt;
&lt;li&gt;For each behavior: what must happen, what must not happen, edge cases&lt;/li&gt;
&lt;li&gt;Point your CLAUDE.md or AGENTS.md at it: "Read &lt;code&gt;*.pbc.md&lt;/code&gt; files before modifying any billing, auth, or entitlement logic"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last step is the bridge — your existing instruction files become the pointer to the behavior spec. They work together, not against each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack, not the replacement
&lt;/h2&gt;

&lt;p&gt;The right mental model isn't "PBC replaces CLAUDE.md." It's a stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Layer 4: Behavior specs (.pbc.md)                  — product truth — what it promises
Layer 3: Feature specs / PRDs                      — what we plan to build
Layer 2: Session memory / context                  — what we're doing now
Layer 1: Instruction files (CLAUDE.md, AGENTS.md)  — how to work here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer is useful. None replaces the others. Most repos have layers 1-3 covered. Layer 4 is the one that prevents the production incident where an agent does exactly what it was told — and breaks a product promise nobody wrote down.&lt;/p&gt;




&lt;p&gt;The PBC spec is open source at &lt;a href="https://github.com/stewie-sh/pbc-spec" rel="noopener noreferrer"&gt;github.com/stewie-sh/pbc-spec&lt;/a&gt;. You can browse example contracts in the &lt;a href="https://pbc.stewie.sh" rel="noopener noreferrer"&gt;PBC viewer&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If your instruction files are working for code conventions but failing for product decisions — this is the layer that's missing.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Disclosure: this article was drafted with AI assistance and reviewed, edited, and fact-checked by the author before publication.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>opensource</category>
      <category>productivity</category>
    </item>
    <item>
      <title>AI agent context still misses the product layer</title>
      <dc:creator>Vinh Nguyen</dc:creator>
      <pubDate>Sun, 29 Mar 2026 15:35:04 +0000</pubDate>
      <link>https://dev.to/stewiehq/ai-agent-context-still-misses-the-product-layer-4di6</link>
      <guid>https://dev.to/stewiehq/ai-agent-context-still-misses-the-product-layer-4di6</guid>
      <description>&lt;p&gt;If you spend time around AI coding tools, the conversation has clearly shifted.&lt;/p&gt;

&lt;p&gt;People are talking less about prompts and raw model quality, and more about the surrounding system: repo rules, memory, harnesses, evals, and monitoring.&lt;/p&gt;

&lt;p&gt;That shift is correct. But even the better AI agent stacks still miss one important layer: product context.&lt;/p&gt;

&lt;p&gt;Now the serious work is happening one layer above the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI is writing about &lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;harness engineering&lt;/a&gt; and internal monitoring for coding agents.&lt;/li&gt;
&lt;li&gt;Anthropic is writing about &lt;a href="https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents" rel="noopener noreferrer"&gt;effective harnesses for long-running agents&lt;/a&gt; and even &lt;a href="https://www.anthropic.com/engineering/building-c-compiler" rel="noopener noreferrer"&gt;parallel agent teams building a C compiler&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Every tool ecosystem now has some version of workflow rules, repo instructions, memory files, and spec-driven coding.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Taken together, these point to the same conclusion: the model is only part of the system. Reliable agentic coding depends on the surrounding stack.&lt;/p&gt;

&lt;p&gt;That's real progress. But it still leaves one missing layer.&lt;/p&gt;

&lt;p&gt;The modern agent stack is getting better at telling agents &lt;strong&gt;how to work&lt;/strong&gt;. It still does a poor job telling them &lt;strong&gt;what the product must continue to do&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI agent context solves today
&lt;/h2&gt;

&lt;p&gt;Most serious AI-native repos are already building some version of the same stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Repo instructions&lt;/strong&gt; like &lt;code&gt;AGENTS.md&lt;/code&gt;, &lt;code&gt;CLAUDE.md&lt;/code&gt;, or tool-specific rules tell the agent how to work in this codebase.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory files&lt;/strong&gt; preserve what happened in prior sessions so the next run doesn't start cold.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Harnesses&lt;/strong&gt; manage long-running work, handoffs, tool access, and task decomposition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evals and monitors&lt;/strong&gt; check whether the agent stayed within technical or safety boundaries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each layer solves a real problem.&lt;/p&gt;

&lt;p&gt;Repo instructions reduce workflow mistakes. Memory reduces repeated exploration. Harnesses help agents make progress across long tasks. Evals and monitors catch bad outputs and suspicious behavior.&lt;/p&gt;

&lt;p&gt;If your goal is better software engineering execution, this stack makes sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why better harnesses still don't protect product decisions
&lt;/h2&gt;

&lt;p&gt;Here's the problem: an agent can follow every repo rule, use the right harness, pass the tests, and still break the product.&lt;/p&gt;

&lt;p&gt;Not by writing obviously bad code. By changing something that looked reasonable from the code alone.&lt;/p&gt;

&lt;p&gt;That happens because most product decisions are not explicit in the repo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the 14-day refund window a confirmed policy or a placeholder?&lt;/li&gt;
&lt;li&gt;Is the current permission model intentional or just the minimal thing that shipped first?&lt;/li&gt;
&lt;li&gt;Is an empty field a bug, a compliance requirement, or a deliberate product choice?&lt;/li&gt;
&lt;li&gt;Is this validation rule load-bearing, or leftover code that should be removed?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent sees implementation. It does not automatically see product intent, trust level, or business significance.&lt;/p&gt;

&lt;p&gt;That is why teams end up saying the same thing after an agent made a "wrong" change: the code was plausible, but it violated something the team had already decided.&lt;/p&gt;

&lt;p&gt;This is not a prompt quality problem. It is a missing artifact problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is missing from AI agent context today?
&lt;/h2&gt;

&lt;p&gt;The missing layer is &lt;strong&gt;product truth&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not a PRD. Not a sprint spec. Not a memory log. Not a test suite.&lt;/p&gt;

&lt;p&gt;Product truth answers a narrower and more durable question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What does this product actually promise to do right now, and which behaviors are confirmed enough that agents should treat them as protected?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That layer needs to capture things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;confirmed product behaviors&lt;/li&gt;
&lt;li&gt;forbidden states and actions&lt;/li&gt;
&lt;li&gt;deliberate edge cases&lt;/li&gt;
&lt;li&gt;areas that are still provisional&lt;/li&gt;
&lt;li&gt;decisions that are actively being explored and should not be treated as settled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that layer, every agent is forced to infer product meaning from implementation details.&lt;/p&gt;

&lt;p&gt;Sometimes that works. Sometimes it silently introduces product drift.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AGENTS.md and memory banks are not enough
&lt;/h2&gt;

&lt;p&gt;This is where teams get confused, because all of these artifacts look similar from the outside. They're usually text files in the repo. They're all readable by both humans and agents. They all seem like "context."&lt;/p&gt;

&lt;p&gt;But they operate at different levels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AGENTS.md&lt;/strong&gt; tells the agent how to behave as a contributor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory banks&lt;/strong&gt; preserve what happened across sessions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature specs&lt;/strong&gt; describe what the team plans to build.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tests&lt;/strong&gt; verify implementation behavior in specific scenarios.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitors&lt;/strong&gt; look for dangerous or misaligned actions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of those directly answer: &lt;em&gt;which product behaviors are intentional, protected, and safe to build on top of?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can have all five and still leave the core product layer implicit.&lt;/p&gt;

&lt;p&gt;That's why a repo can feel "well-instrumented" for agents and still be fragile when they touch billing, entitlements, onboarding logic, permissions, or compliance-sensitive flows.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a Product Behavior Contract adds
&lt;/h2&gt;

&lt;p&gt;A &lt;a href="https://github.com/stewie-sh/pbc-spec" rel="noopener noreferrer"&gt;Product Behavior Contract&lt;/a&gt; adds the missing product layer without replacing the rest of the stack.&lt;/p&gt;

&lt;p&gt;It sits alongside your existing agent context and makes the behavioral contract explicit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;What must happen&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;What must not happen&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Which edge cases are deliberate&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Which behaviors are confirmed, provisional, or still being explored&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Which source files provide evidence&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That changes the quality of agent decisions.&lt;/p&gt;

&lt;p&gt;When the contract says a billing limit is confirmed, the agent stops treating it as an arbitrary number it can refactor freely. When the contract says the permission model is still under exploration, the agent stops extending it as if the design were settled. When a behavior is marked provisional, humans and agents both know not to overfit around it.&lt;/p&gt;

&lt;p&gt;This is the difference between code context and product context.&lt;/p&gt;

&lt;p&gt;Code context tells the agent what exists.&lt;br&gt;
Product context tells the agent what must remain true.&lt;/p&gt;

&lt;h2&gt;
  
  
  The agent stack is converging. The product layer is next.
&lt;/h2&gt;

&lt;p&gt;My read of the current ecosystem is that OpenAI, Anthropic, and the broader tool market are all converging on the same architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;give agents better maps of the repo&lt;/li&gt;
&lt;li&gt;help them work across long time horizons&lt;/li&gt;
&lt;li&gt;break work into clearer sub-tasks&lt;/li&gt;
&lt;li&gt;evaluate and monitor their behavior more rigorously&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the right direction.&lt;/p&gt;

&lt;p&gt;But as agents get better at execution, the cost of missing product truth goes up, not down.&lt;/p&gt;

&lt;p&gt;A stronger coding agent can now move faster, touch more files, and refactor more confidently. If the product layer is still implicit, that extra capability just lets it make bigger product mistakes more efficiently.&lt;/p&gt;

&lt;p&gt;The next mature AI-native repo will not stop at workflow rules, harnesses, and evals. It will also include a durable product artifact that says what the software is actually supposed to do.&lt;/p&gt;

&lt;p&gt;That's the role of a product behavior contract.&lt;/p&gt;

&lt;p&gt;The format is open source. The &lt;a href="https://pbc.stewie.sh" rel="noopener noreferrer"&gt;PBC viewer&lt;/a&gt; lets you browse structured contracts in the browser, and the &lt;a href="https://github.com/stewie-sh/pbc-spec" rel="noopener noreferrer"&gt;PBC spec&lt;/a&gt; is public if you want to see the model directly.&lt;/p&gt;

&lt;p&gt;If you're already using AGENTS.md, memory files, or spec-driven coding, this is the next layer I'd look at: a durable artifact for product truth, not just execution rules.&lt;/p&gt;

&lt;p&gt;Related:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.stewie.sh/blog/agents-md-memory-bank-pbc" rel="noopener noreferrer"&gt;AGENTS.md, Memory Bank, and PBC solve different problems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.stewie.sh/blog/why-ai-agents-keep-violating-your-product-rules" rel="noopener noreferrer"&gt;Why AI agents keep violating your product rules&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Disclosure: this article was drafted with AI assistance and reviewed, edited, and fact-checked by the author before publication.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
