The Week That Felt Productive
Last week I published four posts on this newsletter. They were observation pieces about AI agent products — what current platforms get wrong about persona, about conversation-quality monitoring, about the relationship between AI and runtime, about optimizing for proxy metrics instead of real outcomes.
Each post stood on its own. They also connected — four angles on the same underlying claim: most AI agent products today ship simple primitives (static personas, per-call logs, workflow builders) when the more accurate primitives are runtime-level (situational steering, trajectory-level observability, content-aware design).
By the fifth day, I noticed I wasn't just writing. I was building a worldview.
Yesterday I opened a new document to plan product direction for the agent platform I run. Within twenty minutes I had drafted a framework with four improvement tracks, each mapped one-to-one onto those four posts. The plan felt right. It was logically consistent, evidence-backed, and matched my product's current architecture.
I asked the person I work with most closely — who sometimes functions as my critical second opinion — whether this was the right direction.
The answer: technically sound, strategically premature.
That stopped me. And it taught me something I want to write down before I forget it.
The Trap Has a Name
The pattern I walked into isn't new. It has a name. I'll call it content-to-product alignment trap.
It works like this:
- You write essays about how the world should be.
- The essays are logically coherent and emotionally satisfying to produce.
- You build a worldview from the act of writing them.
- You then try to build your product around that worldview.
- You mistake the coherence of the worldview for evidence that the product is right.
The coherence is real. But coherence is not demand. Your essays are a theory of what's broken in the industry. Your product has to solve what's actually breaking for the users you're trying to reach. These can overlap. They can also diverge entirely.
Here's the part I didn't expect: the trap gets stronger when you write a series. A single essay is easier to hold at arm's length. Four essays in a week, all reinforcing the same thesis, feel like a position paper. The more consistent the series, the more the author mistakes its internal consistency for external validity.
I wrote posts that argued AI agent platforms should have situation-aware personas, trajectory-level observability, content-first design, and outcome-based evaluation. Each claim is defensible. But I have no user interview data showing my actual users are blocked on any of those four problems. I just wrote about them compellingly, and four posts later I was ready to bet a roadmap on them.
Why This Fails Quietly
The tricky part of this trap is that it doesn't feel like a mistake while you're in it.
When you build a product based on real user pain, there's friction. Users complain, things don't make sense, hypotheses get falsified. The dissonance is productive. You update.
When you build a product based on your own essays, there's no dissonance. You wrote the essays. You agree with yourself. Every decision confirms the worldview you already built. The feedback loop collapses into a loop of one.
And because the essays are public, there's a second reinforcing mechanism: public commitment. You've told readers the world is shaped a certain way. Pivoting your product away from that shape can feel like a reputational cost. So you don't pivot. You build.
The product that results might even be well-engineered. It will just be well-engineered for the wrong problem.
The Evidence I Wasn't Using
When I sat with the pushback, I made a small but important list.
Users of my platform don't write to me saying "my agent's conversation drifts structurally and we lack trajectory-level observability." They write to me saying things like:
- it's too slow
- this integration broke
- the output missed the point
- I can't figure out what the agent actually did
These aren't the same problems I wrote about this week. There's a relationship — trajectory observability would help diagnose "can't figure out what the agent did" — but "relationship" is not "direct match."
When I mapped my four-track plan against these actual complaints, I found that one of the four tracks directly addressed a user-facing pain, and the other three were internal quality infrastructure. The three would help my team operate better. They wouldn't be felt by users unless I explicitly surfaced them.
I had written a roadmap where 75% of the work was invisible to the users it was supposedly for.
Author Voice vs. Builder Voice
The fix isn't to stop writing about product observations. The fix is to recognize that the author and the builder are different roles, and they need to hear different things before they commit.
As an author, I'm allowed — encouraged — to write from observation. I can say "here's what I notice about the shape of this category" without being accountable to whether the observation solves anyone's concrete problem. Essays are for sense-making, hypothesis-forming, provocation. They don't need user research to be valuable.
As a builder, I'm not allowed to skip user research. I don't get to substitute my essays for it. A product decision grounded in "I wrote about this and it felt true" is a decision grounded in one person's sense-making. One person's sense-making is a terrible distribution to sample user need from.
The trap is that these two voices live in the same head. The essay I wrote yesterday becomes the premise for the product decision I make today. Unless I consciously split the two, they blur.
In my case the conscious split looks like this:
- Essays go on Substack. They're observations. They commit me to thinking publicly, not to building accordingly.
- Roadmap decisions go through a different filter: user interviews, retention data, competitive positioning, pricing experiments. Essays can be an input to that filter, but not the filter itself.
I had collapsed those two tracks last week without noticing.
The Counter-Example I Should Have Learned From Earlier
There's an irony here. One of the posts I wrote last week was about Claude Design — the product Anthropic launched, which I analyzed by reading source code of the skill bundles it produces. The deepest claim of that piece was: Claude Design is mostly clever runtime engineering with a model at the entry point. The runtime is the product.
What I missed, while writing enthusiastically about that claim, is how Claude Design actually got built. Anthropic didn't start by writing essays about "the AI design category needs a runtime layer." They built a product that users wanted (fast, good-looking design artifacts), and the runtime emerged as the architecture that made that product work well.
Runtime-as-product is a retrospective description of Claude Design. It is not a prescription for how to decide what to build.
If I had actually internalized that lesson, my own plan wouldn't have led with "let's build runtime infrastructure because it's the more accurate primitive." It would have led with "what user outcome is currently broken, and does a runtime change fix it?"
The worldview from my essay was seductive. The worldview from how the product I admired actually came to exist was more useful, and I walked past it.
What I Changed
After the conversation, I rewrote the plan.
The four improvement tracks stayed. What changed was how they're classified and sequenced.
One track — the one that directly changes what users feel (agent behavior shifts with user state) — became the center. It is the only track that gets to carry product positioning.
Three tracks — the observability layer, the artifact bundle format, the evaluation feedback loop — got reclassified as internal engineering investments. They might make my team's operations better. They don't show up in external messaging. No user cares about "we added trajectory-level scoring" as a product pitch.
I also added a positioning confirmation gate. Before any of this gets committed to as product positioning, the user-facing track has to be validated with actual users. If three out of five test users can point to specific moments where the agent felt like it was reading their state, positioning is confirmed. If they can't, the work was still useful internally, but the positioning claim doesn't get made.
That last part feels uncomfortable, which is how I know it's right. The honest product position is "we think this direction matters but we don't know yet if users will feel it." That's how the position should sit until the evidence shows up.
A Practical Test for Other Builder-Writers
If you're both publishing and building, here's the test I now run before letting a piece of writing inform a product decision:
1. Would this observation survive contact with actual user interviews?
If your essay says users need X, and you haven't heard a user articulate X, that's a hypothesis, not a decision.
2. Is the elegance of the essay doing argumentative work the evidence isn't?
Coherent writing is a skill. Coherence inside the essay is not coherence with reality. Test whether the essay's strength is its logic or its evidence.
3. If you had to remove the essay from consideration, would the product decision still look the same?
If no, the essay is the load-bearing reason for the decision. That's dangerous unless the essay is backed by something beyond the author's own sense-making.
4. What is your essay actually an output of — user research, or your own pattern-matching?
Both are valid writing inputs. Only one is a valid product input.
None of these tests disqualify writing from influencing product direction. They just force you to name which track the essay belongs on.
What Stays
I'm going to keep writing these essays. The observation work is valuable on its own. It shapes how I think, it clarifies what I can articulate, it creates accountability that makes the thinking sharper. A weekly essay is not a time waste, even if it doesn't steer the roadmap.
What I'm not going to do, anymore, is let the essay be the plan.
The series I wrote last week — four posts on agent product primitives — is still a real series about real observations. Those observations probably point at real gaps. I just don't know yet which of them are my users' gaps versus the industry's abstract gaps. That distinction matters more than I treated it.
If you're reading this as a builder-writer yourself, the thing I'd watch for is the fifth day of a week when the series feels especially coherent. That's the moment when the author in your head tries to hand the product direction to the builder in your head. The handoff feels productive. It's actually a step sideways into fiction.
Notice when it happens. Make the split conscious.
Top comments (0)