Most agent products start from a simple shape:
A user sends something.
The assistant responds.
The app stores the conversation.
That works for a prototype, but it starts to break when the product becomes real. Once an agent supports files, images, tools, approvals, citations, artifacts, resumable streams, long-running tasks, or multiple model providers, the word "conversation" starts hiding two different records.
The first record is what the product needs to store. The second record is what the LLM actually saw. They are related, but they should not be the same stored object.
The product needs a durable record of the user experience: what the user sent, what the assistant displayed, which files appeared, which tools ran, which artifacts were created, which approvals happened, and how the conversation should be rendered, resumed, searched, audited, and migrated. The LLM needs a different record: the instructions, selected history, resolved attachments, tool observations, retrieved facts, provider-specific input parts, model output, token usage, and errors involved in a particular execution.
Those two records serve different systems. They change at different speeds, fail in different ways, and become harder to reason about when they are forced into one message object. The better rule is simple:
Store what the product experienced separately from what the LLM saw.
Or shorter:
Product messages are not LLM messages.
The Hidden Mistake: One Record for Two Worlds
A chat-like interface makes the mistake feel natural. The user sees a sequence of messages, and the model also appears to consume a sequence of messages, so the first implementation often stores one list and uses it for everything.
That list becomes the product history, the model input, the streaming state, the audit log, and eventually the place where tool results, file references, citations, approvals, provider-specific fields, and partial execution state all accumulate. At that point, the message list is no longer a clean product model. It is an accidental mixture of product state and execution state.
This creates problems on both sides. The product record becomes polluted by temporary model concerns: provider-specific content parts, prompt formatting, compressed tool observations, retrieved snippets, model-specific attachment formats, and context-management artifacts. The model input becomes polluted by product concerns: display metadata, timestamps, loading states, preview URLs, frontend editor structures, collapsed panels, UI labels, and other details that exist only so the product can render correctly.
Neither side gets what it wants. The product wants a stable, durable, user-facing record. The LLM wants a focused execution input. Those should not be the same stored object.
Product Messages: What the Product Experienced
Product messages are the canonical record of the conversation from the product's point of view. They should answer questions like:
- What did the user see?
- What did the user send?
- What did the assistant display?
- Which files were attached?
- Which tool runs appeared in the conversation?
- Which artifacts were created?
- Which citations were shown?
- Which approvals were requested or granted?
- Which errors should be visible?
- How should the conversation be resumed later?
This record should be durable, renderable, searchable, paginatable, auditable, and migration-friendly. It should preserve stable IDs, timestamps, authorship, message structure, attachment metadata, artifact references, tool progress, approval state, citations, and user-visible errors.
It may also include rich product-specific structure. A user input might be a structured editor document, not just plain text; it might include images, PDFs, spreadsheets, selected workspace objects, mentions, or pasted screenshots. An assistant response might include text, generated files, citations, tool activity, expandable logs, intermediate status, or custom UI blocks.
That richness is not a problem. It is exactly what makes the product useful. The problem starts when this product-shaped record is treated as the LLM's own record. The LLM does not need every display field, timestamp, collapsed panel state, frontend-only metadata, object URL, or internal UI structure.
Product messages should be allowed to stay product-shaped. They are the source of truth for the user experience, not a prompt format.
LLM Messages: What the LLM Saw
LLM messages are different. They are not the canonical conversation; they are the execution record of a specific model run. They should answer questions like:
- Which model was called?
- Which provider was used?
- What instructions were included?
- Which parts of the conversation were selected?
- Which files or artifacts were resolved?
- Which tool definitions were available?
- Which tool results were included?
- Which retrieved facts or memories were added?
- What exact input did the model receive?
- What did the model output?
- How many tokens were used?
- What error or finish reason occurred?
This record is valuable, but for a different reason. Product messages help the app render and resume the conversation; LLM messages help engineers debug, replay, evaluate, compare, and audit model behavior. They are an execution trace.
That trace should usually be immutable. Once a model run has happened, the record of what the LLM saw should not silently change because the product conversation was edited, an attachment preview changed, a context policy was updated, or a provider adapter was refactored.
This is especially important for debugging. When a user asks, "Why did the agent do that?", the product conversation is not enough. You need to know what the LLM actually saw at that moment: maybe an older summary was included, a tool result was compressed, a file was too large and only an excerpt was sent, one provider received a native image input while another received a text fallback, or an approval state was visible in the product but not included in the model input.
Without a separate execution record, these differences are hard to inspect. With one, they become visible.
A Conversation Is Not a Prompt Log
A useful mental model is:
The conversation is product messages.
The prompt log is LLM messages.
They overlap, but they are not the same thing. The conversation is organized around the user experience, while the prompt log is organized around model execution. The conversation should be pleasant to render and stable to evolve; the prompt log should be exact enough to replay and inspect.
The conversation may contain rich product objects, while the prompt log may contain provider-specific input parts. The conversation may be edited, redacted, migrated, or re-rendered, while the prompt log should preserve what happened during a specific run.
This distinction becomes more important as the product grows. A simple text-only assistant may get away with one message list for a while, but an agent product with files, tools, artifacts, permissions, and multiple models needs a stronger boundary. Do not store the product experience as if it were merely a prompt, and do not store the prompt as if it were the product experience.
Why Storage Separation Matters
It is tempting to describe this as a schema preference, but the real reason is operational. The model layer needs stable prepared context. The product layer needs durable user history. Those are different stability contracts.
A naive chat product treats the stored message list as the model context: take the messages in order, append the new user input, and send the whole thing to the model. That is simple, but it makes product history and prepared context collapse into the same thing. Real agent products cannot live with that assumption for long, because the model input has to be selected, resolved, compacted, cached, adapted, and inspected.
Prompt Caching Needs Stable Context
Prompt caching makes this separation concrete. Caching works best when the model input has stable, repeatable pieces: system instructions, tool definitions, long-lived summaries, selected history, resolved artifacts, and other context blocks that can be arranged in a predictable order. If every request is assembled directly from a noisy product message list, small product changes can disturb the cacheable prefix for reasons that have nothing to do with model behavior.
That instability is not a product problem. The product should be free to evolve its conversation model, add display metadata, change attachment previews, update tool event UI, or migrate how artifacts are rendered. It becomes a model problem only when the same stored object is also treated as the context sent to the LLM.
Context compaction has the same shape. A compacted summary may be exactly what the model should see for a new run, and it may even become part of the stable cacheable prefix. But that summary should not replace the original product conversation, and it should not silently rewrite what an older run saw. It is prepared context, not the conversation itself.
This does not mean every model call should be frozen forever. Context still changes when the user asks something new, when a relevant file is added, when a tool result matters, or when the compaction policy produces a better summary. The point is that these changes should be intentional model-context changes, not accidental cache misses caused by UI state leaking into the prompt.
An uploaded image shows the same boundary in a smaller form. The product may keep one stable image card with a file name, thumbnail, permissions, and placement in the conversation. The model context may receive native image input, compressed bytes, a text fallback, or only an artifact pointer depending on the provider and the task. Those are not competing versions of the same field; they are different records with different stability rules.
Old Runs Need Execution Truth
The second reason is historical truth. Imagine looking at an old agent conversation six months later. The product UI has changed, the file preview component is different, the model provider has changed, and the way you prepare context has changed. Maybe the agent now uses a retrieval system or compaction policy that did not exist when the old run happened.
The conversation should still render as the user experienced it, and the model run should still show what the LLM saw at the time. Those are two different promises. If the same stored message object is responsible for both, old conversations become fragile: a harmless UI migration can make model debugging harder, a better context strategy can accidentally rewrite the meaning of an old run, and a provider migration can leave historical data in a shape that is neither good product state nor accurate model state.
Separate records avoid that confusion. The product message says, "This is what happened in the conversation." The LLM message says, "This is what the model saw during this execution." The bridge between them can evolve, but the records should remain honest.
This is the architectural benefit:
Product messages preserve the user experience. LLM messages preserve the execution reality.
Both are important. They just should not be the same record.
The Feeling of a Better Boundary
A good boundary should make the system feel calmer. The conversation can be redesigned without wondering whether a CSS-era field will leak into the next prompt. The model input strategy can change without rewriting old user-visible history. A provider adapter can evolve without mutating the conversation record. A bad run can be inspected without reconstructing the prompt from today's version of the UI.
This is the feeling I want from the architecture:
- The product keeps the story the user experienced.
- The model run keeps the facts the LLM saw.
- The two are connected by IDs, not collapsed into one object.
That is enough. You do not need to turn the product schema into a prompt schema, and you do not need to turn the prompt log into a UI schema. You just need to let the two records stay honest about what they are.
Product Messages Are Not LLM Messages
The mistake is not using the wrong message format. The mistake is using one stored object for two different realities. The product has to remember what the user experienced, and the model run has to remember what the LLM saw. Those are not the same thing.
A conversation is not a prompt log, and a prompt log is not a conversation. The conversation is product messages: durable, renderable, resumable, searchable, auditable, and migration-friendly. The model run is LLM messages: exact, execution-specific, replayable, debuggable, evaluable, and provider-aware.
They should be connected and they should reference each other, but they should not share the same schema. This separation gives the product room to evolve without losing execution truth, gives the model layer room to improve without corrupting the product history, and makes context management and prompt caching much easier to reason about.
The final rule is simple:
Store the conversation the product experienced.
Store the execution the LLM saw.
Link them, but do not collapse them.
That is the difference between product messages and LLM messages.




Top comments (0)