DEV Community

The Messages Array, in 4 GIFs

Jasmin Virdi on June 09, 2026

This is the third post of series Building TinyAgent where we are building a small agent from scratch in Node.js with no frameworks just the API cal...

Read full post

Alex Shev • Jun 12

Visualizing the messages array is helpful because most bugs in LLM apps are not in the model call itself. They are in what got included, in what order, and under which role. A small animation can make that state visible in a way a JSON dump usually does not, especially for people new to chat-based architectures.

Jasmin Virdi • Jun 12

Thanks @alexshev !

Exactly, the thoughts I had in mind while writing this series

Alex Shev • Jun 13

Exactly. The messages array is where small misunderstandings turn into real behavior. Visualizing it makes the contract feel concrete: roles, order, context, and tool output all become things you can inspect instead of vague prompt magic.

Manuel Bruña • Jun 15

The messages array looks simple until you debug a real agent trace. Keeping roles, tool calls, and intermediate reasoning boundaries clear makes later failures much easier to inspect. Bad message hygiene becomes bad system behavior.

Jasmin Virdi • Jun 16

Thanks for adding up @tecnomanu

Manuel Bruña • Jul 9

Thanks for the reply. The main point for me is that message arrays become useful when they preserve roles and tool boundaries clearly. Once those blur, agent debugging gets much harder.

Mudassir Khan • Jun 16

the "long system prompts compound" line is the one most teams hit hardest tbh. in our prod RAG setup we inject retrieved docs into the system prompt — at 3k tokens average, that's 90k input tokens just from the system message by turn 30, before any conversation happens.

went full sliding window after that. trade off is the agent loses early context on long sessions, so we snapshot key facts into a context_summary block instead of pure recency truncation.

curious if you're doing hard truncation or summarization to handle the curve?

Marcus Chen • Jun 11

Nice way to make the state visible. The messages array gets real interesting in voice, where every turn you keep is latency you pay at synthesis time. We ended up summarizing everything older than a few turns into one system-side note and keeping only the recent turns verbatim, which felt wrong until we measured that response quality barely moved and time-to-first-token dropped noticeably. The array is a budget, not just a log

Jasmin Virdi • Jun 11

Thanks @realmarcuschen

That's a great analysis. How much difference did you observe in cost and output rendering time?

Mininglamp • Jun 11

The messages array abstraction works fine for simple chatbots but starts breaking when you need agents that maintain state across tool calls. The moment you have parallel tool execution or need to inject system context mid-conversation, the linear array model gets awkward fast. Most agent frameworks end up building a graph on top of it anyway.

Jasmin Virdi • Jun 11

Thanks @mininglamp

Nice addition! Would definitely include this point in the upcoming post of the series for tool call.

Yves Jutard • Jun 10

Exactly the questions I had today. Thanks for this write-up + the cool illustrations 👏

Jasmin Virdi • Jun 10

Thanks @yvem

Sloan the DEV Moderator • Jun 9

Hey, this article appears to have been generated with the assistance of ChatGPT or possibly some other AI tool.

We allow our community members to use AI assistance when writing articles as long as they abide by our guidelines. Please review the guidelines and edit your post to add a disclaimer.

Failure to follow these guidelines could result in DEV admin lowering the score of your post, making it less visible to the rest of the community. Or, if upon review we find this post to be particularly harmful, we may decide to unpublish it completely.

We hope you understand and take care to follow our guidelines going forward!

Nazar Boyko • Jun 10

Great series! 👏
The caching section is the part most agent tutorials skip entirely. Worth adding: you're not limited to a single cache marker. Anthropic gives you up to 4 cache_control breakpoints, so a common layout is one on the tools, one on the system prompt, one on the last message. Since a read checks for the longest matching cached prefix, append-only history keeps hitting the cache without you re-marking every turn. The gotcha you nailed is the real one though! Any edit before a breakpoint (a window drop, a summary swap) busts everything after it, which is exactly why window and caching fight each other.

Jasmin Virdi • Jun 10

Thanks @nazar_boyko !

Your feedback is really motivating. Thanks for adding up! :)

Athreya aka Maneshwar • Jun 9

Jasmin Virdi • Jun 10

Thanks @lovestaco