Memory and Context Windows: Best Practices for AI Tools

#ai #programming #productivity #tutorial

As new tools and features come out, AI tools get more complex giving us more capabilities and ways of working. What also comes along with these updates and changes, is a higher rate of tokens exchanged in both prompts and responses.

From the beginning, we've been told that the more detailed a prompt the more accurate our response from the server is likely to be. With added skills like Grill-Me or Superpowers we are able to refine ideas and create more purposeful prompts but there's also the added cost.

What a lot of people don't realise, is that prompts and responses are only one way that token usage is eaten. What I've seen from a lot of people recently, are context windows that stay open for longer than required, details being pulled from memory long after caching has expired and multiple chats happening within the same context or chat window.

While this may seem like a time saver, and it is, there's a hidden memory cost hiding in every prompt sent in that window. You might think it's a good idea because "the AI remembers" but in reality, the AI is just regurgitating previous information that it's already retrieved.

There are a few "Best Practices" that are important to consider when first looking at context windows and memory:

New chat, new session: This is really important because you don't want to change or adapt what is currently already in your session memory. If you have a question or a new piece of work to do, mostly unrelated to your existing chat, start a new session in the CLI or open a new chat window in the IDE.

This is important for two main reasons; it keeps your sessions clean and easy to find later and it sets a new agent with a new set of memories to recall making the response faster and more cost effective.

Summarise your current session: Summarising your current session and opening a new session to start over means that your agent will be able to retrieve detail from memory faster. It also opens up the context window allowing newer information to be set in memory and easily retrievable

One way I do this is by having my coding agent create a summary document, either in markdown or confluence and use that to set my prompt in the next session.

AI DOES NOT REMEMBER: AI does not have "memory". Every time you ask something of your AI tool, it retrieves what it thinks is relevant information (also why specific prompts are important). Everything it takes in is stored in the context window to be easily repeated should you need it later. As soon as you ask a new question unrelated to the previous one, everything in that previous memory is deleted and is replaced with new memory for your new question.

If you then want to go back to your first problem, it has to go back and retrieve all of that important information again, removing the previous log from it's memory

Small sessions are good sessions: While this may seem like a lot of work and looks less efficient from the outset, your output is going to be much more efficient. You'll need to bring your agent back on track less, debugging problems in the agent generated code will be faster and the overall chat history remains cleaner.

Utilise your Instructions.md: All coding agents have a version of this file. For GHCP it's Instructions.md for Claude it's Claude.md.

Anything that is repeated across sessions should go into one of these files. This can include specific linting that you want for your repo or specific coding practices that need to be remembered between sessions. Having that information in one of these files means that you don't need to bring it into working memory each session and it's preloaded when you open up either chat or the CLI.

Don't switch models in the same session: This might seem like a good idea to do for code review or to see what outputs differ but as soon as a new model is loaded, the previous memory disappears. While it's important to review code or documentation with multiple models for new perspectives, this should be done in a new chat or context window.

As the models get more powerful and our workload with AI increases, it can be tempting to keep windows and chats open for hours, days or even weeks!

It's important to start working with these kind of practices in mind now, while we are still figuring out the best way to utilise these tools effective before we are too overwhelmed and stuck in our ways.

Think of the AI tools like an online meeting; each new topic is a new meeting where the relevant team members are invited. You don't sit all day or all week in the same call switching between coding projects, repos and tickets with different people constantly jumping in and out.

AI works the same. The clearer and more focused the session, the better the output we are going to receive.