Adam Poulemanos for Knitli

Posted on Sep 26 • Originally published at blog.knitli.com on Sep 26

Context and Context Windows: What You Need to Know

#deeplearning #ai #learning #beginners

Why Your AI is a Goldfish

Part 2 of Knitli's 101 introductions to AI and the economics of AI

tl;dr

Large language models (LLMs) use a fixed-size context window to process input and generate responses, but they don't have memory like humans.
The context window contains all the information the model can consider at once, and when it overflows, older information is lost.
LLMs are trained on outdated data, leading to a preference for older information and potential hallucinations when asked about unknown topics.
Context management is crucial, as including too much or irrelevant information can hinder response accuracy.
Engineers use techniques like prioritizing recent information and filtering out irrelevant details to manage context effectively.

LLMs and Their 'Memories'

When people talk about 'AI' today, they usually mean ChatGPT, Claude, or Gemini. These tools all use large language models (LLMs). LLMs consist of billions of parameters_think of each parameter as a number in a massive mathematical equation. The model combines these parameters with your input to generate responses. Its a huge statistical machine: predicting the most _likely output based on its training parameters and the context you gave it (intentionally or otherwise).

The Context Window 'Container'

LLMs don't remember like humans do (or at all). Instead, they work with a fixed-size container called a context window. Everything you send the model every word, file, or bit of data_and_ all of the model's previous responses fill this container. When the container overflows, the oldest information disappears. The model can no longer see it, even if you can still see it on your screen.

Think of it this way: the context window contains all the information the LLM can consider at once. Any information not in the window, or that doesn't fit in it, doesn't exist to the model.

Time Bias: Why LLMs Live in the Past

The context window is the only way to provide LLMs with recent or specific information. Companies train these models on huge datasets, but collecting and processing this data takes years. Most of the training data is 2-3 years old or even older. A model might say it was trained up to a month or two ago, but recent information is only a tiny part of its overall training data. Most of what it knows is outdated.

For instance, if you ask an LLM about a programming framework released last month, it won't know about it unless you include the documentation in your context window. The model's training data just doesn't have that recent information. This leads to a strong preference for older information, even when newer details might be more important.

It's also important to note that if you ask LLMs to provide information on something they haven't been trained on and have no context for, they will likely produce hallucinations. A hallucination occurs when the LLM generates false or made-up information that can sometimes sound real or nearly true. This happens because you asked it to provide information about something it can't, so it creates something similar based on its training.

Model training is permanent. Context is temporary.

Your AI Friend is a Goldfish: It Has No Memory

Models can't save their context between messages. The system that feeds data to the LLM rebuilds and reprocesses the entire context history every single turn. This process of generating output from input combined with the models training is called inference.

Here's what actually happens: You send "hey there" to ChatGPT. Your buddy ChatGPT replies with a friendly response. When you send your second message, the model doesn't just process that new message it processes your first message, its first response, AND your second message all at once. This context grows with each exchange.

The model treats this entire conversation thread as one giant input until the window reaches its limit and forces older turns to drop. Thats why responses can change tone or forget earlier details as conversations grow longer.

The Context Window Paradox

Context window sizes have grown dramatically. A few years ago, models handled only a few thousand words (8,192 or 16,384 tokens). Today's top models can process 128,000 to 2 million tokens worth of information.

Bigger windows allow more context, but they create new problems. Fill a window with irrelevant information, and you're giving the model junk data that makes accurate responses harder. Processing large contexts also takes more time and costs more money.

This creates a paradox: any information you exclude might be crucial, but including too much information can poison the model's ability to respond accurately.

The Context Poisoning Problem

Most current tools don't handle context well. For coding tasks, many systems add everything that might be relevant into the model's context without careful selection. They might include:

Entire codebases when only a few functions are needed
Outdated documentation along with current specs
Error logs mixed with successful runs
Multiple conflicting examples

This adds more confusion than clarity, making it difficult for the model to find useful information among irrelevant details.

Why This Matters for You

Understanding context windows helps explain common AI frustrations:

Why an AI assistant "forgets" something you mentioned earlier in a long conversation
Why providing too much background information sometimes makes responses worse
Why the same prompt can give different results depending on what else is in the context (since models are probabilistic, meaning they create output based on statistical likelihood, even the exact same context can lead to different results).
Why AI coding tools sometimes suggest outdated approaches despite having access to current documentation

Working Around the Limitations

Engineers use several techniques to manage context effectively, like prioritizing recent information, summarizing older exchanges, and filtering out irrelevant details. But these approaches have their own trade-offs and limitations.

The bottom line: "context is king" for LLMs. Feeding the right amount of the right information in the right order matters more than raw context window size. This makes context management the central engineering challenge for anyone building with LLMs.

Our next post in this series will explore current solutions to these context problems including their strengths, weaknesses, and why even the best approaches today aren't quite "good enough" for complex, long-running tasks.

visit us at knitli.com to learn how we're fixing the context problem, and sign up for our waitlist!

DEV Community