DEV Community

CharmPic
CharmPic

Posted on

[For Beginners] A Guide to Tokens and Context in LLMs (like ChatGPT)

This is a guide for anyone wondering, "What are tokens and context?"

1) What are Tokens?
A token is the basic unit for measuring the size of messages you exchange with an LLM like ChatGPT.
To put it simply, sending 100 characters might be around 100 tokens, and 1000 characters might be 1000 tokens.
It's actually more complex than that (for example, English often uses fewer tokens per character, while languages like Japanese can use more), but for now, it's okay to think of it as being roughly proportional to the number of characters you send.
Tokens are counted not just for what you send (the prompt) but also for what you receive (the response), so you need to consider both.

This is important because API usage fees are usually calculated based on the total number of tokens.

2) What is Context?
It's the LLM's memory space! (Also known as the "Context Window").
The bigger the context window, the more the LLM can "remember" from the conversation.
Conversely, if it's small, it forgets things quickly. It typically discards the oldest information first when it runs out of space.
However, bigger isn't always better. An overly large context window can cause the LLM to take longer to process your request. Balance is key.

3) How an LLM Interaction Actually Works

Turn 1
You: "Hello"
ChatGPT: "Hello"

Turn 2
You: "How are you?"
ChatGPT: "I'm doing great!"

Turn 3
You: "That's good to hear."
ChatGPT: "It really is!"

Here's the secret: for the second turn, the entire history of the first and second turns is sent back to the model.

Turn 1
You: "Hello"
ChatGPT: "Hello"

Turn 2 You: "How are you?"

↑ All of this is sent to the model for the second turn.

And for the third turn, the complete history of turns 1 and 2 is sent along with your new message.

Turn 1
You: "Hello"
ChatGPT: "Hello"

Turn 2
You: "How are you?"
ChatGPT: "I'm doing great!"

Turn 3 You: "That's good to hear."

This is why, as a conversation gets longer, the number of tokens rapidly increases, and processing can become slower and more expensive.
It's something to be especially mindful of when using the API.

What about the ChatGPT Web Interface?
We don't actually know the exact mechanics of the web version. With the API, the process is clear, but the detailed inner workings of the web interfaces for ChatGPT, Gemini, etc., are kept secret.
However, it's safe to assume that if you keep talking in a single chat window, the amount of data the LLM has to process increases, which can slow things down.
Because of this, it's a good practice to start a new chat after your conversation gets long.

Pro-Tip: If you want to continue a long conversation in a new chat, you can ask the LLM:
"Please summarize our conversation so far. I want to continue it in a new chat."
It will give you a summary of your discussion. Just paste that summary into a new chat window, and you're good to go. It might not be a perfect transfer of context, but it's a very useful trick!

Top comments (0)