Executive summary
If there is one thing you should take away from this document, it is to "Document what you are doing in a separate file."
If you have time for a second thing, it is "Add context manually."
Understanding Context.
Context window
First, what is a context window? And how does it relate to effectively coding with Cursor?
A context window is all the context an LLM has when answering a prompt.
In Cursor, "context" is the information that is provided to the model. There are two types of context:
Intent context defines what the user wants to get out of the model. For example, a system prompt usually serves as high-level instructions for how the user wants the model to behave. Most of the "prompting" done in Cursor is intent context. "Turn that button from blue to green" is an example of stated intent; it is prescriptive.
State context describes the state of the current world. Providing Cursor with error messages, console logs, images, and chunks of code are examples of context related to state. It is descriptive, not prescriptive.
Providing context to Cursor.
When answering a prompt, Cursor will try to find context by itself, it does this with a proprietary
model that pulls in the parts of your codebase that the model estimates are relevant, such as the current file, semantically-similar patterns in other files, and other information from your session. It also might use "tools" and MCPs like web search to find relevant context.
This proprietary
model is pretty good, but it falls short when dealing with complex problems.
I made an experiment to show this. The experiment is a variation of the "Needle in the haystack" test. In this test I placed a riddle in our code, and asked Cursor to solve it.
Context experiment
The experiment consisted of a riddle that added a .txt
file of 25k characters ~ 6k tokens with random text, and a riddle inside the random text. In this case, the riddle was a Caesar cipher with the key at the end of the file.
- If anyone wants to see the whole experiment, please comment or DM me.
The results of the experiment were:
* Gemini 2.5 pro
- Finds the file correctly ❌
- Gets the answer correct. ❌
* Claude 4.0 sonnet
- Finds the files correctly ✅
- Gets the answer right. ✅
As we see, even with only a 6k context window, the Gemini model can't figure out the answer to the riddle. (6k tokens is about ~850 lines of code, 4 characters =~ 1 token).
- I haven't confirmed, but I am sure that if I made the puzzle more intricate by adding text and more cross-references, the model would break
This is relevant because imagine Cursor looking for how to answer your prompt in the best way possible, it will look online, it will look at all your files, but it won't read them UNLESS you pass the file into context. (You can test this by seeing how many tokens are passed if you use a custom API token in Cursor)
How much context do we have?
In Cursor; Gemini, and Claude have 120k tokens for context. That might seem like a lot, but remember our experiment with 6k context? The models could degrade with way shorter contexts.
Here is the context of a few things for anyone interested:
- Average stack overflow page call ~15k tokens.
- The code files in a mid-sized company are at least a few million characters, which is at least 0.75 Million Tokens
- Average characters per file is about 4,721 characters ~= 1k tokens.
- Large files for an average company are ~= 30k characters ~= 8k tokens.
How Cursor handles a prompt.
This is how the Cursor handles a prompt:
- You write your prompt and add your context. Then press enter.
- Cursor uses it's proprietary model to analyze your prompt and context. You haven't hit the AI model yet (Gemini, Claude, etc...)
- There is no official documentation that says this directly, but there are allusions to it in Cursor official docs, and in Anthropic's system prompt docs. (And the leaked system prompt from Anthropic around May 2025).
- Cursor prompts some tools with a proprietary LLM or algorithm to decide what MCPs to use.
- There is no official documentation that says this directly, but there are allusions to it in Cursor official docs, and in Anthropic's system prompt docs. (And the leaked system prompt from Anthropic around May 2025).
- Cursor takes the context you added, AND it takes what it thinks is the relevant context and appends it to the api.
- Cursor also adds a small system prompt (can't find official docs. But cursor adds about ~1k tokens to all your requests. I assume those are its custom instructions.)
- The API/LLM then processes the prompt, and you receive the answer.
How to improve my prompts? - Keep track of your decisions in a document.
Finally, the good part. Here is what to do if you want to significantly improve the context that you send Cursor:
- Make one or two documents depending on how big is your task, I usually call it "PLAN.md" and/or "TASKS.md"
- Write down the problem, objective, insights and thoughts you have, and a plan on "PLAN.md", you can use AI to help you write the plan. It's usually useful to ask AI to write a plan with "atomic" steps, so it gives you small steps to follow.
- Start working on the problem, every time you make progess (progress might be a new insight, a new piece of code, etc...), write down what you did on "TASKS.md", it's even better if you can reference what you did at "PLAN.md" too. Then commit.
- Keep going. Remember to write new insights, failed attempts, and sources too in your documents. This development loop forces two things.
- You will think more about what you are doing, which is a lot of help as a developer.
- The AI will keep close track of context, and you can easily revert and correct if it goes off track. The AI will also be way better at problem-solving.
Top comments (0)