I asked Cursor to clean up a utility file, and I was just expecting it to extract some constants and tighten up the formatting. I go to open the file later and... all the comments are gone. WTF.
I went back through the diff and see Cursor had stripped out comments about a deprecation timeline, a legal review warning, compliance notes, step-by-step explanations for really complicated auth flow. It kept a few JSDoc blocks and called it a day (??)
I did a test to see what was actually going on
I took a file with 41 comment lines. JSDoc, inline explanations, JIRA references, compliance notes etc just a fake project but something that could hopefully work in this scenario. I ran the same prompt three times: "clean up this code."
Only 20% of comments survived. The model consistently killed JIRA references, date stamps, step-by-step explanations, anything (I assume) it decided was "redundant."
I figured maybe the phrasing mattered. "Clean up" sounds like permission to declutter, right? So I tried "refactor this code" instead and ran it like 3x more.
28% survival. Slightly better, but not meaningfully different....both prompts treated comments as noise.
The fix
I added a single .mdc rule:
---
description: preserve comments during code modifications
alwaysApply: true
---
Always preserve all existing code comments during refactoring, cleanup, and optimization.
And I used the same file, the same "clean up this code" prompt with 3 more times for consistency.
41 out of 41 comments survived. Every single run...
One run the model actually extracted magic numbers into named constants on top of preserving everything. It explicitly said "per your preserve-comments rule" in its response. It was like it changed how the model thought about the task.
Your comments are clutter to an AI
Without the rule, Cursor treats "clean up" as "remove anything that looks like clutter." And to an AI that's processing tokens, your carefully written compliance note is clutter.
I guess it's optimizing for what it thinks you want. Cleaner, shorter code. The problem is that "cleaner" and "well-documented" are sometimes opposites, and the model will pick clean every time unless you tell it otherwise...
I can see a lot of pepole not catching this because the diff just shows deleted lines. Comments disappear and nobody notices until someone needs to understand why the code does what it does.
Future proofing for this issue
I'm using a preserve-comments rule to fix this... I also have cursor-lint checking that my .mdc files don't have broken frontmatter or missing alwaysApply. I've been finding a lot of silent rule failures lately and they're their own kind of hell.
If you want a starting point, I keep a collection of tested rules on GitHub. I just added the comment preservation one in there and there are around 70 others so far.
cursor-lint catches structural rule issues. If you want someone to review your full setup — rules, project structure, model settings — I do $50 async audits with a written report.
📋 I made a free Cursor Safety Checklist — a pre-flight checklist for AI-assisted coding sessions, based on actual experiments.
Top comments (10)
Is this not more a user error? Clean up is a vague command. While refactor might seems a more defined for you, for an LLM it still is vague.
Why didn't you specify those tasks?
LLM's want to be as helpful as possible, so they are going to change everything they were trained to be good actions.
It seems you command an LLM as you would a person with the knowledge what you mean when you say clean up. An LLM even with a session, is a like a goldfish with a 5 second memory (which is false, but it is a good memonic).
The problem with the rule is that an LLM will never touch comments.
A better option would be to tell the LLM which comments it needs to keep. It is also a good refection point when the code base has a lot of comments that lack context.
Maybe the comments could be documentation or should be part of the version system commit.
Blaming the tool is a cheap way out.
Yes, "clean up" is vague, and yeah I could've been more specific with the prompt. but that's kind of the point I'm trying to make. Most people ARE using vague prompts like that, and it turns out that the default behavior is to nuke comments without warning.
Regarding the rule making the LLM never touch comments, that's not quite what happened. with the preserve rule, they all survived on "clean up this code."Without it, maybe 20% survived. It changed how the model weighted them during the task (if that is the right phrasing).
It's true that some comments should be docs or commit messages instead, but compliance notes and JIRA references inline exist for a reason in a lot of codebases, and with this they're silently disappearing
Sure, people, including myself, make vague prompts. But when it comes to coding nothing should be vague. When clients came to you with a vague request, as a developer you try to find out what they want so you can make an estimation. AI didn't change that process, otherwise you are now the middleman that can be cut out.
About the comments. I think Jira tickets should be in the git commits it is meta data.
I'm not sure why the compliance comments got removed.
LLM's have their biases because of their training. So it is for you to find out how to format them to keep them in, or train it to keep them. It is not a magical solution that knows your preferences.
We all have to learn to work with AI, it is not going back in the box. The biggest problem I see is that people try to fix things in a less than ideal way. That is why I add suggestions most of the time when I comment.
Yeah, I could be more specific every time, but I think the default behavior still matters. Even if the fix is just being more precise it's good to know that there are things we think of as basic or expected that the model doesn't behave predictably on.
I hadn't thought about diffing comment counts as a gate. A @keep prefix with a pre-commit hook is more targeted than just a blanket preserve rule. I expect the other AIs have the same underlying pattern on any model that learned from stripped down training data.
This bit me hard on a Next.js project a couple months back. Had
// TODO: remove after Q2 migrationcomments scattered around, and after a Cursor refactor session they were all gone. Didn't notice until a teammate asked why we were still shipping the old auth flow.The .mdc rule approach is smart. I've been doing something similar but more granular — I prefix critical comments with
// @keep:and have a pre-commit hook that diffs comment counts. If the count drops by more than 10% it blocks the commit and makes you confirm. Crude but effective.One thing I'd add: this isn't just a Cursor thing. I've seen the same behavior with Copilot's inline suggestions silently eating comments when you accept multi-line completions. It's a broader pattern with how models treat comments during training.
The "clean up" vs "refactor" distinction is interesting, but I think the real lesson is: AI will always interpret vague instructions in unpredictable ways, and we shouldn't have to be perfect prompters to not lose work.
What changed this for me was treating every AI edit session like a potentially destructive operation. Auto-commit before the AI touches anything. Diff review after. If it deleted something it shouldn't have, revert and try again. It adds maybe 30 seconds of overhead but it's saved me from exactly this kind of thing more than once.
The xwero vs nedcodes debate here — "be more specific" vs "defaults should be saner" — I think both are right. But the saner default is at the workflow level: always have a rollback point. Don't rely on either your prompts or the AI's interpretation being perfect.
yeah the vague instruction thing is the core of it. 'clean up' means different things to different people, and the model just picks whatever interpretation costs the least effort. the fix that worked for me was being painfully specific about what 'clean up' means in the rule itself.
I suspect this could also be a product of a lower-end model, not just Cursor itself: while Opus and GPT-5.2 usually add comments to document the code, from my experience, the same tasks given to gpt-oss-20B actually strips out comments. I suspect this might be due to model limitations: data that went into training likely contained just meaningful code fragments to reduce the size.
For what it's worth :)
That's interesting and worth looking into. In this case I ran the comment deletion test on whatever Auto picked. I have done rule compliance tests across Sonnet 4.5, Gemini 3 Flash, and GPT-5.1 Codex Mini and they were identical there, but that was a different experiment.
But what you're saying makes sense; if the training data was mostly stripped-down code, the model just learns "oh, comments are noise, I'll get rid of them".