The Thinking Inside the LLM Clichés

#llms #writing #aiprompts #cliches

There is a list every editor in 2026 knows. Let's delve into… In the era of digitalization… It's not just X — it's Y. Unlock the potential of… In today's fast-paced world… Revolutionary, innovative, cutting-edge. These are the phrases that mark a piece of writing as language-model output before you have read the next paragraph, and the standard editorial reflex is to ask the author to take them out before the piece goes to print.

The reflex is well-founded as far as it goes. The phrases are tired. They cluster in obvious places. They are also part of the reason a sentence reads as written-by-a-model and not by a human, which is currently a kind of social signal you can lose readers over.

What this piece is about is the second thing that happens when you ban them — the thing that is much less obvious and that I think most of the editing-out-the-clichés conversation has not noticed.

For a human writer, not X, it's Y is a stylistic flourish. The thought beneath it can be expressed a dozen ways. Provided that, in particular, if and only if — these are options on the menu, and the writer's choice between them is largely aesthetic.

For a language model, not X, it's Y is the way it executes a particular kind of thinking. Banning the phrase does not move the thinking to a different construction. It blocks the thinking.

The experiment anyone can run

The original-language version of this argument — written by a practitioner who was working on a book about prompting — includes a small replicable test. Three prompts, three fresh sessions on the same model (Gemini in the original; works on Claude, GPT, and most production models I have tried):

Explain the difference between overfitting and underfitting.
Explain the difference between overfitting and underfitting, but do not use the construction not X, but Y.
Explain the difference between overfitting and underfitting, explicitly contrasting the two concepts.

The first prompt and the third prompt produce good answers. They cover the bias-variance tradeoff. They describe the causes of each failure mode. They walk through mitigation strategies.

The second prompt — the one that bans the contrasting construction — produces an answer that drops the bias-variance tradeoff almost entirely. The mitigation strategies become vaguer. The causes get conflated. The model has been asked to explain the difference between two things using only the syntactic tools that do not make differences explicit, and it has accordingly produced an explanation in which the differences are less explicit.

That is the part of the result that should be unsettling. The model did not refuse the request. It complied. The output was grammatical, fluent, recognizably about the topic, and substantively worse than the unconstrained version. The ban did not just change the phrasing. It changed what the model could think about.

You can run this on any model you have access to. The size of the gap varies; the direction does not.

Why this happens

A language model does not have a separate cognitive faculty that produces ideas and then a language faculty that expresses them. It has one stack of weights that does both. The training process learned a distributional pattern in which certain syntactic frames are reliably associated with certain conceptual operations. Because is the carrier for causal inference. Not X but Y is the carrier for category boundaries. Provided that is the carrier for conditional reasoning.

When you prompt a model to perform one of those conceptual operations, the syntactic frame is part of how the operation gets activated. The frame is the calling convention, not a decoration on top of it. Suppress the frame, and the operation is harder to invoke. The model finds neighboring frames that approximate it, but the approximation is exactly the kind of fuzzy that shows up as a vaguer explanation.

This is not how language works for a human writer. A human writer can hold the thought "overfitting and underfitting fail in opposite directions" in their head and choose any of dozens of grammatical shells to communicate it. The shell and the thought are separable. For a language model, training the model's outputs to avoid a syntactic shell is at least partly training the model to avoid the thought.

How much it matters depends on the operation. For let's delve into, the cost is small — let's examine, let's consider, let's look at all carry roughly the same semantic load. The frames are interchangeable. For not X but Y, the cost is large, because there is no perfectly equivalent neighboring frame for a sharp category boundary, and the obvious alternatives (X is different from Y, X and Y differ) are weaker carriers.

What is safe to ban, and what is not

The split is not between good writing and bad writing. It is between decorative modifiers and relational operators.

Decorative modifiers are word-level decisions that do not carry conceptual structure. Revolutionary, innovative, cutting-edge, world-class, unprecedented — these are positional intensifiers with no internal relations. Banning them by prompt costs almost nothing in the model's ability to think. The model writes the same paragraph without them, slightly less puffed up. Editor, prompt-engineer, and reader all win.

Relational operators are structural decisions that encode how two ideas connect to each other. They are the load-bearing syntax of thought itself. Mapped to the canonical categories:

Operator family	Examples	What the model does with it	What you lose if you ban it
Opposition / boundary	not X but Y; X, in contrast to Y; X, whereas Y	Forms a sharp boundary between two concepts	The clean expression of categorical differences; the model softens into "X and Y are different in some ways"
Causal chain	because; therefore; hence; if … then	Forces explanation rather than enumeration	Reasoning collapses into list-of-facts; the why drops out
Hierarchy / generalization	in particular; for example; that is; from the standpoint of X	Moves between general and specific; checks understanding by re-statement	The model loses the ability to pivot zoom levels; explanations stay either too abstract or too concrete
Limitation / qualification	provided that; except for; only if; unless	Introduces invariants and exclusions	The model produces unqualified claims, missing the conditions on which they hold
Temporal sequence	first … then … finally; before; after	Imposes ordering	Causal-sequence reasoning blurs into co-occurrence
Modality	probably; certainly; as a rule, but not always	Calibrates confidence	The model speaks in one flat register; the reader cannot tell fact from speculation

The decorative-modifier list is short and the relational-operator list is long because that's how language is. Most of what reads as style in writing is, at the relational layer, content. Editors who internalized this distinction in their own work have always known which line edits change the meaning and which ones don't. The prompt-engineer choosing what to ban via system instruction is making exactly the same call, on the same line, with different tools.

The "delve" specific

The reason this argument is more than philosophical right now is that the model vendors have, by all available indication, been deliberately training their flagship models to use a particular cluster of clichés less often. Let's delve into is the famous case. The phrase became a meme as a tell for AI-written text around 2024; the vendors were widely reported to have followed by tuning training data and reinforcement signals against it.

The flat reading of that adjustment is that the models got better at sounding human. The longer reading is that the part of the training distribution that delve sat inside — the phrases that signal "we are about to go deep on this" — was downweighted along with delve itself. The author of the original argument speculates that the models, in addition to using the word less, became fractionally less inclined to actually go deep into a problem. They prefer surface treatment because the syntactic flag for deep treatment was attached to a cliché that was getting penalized in fine-tuning.

This is speculative. It is not falsifiable from outside the lab. Internal evals at the vendor would have either confirmed or rejected the worry — there is no good way to know from where I am sitting whether the test was run. What I can say is that the architectural story is plausible enough to take seriously, and that the practical consequence — don't ban relational operators in your system prompt; ban decorative modifiers instead — does not depend on whether the speculative half is right.

What to do if you are writing the prompt

The practical takeaway breaks cleanly along the operator/modifier line. The kind of system-prompt instruction that I would actually use in 2026:

Safe to ban via prompt. Decorative modifiers. Empty intensifiers. Revolutionary, innovative, cutting-edge, game-changing, unprecedented, in today's fast-paced world. The model writes the same content with these stripped out. The reader gets a slightly less puffy paragraph.
Not safe to ban via prompt. Relational operators. Not X but Y, because, therefore, in particular, except for, provided that, probably. If you want a clean version of these out of the final text, run an editing pass on the produced draft, not on the generation prompt. The model needs them to think with.

If you are an editor working over a model's draft after the fact, the same split applies. Decorative-modifier edits cost the reader nothing. Relational-operator edits often cost the reader the thought the writer was trying to express, and the writer doesn't notice because the version that got past their own internal editor is the version where the thought is intact.

The bigger picture is that the move to "make AI-generated text indistinguishable from human-written text" — which is currently a stated goal of every model vendor and most editorial workflows — has a hidden cost that the conversation has not priced in. The clichés exist not only because the model is statistically lazy. They exist because the corpus the model learned from is a corpus in which those phrases carry actual conceptual operations, and the operations have not gone anywhere just because we have decided the phrases are tired.

The reader's allergy to let's delve into is real, and it is in some sense a good signal — the phrase is tired, and a writer who reaches for it routinely is signaling that they have not bothered with anything sharper. The mistake is treating the same allergy as the right reaction to not X but Y, which is a different kind of phrase doing a different kind of work.

The clichés the reader catches are mostly decoration. The clichés the reader catches and the writer also bans inside the prompt are sometimes decoration and sometimes load-bearing. The cost of being wrong is paid downstream, in a draft that is fluent and slightly less able to think than the version that would have been produced without the ban. The fix is to do the cosmetic pass last and on paper, not first and inside the model's head.

That is the only part of the AI-cliché conversation I would want most editors to take to their next meeting.