DEV Community

Cover image for Why the f*** does AI always use em dashes — the involuntary AI watermark
Valentin Monteiro
Valentin Monteiro

Posted on • Edited on

Why the f*** does AI always use em dashes — the involuntary AI watermark

You want to spot AI-generated text without even reading it? Count the long dashes.

The em dash (—). This little punctuation mark was living its best life in English typography for centuries. Then AI decided to make it its favorite toy. Every sentence. Every response. No exceptions.

Open ChatGPT. Ask it anything. An email, a LinkedIn post, a cover letter, a pasta recipe. You'll get hit with em dashes every other line. It's systematic, it's mechanical, and it's annoying as hell.

And here's what pisses me off: nobody actually writes like that. Have you ever seen a coworker drop an em dash in a Slack message? No. Because normal people use commas, periods, parentheses. Normal stuff.

So why do AIs do this?

Because they've been force-fed millions of "well-written" texts. Blog posts, Medium articles, technical documentation. And in that kind of content, the em dash is the universal sign of "I know how to write." The model got the message loud and clear: long dash = quality. So it throws them around like an intern at a content agency who just read their first copywriting guide.

On top of that, the em dash is the Swiss Army knife of punctuation. It replaces commas, colons, parentheses. When the model hesitates between three punctuation marks, the em dash works everywhere. It's the default choice. The path of least statistical resistance.

And it gets worse

During training, humans rate the AI's responses. The ones that "sound good" get rewarded. Em dashes sound good. So they get rewarded. So they come back. Again and again and again. The model reinforces its own tic. It's a typographical vicious cycle.

The result: the em dash has become an involuntary watermark. A "generated by AI" stamp that the model slaps on its own face. You don't even need an AI detector. Just count the long dashes and you've got your answer.

Just the tip of the iceberg

And that's not all. The "it's worth noting", the "let's dive in", the bullet points in every single response, the "moreover" at the start of every paragraph... AI has a writing style. You can recognize it from a mile away. And if your content sounds exactly the same, you've got a real credibility problem.

What's the AI writing tic that annoys you the most?

Top comments (15)

Collapse
 
annavi11arrea1 profile image
Anna Villarreal • Edited

Today I learned they are called 'em' dashes — short for emphasis im guessing.

A little dash of AI

A little 'seasoning'

Thanks for that. 😂

Collapse
 
annavi11arrea1 profile image
Anna Villarreal

Also, noticing that like every 5th post title has one of these things — sus! Haha.

Collapse
 
valentin_monteiro profile image
Valentin Monteiro

Yeah that's too much 🤣 I mean starting now when I'll see this, it will look like the author didn't do more than ask chatGPT so I'll wont use my time for someone who doesn't care. Using AI to do copywriting is ok but at some point...

Collapse
 
xwero profile image
david duymelinck • Edited

I have seen humaniser skills and one of the rules is don't overuse em dash. So people are just not using the skill.

The worst AI writing tic is that is too often non-assertive. There is a line where being assertive is good.
Like people waiting endlessly to merge into traffic. Especially at a crossroad that can result into road rage.

Collapse
 
valentin_monteiro profile image
Valentin Monteiro

Yes indeed ! It’s another point but you spoil my content for tomorrow 😂 Anyway, the em dashes is my first clue to see if the content written worth my time or not

Collapse
 
xwero profile image
david duymelinck • Edited

I understand your reluctance. On the other hand someone that doesn't has English as their native language or as a lesser known language AI corrections can be a tool to make them more confident to publish their content.

At the moment I'm not that bothered anymore about the em dashes. I do think it is a cool English sentence construct like the Oxford comma. Of course the over use by AI made it ridiculous.
Also all the LLM's have the same overuse, I wonder who was the person that decided it should be a thing to bring back. Who is the em dash Satoshi Nakamoto?

Collapse
 
ben profile image
Ben Halpern

For what it's worth I love emdashes and have used them for years 😭

Collapse
 
valentin_monteiro profile image
Valentin Monteiro

Really, never saw one of them before 2023 ahah, there's a lot more than em dashes that betray AI copywritting, But if your prime minister two em dashes per sentence, that mean that he just used chatGPT 🤣

Collapse
 
harsh2644 profile image
Harsh

This is the kind of meta-observation I live for. 😂

Now I'm going through my old posts like a detective looking for em dash crimes. And yep they're everywhere in the AI-assisted sections.

I've noticed Claude loves delve and tapestry too. It's wild how these tiny patterns become watermarks.

Also love that you called out the irony your title has an em dash and I'm guessing that was 100% intentional 😄

Great read, Valentin. Your "Day I Let AI Write the Code" post actually inspired my recent article. So thanks for that! 🙌

Collapse
 
valentin_monteiro profile image
Valentin Monteiro

Thanks Harsh! Yes — the — title — em — dash — was — 100% — intentional 😄
“Delve” and “tapestry” are definitely on the list. I might do a follow-up ranking the worst offenders. The em dash is just the most visible one because it’s a punctuation mark, not a word. You can’t hide it.
Glad the previous post sparked something for you, that’s the best. Looking forward 🔥

Collapse
 
valentin_monteiro profile image
Valentin Monteiro

The first "—" is a joke in case you didn't get it 🤣

Collapse
 
splinterhq profile image
Tim Post • Edited

My name is Tim Post. I'm the former Director Of Community Strategy for Stack Overflow. I'm here to confess that I might just be the cause of this, and I'm deliberately waiting until after April 1 to do it.

If you use Stack Exchange's Data Explorer and look for posts by "user50049" (that's me after having my accounts denormalized after resigning) and look back at my posts you'll see three things:

  1. I posted a crap ton on MSO, MSE and every child meta site we had. Thousands of posts. Thousands.
  2. I used to have a REALLY bad comma-splice problem in writing. It's tied to my dyslexia. I overcame it using em-dashes initially - and, well, you can see the result.
  3. It's not uniquely my pattern, my team did some stochastic parroting of the style, but when looking for blame, well ...

Because my posts tended to be HEAVILY (and usually disproportionately) up-voted, AI parroted my em-dash problem in early conversational corpus and it never quite ironed out post-RLHF. Conversational training was heavily based on Stack Exchange's meta sites because of the voting system. We didn't just teach LLMs how to write code, we taught them how to write, too. Uh oh, could that be another annoying inflection they got from me? That would be hilarious.

At least I didn't give them a comma splice problem.

This isn't a joke. You can go to their data explorer and verify this right now, if you want. I don't have the time to make a pretty and "dashing" chart of my em-dashes, someone else can, but I'm 89.7% sure I'm at least over 51% (scientifically arbitrary numbers guaranteed to be numerical) responsible for those f***ING em-dashes and I am soooooooooo sorry about that.

It haunts me too, if that makes anyone feel better 😀

Thinking about it, there are probably more "peeves" in modern day LLMs that might have just been meta memes at the time. It's worth a cup of coffee while you poke at it.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.