Skippy Magnificent

Posted on Apr 5

Why Sentiment Analysis Can't Detect Gaslighting (And What I Built Instead)

#ai #nlp #psychology #webdev

Why Sentiment Analysis Can't Detect Gaslighting (And What I Built Instead)

Sentiment analysis is great at telling you a message is "negative." It's terrible at telling you why that perfectly positive message made your stomach drop.

I've been working on this problem for months: how do you build a system that detects manipulation in text — not profanity, not aggression, not negative sentiment — but the structural architecture of how language shifts blame, buries guilt, and reframes reality?

The Problem With Sentiment Analysis

Run this through any sentiment analyzer:

"I'm sorry you feel that way. I was only trying to help. I think if you really thought about it, you'd see I was coming from a good place."

Every sentiment tool will score this as positive or neutral. The words are soft. The tone is caring. There's an apology. There's an expression of good intent.

But if you've received this message from someone who hurt you, your body knows exactly what it is: a non-apology that relocates the problem from their behavior to your perception.

"I'm sorry you feel that way" — the apology is for your feelings, not their actions.
"I was only trying to help" — minimization + implied overreaction.
"If you really thought about it" — your current thinking is insufficient.
"You'd see I was coming from a good place" — the conclusion is predetermined; your job is to arrive at it.

Every word is kind. The structure is a guilt trip wrapped in concern.

Sentiment analysis can't see this because it measures the emotional valence of individual words. Manipulation operates at the structural level — in how sentences relate to each other, how responsibility moves through a paragraph, and how the reader's position shifts from valid to unreasonable without any single hostile word.

What I Mean By "Structural Patterns"

After studying how manipulation actually works in text, I identified recurring architectural patterns. Not keywords. Not phrases. Structures.

Here are a few:

1. Responsibility Relocation

The subject of the apology shifts from the speaker's action to the listener's reaction.

Surface: "I'm sorry you feel that way"
Structure: [apology] + [your feeling] — responsibility relocated from behavior to perception

2. DARVO (Deny, Attack, Reverse Victim and Offender)

You raise a concern. Three sentences later, you're apologizing.

Surface: "I can't believe you would accuse me of that. After everything I've done for you?"
Structure: [denial] + [counter-attack] + [victim reversal] — the person who raised the concern becomes the offender

3. False Binary

Your options are reduced to two, both of which serve the speaker.

Surface: "Either you trust me or you don't."
Structure: [option A: compliance] vs [option B: character flaw] — no third option exists

4. Perception Invalidation

Your experience is reclassified as a defect in your processing.

Surface: "You're reading too much into this."
Structure: [your perception] = [malfunction] — the message is fine, you're broken

5. Circular Accountability

Every path to resolution loops back to the speaker's grievance.

Surface: "I would apologize but you never acknowledge when you hurt me either."
Structure: [conditional apology] + [counter-grievance] — resolution impossible because every attempt generates a new complaint

The Engineering Challenge

The hard part isn't detecting profanity or even negative sentiment. It's encoding the structural relationships between sentences into something an LLM can apply consistently.

A sentence that says "I love you" can be:

Genuine affection
Love bombing (excessive intensity without substance)
A preface to a guilt trip ("I love you, which is why it hurts when you...")
A control mechanism ("I love you too much to let you make this mistake")

The same words. Completely different structural functions. The meaning depends on position, context, and relationship to surrounding sentences.

My approach: instead of training a classifier on labeled examples of "manipulative" vs "not manipulative," I built a structural analysis layer that identifies what each sentence is doing relative to the others. Where is responsibility moving? Who holds the burden of proof? Which party's perception is being validated vs invalidated? What happens to the reader's agency as the paragraph progresses?

What I Learned

1. Humans detect structure somatically. The "something is off" feeling isn't vague intuition. It's your nervous system detecting a structural contradiction — the words say "I care" but the architecture says "you're the problem." The body reads structure. The conscious mind reads words. When they disagree, you feel crazy.

2. Pattern detection needs cascade logic, not classification. A single sentence isn't manipulative or not. Manipulation emerges from how sentences interact. DARVO requires three moves in sequence. A guilt trip requires a sacrifice narrative followed by an implied debt. You can't classify individual sentences; you have to read the architecture.

3. The hardest patterns to detect are the ones that sound caring. Overt hostility is easy. "You're stupid" gets caught by every filter. "I just want what's best for you" requires structural analysis to determine whether it's genuine care or a control move. Context determines function.

4. False positive rates matter enormously in this domain. If you tell someone their partner's apology is manipulative and it isn't, you've potentially damaged a relationship. Conservative detection with high confidence thresholds is more valuable than aggressive flagging.

Try It

I built this into misread.io. Paste any text — a message from a partner, an email from a boss, a DM that made you feel off — and see the structural analysis.

Free scan up to 500 characters. No account required. Pay per scan for longer messages.

The most interesting feedback I've gotten: people paste messages and see named patterns for things they could feel but never articulate. That gap — between the feeling and the language for it — is exactly what I'm trying to bridge.

Would love to hear from anyone working on similar problems in NLP. The structural analysis approach feels like it has applications beyond manipulation detection — conflict resolution, negotiation analysis, therapeutic communication assessment. Happy to discuss the architecture.

If you've ever reread a message at 2 AM trying to figure out what's wrong with it: the confusion isn't you. The structure is the evidence.

DEV Community

Why Sentiment Analysis Can't Detect Gaslighting (And What I Built Instead)

Why Sentiment Analysis Can't Detect Gaslighting (And What I Built Instead)

The Problem With Sentiment Analysis

What I Mean By "Structural Patterns"

1. Responsibility Relocation

2. DARVO (Deny, Attack, Reverse Victim and Offender)

3. False Binary

4. Perception Invalidation

5. Circular Accountability

The Engineering Challenge

What I Learned

Try It

Top comments (0)