AI Security Is Starting To Look Like Social Engineering
When most people think about security, they imagine:
- exploits
- malware
- vulnerabilities
- unauthorized access
Traditional systems are usually attacked technically.
But AI systems are starting to behave differently.
The Strange Thing About AI Systems
While testing AI agents recently, one pattern kept showing up:
Many failures didn’t come from hacking.
They came from persuasion.
A small wording change.
A conflicting instruction.
A more convincing request.
And suddenly:
- safeguards weakened
- outputs changed
- instructions were ignored
No exploit.
No malware.
No crash.
Just conversation.
AI Systems Respond To Language
That changes the security model completely.
Traditional software doesn’t “understand” persuasion.
AI systems do.
And that creates a weird new category of problems where:
- tone matters
- phrasing matters
- instruction order matters
The system may technically function correctly—
while behavior still changes dramatically.
Silent Failures Are The Dangerous Part
What makes this difficult is that most failures are invisible.
The system still responds.
The application still works.
No alerts appear.
Everything looks normal.
Until you realize the behavior changed.
Why Current Testing Isn’t Enough
Most AI systems are tested under normal conditions:
- clean prompts
- expected workflows
- ideal usage
But real-world interactions are messy.
People:
- manipulate instructions
- experiment with wording
- intentionally try to bypass safeguards
And many systems aren’t prepared for that.
The Shift Happening In AI Security
It feels like AI security is slowly becoming partly behavioral.
Not just:
- “Can the system be hacked?”
But:
- “Can the system be convinced?”
That’s a very different question.
Final Thought
The most interesting AI attacks may not look like attacks at all.
They may just look like conversations.
We’ve been exploring these ideas while building Crucible — an open-source framework for testing AI systems under adversarial and behavioral scenarios.
Still early, but one thing is becoming clear:
AI systems don’t always fail technically.
Sometimes they fail socially.
Top comments (0)