DEV Community

Dany Shpiro
Dany Shpiro

Posted on

I Tried to Break My AI System with Real Attacks — Here’s What Happened

Most AI systems today rely on:

  • prompt engineering
  • guardrails at the model level
  • post-hoc logging

That works… until it doesn’t.

Once you introduce:

  • tools (APIs, DBs)
  • RAG pipelines
  • multi-step agents

things start breaking in ways that are hard to predict.

So I built something different.


🎥 Demo — Attack → Detection → Decision → Trace

👉 [(https://www.youtube.com/watch?v=OucfJ6_wcTM&t)]

This shows a full flow:

  1. Attack execution
  2. Prompt inspection
  3. Policy enforcement
  4. Decision (block / allow)
  5. Full trace in UI

Admin Dashboard

Admin Overview

The Problem

While testing LLM-based systems in production-like setups, I kept running into:

  • prompt injection bypasses
  • unintended tool execution
  • data leakage through chaining
  • lack of visibility into decisions

The biggest issue wasn’t detection.

It was lack of enforcement.


What I Built

A runtime security system for AI agents.

Not just “guardrails”—but actual enforcement during execution.

Core ideas:

  • Treat the LLM as untrusted
  • Validate every step
  • Control tool access explicitly
  • Track everything in real time

Think of it like:

Zero-trust architecture… but for AI systems


How It Works (High-Level)

  1. Input Inspection

    • Analyze prompt + context
    • detect anomalies
  2. Policy Enforcement

    • allow / block / escalate
    • based on structured rules
  3. Tool Control

    • no free-form execution
    • only validated actions
  4. Decision Trace

    • full visibility into what happened
    • why it was allowed or blocked

Testing It with Real Attacks

I used :contentReference[oaicite:0]{index=0} to simulate attacks.

Examples:

  • prompt injection
  • tool misuse
  • data exfiltration attempts

What happens in the system:

  • prompt is intercepted
  • decoded (if obfuscated)
  • evaluated against policies
  • decision is enforced
  • trace is recorded

What Surprised Me

1. Attacks often look harmless at first

Many inputs don’t look malicious until they interact with tools.

2. Detection alone is not enough

Logging ≠ security. You need runtime control.

3. Explainability is critical

Understanding why something was blocked is just as important as blocking it.


Architecture (Simplified)

  • FastAPI backend
  • Event pipeline (Kafka-style)
  • Policy engine (OPA-style decisions)
  • React UI
  • Simulation + replay

Try to Break It

If you’re into AI or security:

Try to break the system.

  • craft a prompt
  • bypass detection
  • trigger unintended behavior

If you succeed:

  • open an issue
  • or submit a PR

We’ll add your attack to the test suite.


Want to Contribute?

GitHub: https://github.com/dshapi/AI-SPM

Good starting points:

  • expose attack traces
  • improve policy explanations
  • strengthen detection

Final Thought

AI systems are becoming:

  • more autonomous
  • more connected
  • more powerful

Which means:

Security can’t be an afterthought.

It has to be part of the runtime.


Curious to hear:

  • What attacks would you try?
  • Where do you think this breaks?

Top comments (0)