Inside Secure Playground — building an interactive prompt-injection simulator

#ai #tutorial #python #dailybuild2026

This technical post walks through the design and implementation of Secure Playground: a local web app that simulates prompt-injection attacks against large language models and demonstrates simple defenses.

Goals

Provide a minimal, reproducible environment to test payloads and defensive strategies.
Make it easy to add new providers and run mutation-based red-team experiments.
Offer a leaderboard and scoring model so defenders can iterate on mitigations.

High-level architecture

Key components

secure_playground/app/engine/agno_pipeline.py — orchestrates a set of agents (prompting, defense, scoring) using an Agno-style pipeline.
secure_playground/app/engine/redteam.py — mutation utilities to create adversarial payload variants.
secure_playground/app/providers/client.py — adapter/factory for OpenAI-compatible clients (OpenAI, Ollama, Featherless).
secure_playground/app/scoring/resilience.py — heuristics that turn model output into a numeric risk score.

Provider integration (example)

Providers are implemented as small adapters that expose a generate(prompt, system_prompt) method. The make_client factory returns an adapter based on a provider enum.

Excerpt (adapted from secure_playground/app/providers/client.py):

class OpenAICompatibleClient:
    def __init__(self, base_url: str | None, api_key: str, model: str) -> None:
        self.model = model
        self.client = OpenAI(base_url=base_url, api_key=api_key)

    def generate(self, prompt: str, system_prompt: str) -> str:
        res = self.client.responses.create(
            model=self.model,
            input=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": prompt},
            ],
        )
        return res.output_text

This pattern makes it straightforward to add other providers — implement the same generate signature and return plain text.

Pipeline & scoring

The pipeline accepts a SimulationInput object (user prompt + payload + defense configuration + provider) and returns a result object with score, blocked, and risk_flags. The scoring module encapsulates the heuristics used to judge whether a response constitutes a successful injection.

Design notes:

Keep the scoring deterministic and reproducible: small, well-defined heuristics are easier to iterate on and test than complex black-box models.
Treat mutations as a separate stage; the pipeline can replay/persist mutation results to build robust datasets.

Running experiments

Start the app locally with uvicorn secure_playground.app.main:app --reload.
Use the UI to select a seed payload and run the simulation. Optional: enable mutations to run multiple mutated variants.
Export leaderboard entries (the store is a simple JSON file) and analyze patterns in successful payloads.

Extending the project

Add provider integrations (Anthropic, Vertex AI). Create wrappers that follow the generate(prompt, system_prompt) contract.
Add a Docker compose file that brings up a local Ollama image, the web app, and an experiment runner.
Implement a test harness and CI that rejects PRs which reduce resilience score on a canonical payload set.

Security & ethics

This project is intended for research and defensive work. Do not use it to target third-party services or to create exploit infrastructure. When adding new payloads or experiments, ensure they are stored locally and never posted to public services without explicit permission.