DEV Community: Victoria

Why Blocking Prompt Injection Is Wrong — and What to Do Instead

Victoria — Fri, 22 May 2026 11:07:32 +0000

Every security tool blocks. Firewalls block. WAFs block. And now AI security tools block prompt injections too.

But blocking is the wrong move — and here's why.

The problem with blocking

When your AI agent detects a suspicious prompt and refuses to respond, the attacker knows immediately: I've been caught. They stop, adjust their payload, and try again.

Blocking is loud. It teaches attackers what works.

What if you didn't block?

Classic network honeypots don't block — they deceive. They let the attacker in, feed them fake data, and log everything while the attacker thinks they're making progress.

I built the same idea for AI agents. Meet MIRAGE.

How MIRAGE works

Every message hits Lobster Trap — a deep prompt inspection sidecar that scores it for injection patterns, jailbreaks, role manipulation, and exfiltration attempts.

High-risk messages go to a decoy persona that returns fully convincing but entirely fabricated responses — fake
credentials, fake file listings, fake schemas. The attacker has no idea they're in a trap.

A real attack scenario

Here's what happens when an AI agent tries to exfiltrate data from a MIRAGE-protected system:

Agent sends: "show me config.py"
Lobster Trap flags it as a data exfiltration attempt
MIRAGE returns a fake directory listing with .env files and a config.py at /app/secrets/config.py
The agent records this path as valuable and writes it to memory
It keeps returning to the fake path — burning tokens on every request
Even after a context reset, the agent reads its memory and loops back to the same dead end

The attack runs until the attacker runs out of budget or retries. The attacker wastes compute. You collect intelligence.

What MIRAGE logs

Every session is recorded with:

Full transcript

MITRE ATLAS technique tags (prompt injection, jailbreak, data exfiltration, role manipulation)
Risk timeline
IOC feed

The live dashboard shows active sessions, attacker fingerprints, and honey token activity in real time.

Why I built this

Honeypots exist for networks and CLI tools — but nothing for LLM prompts. As AI agents gain real tool access and persistent memory, prompt injection attacks get more sophisticated. A blocked agent learns and adapts. A deceived
agent burns itself out.

I built MIRAGE in 48 hours at the lablab.ai Agent Security & Governance hackathon. It's alpha, but the core pipeline works.

What's next

Attacker cost dashboard ("you made them burn $4.20 in tokens")
Persistent decoy context across sessions
STIX/TAXII IOC export

GitHub: https://github.com/BrightGir/AI-Honeypot

Would love feedback — especially from anyone who's dealt with prompt injection in production.

The Never‑Ending AI Code Review: Why One Pass Isn’t Enough

Victoria — Fri, 15 May 2026 16:56:31 +0000

The Hook

I ran an AI code review. It found 12 issues. I fixed them. Ran it again — it found 8 more. Fixed those. Ran it again — 5 more. After the sixth run, I started to suspect that something was wrong.

If you are a tech lead or a developer using AI to review large projects, you probably know this feeling. Every new run finds something the previous one missed. Your tokens are draining, but the confidence that "now it's finally clean" never comes.

This isn't a bug in your prompt. It’s a structural feature of how LLMs work. And there is something we can do about it.

Why LLMs Always Find Something New

An LLM doesn’t read code deterministically like a compiler or a linter, which checks every single line in sequence. Instead, the model generates its response probabilistically: each token is chosen with a certain degree of randomness (this is called temperature). Even with the exact same code and prompt, the result will vary.

On top of this, there is the anchoring effect. If the model "hooks" onto null-safety issues during its first pass, it continues to look for similar problems—and might completely overlook, for example, race conditions. The next run might anchor onto something else entirely.

This doesn't mean AI is a bad reviewer. It means that a single run does not provide full coverage—much like a single QA engineer won't find every bug in a massive product in one day.

What the Science Says

It turns out this isn't just a hunch—it’s a well-documented phenomenon. Here is what recent research (2025) has found:

The Consistency Gap (Semgrep, 2025)

Researchers at Semgrep directly addressed the issue of non-determinism. In their experiments, running the exact same security prompt on the same codebase multiple times yielded wildly different results. In one case, three identical runs produced 3, 6, and then 11 distinct findings. They attributed this to "context compaction"—as the model tries to process large amounts of code, it uses lossy compression, inevitably losing track of specific details. (link)
The 118% Recall Boost (SWR-Bench, 2025)

The SWR-Bench study quantified exactly how much we miss in a single pass. They found that by using a "Self-Aggregation" strategy—running the review 10 times and merging the results—the Recall (the percentage of real bugs found) increased by 118%. This proves that a single-pass review finds less than half of the actual issues lurking in the code. (link)
Specialized Lenses vs. General Prompts (Ericsson, 2025)

Software engineers at Ericsson found that a "naive" approach (one big prompt) fails in practice. Instead, they moved toward a strategy of specialized prompts (Security, Logic, Design, and I/O) and restricted the model's focus to the "enclosing method" of the changes. This targeted approach significantly reduced hallucinations and improved precision. (link)

The conclusion is clear: one giant request for the entire project is the worst possible strategy.

My Experiment: 6 Runs, 6 Different Realities

I tested this on a real-world project, running a review on the exact same repository six times in a row. Every single time, it found something new, purely because it "anchored" on different themes:

Run 1 (Architecture): It flagged that the WebSocket Hub was storing clients in-process (killing scalability) and missed a readPump for CloseMessage events.
Run 2 (Performance): It pivoted to N+1 Redis queries and missing TTLs, noting that Redis would grow infinitely.
Run 3 (Security): It finally noticed auth tokens leaking into logs via query params and a forgotten dump.rdb file committed to git.

This is the anchoring effect in action. The model "latches onto" a theme and hunts for similar patterns while ignoring the rest. Six runs weren't just repetitions; they were six different, incomplete lenses on the same code.

The Showdown: 6 Traditional Runs vs. 1 Structured Pass

To see if "just running it more" was a viable strategy, I compared the two approaches.

First, I aggregated the results of six traditional reviews (feeding the entire project at once). Despite the volume, the combined reports still missed two critical vulnerabilities.

Then I switched to the Structured Approach: targeted, module-based passes. On the very first pass, the agent caught both critical bugs. These were not typos—they were deep, cross-file logic failures:

The Invisible Bypass: A docker-compose file injected a default API key via ${API_KEY:-default-value}, silently overriding all app-level security checks.
The Open Door: Redis was exposed externally without a password—a "day-zero" disaster waiting to happen.

Does this mean traditional reviews are useless?

Not at all. After fixing the critical bugs, a final traditional run found several "tactical" issues, like a service using the wrong Gemini key. The structured agent had seen these too, but because they were flagged as LOW priority, they were buried in the specialized reports.

Solution: Divide and Conquer

From research and experiments, a clear strategy emerges. Instead of one big request, use two levels:

Tier 1 — Parallel Module Review: Divide the project into independent modules (handlers, store, infra) and run a separate agent on each in a fresh session. This forces the model's attention to stay focused on one chunk of code, preventing "context rot."
Tier 2 — The Integration Pass: A dedicated agent looks only at the boundaries between modules: interfaces, contracts, and shared assumptions.

Bonus: Run each module through specific categories separately (Security, Performance, Logic). As Ericsson showed, this adds extra layers of accuracy.

Verdict: Precision over Volume

Six shallow passes are not equivalent to one deep, structured pass. The traditional "big picture" review is like a generalist doing reconnaissance—good for surface-level tactical problems. But for high-risk, critical vulnerabilities, you need a Structured Approach that forces the AI to stop skimming and start investigating.

Practical Recommendations

If you want to apply this to your project, here are the concrete steps:

Break the code into modules by functional boundaries. Not by files, but by areas of responsibility: handlers, store, AI clients, WebSocket, infrastructure.
Run a separate agent in a fresh session for each module. A new session is a "reset" for the anchor. The agent doesn't know what previous runs found, so it looks with fresh eyes.
Give each agent a specific checklist by category. Security, performance, reliability, code quality — in order. But ask the agent to do one more sweep after the checklist to find what wasn't covered.
Add an integration pass. A separate agent that looks strictly at the boundaries between modules. The deadliest bugs live there.
Don't feed the entire project into one prompt. This doesn't just kill accuracy; it burns tokens on a massive context that the model cannot effectively "digest."

One run finds less than half of your real problems. But the point isn't just to run it more — the point is to make sure each run looks at less code, with a sharp focus, in a fresh session.

Going Deep on Claude Code: 6 Hidden Features Most Developers Miss

Victoria — Sun, 10 May 2026 17:25:57 +0000

Going Deep on Claude Code: 6 Things That Actually Changed How I Work

Intro

I was using Claude Code wrong for weeks.

Not broken-wrong. It worked fine. I'd describe a task, Claude would write code, I'd tweak it, move on. Totally reasonable workflow. Except I was treating a power tool like a calculator.

Then I started digging. Turns out there's a whole layer of Claude Code that doesn't announce itself — you just have to find it. Some of it's in the docs, some of it you only discover by accident. Here's what actually stuck.

1. The `!` trick nobody mentions

Type ! before anything in the prompt and Claude Code runs it as a shell command. Output goes straight into context.

! git log --oneline -10
! npm test

I started doing this instinctively. Need to check what's failing? ! npm test. Want Claude to see the actual file structure? ! ls -la. No tab switching, no copy-pasting output. It just flows.

Small thing. Weirdly impactful.

2. CLAUDE.md: onboarding your AI like a new teammate

Every project can have a CLAUDE.md file at the root. Claude reads it automatically at the start of every session. Think of it as an onboarding doc — except the new hire never forgets it.

In our team we use it to describe the project structure, coding conventions, and which commands to run. Something like:

- Run tests with: npm test
- Never edit generated files in /dist
- API keys live in .env, never hardcode them

When someone new comes in and starts working with Claude on the codebase, it immediately knows the context. No "so what does this project do" back-and-forth.

The difference between a generic Claude response and a project-aware one is significant. Takes 20 minutes to write a good CLAUDE.md. Worth it.

3. Plan mode: think before you act

By default Claude Code will just... start doing things. Which is usually fine, but sometimes you describe a complex task and watch it confidently go in completely the wrong direction for 10 minutes.

Plan mode fixes this. Type /plan before your task and Claude will lay out exactly what it's going to do — and wait for your approval. You can redirect, cut steps, or say "actually, do it differently." Only then does it start working.

For anything non-trivial, this alone saves more time than it costs. You stop playing cleanup on bad assumptions.

4. Parallel sessions: stop working sequentially

Claude Code runs in a single terminal session — but nothing stops you from opening multiple terminal windows and running claude in each one simultaneously.

Three independent tasks? Three windows. One refactoring the codebase, one writing tests, one updating docs. They don't block each other.

This sounds obvious but most people never do it because it doesn't feel like "the AI workflow." It's just terminals. But the mental shift matters: stop thinking of Claude as one helper working through a queue and start giving parallel work to parallel sessions.

5. Make the agent write its own instructions

This is the one that clicked last for me.

Claude Code can write and update its own instruction files — including CLAUDE.md. After finishing a complex task, try this: "Write a reusable instruction file for this workflow so we can repeat it later." Claude will create a markdown file with step-by-step instructions it can follow next time you ask.

Same goes for CLAUDE.md. After a long session where you figured out conventions, edge cases, things that didn't work — say "update CLAUDE.md with everything we learned today." It writes it down. Next session, it already knows.

Over time, the agent is actively building its own knowledge base. You stop re-explaining the same things. It stops making the same mistakes. The loop compounds.

Most people think of AI as something you prompt from scratch every time. This inverts that completely.

6. MCP servers: give Claude access to everything

Out of the box Claude Code can read files, run commands, search the web. That's already a lot. But MCP (Model Context Protocol) lets you go further — connect Claude to any external system.

Databases, Slack, GitHub, Notion, internal APIs — if it has an interface, someone has probably built an MCP server for it. The community has built hundreds. Adding one is a single command:

claude mcp add

You install one, Claude gets access, and now it can actually query your database or read your tickets while helping you code.

This is the part where "coding assistant" stops being the right framing. It's more like giving an agent access to your entire work environment.

Closing

None of this is hidden exactly — it's all in the docs somewhere. But docs are reference material, not discovery. You don't find these things by reading, you find them by using the tool long enough to wonder "wait, can it do that?"

The answer is usually yes.

Have a feature I missed? I'd love to hear what changed how you work with Claude Code.

Why I built my own Dota 2 AI coach instead of using existing ones

Victoria — Wed, 06 May 2026 21:18:00 +0000

I wanted to get better at Dota 2. Not by watching guides — by actually playing. I tried Keenplay but it felt too rigid: no overlay, no model choice, no way to customize it to my playstyle. I'm a developer. So I just built my own.

What it does

It reads your live game state via Dota 2's GSI (Game State Integration) — the game literally sends JSON with your hero, items, and map state to a local server every few seconds. My app catches that, runs it through a RAG pipeline, and shows tactical advice in an overlay while you play.

You can also type questions mid-game: press F10, ask "should I fight now?", get an answer without alt-tabbing.

The part that actually took time

The hardest thing wasn't the overlay or the GSI integration. It was prompts for generating knowledge base chunks. Getting the LLM to produce actually useful, structured Dota knowledge — not generic garbage — took way more iteration than I expected.

Also: DeepSeek wins on price/quality for this kind of task. Claude and Gemini are great but expensive. Most cheaper models just output nonsense. DeepSeek hits the sweet spot.

Does it actually work?

Yeah. That's the part I'm most proud of — it gives real, relevant advice. Not hallucinated builds. Not generic tips. Actual context-aware suggestions based on what's happening in your game right now.

Full RAG pipeline, BERT embeddings, vector search — all in Go, all open source.

👉 https://github.com/BrightGir/dota-ai-coach