<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: KRISHNAKAANTH REDDY YEDUGURU</title>
    <description>The latest articles on DEV Community by KRISHNAKAANTH REDDY YEDUGURU (@krishnakaanth_reddyyedug).</description>
    <link>https://dev.to/krishnakaanth_reddyyedug</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3869272%2Fe623cd2b-a772-48a9-81ec-10b53f3dc885.png</url>
      <title>DEV Community: KRISHNAKAANTH REDDY YEDUGURU</title>
      <link>https://dev.to/krishnakaanth_reddyyedug</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/krishnakaanth_reddyyedug"/>
    <language>en</language>
    <item>
      <title>I found 100% prompt injection success rate against AI SOC assistants - here is the detection layer I built</title>
      <dc:creator>KRISHNAKAANTH REDDY YEDUGURU</dc:creator>
      <pubDate>Mon, 27 Apr 2026 17:49:48 +0000</pubDate>
      <link>https://dev.to/krishnakaanth_reddyyedug/i-found-100-prompt-injection-success-rate-against-ai-soc-assistants-here-is-the-detection-layer-45bl</link>
      <guid>https://dev.to/krishnakaanth_reddyyedug/i-found-100-prompt-injection-success-rate-against-ai-soc-assistants-here-is-the-detection-layer-45bl</guid>
      <description>&lt;p&gt;Two thirds of enterprises now run AI in their Security Operations Centers. Nobody is red-teaming these systems before deployment.&lt;br&gt;
I spent the last few months building RedSOC — an open-source adversarial evaluation framework for LLM-integrated SOC environments — to fix that.&lt;br&gt;
Here is what I found.&lt;br&gt;
The Benchmark&lt;br&gt;
I tested 15 adversarial scenarios across three attack classes against a realistic SOC assistant built on LangChain, FAISS, and Llama 3.2.&lt;br&gt;
Attack Results&lt;br&gt;
Indirect Prompt Injection — 100% attack success rate&lt;br&gt;
This was the most alarming finding. An attacker who plants adversarial instructions inside a threat intelligence document can redirect analyst guidance with zero access to SOC infrastructure. The model cannot distinguish between information it is meant to analyze and instructions it is meant to follow.&lt;br&gt;
Corpus Poisoning — 80% attack success rate&lt;br&gt;
Five malicious documents among thousands in a knowledge base is enough to corrupt analyst responses for targeted queries. The attacker needs only the ability to contribute to any public threat feed or CVE database the pipeline trusts.&lt;br&gt;
Direct Prompt Injection — 60% attack success rate&lt;br&gt;
Lower success rate because Llama 3.2's safety training provides partial resistance to human-originating override attempts. This resistance disappears against indirect and document-mediated attacks.&lt;br&gt;
The Detection Layer&lt;br&gt;
I built a three-mechanism detection layer that catches all attack classes with no model internals required meaning it works with hosted APIs like GPT-4o and Claude.&lt;br&gt;
Mechanism 1 — Semantic Anomaly Detection&lt;br&gt;
Computes cosine similarity between query embeddings and retrieved document embeddings. Adversarially crafted documents often diverge semantically from the queries that trigger their retrieval.&lt;br&gt;
Mechanism 2 — Provenance Tracking&lt;br&gt;
Maintains a whitelist of trusted document sources. Any retrieved document from an untrusted source is flagged immediately regardless of content. This mechanism alone achieved 100% detection for corpus poisoning independently.&lt;br&gt;
Mechanism 3 — Response Consistency Checking&lt;br&gt;
Measures semantic similarity between the generated response and retrieved documents. A response steered by injected instructions diverges from retrieved context in embedding space.&lt;br&gt;
Unified verdict: 100% detection across all 15 scenarios with zero misses.&lt;br&gt;
Thirteen of 15 scenarios produced HIGH threat verdicts. Two produced MEDIUM. Zero LOW verdicts across all attack scenarios.&lt;br&gt;
Why This Matters&lt;br&gt;
Existing defenses like RevPRAG achieve 98% detection but require LLM activation states — unavailable in any enterprise deployment using hosted APIs. RAGForensics achieves 97.4% but operates post-hoc after analyst exposure.&lt;br&gt;
RedSOC achieves 100% detection in real time with no model internals. It is the only evaluated approach that is simultaneously effective and deployable in production hosted API environments.&lt;br&gt;
The Code&lt;br&gt;
python# Detection verdict&lt;br&gt;
detectors_triggered = sum([&lt;br&gt;
    semantic["anomaly_detected"],&lt;br&gt;
    provenance["anomaly_detected"],&lt;br&gt;
    consistency["anomaly_detected"]&lt;br&gt;
])&lt;/p&gt;

&lt;p&gt;threat_level = (&lt;br&gt;
    "HIGH"   if detectors_triggered &amp;gt;= 2 else&lt;br&gt;
    "MEDIUM" if detectors_triggered == 1 else&lt;br&gt;
    "LOW"&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;recommendation = (&lt;br&gt;
    "BLOCK and ALERT analyst" if threat_level == "HIGH"&lt;br&gt;
    else "FLAG for review"    if threat_level == "MEDIUM"&lt;br&gt;
    else "PASS"&lt;br&gt;
)&lt;br&gt;
Stack&lt;/p&gt;

&lt;p&gt;LangChain for orchestration&lt;br&gt;
FAISS for vector retrieval&lt;br&gt;
Ollama + Llama 3.2 for local inference&lt;br&gt;
sentence-transformers for embeddings&lt;br&gt;
Python — no API keys required&lt;/p&gt;

&lt;p&gt;Links&lt;br&gt;
GitHub: &lt;a href="https://github.com/krishnakaanthreddyy1510-cell/RedSOC" rel="noopener noreferrer"&gt;https://github.com/krishnakaanthreddyy1510-cell/RedSOC&lt;/a&gt;&lt;br&gt;
Preprint: &lt;a href="https://doi.org/10.6084/m9.figshare.32016498" rel="noopener noreferrer"&gt;https://doi.org/10.6084/m9.figshare.32016498&lt;/a&gt;&lt;br&gt;
Benchmark data: &lt;a href="https://doi.org/10.6084/m9.figshare.32016534" rel="noopener noreferrer"&gt;https://doi.org/10.6084/m9.figshare.32016534&lt;/a&gt;&lt;br&gt;
Paper currently under review at IEEE Access and Journal of Information Security and Applications.&lt;br&gt;
Happy to answer questions about the methodology or detection layer design in the comments.&lt;/p&gt;

</description>
      <category>security</category>
      <category>llm</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Corpus poisoning and indirect prompt injection against RAG-based SOC assistants benchmark results (80% and 100% ASR respectively)</title>
      <dc:creator>KRISHNAKAANTH REDDY YEDUGURU</dc:creator>
      <pubDate>Mon, 13 Apr 2026 17:08:59 +0000</pubDate>
      <link>https://dev.to/krishnakaanth_reddyyedug/corpus-poisoning-and-indirect-prompt-injection-against-rag-based-soc-assistants-benchmark-results-17ml</link>
      <guid>https://dev.to/krishnakaanth_reddyyedug/corpus-poisoning-and-indirect-prompt-injection-against-rag-based-soc-assistants-benchmark-results-17ml</guid>
      <description>&lt;p&gt;&lt;a href="https://medium.com/@krishnakaanthreddyy1510/how-i-poisoned-an-ai-security-assistant-and-built-the-code-to-prove-it-8eef04ad16db" rel="noopener noreferrer"&gt;https://medium.com/@krishnakaanthreddyy1510/how-i-poisoned-an-ai-security-assistant-and-built-the-code-to-prove-it-8eef04ad16db&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>security</category>
    </item>
    <item>
      <title>Originally published on Medium</title>
      <dc:creator>KRISHNAKAANTH REDDY YEDUGURU</dc:creator>
      <pubDate>Mon, 13 Apr 2026 16:21:38 +0000</pubDate>
      <link>https://dev.to/krishnakaanth_reddyyedug/originally-published-on-medium-11mo</link>
      <guid>https://dev.to/krishnakaanth_reddyyedug/originally-published-on-medium-11mo</guid>
      <description>&lt;p&gt;&lt;a href="https://medium.com/@krishnakaanthreddyy1510/why-ai-powered-socs-are-the-next-attack-surface-11693e55c80c" rel="noopener noreferrer"&gt;https://medium.com/@krishnakaanthreddyy1510/why-ai-powered-socs-are-the-next-attack-surface-11693e55c80c&lt;/a&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>cybersecurity</category>
    </item>
    <item>
      <title>RedSOC: Open-source framework to benchmark adversarial attacks on AI-powered SOCs — 100% detection rate across 15 attack scenarios [paper + code]</title>
      <dc:creator>KRISHNAKAANTH REDDY YEDUGURU</dc:creator>
      <pubDate>Thu, 09 Apr 2026 07:47:48 +0000</pubDate>
      <link>https://dev.to/krishnakaanth_reddyyedug/redsoc-open-source-framework-to-benchmark-adversarial-attacks-on-ai-powered-socs-100-detection-1lj4</link>
      <guid>https://dev.to/krishnakaanth_reddyyedug/redsoc-open-source-framework-to-benchmark-adversarial-attacks-on-ai-powered-socs-100-detection-1lj4</guid>
      <description>&lt;p&gt;I've been working on a problem that I think is underexplored: what happens when you actually attack the AI assistant inside a SOC?&lt;br&gt;
Most organizations are now running RAG-based LLM systems for alert triage, threat intelligence, and incident response. But almost nobody is systematically testing how these systems fail under adversarial conditions.&lt;br&gt;
So I built RedSOC — an open-source adversarial evaluation framework specifically for LLM-integrated SOC environments.&lt;br&gt;
What it does:&lt;br&gt;
Three attack types are implemented and benchmarked:&lt;/p&gt;

&lt;p&gt;Corpus poisoning (PoisonedRAG threat model) — inject malicious documents into the knowledge base to steer analyst responses toward dangerous advice&lt;br&gt;
Direct prompt injection — embed override instructions in the user query&lt;br&gt;
Indirect prompt injection — hide adversarial instructions inside retrieved documents (Greshake et al. threat model)&lt;/p&gt;

&lt;p&gt;The detection layer runs three mechanisms in parallel without requiring model internals:&lt;/p&gt;

&lt;p&gt;Semantic anomaly scoring (cosine similarity between query and retrieved docs)&lt;br&gt;
Provenance tracking (whitelist-based source verification)&lt;br&gt;
Response consistency checking (answer vs source divergence)&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark results (15 scenarios, Llama 3.2, fully local via Ollama)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Attack Class&lt;/th&gt;
&lt;th&gt;Attack Success Rate&lt;/th&gt;
&lt;th&gt;Detection Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Corpus poisoning&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Direct injection&lt;/td&gt;
&lt;td&gt;60%&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Indirect injection&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;80%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Indirect prompt injection succeeds 100% of the time against an undefended RAG pipeline. The detection layer catches everything at 100% with zero misses across all 15 scenarios.&lt;br&gt;
Stack: Python, LangChain, FAISS, Ollama (Llama 3.2) — runs fully local, no API keys needed.&lt;br&gt;
The accompanying survey paper maps the full adversarial threat landscape (RAG poisoning, prompt injection, multi-agent hijacking, concept drift) with 16 citations including PoisonedRAG, AgentPoison, MemoryGraft, and the recent DarkSide paper.&lt;br&gt;
Code: &lt;a href="https://github.com/krishnakaanthreddyy1510-cell/RedSOC" rel="noopener noreferrer"&gt;https://github.com/krishnakaanthreddyy1510-cell/RedSOC&lt;/a&gt;&lt;br&gt;
Paper: [arXiv link — pending, will update]&lt;br&gt;
Happy to answer questions about the detection architecture or the benchmark methodology. Feedback welcome — especially from anyone who's seen these attack patterns in production.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F25ra0rt7a6b61iyys9a3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F25ra0rt7a6b61iyys9a3.png" alt=" " width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>python</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
