DEV Community

AYUSH SINGH
AYUSH SINGH

Posted on

I built a Threat Intelligence RAG System from scratch — here's what actually broke

CVE databases are massive. Searching them manually is painful. I wanted to ask plain English questions like "show me all critical RCE vulnerabilities from 2024" and get real answers — so I built a RAG system to do exactly that.

The stack

🔹 HuggingFace — embeddings
🔹 FAISS — vector store
🔹 Fully local LLM — no OpenAI costs
🔹 AWS — deployment
What actually broke (and how I fixed it)

The local LLM hallucinated CVE numbers confidently. FAISS retrieval returned irrelevant chunks when queries were too short. Chunking strategy mattered way more than I expected. I'll walk through each failure and the fix.

What you'll learn

How to build a RAG pipeline without relying on OpenAI, why chunking strategy is underrated, common failure modes in local LLMs, and how to deploy the whole thing to AWS.
Full article on Medium 👉 https://medium.com/p/e9efd48d1799/edit
github: https://github.com/letshck/threat-intelligence-RAG

Building in AI/security? I'd love to connect.

Top comments (0)