Nrk Raju Guthikonda

Posted on Apr 12

Stop Sending Your Security Alerts to Cloud AI — Build Local LLM Tools Instead

#privacy #ai #security #python

Every time a security analyst pastes a suspicious log entry into a cloud-based AI chatbot, they might be handing adversaries a roadmap. That firewall alert contains your internal IP ranges. That phishing email reveals which executives are being targeted. That threat intelligence report maps your entire attack surface.

I learned this the hard way. As a Senior Software Engineer at Microsoft working on Copilot Search Infrastructure, I spend my days thinking about how AI systems ingest, index, and retrieve sensitive data at scale. That experience taught me a foundational principle: the most dangerous data leak is the one disguised as a productivity tool.

So I built five open-source security AI tools — all powered by local LLMs through Ollama — that never send a single byte to the cloud. Here is why you should do the same, and how to get started.

Why Security Data Must Never Leave Your Network

This is not theoretical paranoia. It is operational reality.

1. Compliance Exposure

NIST SP 800-171, SOC 2, HIPAA, and GDPR all impose strict controls on where sensitive data can be processed. The moment you paste a security alert into a cloud AI service, you have potentially created a compliance violation. Most cloud AI providers explicitly state in their terms of service that they may use input data for model improvement.

2. Adversarial Intelligence Leakage

Security alerts are not just operational noise — they are intelligence. An alert about a brute-force attempt on admin@internal-crm.yourcompany.com tells an attacker three things: you have a CRM system, it uses that naming convention, and it is internet-facing. Sending this to a third-party API, even an encrypted one, expands your blast radius.

3. Supply Chain Risk

Cloud AI providers are themselves targets. A breach at your AI provider could expose every query ever sent — including your security telemetry. Running locally eliminates this entire attack surface.

4. Latency in Incident Response

During an active incident, you cannot afford to wait for API rate limits or deal with cloud outages. Local inference means your AI triage tools work even when the network is compromised — which is exactly when you need them most.

The Local LLM Stack: Ollama + Python

The architecture is simpler than you might expect. Ollama provides a local REST API that is compatible with the interface patterns most developers already know. Here is the foundation every tool in my security suite shares:

import requests
import json
from typing import Optional

class LocalLLM:
    """Interface to local Ollama instance for security analysis."""

    def __init__(self, model: str = "gemma4", base_url: str = "http://localhost:11434"):
        self.model = model
        self.base_url = base_url

    def analyze(self, prompt: str, temperature: float = 0.3) -> str:
        """Send a prompt to the local LLM. No data leaves localhost."""
        response = requests.post(
            f"{self.base_url}/api/generate",
            json={
                "model": self.model,
                "prompt": prompt,
                "temperature": temperature,
                "stream": False
            }
        )
        response.raise_for_status()
        return response.json()["response"]

    def health_check(self) -> bool:
        """Verify Ollama is running before processing sensitive data."""
        try:
            resp = requests.get(f"{self.base_url}/api/tags")
            return resp.status_code == 200
        except requests.ConnectionError:
            return False

Notice the low temperature setting of 0.3. For security analysis, you want deterministic, factual responses — not creative writing. This is a deliberate architectural choice that differs from most chatbot configurations.

Building a Security Alert Analyzer

Let me walk through a concrete example: triaging a cybersecurity alert. The key insight is that not everything requires an LLM. Pattern extraction (IPs, hashes, CVEs) is best handled by regex, while the LLM handles contextual analysis and summarization.

import re
from dataclasses import dataclass

@dataclass
class SecurityAlert:
    raw_text: str
    iocs: dict
    threat_score: float
    summary: str

def extract_iocs(alert_text: str) -> dict:
    """Extract Indicators of Compromise without an LLM."""
    return {
        "ips": re.findall(r'\b(?:\d{1,3}\.){3}\d{1,3}\b', alert_text),
        "domains": re.findall(r'\b[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b', alert_text),
        "cves": re.findall(r'CVE-\d{4}-\d{4,}', alert_text),
        "md5_hashes": re.findall(r'\b[a-fA-F0-9]{32}\b', alert_text),
        "sha256_hashes": re.findall(r'\b[a-fA-F0-9]{64}\b', alert_text),
    }

def analyze_alert(alert_text: str, llm: LocalLLM) -> SecurityAlert:
    """Full alert analysis: regex extraction + local LLM summarization."""
    iocs = extract_iocs(alert_text)

    prompt = f"""You are a senior SOC analyst. Analyze this security alert and provide:
1. Threat severity (CRITICAL/HIGH/MEDIUM/LOW)
2. Attack type classification
3. Recommended immediate actions
4. IOC summary

Alert:
{alert_text}

Extracted IOCs: {json.dumps(iocs, indent=2)}
"""
    summary = llm.analyze(prompt)

    # Score based on IOC density and keyword severity
    score = len(iocs["cves"]) * 3.0 + len(iocs["ips"]) * 1.5
    if any(kw in alert_text.lower() for kw in ["critical", "exploit", "ransomware"]):
        score += 5.0

    return SecurityAlert(
        raw_text=alert_text,
        iocs=iocs,
        threat_score=min(score, 10.0),
        summary=summary
    )

This hybrid approach — deterministic extraction plus LLM analysis — gives you the reliability of pattern matching with the contextual intelligence of a language model. And everything stays on localhost:11434.

Five Tools, Zero Cloud Dependencies

I have built and open-sourced a suite of security tools that follow this architecture. Each one solves a real problem I have encountered in production environments:

1. Cybersecurity Alert Summarizer

The flagship tool. It ingests raw security alerts, extracts IOCs (IPs, domains, hashes, CVEs), queries a local CVE database for CVSS scores, calculates weighted threat scores, and generates executive-ready summaries. The correlation engine links related alerts across multiple data sources — critical for spotting coordinated attacks.

Tech: Python, Ollama, Click CLI, FastAPI, Rich, Docker

GitHub: cybersecurity-alert-summarizer

2. DocShield — Privacy-First Document Analysis

A multi-agent system using Gemma 4 that reads, explains, and audits sensitive documents. While originally built for medical documents (HIPAA compliance demands local processing), the architecture applies to any document type containing sensitive data — contracts, financial reports, legal discovery. Five specialized agents (Orchestrator, Reader, Explainer, Checker, Bill Analyzer) work in a pipeline, each with a focused responsibility.

Tech: Python, Gemma 4, Flask, Multi-Agent Pipeline, Docker

GitHub: docshield

3. Password Strength Advisor

Goes far beyond "must contain uppercase and special character." This tool calculates Shannon entropy with pattern penalty scoring, checks against a local breach database with leet-speak variation detection, generates NIST SP 800-63B compliant policies, and creates cryptographically secure passwords using Fisher-Yates shuffling. The LLM provides natural-language explanations of why a password is weak.

Tech: Python, Ollama, Click, Streamlit, FastAPI

GitHub: password-strength-advisor

4. Phishing Email Detector

Analyzes email headers, body text, and embedded URLs to classify phishing attempts. The local LLM examines linguistic patterns (urgency cues, authority impersonation, grammatical anomalies) while deterministic checks handle SPF/DKIM validation and URL reputation lookups against local threat feeds. No email content ever leaves the analysis machine.

5. Threat Intelligence Summarizer

Ingests threat intelligence reports (STIX/TAXII feeds, vendor advisories, CVE bulletins) and produces actionable summaries for different audiences — technical IOC lists for the SOC team, risk assessments for management, patch priority lists for the infrastructure team. The LLM translates dense technical reports into audience-appropriate language.

The Architecture Pattern

Every tool in this suite follows the same layered design:

┌─────────────────────────────────────────────┐
│           Input Layer (CLI / Web / API)      │
└──────────────────┬──────────────────────────┘
                   ↓
┌─────────────────────────────────────────────┐
│     Deterministic Processing Layer          │
│  (Regex, Pattern Matching, Scoring, DB)     │
│  → No LLM needed, fast, reliable            │
└──────────────────┬──────────────────────────┘
                   ↓
┌─────────────────────────────────────────────┐
│     Local LLM Analysis Layer                │
│  (Ollama → Gemma 4 / Llama 3.2)            │
│  → Contextual analysis, summarization       │
│  → 127.0.0.1 only, no external calls        │
└──────────────────┬──────────────────────────┘
                   ↓
┌─────────────────────────────────────────────┐
│     Output Layer (Rich CLI / Streamlit)     │
│  → Formatted reports, threat dashboards     │
└─────────────────────────────────────────────┘

The critical design decision is the separation between deterministic and LLM layers. Pattern extraction, scoring, and database lookups do not need an LLM and should not use one. The LLM handles what it is good at: contextual understanding, summarization, and natural-language generation.

Getting Started in 5 Minutes

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a security-optimized model
ollama pull gemma4

# Clone any tool from the suite
git clone https://github.com/kennedyraju55/cybersecurity-alert-summarizer.git
cd cybersecurity-alert-summarizer

# Install and run
pip install -r requirements.txt
python -m src.cyber_alert.cli --alert alerts/sample.txt

For model selection, I recommend Gemma 4 for its strong reasoning capabilities and multimodal support, or Llama 3.2 (3B) if you need faster inference on limited hardware. Both run comfortably on a machine with 16GB RAM.

The Bottom Line

Cloud AI is transformative for many use cases. Security is not one of them. The data you are analyzing — alerts, logs, threat intel, credentials, internal network topology — is precisely the data that adversaries want. Every cloud API call is an exposure surface.

Local LLMs have reached a capability threshold where they handle security analysis tasks effectively. The tools exist. The models are free. The only cost is the compute you already own.

In my experience building production AI systems that process sensitive data at scale, the architecture that wins is the one that minimizes data movement. For security tooling, that means local inference, local storage, and zero external dependencies.

Build local. Analyze local. Keep your security data where it belongs — on your network.

About the Author

Nrk Raju Guthikonda is a Senior Software Engineer at Microsoft on the Copilot Search Infrastructure team, focused on semantic indexing and retrieval-augmented generation (RAG). He maintains 116+ open-source repositories, including a suite of security AI tools powered by local LLMs. His work explores the intersection of AI, privacy, and practical security tooling.

GitHub: @kennedyraju55
Dev.to: nrk_raju
LinkedIn: nrk-raju-guthikonda

DEV Community