DEV Community

Conor Dobbs
Conor Dobbs

Posted on • Originally published at tools.thesoundmethod.me

the ai cybersecurity hype is real, and so is the gap between the deck and the deployment

somebody on r/cybersecurity hit 765 upvotes asking if anyone else was losing their mind over the "ai cybersecurity" hype. yes. the answer is yes. every practitioner i talk to is having the same conversation, and it's worth saying out loud what's happening so we can stop pretending the emperor is dressed.

the pitch goes like this. vendor X has an "ai-powered" platform. it triages alerts, reduces analyst workload by 90 percent, autonomously investigates threats, contains incidents, and makes the soc analyst job a quaint historical footnote. the deck is gorgeous. the demo is smooth. the price is six figures and growing.

then you deploy it.

what actually shows up in production

a recent practitioner write-up tested an llm-based triage tool against 348 known false positives and one synthetic true positive. the tool got 71 percent accuracy. it called obvious false positives malicious. it missed the planted incident entirely. that's the part nobody puts on the slide.

the microsoft and omdia state of the soc 2026 report says 46 percent of alerts are false positives. the 2025 sans detection and response survey says 73 percent of teams name false positives as their top detection challenge. soc teams are looking at around 11,000 alerts per day with only 22 per analyst that warrant investigation. the ai is supposed to fix this. some of the better tools (radiant, cortex xsiam, sentinel) do meaningfully reduce noise. most do not. and almost none deliver what the vendor promised in the room.

help net security ran a piece in march 2026 that put it bluntly: "ai soc vendors are selling a future that production deployments haven't reached yet." that's the headline. that's the story.

the ai-washing tax

there's a name for what's happening. ai-washing. a vendor takes a regex engine, a correlation rule, or a half-trained classifier, slaps "powered by ai" on the box, and triples the price. one security analyst put the estimate at 80 percent of "ai-powered" security tools being misleading. 77 percent of it leaders report using ai-driven security products. only 66 percent say they understand how the ai improves outcomes. that 11-point gap is the entire problem in one stat.

ciso enthusiasm for ai security tooling far outpaces operator enthusiasm. the people writing the checks are excited. the people running the consoles at 2 am are not. that disconnect is load-bearing for the whole market right now and it's why so many deployments look great in a board slide and miserable in a queue.

what i'd be skeptical of, every time

if you're a practitioner sitting across the table from a vendor pitch, here's the short list.

  • ask them to run the ai against your last 30 days of alerts. not a curated demo set. yours. with your noise, your environment, your weirdness.
  • ask how an analyst inspects and overrides an ai decision. if the answer is hand-wavy, the audit trail doesn't exist and you can't defend the outcome to a regulator.
  • ask which threat types the ai caught that a rules-based or signature-based system would have missed. specific examples. not "advanced threats." not "novel attacks." names, ttps, cves.
  • ask what the failure mode looks like. when the model is wrong, how do you find out? what's the blast radius? who eats the page?
  • ask if the model retrains on your data, who owns the resulting weights, and whether you can audit what it learned.

vendors with real ai capability answer those concretely. vendors selling air do not. nist has said this on the record: there are theoretical problems with securing ai algorithms that haven't been solved. anyone claiming otherwise is selling something.

the analyst trust problem

the part that bothers me most. ai-generated alert summaries create a real cognitive failure mode. analysts start deferring to confidence-weighted ai output instead of doing evidence-weighted analysis. the model writes a paragraph that sounds plausible, marks it severity high, and the analyst goes with it even when the raw telemetry suggests something more boring is happening.

the inverse happens too. the model says benign with high confidence and an analyst stops investigating something that needed another five minutes. either failure mode produces worse outcomes than if the analyst had read the logs themselves. the tool was supposed to amplify judgment. instead it replaced it, badly.

this isn't a hypothetical. it's showing up in incident postmortems right now.

where ai in security is actually working

i don't want to be the "ai bad, log4j good" guy. there are places this is genuinely working.

  • noise reduction on well-bounded problems. phishing classification. duplicate alert grouping. log parsing. bounded, plentiful training data, low failure cost.
  • enrichment. the model pulls context (whois, virustotal, geo, asn reputation) into one pane. saves you ten tabs without making the call.
  • writing detection rules from natural language. "alert me when a non-admin account enumerates ad groups outside business hours." that's a sigma rule you can refine, not a black box.
  • behavioral baselining where you have the volume. noisy for the first three months, but you eventually get real signal on insider weirdness.

notice what those have in common. the human is still in the loop. the ai is doing the boring scut work. the analyst is making the call. that's the architecture that works.

the next 12 to 24 months

agentic ai is the next wave the vendors are selling. agents that triage, investigate, contain, and remediate end-to-end. some of this will be real. most of it will not be real for at least another product cycle, probably two. agents fail in compounding ways: one bad inference upstream becomes a bad action downstream becomes an isolated production host at 4 am.

the agentic story will eventually work, but it will work in narrow domains first (specific alert types, specific containment actions with hard guardrails) and then expand. anyone selling you a fully autonomous soc-in-a-box right now is selling the 2030 version of a 2026 product.

what to do in the meantime. invest in the unsexy foundations. telemetry quality. identity controls. asset inventory. exposure management. recoverability. ai is a multiplier on whatever foundation you already have. if the foundation is bad the ai will make decisions on bad data faster.

if you're an analyst watching this

your job is not going away in the next 24 months. the people telling you it is are either selling something or repeating someone who's selling something. what is going to change is which parts of the job you spend time on. less tab-flipping for context. more designing detections, tuning the ai's output, and being the human who pushes back when the model is confidently wrong.

the skill that compounds in this environment is the ability to read raw telemetry and form your own opinion before looking at what the tool said. analysts who can do that are going to be more valuable, not less. analysts who learn to rubber-stamp ai output are going to find themselves replaced by a cheaper analyst who also rubber-stamps ai output.

learn to read pcaps. learn to read auth logs unaided. learn to write detections. those are the durable skills. the ai-tool-of-the-month is the disposable one.

the takeaway

the r/cybersecurity post wasn't wrong to be losing its mind. the ai cybersecurity market in 2026 has a hype-to-reality ratio that would make a 2017 ico blush. but underneath the marketing there are real, narrow, useful applications that practitioners should be exploiting. the trick is being able to tell which is which, and the only way to do that is to demand evidence and run your own tests instead of trusting the slide deck.

if you're trying to make sense of which ai tools in your stack are pulling weight versus which ones are expensive autocomplete, i've been writing up tool comparisons over at tools.thesoundmethod.me. same evaluation framework, applied to the dev side first because that's where the data is cleanest. the cybersecurity comparison is next.

and if you want the engineering playbook for building reliable agent workflows (the kind of thing the soc-in-a-box vendors are still figuring out), the claude code cookbooks i'm assembling cover the patterns that hold up in production. ping me if you want early access.

stay skeptical. read the logs. don't buy the deck.


I keep the running list of these from-real-use tool teardowns at tools.thesoundmethod.me, written from running the things, not from the vendor deck.

Top comments (0)