Your EC2 enumeration detection buckets events into 5-minute windows, counts the distinct instances each actor touched, and alerts when that count passes 10. Reasonable on paper. A burst of instance enumeration is recon, and recon is worth catching early.
It has never fired.
Not because nobody is enumerating your environment. Because the math forbids it from ever firing, at any attacker pace you'll actually see, and it shows up green on your dashboard the entire time.
The detection that looks tuned
index=aws sourcetype=aws:cloudtrail eventName=DescribeInstances
| bin _time span=5m
| stats dc(instance_id) AS distinct_instances by sourceIPAddress, _time
| where distinct_instances > 10
A window and a threshold is what a rate detection is supposed to have. It maps to discovery. It looks deliberate. It looks tuned. That's why it sails through review – it has the shape of a careful rule.
The shape is the problem. The numbers inside it can't happen.
Why it can never fire
Three things stack up, and any one of them is enough.
The fixed-bucket boundary. bin doesn't draw windows around your events. It chops the timeline on a fixed origin – :00, :05, :10 – and drops events into whichever slot they land in. An attacker enumerating 10 instances over 12 minutes spreads across three buckets. Maybe four instances, then three, then three. Never 10 in one. The enumeration absolutely happened. The bucketing sliced it into pieces too small to cross the threshold.
The pace. Ten DISTINCT instances inside a single 300-second window assumes a tempo that deliberate recon doesn't use and API rate limits don't encourage. Low-and-slow enumeration – the kind you most want to catch – is the kind least likely to pile 10 unique instances into five minutes.
The straddle. Even a genuine burst, 10 instances in six minutes, gets split the moment it crosses a boundary. Five before :05, five after. Two buckets of five. Nothing fires. The detection punishes the attacker for nothing they did and rewards them for where your clock happened to start.
This is a class of bug, not a rule
The specific rule doesn't matter. The category does: any rate detection where the threshold can't be reached inside the window at a realistic pace.
It shows up as a span set tighter than the threshold can fill. As a threshold sized for sliding-window intuition but implemented as a fixed bucket. As a window shorter than the natural cadence of the behavior it's hunting. Different surface, same dead detection underneath.
And it's invisible, because nothing about it errors. The SPL is valid. The schedule runs. The job completes clean. The cell on your ATT&CK heatmap is green. I've written before that the heatmap counts rules instead of coverage – this is one of those rules. It's behind a green cell, and it cannot fire.
How to spot them
- Do the arithmetic. Can the threshold be reached inside one window at a pace a real attacker would use? Multiply it out. If the answer is no, the detection is dead and no tuning saves it.
-
Know fixed-bucket from sliding-window.
binandtimechartare fixed-origin. A burst that straddles a boundary gets split across windows. If you wrote a fixed bucket but reasoned about it like a sliding window, your intuition and your SPL disagree. - Backtest for EVER, not for HOW OFTEN. Run it across 90 days of real activity. A noisy detection tells you it's miscalibrated. A detection that fired ZERO times across all that data is telling you something worse – suspect dead-by-construction, not quiet-because-safe.
- Compare the window to the behavior's cadence. Match your span against API rate limits, automation intervals, and how fast the technique actually runs in the wild. Not against your gut sense of "fast."
The fix
Widen the window, lower the threshold, or both – but only after the arithmetic, not by feel. Then stop using a fixed bucket for a sliding-window question:
index=aws sourcetype=aws:cloudtrail eventName=DescribeInstances
| streamstats time_window=15m dc(instance_id) AS distinct_instances by sourceIPAddress
| where distinct_instances > 10
streamstats time_window counts over a window that moves with the events instead of snapping to a fixed clock. Ten instances in any rolling 15-minute span trips it, no matter where the boundaries would have fallen. The straddle problem disappears because there are no fixed boundaries to straddle.
Re-backtest after the change, and confirm it fires on a known-good positive control before you trust it.
The actual lesson
A detection that cannot fire is worse than no detection. No detection at least leaves a hole you can see and plan around. A dead one fills the hole with a green cell, claims a slot on the coverage report, and tells you you're watching a door that's been welded shut since the day you deployed it.
Do the arithmetic before you trust the threshold. The window has to be big enough to hold the thing you're trying to catch.
Top comments (0)