In the previous post, we identified three key gaps that Conditional Access cannot address:
Brute force patterns (e.g. 10 failures in 2 minutes)
Activity from excluded users (e.g. executives bypassing geo-blocking)
behavioural anomalies (e.g. Saturday midnight logins)
This post builds the detection layer that catches what CA misses. Not prevention but detection. Stream Analytics complements Conditional Access, not replaces it.
What this system detects:
- Brute force patterns (5+ failures in 10-minute windows)
- Geographic anomalies from excluded users (non-UK access with no CA oversight)
- behavioural anomalies (off-hours activity from UK locations)
What this system does NOT detect:
- Token theft without anomalous sign-in activity
- Lateral movement after successful authentication
- Data exfiltration post-login
This highlights a critical principle: identity security requires both preventative controls (Conditional Access) and detective controls (event-driven monitoring).
Note: This is about detection. This should feed into a SIEM integration for SOC investigation and response
Architecture: Event Hub + Stream Analytics
The Pipeline:
- Entra ID sign-ins → Real authentication events (success, CA blocks, password failures)
- Event Hub (signin-events) → Buffers events for stream processing (2 partitions, 1-day retention)
- Stream Analytics → 3 continuous queries running SQL against event stream
- Event Hub (threat-alerts) → Stores detected threats with full investigation context
Why Event Hub? Decouples collection from processing. Events persist even if Stream Analytics fails. Query has bug? Replay events with corrected query. Connection drops? Events buffered.
Why monitor excluded users? When Senior Level executive logs in from New York, CA policy doesn't apply (user excluded from geo-blocking) → authentication succeeds → no CA oversight. Stream Analytics flags this for investigation: legitimate executive travel or compromised account?
Infrastructure: Terraform deploys everything in ~5 minutes. Event Hub Basic tier (£0.86/day), Stream Analytics 1 SU (£0.10/hour active).
With the ingestion layer in place, we can now define detection logic as continuous queries running against the event stream.
Query 1: Brute Force Detection
Purpose: Detect 5+ authentication failures from same user within 10-minute window.
SELECT
userPrincipalName,
COUNT(*) as failed_attempts,
System.Timestamp() AS window_end,
'Failed Login Spike' as alert_type
INTO [FailedLoginOutput]
FROM [EventHubInput]
WHERE status.errorCode <> 0
GROUP BY userPrincipalName, TumblingWindow(minute, 10)
HAVING COUNT(*) >= 5;
How tumbling windows work:
- Fixed 10-minute intervals: 14:00-14:09, 14:10-14:19, etc.
- Non-overlapping: User with 6 failures in one window → alert fires
- Separate windows: 3 failures in window 1 + 3 in window 2 → no alert (distributed, not burst)
Why this matters: Prevents alert fatigue from slow distributed attacks while catching concentrated bursts indicative of automated credential stuffing.
Key decision: Include errorCode 53003 (CA blocks) or only 50126 (wrong passwords)?
Early iteration excluded CA blocks (user might have correct password, wrong location—not a credential attack). After testing, included both to capture attackers trying multiple locations during brute force.
Production teams: Adjust based on threat model. Exclude 53003 for pure credential attacks. Include for comprehensive activity monitoring. Or better still; split them into separate detections.
Query 2: Geographic Anomalies
Purpose: Flag ALL non-UK access for investigation. Operational staff will be CA-blocked, but we want to monitor excluded executives.
SELECT
userPrincipalName,
location,
ipAddress,
createdDateTime,
'Non-UK Access' as alert_type
INTO [HighRiskOutput]
FROM [EventHubInput]
WHERE location NOT LIKE '%United Kingdom%'
AND location NOT LIKE '%UK%'
AND location NOT LIKE '%, GB'
AND location IS NOT NULL
AND location <> 'Unknown, Unknown';
The GB Location Bug (cost me an hour of debugging):
Version 1 (failed):
WHERE location NOT LIKE '%UK%' AND location NOT LIKE '%United Kingdom%'
Looked reasonable. Deployed. Tested.
Alerts fired for Salford, Manchester, Islington—all UK cities!
The problem: Entra ID uses ISO country code "GB", not "UK". Sign-in location appears as "Salford, GB", not "Salford, UK".
My query checked NOT LIKE '%UK%'. It correctly detected "Salford, GB" doesn't contain "UK"—so it flagged a UK city as non-UK. False positive cascade.
Version 2 (deployed):
Added NOT LIKE '%, GB' to catch the ISO format. Also added <> 'Unknown, Unknown' to filter geolocation lookup failures.
Lesson: Never assume data format. Always inspect actual payloads before writing filters. The GB vs UK issue is obvious in hindsight—but you only find it by testing with real data.
Why alert on excluded users?
Executive logs in from Singapore → Policy doesn't apply (excluded from geo-blocking) → Stream Analytics flags it → Security investigates:
- Check group membership → Senior Level executive
- Verify CA policy exclusion → Correctly excluded
- Conclusion: Legitimate travel (no action) OR unexpected location (escalate)
Alert forces human review of activity from users who bypass CA policies.
Query 3: Off-Hours Activity
Purpose: Flag UK-based logins on weekends or outside 9-5 UTC business hours.
SELECT
userPrincipalName,
location,
createdDateTime,
DATEPART(hour, createdDateTime) as login_hour,
DATEPART(weekday, createdDateTime) as day_of_week,
'Off-Hours Activity' as alert_type
INTO [OffHoursOutput]
FROM [EventHubInput]
WHERE (
location LIKE '%, GB'
OR location LIKE '%United Kingdom%'
OR location LIKE '%UK%'
)
AND (
DATEPART(weekday, createdDateTime) IN (1, 7) -- Weekday numbering: 1=Sunday, 7=Saturday (verify DATEFIRST setting)
OR DATEPART(hour, createdDateTime) NOT BETWEEN 9 AND 17
);
UK-only location filter: Non-UK logins already trigger Query 2 (geographic anomalies). This query focuses on unusual timing from allowed locations. Prevents duplicate alerts—keeps queries mutually exclusive.
Location matching brittleness: This query uses the same string pattern matching as Query 2 (LIKE '%, GB'). For production, consider extracting countryOrRegion during ingestion and using structured field comparison (WHERE country = 'GB') instead of string matching. More reliable and avoids the GB vs UK inconsistency issues.
Timezone considerations: All Entra ID timestamps are UTC. This query checks UTC hours 9-17, not local UK time. Implications:
- UK summer (BST, UTC+1): "9-5 UK time" = 8-16 UTC → query misses 16-17 UTC hour
- UK winter (GMT, UTC+0): "9-5 UK time" = 9-17 UTC → query is accurate
For precise UK business hours detection, adjust query to account for BST/GMT transitions or accept UTC-based approximation.
Early iteration mistake: Singapore login at 2 AM triggered BOTH Query 2 (non-UK) AND Query 3 (off-hours). Duplicate alerts cause investigation fatigue.
Fix: Added UK-only filter to Query 3. Now each query targets a distinct signal dimension:
- Query 1: Authentication failures (any location, any time)
- Query 2: Geographic anomalies (non-UK access, any time)
- Query 3: Behavioral anomalies (UK access, unusual timing)
Note: A single event can still trigger multiple queries if it matches multiple dimensions. For example, a UK user failing login 10 times at 2 AM on Saturday triggers Query 1 (failures) AND Query 3 (off-hours). This is expected—each query surfaces a different investigation angle for the same suspicious event.
Data Collection: Graph API to Event Hub
Python script fetches sign-in logs from Graph API → transforms to simplified schema → sends to Event Hub in batches.
# Get connection string from Terraform
cd terraform
terraform output -raw eventhub_send_connection_string
# Add to .env file
echo "EVENTHUB_CONNECTION_STRING=$(terraform output -raw eventhub_send_connection_string)" >> ../scripts/.env
# Process events
cd ../scripts
python export_signin_logs_to_eventhub.py
Typical completion: ~30 seconds for 3300 events.
Schema transformation:
{
"userPrincipalName": log["userPrincipalName"],
"status": {
"errorCode": log.get("status", {}).get("errorCode", 0)
},
"location": f"{log.get('location', {}).get('city')}, {log.get('location', {}).get('countryOrRegion')}",
"ipAddress": log.get("ipAddress"),
"createdDateTime": log["createdDateTime"]
}
Debugging gotcha: Script crashed with cryptic DNS error (getaddrinfo failed). Looked like network issue. Spent 20 minutes checking firewalls, DNS settings, network connectivity.
Actually: Extra quote in .env file: 'Endpoint=sb://.... Parser read the quote, couldn't parse hostname, threw DNS error.
Lesson: Connection string format errors manifest as network failures. Print connection strings (redacted) to verify exact format before debugging network stack.
Investigation Workflows
All 3300 threats stored in Event Hub threat-alerts with complete investigation context. Azure Portal → Event Hub → Data Explorer shows each threat.
Example 1: Brute Force Attack
The sign-in logs show repeated failures (error code 50126) from a UK location.
Individually, these events would not trigger Conditional Access. However, when analysed as a sequence, they form a clear brute force pattern and worth investigating.
Investigation:
- Check user's normal location: UK (matches profile?)
- Failed attempts: 10 in 2 minutes (definite brute force pattern)
- Action: Contact user to verify, force password reset, review MFA enrollment
Example 2: Excluded User (Critical Scenario)
Investigation:
- Alert received: Non-UK access from New York
- Check Entra ID: Navigate to user → Group memberships
- Discovery: User is member of "Senior Level" group (executives)
- Verify CA policy: "LAB - Block Non UK (Exclude Senior Level)" → Senior Level group excluded
- Conclusion: Expected (excluded executive), NOT breach
Policy doesn't apply to this user (excluded for travel requirements). No remediation needed—log for audit trail.
This demonstrates defense-in-depth: CA policy doesn't apply (excluded), Stream Analytics flagged it for visibility, security team confirms expected behavior.
Without Stream Analytics, this login is invisible. No flag, no investigation, no confirmation that exclusion is being used legitimately vs. account compromise.
Example 3: Off-Hours Activity
Investigation:
- Location: UK (expected for this user)
- Time: Saturday 11:34 (outside Mon-Fri 9-5)
- Action: Log for trend analysis, contact user if pattern emerges
Key Learnings
1. Real Data Reveals Edge Cases: GB vs UK location bug (Entra uses "Salford, GB"), CA blocks (53003) vs auth failures (50126) require different handling, connection string format errors manifest as DNS failures.
2. Event Hub Decouples Collection from Processing: Events persist even if Stream Analytics fails. Query bug? Replay with corrected query. Critical for production reliability. This also enables replay—allowing you to reprocess historical events with updated detection logic.
3. Alert Design Prevents Fatigue: Early iteration had duplicate alerts (Singapore 2 AM triggered geo + off-hours). Fix: UK-only filter for Query 3. Each query targets distinct signal dimension.
4. IaC Accelerates Iteration: Manual deployment ~2 hours. Terraform apply ~5 minutes. Debugging 2 query bugs took ~30 minutes with IaC vs. 4+ hours manual.
Security Audit: Stream Analytics in Production
This is a LAB environment. The following security gaps would fail production review.
What We Skipped (Intentionally):
| Component | Lab Setup | Production Requirement | Risk |
|---|---|---|---|
| Network | Public Event Hub endpoints | Private Link + VNet integration | Data exposure, unauthorized access |
| Secrets | Connection strings in .env files | Azure Key Vault + managed identities | Credential theft, no rotation |
| Encryption | Basic tier (no encryption at rest) | Premium tier with customer-managed keys | Data breach if storage compromised |
| Query Changes | Manual edits in portal | CI/CD with approval gates | Accidental query breakage, no audit trail |
| Monitoring | No alerting on job failures | Azure Monitor alerts + runbooks | Silent detection failures |
| Data Retention | 1-day Event Hub retention | Long-term storage (SQL/Log Analytics) | Compliance violations, lost audit trail |
Production Must-Haves for Stream Analytics:
1. Network Isolation
- Private Link for Event Hub (~£12/month per endpoint)
- VNet integration for Stream Analytics job
- Network security groups restricting inbound/outbound
2. Identity & Access
- Managed identities for Stream Analytics → Event Hub authentication
- Azure Key Vault for any connection strings (Python script)
- RBAC with least privilege (no account keys in queries)
3. Data Protection
- Premium Event Hub with encryption at rest (~£531/month)
- Customer-managed keys (CMK) for compliance
- TLS 1.2+ for all data in transit
4. Operational Excellence
- Multi-region deployment for disaster recovery
- Automated failover for Event Hub namespace
- Azure Monitor alerts on: job stopped, output errors, SU utilization >80%
- Runbooks for common failure scenarios
5. Change Management
- Version control for query definitions (Git)
- CI/CD pipeline for query deployments
- Approval gates before production changes
- Rollback capability if query breaks
6. Compliance & Audit
- Forward threat-alerts to Log Analytics workspace (7-year retention)
- Immutable audit logs for compliance
- Data residency controls for GDPR/regional requirements
- Regular access reviews for Event Hub/Stream Analytics permissions
Conditional Access evaluates individual sign-in events.
This system detects patterns across events—the difference between blocking risk and understanding it.
Skills demonstrated: Stream Analytics query development, Event Hub event-driven architecture, SQL tumbling windows, real-time threat detection, production security architecture design, infrastructure as code iteration.











Top comments (0)