Christian

Posted on Mar 24

Conditional Access Realism: Testing Real Sign-Ins to Understand Policy Gaps

#azure #security #conditionalaccess #identity

Your Conditional Access policy blocked risky logins last week. Working as designed.

But can you answer these questions?

Are your executives' travel logins being monitored?
Can you detect 10 failed password attempts in 2 minutes?
What happens when excluded users authenticate from unusual locations?

Conditional Access makes point-in-time decisions: Allow or Block. Binary. Conditional Access policies themselves do not aggregate events over time or detect patterns. While Entra ID provides risk detections (e.g., via Identity Protection), these are separate systems and not configurable at the CA policy level. It can't tell you if a user failed authentication 10 times in 2 minutes. It can't flag unusual behavior from users excluded from your policies.

We built a synthetic login-generator to simulate real-world authentication patterns. The generator intentionally models probabilistic attack behavior and temporal variance to mimic real-world authentication noise rather than uniform test traffic.

UK locations and international travel. Successful logins and deliberate brute force attacks. Business hours activity and out of hours authentications. Policy compliance and executive exclusions.

The result: CA enforced policy exactly as designed. But successful logins happened—some legitimate, some from excluded executives, some exhibiting patterns only visible through cross-event analysis.

We're testing how Conditional Access protects identities—and discovering where it needs help.

The Problem: Conditional Access's Blind Spots

Conditional Access excels at point-in-time decisions:

Is this IP address risky? Block.
Is the sign-in location outside the UK? Block access.

Every authentication request gets evaluated independently. Allow or deny. Binary.

What CA cannot detect:

Patterns over time: Five failed password attempts in 10 minutes? CA evaluates each attempt separately. No aggregation.
Behavioral anomalies: Finance manager who normally works 9-5 suddenly authenticating at 2 AM? CA doesn't track baselines.
Activity from excluded users: Executive excluded from geo-blocking logs in from Singapore at midnight? Policy doesn't apply—authentication succeeds with zero oversight. No Conditional Access enforcement or policy-driven detection
Distinction between policy blocks and authentication failures: User blocked by CA (wrong location, correct password) looks different from brute force attempt (wrong password, allowed location).

These aren't CA failures; they're architectural limitations. CA prevents threats at the door. It doesn't monitor patterns that develop across multiple authentication attempts.

To test these limitations, we built realistic authentication data.

Building the Authentication Event Generator

We built a Python script using MSAL (Microsoft Authentication Library) to generate realistic authentication events across hundreds of users. The script randomly selects users from a pool and generates logins with user credentials, it also tries the wrong password 10 times in 2 mins for 5 percent of users and I use a VPN app to simulate different geo-logins.

The Event Generation Strategy

Event Distribution:

Mix of successful and failed authentications
Geographic diversity:
- UK locations: Salford, Manchester, Islington
- Non-UK locations: New York, Zagreb, Qafsah, Karachi, Kyiv, etc
Temporal patterns: Business hours plus off-hours testing (late night, weekends)
Attack simulation: Concentrated bursts of rapid password failures (10 attempts per user in 2 minutes)

Why this distribution?: Real authentication traffic isn't uniform. Most activity happens during business hours from expected locations. Suspicious activity is the minority—but buried in normal traffic.

Technical Implementation: MSAL + ROPC Flow

The script uses Microsoft Authentication Library (MSAL) with Resource Owner Password Credential (ROPC) flow to generate real Entra ID sign-in events:

from msal import PublicClientApplication
import random
import time

PASSWORD = "redacted"
WRONG_PASSWORD = "redacted"
BRUTE_FORCE_RATE = 0.05  # 5% of logins trigger brute force
BRUTE_FORCE_ATTEMPTS = 10  # 10 attempts in 2 minutes

def generate_login(username, password, is_intentional_failure=False):
    """Generate a real sign-in via ROPC flow"""
    app = PublicClientApplication(
        CLIENT_ID,
        authority=f"https://login.microsoftonline.com/{TENANT_ID}"
    )

    result = app.acquire_token_by_username_password(
        username=username,
        password=password,
        scopes=["User.Read"]
    )

    if "access_token" in result:
        print(f"✅ {username} - Login successful")
        return True
    else:
        print(f"❌ {username} - Failed")
        return False

# Main loop: randomly select users and generate logins
for user in user_pool:
    trigger_brute_force = random.random() < BRUTE_FORCE_RATE

    if trigger_brute_force:
        # Brute force: 10 failed attempts in 2 minutes
        for attempt in range(BRUTE_FORCE_ATTEMPTS):
            generate_login(user, WRONG_PASSWORD, is_intentional_failure=True)
            time.sleep(12)  # ~12 seconds between attempts
    else:
        # Normal login
        generate_login(user, PASSWORD)

    time.sleep(10)  # Rate limiting between users

Why ROPC for labs:

✅ Creates real Entra ID sign-in events (triggers CA policies)
✅ Generates authentic telemetry (same schema as production)
✅ Scriptable and repeatable
❌ ROPC bypasses MFA and modern authentication controls, making it unsuitable for production and risky even in poorly isolated environments

Geographic diversity: We manually switched VPN endpoints (UK: Salford, Manchester; Non-UK: Miami,Zagreb, Qafsah, Karachi, Kyiv, etc) before running the script. Each VPN location creates different IP addresses, ensuring Entra ID's geolocation service sees real non-UK sign-in attempts that trigger CA policy evaluation.

The Conditional Access Policy Design

We created a policy that reflects realistic business requirements: strict geo-blocking for operational staff, global access for executives.

Policy Name: "LAB - Block Non UK (Exclude Senior Level)"

*Policy blocks non-UK locations but excludes Senior group—creating a monitoring gap we'll address in next section

Policy Configuration:

Setting	Value	Rationale
Users	All users	Policy applies to entire directory
Exclusions	Senior Level group (executives)	Global access requirements (travel, M&A, incident response)
Cloud apps	All cloud apps	Protect all resources
Locations	NOT United Kingdom	Enforce only for non-UK sign-ins
Grant	Block access	Zero tolerance for non-UK access from operational staff

The Three User Populations This Creates:

Operational Staff from UK → Policy applies → Location = UK → Access granted
Operational Staff from Non-UK → Policy applies → Location ≠ UK → Access blocked (errorCode: 53003)
Senior Level Executives from Non-UK → Policy doesn't apply (excluded) → Access granted (monitoring gap)

Why Senior Levels needs exclusions: CEO travels for board meetings, CFO accesses systems during international audits, security team responds to incidents 24/7 from any location. Strict geo-blocking would block critical business operations.

Critical security consideration: While the policy doesn't apply to excluded users (no CA enforcement), Entra ID still generates sign-in events for every authentication attempt. These events contain full context (user, location, IP, timestamp, outcome) and flow to our monitoring systems. This means:

✅ Excluded users' activity is logged and available for investigation
✅ SOC can detect suspicious patterns from executive accounts (unusual locations, off-hours access)
✅ Forensic analysis possible if executive account is compromised

The gap: Detection isn't automated within CA policy framework. Manual log review required—or Stream Analytics (next section) for real-time monitoring.

Understanding CA Evaluation Outcomes

When Conditional Access evaluates an authentication request, there are four possible outcomes.

Outcome 1: SUCCESS (Policy Satisfied)

{
  "userPrincipalName": "chloe.oconnel@acme.onmicrosoft.com",
  "status": { "errorCode": 0 },
  "conditionalAccessStatus": "success",
  "location": { "city": "Manchester", "countryOrRegion": "GB" }
}

What this means: User authenticated successfully, CA evaluated and all conditions met (UK location), access granted. Normal activity.

Outcome 2: FAILURE - CA Block (Policy Enforced)

{
  "userPrincipalName": "Ethan.Bell@acme.onmicrosoft.com",
  "status": {
    "errorCode": 53003,
    "failureReason": "Blocked by Conditional Access"
  },
  "conditionalAccessStatus": "failure",
  "location": { "city": "Zagreb", "countryOrRegion": "Croatia" }
}

What this means: User might have correct password, but CA blocked before authentication completed. Policy condition not met (non-UK location). No access token issued.

CRITICAL DISTINCTION:

errorCode 53003 = CA block (policy prevented access)
errorCode 50126 = Invalid password (authentication failure, potential brute force)

These are different security events: CA block means user blocked by policy (could have correct password, wrong location). Auth failure means wrong password (typo or attack).

Outcome 3: NOT APPLIED (Policy Exemption - Monitoring Gap)

{
  "userPrincipalName": "wei.huang@acme.onmicrosoft.com",
  "status": { "errorCode": 0 },
  "conditionalAccessStatus": "notApplied",
  "location": { "city": "Qafsah", "countryOrRegion": "Tunisia" },
  "appliedConditionalAccessPolicies": [{
    "displayName": "LAB - Block Non UK (Exclude Senior Level)",
    "result": "notApplied"
  }]
}

What this means: User is in excluded group (Senior Level executives). CA policy doesn't evaluate this user at all. Authentication succeeds based on credentials alone.

This is the security gap: Excluded users' activity happens with no CA oversight. If executive account is compromised, attacker gets same exemptions. No automated detection.

Investigation required: Manual review needed. Is this legitimate travel or compromised executive account?

Stream Analytics will monitor conditionalAccessStatus: "notApplied" events to flag excluded users' activity.

Outcome 4: FAILURE - Authentication Failure (Wrong Password)

{
  "userPrincipalName": "rhys.oconnel@acme.onmicrosoft.com",
  "status": {
    "errorCode": 50126,
    "failureReason": "Invalid username or password"
  },
  "conditionalAccessStatus": "notApplied",
  "location": { "city": "Salford", "countryOrRegion": "GB" }
}

What this means: User failed to authenticate

(wrong password). CA was not evaluated because authentication failed before CA stage.This ordering is critical: it explains why brute force attacks (invalid credentials) never reach Conditional Access and therefore bypass policy evaluation entirely but should be worth investigating.

Why CA shows notApplied: Authentication pipeline validates credentials first, then evaluates CA. If credentials fail, CA never runs.

This is the signal for brute force detection: Multiple errorCode 50126 events in short time window = potential attack.

User is logging in from New York so should be blocked by CA but it was not applied as it failed password authentication before reaching the CA engine so it was not applied in this case.

Comparison Table: The Four Outcomes

Outcome	errorCode	conditionalAccessStatus	Meaning	Investigation?
Success	0	`success`	User authenticated, policy met	No
CA Block	53003	`failure`	Policy blocked access	Depends
Not Applied (Excluded)	0	`notApplied`	Policy skipped (user excluded)	YES
Auth Failure	50126	`notApplied`	Wrong password	YES

Key takeaway: Not all "notApplied" events are the same. Excluded users vs. authentication failures both show notApplied, but for different reasons. Check errorCode to distinguish.

Real-World Scenarios: Policy in Action

Scenario 1: Operational Staff from Non-UK (CA Block)

User: IT admin attempting login from Zagreb
CA Evaluation: Policy applies → Location = Zagreb, Croatia (non-UK) → Blocked (errorCode 53003)
Result: Access denied. Policy working as designed.

Scenario 2: Senior Level Executive from Non-UK (Monitoring Gap)

User: An executive login from Qafsah at 10:24 PM
CA Evaluation: User in Senior Level group → Policy doesn't apply → Access granted (notApplied)
Result: Login succeeds. No CA oversight.

Investigation needed: Is this legitimate travel or compromised executive account? CA can't tell—it never evaluated the risk.

Next section: Stream Analytics will flag all non-UK access from excluded users for investigation.

Scenario 3: Brute Force Attack (CA Can't Detect)

User: A user with 10 wrong password attempts in 2 minutes from UK
CA Evaluation: Each attempt fails authentication → CA never runs (evaluated independently)
Result: 10 failed logins in 2 minutes. CA policies do not natively detect or respond to rapid authentication failures. While platform-level protections like smart lockout may mitigate this, they are not visible or tunable within Conditional Access policy logic.

What CA detects: Nothing. CA evaluates each attempt independently—no aggregation, no pattern detection.

Next section: Stream Analytics will aggregate errorCode 50126 events over time windows to detect brute force.

Scenario 4: Off-Hours Activity (CA Allows, But Suspicious)

User: HR coordinator at 2:13 AM Saturday from Salford
CA Evaluation: Location = UK → Policy met → Access granted
Result: Login succeeds.

Why this is suspicious: HR coordinator normally works Mon-Fri 9-5. Why 2 AM Saturday? CA doesn't track baselines or detect behavioral anomalies.

Next section Stream Analytics will flag off-hours activity for investigation.

Results: What CA Caught and Missed

What Conditional Access Successfully Protected

✅ Geo-blocking enforcement: Non-UK login attempts from operational staff blocked
✅ Policy compliance: Every non-UK operational staff login denied
✅ Legitimate access: Successful logins from authorized users in compliant scenarios
✅ Executive mobility: Senior Level global access maintained

CA did exactly what it's designed to do: Evaluate each authentication request against location policy, enforce blocks, allow compliant requests and exempted users.

What Conditional Access Missed

❌ Brute force patterns: Rapid authentication failures went undetected—CA evaluated each attempt independently
❌ Excluded user activity: Non-UK logins from Senior Level executives happened with no CA oversight
❌ Behavioral anomalies: Off-hours logins from operational staff allowed (UK location satisfied policy, but timing unusual)
❌ Attack progression: No visibility into whether failed attempts escalate over time

These aren't CA failures—they're gaps in CA's detection capabilities. CA prevents known threats. It doesn't detect patterns over time, monitor exempted users, or identify behavioral anomalies.

Security Audit: Lab vs. Production Considerations

This is a LAB environment for learning. The following security gaps exist and would fail production review:

What We Skipped (Intentionally)

Secrets Management: Client ID and Tenant ID hardcoded in Python script. Production requires Azure Key Vault with managed identities.

ROPC Flow: Deprecated authentication method that bypasses MFA. Production must use interactive flows (authorization code flow, device code flow) that support modern security features.

Password Storage: Plaintext passwords in script constants (redacted in snip}. Production requires secure credential management and certificate-based authentication.

Rate Limiting: No backoff logic for API throttling. Production needs exponential backoff and retry policies.

Production Requirements

When deploying authentication testing in production environments:

Use service principals with certificate authentication (not ROPC with passwords)
Store credentials in Azure Key Vault (retrieve via managed identity)
Implement proper error handling (API throttling, network failures, authentication errors)
Use test users in isolated tenant (never test against production user accounts)
Minimal Graph API scopes (only AuditLog.Read.All for reading sign-in logs, not Directory.Read.All)

Why we use ROPC in this lab: Creates authentic Entra ID sign-in events that trigger real CA policy evaluation. Allows reproducible testing scenarios. Production would use interactive authentication or real user logins for testing, not scripted automation.

The Three Gaps CA Can't Fill

Gap 1: Pattern Detection Over Time (Brute Force)

What CA can't do: Aggregate authentication failures over time windows.

Real-world impact: Attacker tries hundreds of passwords over hours from UK-based VPS. Each attempt: UK location (policy allows), wrong password (auth fails, CA doesn't evaluate). No detection, no lockout, no alert.

Solution needed: Time-windowed aggregation. "If user fails authentication 5+ times in 10 minutes, trigger alert."

Gap 2: Monitoring Excluded Users (Executive Compromise)

What CA can't do: Monitor excluded users' activity. No Conditional Access enforcement or policy-driven detection is applied to excluded users.

Real-world impact: Executive account compromised. Attacker logs in from Singapore at 2 AM. Policy doesn't apply (executive excluded) → Access granted → No alert.

Solution needed: Monitoring layer that flags ALL non-UK access (including excluded users) for investigation.

Gap 3: Behavioral Anomaly Detection (Off-Hours)

What CA can't do: Establish per-user behavioral baselines and detect deviations.

Real-world impact: Finance manager account compromised. Attacker logs in from UK-based VPS at midnight. UK location = policy allows. Unusual timing goes unnoticed for days.

Solution needed: Behavioral analytics that track normal patterns per user and flag outliers.

Next section will show you how to build it.

Conclusion: CA Prevents, Detection Detects

We tested Conditional Access with realistic authentication data:

What CA does brilliantly:

✅ Blocked non-UK login attempts from operational staff
✅ Maintained executive mobility
✅ Binary decisions executed instantly

What CA cannot do:

❌ Detect patterns over time (brute force)
❌ Monitor excluded users' activity (blind spot)
❌ Identify behavioral anomalies (off-hours, unusual locations)

Conditional Access evaluates context (location, device, risk signals) but not intent. A sequence of low-risk events can still represent high-risk behavior when viewed over time.

This isn't a criticism of CA—it's understanding its design: Conditional Access is a prevention control. It's not designed for continuous monitoring, pattern detection, or behavioral analytics.

Defense-in-depth requires both:

Conditional Access: Prevents threats at the door
Stream Analytics (next section): Detects patterns that slip through

Skills demonstrated:

MSAL authentication with ROPC flow (generating real Entra ID sign-in events for testing)
Realistic authentication data generation (geographic and temporal diversity)
Conditional Access policy design (geo-blocking with executive exclusions)
CA evaluation outcome interpretation (Success, Failure, Not Applied, Auth Failure)
Security gap analysis (what CA detects vs. what it misses)

Next: Event-Driven Stream Analytics Threat Detection — where we'll aggregate these events in real-time to catch brute force, monitor excluded users, and flag off-hours anomalies

We have the authentication data. We understand CA's gaps. Now let's build real-time detection that catches the patterns CA can't see.

Conditional Access evaluates events. Detection systems evaluate sequences.

Security failures happen in the gap between those two questions.

DEV Community