V.Ray

Posted on Dec 15, 2025

7 Azure Security Gaps I have Seen in Production (and How to Fix Them)

#azure #devops #security #cloud

Note: The issues described here are from real production environments, security audits, and incident reviews — not lab or demo setups.

Over the past few years, while managing 100+ Azure virtual machines in production environments running critical PeopleSoft and Oracle workloads, I have repeatedly encountered the same security gaps across different organizations.

These are not theoretical vulnerabilities from security blogs. They are real-world issues that surface during security audits, incident response investigations, and routine infrastructure reviews.

In this article, I will walk through seven critical Azure security gaps I have personally identified in production, along with:

Real-world examples
Detection methods using Azure-native tools
Practical, step-by-step remediation strategies

Whether you manage a handful of VMs or hundreds, addressing these gaps will significantly improve your Azure security posture before they turn into incidents.

1. Inadequate Network Security Group (NSG) Rules

The Problem

Network Security Groups (NSGs) are the primary network-level control for Azure VMs. Yet I often find production environments with overly permissive inbound rules such as:

Allow * from 0.0.0.0/0
SSH (22) or RDP (3389) exposed to the public internet

Real-world example:
During a security audit, I found 12 production VMs with SSH open to the internet. NSG flow logs showed 50,000+ failed login attempts in 30 days.

How to Detect:

Azure Portal

Network Security Groups → Inbound rules
Look for:
- Source: Any or 0.0.0.0/0
- Ports: 22, 3389, 1433, 5432
Review Effective security rules per VM

KQL (NSG Flow Logs)

AzureNetworkAnalytics_CL
| where SubType_s == "FlowLog"
| where SrcIP_s !startswith "10."
and SrcIP_s !startswith "172."
and SrcIP_s !startswith "192.168."
| where DestPort_d in (22, 3389)
| summarize Attempts = count() by SrcIP_s, DestPort_d
| order by Attempts desc

The Fix
Immediate

Remove 0.0.0.0/0 access for SSH/RDP
Whitelist trusted office or VPN IPs
Use Azure Bastion
Enable Just-In-Time (JIT) VM access via Microsoft Defender for Cloud

Long-term

Use Application Security Groups (ASGs)
Implement Azure Firewall
Enforce NSG standards using Azure Policy

2. Missing Azure Backup Policies

The Problem

Many teams assume Azure automatically backs up VMs. It doesn’t.
I have seen production workloads with no Recovery Services vault and no backups.

How to Detect

Azure Portal

Recovery Services vaults → Backup items
Compare against VM inventory

The Fix

Create Recovery Services vaults per region
Define policies aligned with RPO/RTO
Enable Soft Delete
Test restores quarterly
Configure backup alerts via Azure Monitor

3. Weak Authentication Methods

The Problem

Password-based SSH and RDP access is still common. I have seen environments using the same password across multiple admin accounts, creating a single point of failure.

How to Detect

Linux

grep -i PasswordAuthentication /etc/ssh/sshd_config
grep -i PubkeyAuthentication /etc/ssh/sshd_config

Windows

Review Azure AD Conditional Access
Check Azure AD sign-in logs

The Fix

Linux

Disable passwords: PasswordAuthentication no
Use SSH key-based authentication
Store keys in Azure Key Vault
Enable Azure AD Login for Linux VMs

Windows

Enforce MFA
Reduce local admin usage
Use Privileged Access Workstations (PAWs)

4. Unencrypted Data in Transit

The Problem

HTTP endpoints, databases without TLS, and FTP transfers still exist in production — even for sensitive data.

How to Detect

Application Gateway / WAF Logs

AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS"
| where requestUri_s startswith "http://"
| summarize Count = count() by requestUri_s, clientIP_s
| order by Count desc

The Fix

Enforce HTTPS end-to-end
Redirect HTTP → HTTPS
Enable TLS 1.2+ for databases
Replace FTP with SFTP/FTPS
Manage certificates via Azure Key Vault

5. Improper Role-Based Access Control (RBAC)

The Problem

Developers often have Contributor or Owner access at subscription level, violating least privilege.

How to Detect

authorizationresources
| where type == "microsoft.authorization/roleassignments"
| where properties.roleDefinitionId contains "Owner"
or properties.roleDefinitionId contains "Contributor"
| project principalId, scope = tostring(properties.scope)

The Fix

Audit RBAC regularly
Remove unnecessary subscription-level roles
Create custom roles
Use Azure AD PIM
Enable access reviews

6. Missing Activity Log Alerts

The Problem

Critical changes happen with no alerts, leaving teams blind.

How to Detect

Azure Monitor → Alerts
Filter by Activity Log

The Fix

Create alerts for:

NSG rule changes
VM deletion/creation
RBAC changes
Defender for Cloud policy updates
Key Vault access changes

7. Exposed Management and Database Ports

The Problem

Beyond SSH/RDP, I often find database ports and admin interfaces exposed to the internet.

How to Detect

AzureNetworkAnalytics_CL
| where SubType_s == "FlowLog"
| where DestPort_d in (1433, 3306, 5432, 8080, 8443, 27017, 6379)
| where SrcIP_s !startswith "10."
and SrcIP_s !startswith "172."
and SrcIP_s !startswith "192.168."
| summarize Count = count() by DestPort_d, DestIP_s
| order by Count desc

The Fix

Close non-essential ports
Use Private Endpoints / Private Link
Deploy Application Gateway with WAF
Use Azure Bastion
Follow Defender for Cloud recommendations

Conclusion

Azure security is not a one-time setup. It requires continuous monitoring, audits, and proactive remediation.

Most of these gaps exist not because of missing tools, but because of missing guardrails. Azure Policy, Defender for Cloud, and Infrastructure as Code can prevent most of them before they reach production.

Quick Security Checklist (≈ 85 Minutes)