7 Azure Security Gaps I have Seen in Production (and How to Fix Them)
Note: The issues described here are from real production environments, security audits, and incident reviews — not lab or demo setups.
Over the past few years, while managing 100+ Azure virtual machines in production environments running critical PeopleSoft and Oracle workloads, I have repeatedly encountered the same security gaps across different organizations.
These are not theoretical vulnerabilities from security blogs. They are real-world issues that surface during security audits, incident response investigations, and routine infrastructure reviews.
In this article, I will walk through seven critical Azure security gaps I have personally identified in production, along with:
Real-world examples
Detection methods using Azure-native tools
Practical, step-by-step remediation strategies
Whether you manage a handful of VMs or hundreds, addressing these gaps will significantly improve your Azure security posture before they turn into incidents.
1. Inadequate Network Security Group (NSG) Rules
The Problem
Network Security Groups (NSGs) are the primary network-level control for Azure VMs. Yet I often find production environments with overly permissive inbound rules such as:
Allow * from 0.0.0.0/0
SSH (22) or RDP (3389) exposed to the public internet
Real-world example:
During a security audit, I found 12 production VMs with SSH open to the internet. NSG flow logs showed 50,000+ failed login attempts in 30 days.
How to Detect:
Azure Portal
Network Security Groups → Inbound rules
-
Look for:
- Source: Any or 0.0.0.0/0
- Ports: 22, 3389, 1433, 5432
Review Effective security rules per VM
KQL (NSG Flow Logs)
AzureNetworkAnalytics_CL
| where SubType_s == "FlowLog"
| where SrcIP_s !startswith "10."
and SrcIP_s !startswith "172."
and SrcIP_s !startswith "192.168."
| where DestPort_d in (22, 3389)
| summarize Attempts = count() by SrcIP_s, DestPort_d
| order by Attempts desc
The Fix
Immediate
Remove 0.0.0.0/0 access for SSH/RDP
Whitelist trusted office or VPN IPs
Use Azure Bastion
Enable Just-In-Time (JIT) VM access via Microsoft Defender for Cloud
Long-term
Use Application Security Groups (ASGs)
Implement Azure Firewall
Enforce NSG standards using Azure Policy
2. Missing Azure Backup Policies
The Problem
Many teams assume Azure automatically backs up VMs. It doesn’t.
I have seen production workloads with no Recovery Services vault and no backups.
How to Detect
Azure Portal
Recovery Services vaults → Backup items
Compare against VM inventory
Resources
| where type == "microsoft.compute/virtualmachines"
| project name, resourceGroup
| join kind=leftouter (
RecoveryServicesResources
| where type contains "protectedItems"
| extend vmName = tostring(split(properties.sourceResourceId, "/")[8])
| project vmName
) on $left.name == $right.vmName
| where isnull(vmName)
The Fix
Create Recovery Services vaults per region
Define policies aligned with RPO/RTO
Enable Soft Delete
Test restores quarterly
Configure backup alerts via Azure Monitor
3. Weak Authentication Methods
The Problem
Password-based SSH and RDP access is still common. I have seen environments using the same password across multiple admin accounts, creating a single point of failure.
How to Detect
Linux
grep -i PasswordAuthentication /etc/ssh/sshd_config
grep -i PubkeyAuthentication /etc/ssh/sshd_config
Windows
Review Azure AD Conditional Access
Check Azure AD sign-in logs
The Fix
Linux
Disable passwords: PasswordAuthentication no
Use SSH key-based authentication
Store keys in Azure Key Vault
Enable Azure AD Login for Linux VMs
Windows
Enforce MFA
Reduce local admin usage
Use Privileged Access Workstations (PAWs)
4. Unencrypted Data in Transit
The Problem
HTTP endpoints, databases without TLS, and FTP transfers still exist in production — even for sensitive data.
How to Detect
Application Gateway / WAF Logs
AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS"
| where requestUri_s startswith "http://"
| summarize Count = count() by requestUri_s, clientIP_s
| order by Count desc
The Fix
Enforce HTTPS end-to-end
Redirect HTTP → HTTPS
Enable TLS 1.2+ for databases
Replace FTP with SFTP/FTPS
Manage certificates via Azure Key Vault
5. Improper Role-Based Access Control (RBAC)
The Problem
Developers often have Contributor or Owner access at subscription level, violating least privilege.
How to Detect
authorizationresources
| where type == "microsoft.authorization/roleassignments"
| where properties.roleDefinitionId contains "Owner"
or properties.roleDefinitionId contains "Contributor"
| project principalId, scope = tostring(properties.scope)
The Fix
Audit RBAC regularly
Remove unnecessary subscription-level roles
Create custom roles
Use Azure AD PIM
Enable access reviews
6. Missing Activity Log Alerts
The Problem
Critical changes happen with no alerts, leaving teams blind.
How to Detect
Azure Monitor → Alerts
Filter by Activity Log
The Fix
Create alerts for:
NSG rule changes
VM deletion/creation
RBAC changes
Defender for Cloud policy updates
Key Vault access changes
7. Exposed Management and Database Ports
The Problem
Beyond SSH/RDP, I often find database ports and admin interfaces exposed to the internet.
How to Detect
AzureNetworkAnalytics_CL
| where SubType_s == "FlowLog"
| where DestPort_d in (1433, 3306, 5432, 8080, 8443, 27017, 6379)
| where SrcIP_s !startswith "10."
and SrcIP_s !startswith "172."
and SrcIP_s !startswith "192.168."
| summarize Count = count() by DestPort_d, DestIP_s
| order by Count desc
The Fix
Close non-essential ports
Use Private Endpoints / Private Link
Deploy Application Gateway with WAF
Use Azure Bastion
Follow Defender for Cloud recommendations
Conclusion
Azure security is not a one-time setup. It requires continuous monitoring, audits, and proactive remediation.
Most of these gaps exist not because of missing tools, but because of missing guardrails. Azure Policy, Defender for Cloud, and Infrastructure as Code can prevent most of them before they reach production.
Quick Security Checklist (≈ 85 Minutes)
Audit NSG rules for 0.0.0.0/0 (5 min)
Verify VM backups (10 min)
Review RBAC assignments (15 min)
Check SSH password authentication (10 min)
Enable activity log alerts (20 min)
Scan for exposed DB ports (10 min)
Identify HTTP traffic (15 min)
Let’s Discuss
Have you seen similar Azure security issues in production?
What guardrails do you use to prevent them?
Top comments (0)