DEV Community

Cover image for Fog in the sky: logging & visibility issues in the cloud
Ian Cooper
Ian Cooper

Posted on

Fog in the sky: logging & visibility issues in the cloud

Security logging and monitoring failures have been in the OWASP top ten in recent years, and for good reason. Can’t catch the bad guys if you can’t see them. These failures should be a top concern for cloud providers & users of the cloud in such a quickly developing cloud landscape. However, despite the consequences, logging & monitoring solutions from the major cloud providers remain imperfect. Read on for a few issues I’ve found in the logging & monitoring solutions from GCP, Azure, and AWS. Whether it’s an inconvenient logging structure, or a significant error in logging performance, being aware of some of such issues may help to avoid disaster and stay ahead of adversaries.

Contents

Background

From personal experience, visibility issues are one of the most frustrating problems to deal with as a security team. The thrill of chasing a red team down quickly fades when it’s identified that security logging was either insufficient or turned off when it was needed most. Majority of the time, logging sources are simply disabled to cut costs. However, when logging is enabled, detections are solid, but the quality of logs simply isn’t good enough- that’s when it really feels like our hands are tied as security professionals. The cloud can be a big opportunity for attackers, so we should give ourselves every chance possible to detect threat actors. That means quality logging. Let’s discuss some examples I have found where the quality of logs in the cloud could cause problems.

GCP

Our first problem lies with GCP, specifically with the SetIamPolicy API call. This API is used to update the access control policies for specified resources. A common operation in most environments. However, it can be involved in an attacker’s attempt to escalate privileges, so it’s important to log these operations for security monitoring & incident response. Now, inside of the log for a SetIamPolicy API call is the "bindingdelta" field (as seen in the docs). The bindingdelta field should be of interest to security teams because it indicates what changes are being attempted. Perhaps a security alert should be generated when the changes are suspicious…

Now that we understand why the SetIamPolicy log and the “bindingdelta” field contained within it are valuable for security monitoring, let us be concerned by the fact that the "bindingdelta" field is not present during failed SetIamPolicy API calls. This means that people detecting off of this field are not able to successfully monitor suspicious instances of failed API calls. Imagine an attacker gets access to a target environment, and attempts to change various IAM policies. Assuming the attacker is performing these actions under the guise of some compromised identity, it may not be obvious to defenders that something malicious is going on. There are likely many instances of failed SetIamPolicy calls in most environments, and without the attempted changes being logged in the bindingdelta field, there is little to observe or scrutinize about the event. So, without the visibility into what changes are actually being attempted, defenders may be left in the dark.

Azure

Network security groups (NSGs) are an important part of network security in the cloud. They serve as a firewall between VMs and other resources in a given virtual network. Needless to say they are valuable when it comes to security monitoring.

It can be tricky to know which logs to pay attention to because Azure Activity Monitor generates several events from a single action. For example, when an NSG is edited, you can find multiple "Microsoft.Network/networkSecurityGroups/write" events in the activity monitor log. All of these events pertain to the same action, but have a different “status” (Started, Accepted, Succeeded). From a security perspective, one might be especially interested in the Succeeded events to see which changes actually carried through and did not fail.

However, I found that the actual changes to the network security group rules are not consistently found in "Microsoft.Network/networkSecurityGroups/write" events. Specifically, the contents of the rule changes are most often found in the Succeeded events, but sometimes can only be found in the Started, or Accepted events. This produces a security gap if someone is attempting to detect off of certain port ranges and/or IPs being introduced with the status Succeeded. So, if you are using a SIEM or other non-microsoft tool, be careful when creating custom rules around Azure NSG changes. Creating an alarm for sensitive NSG changes may be more reliable with a custom query in Azure Log Analytics.

AWS

For our last concern, we’ll chat about network ACLs in AWS. While this might not be a massive issue, it is mildly interesting that there’s no way to tell via cloudtrail logs that a network ACL was dissociated from a subnet. Fun fact: dissociating a network ACL from a subnet causes the default network ACL to be attached to the subnet. The default network ACL allows all outbound & inbound traffic. It is possible that a change to the associated network ACL could be an attacker in the environment drawing closer to a target resource in the VPC (virtual private cloud). More research is needed to determine how harmful this could be. This logging gap may not be destructive on its own, but a skillful threat actor with knowledge of multiple monitoring weaknesses is more likely to reach their objectives.

In conclusion

High quality logging and monitoring allows defenders to have visibility into attacker activities, and have confidence that detection logic is going to operate as intended. Hopefully the quality of logging continues to improve. I won’t share too much of my own opinion here, but I believe that there are many strengths and weaknesses among the major cloud providers, and quality of logging & logging documentation is a big item that I like to score them on.

Tips

  • Be aware of the various monitoring issues with your cloud provider. Understand the reach of your query + detection logic. Embrace a defense in depth approach, or even consider having multiple detections for actions on sensitive resources.
  • Be careful when creating detection logic for log schemas that are not fully consistent. When possible, check the logging output of simulated attacker behavior in your lab environment to verify the detection & logging will behave as expected.

May the future yield a highly visible cloud, where logging is consistent, documentation is there to explain the nuances, and defenders are fully enabled to do their job.

Top comments (0)