DEV Community

Cover image for IAM Ghost Roles: The Forgotten Cleanup and a Custom Toolkit to Fix It
Jana Hockenberger
Jana Hockenberger

Posted on

IAM Ghost Roles: The Forgotten Cleanup and a Custom Toolkit to Fix It

Hunting Down Orphaned Identities

IAM Housekeeping is a topic often overlooked when hosting several AWS accounts. I’m pretty sure most of us have granted a policy too many permissions to “make sure” that it works when being under time pressure and then forget about it - at least I did.

But often also orphaned roles and policies are laying around in our accounts which have been automatically deployed when creating another resource which now doesn’t exist anymore. This for example is the case for Lambda functions, you create a function, an IAM role and policy gets created, but when the function is deleted, the IAM part stays in the account.

Sometimes you may also rather create a new role than using an existing one to keep the use-case more transparent.

All these situations are very realistic to occur and therefore it should be checked on a regular base whether certain roles are still in use or permissions might be over-permissive

The High Cost of AWS Native Ghosthunting

AWS offers the “Unused Access” function for that which is part of the IAM Access Analyzer. The solution is straightforward, providing you a list of findings for the found resources.

The only downside of it are the cost which are… a pretty big downside.

When using the unused access functionality AWS charges you 0,20$ per role or user per month.

Imagine you have 100 roles and 10 users in each of 10 AWS Accounts. When using unused access you would have the following costs per month:

$0.20 /IAM role or user analyzed/month

100 Roles x 0,20$ = 20$

10 Users x 0,20$ = 2$

22$ x 10 Accounts = 220$ per month

This number can easily grow for evolving environments making it hard to predict what actual costs could occur. AWS states the following about the cost topic:

“Periodically review and remove unnecessary IAM roles and users. Because IAM Access Analyzer unused access analysis charges are based on the number of roles and users analyzed, removing unused roles and users will help reduce unused access findings cost. This is also a security best practice for IAM.”

Since I didn't want to spend more time in adjusting the Analyzer than cleaning up the actual ressources and putting me under self-induced pressure trying to keep up with the findings, I decided to build my own solution for this topic which comes at almost zero costs.

Ghostbusting on a Budget

The solution consists of two parts: One Lambda function to identify ophaned 'ghost' resources and another to visualize the findings in a centralized Cloudwatch Dashboard. The analysis Lambda lets you define a threshold setting to decide when a role has officially turned into a 'ghost'. Also you have the possibility to set an exclusion tag for specific resources which shouldn't be checked.

The process starts by interating through all accounts in the organization. With the help of a cross-account role, the Lambda accesses each account and reads out each role and policy.
The following steps are checked during the analysis:

  • Calculate days since last used (or since creation if never used)
  • Flag as unused if threshold exceeded
  • Analyze managed policies for overly broad access (FullAccess, AdminAccess)
  • Analyze customer managed policies and inline policies for wildcard permissions
  • Analyze trust policy for risky principals (*, root accounts)
def analyze_single_role(iam_client, role, unused_threshold):
    """Analyze a single IAM role including policies"""
    role_name = role["RoleName"]

    try:
        role_details = iam_client.get_role(RoleName=role_name)["Role"]

        # Check last used date
        is_unused = False
        last_used = None
        days_unused = None

        if "RoleLastUsed" in role_details and "LastUsedDate" in role_details["RoleLastUsed"]:
            last_used_date = role_details["RoleLastUsed"]["LastUsedDate"]
            last_used = last_used_date.isoformat()
            days_unused = (datetime.now(last_used_date.tzinfo) - last_used_date).days
            is_unused = days_unused > unused_threshold
        else:
            # Never used - check creation date
            create_date = role["CreateDate"]
            days_unused = (datetime.now(create_date.tzinfo) - create_date).days
            is_unused = days_unused > unused_threshold

        # Analyze attached policies for unused permissions
        unused_permissions = analyze_permissions(iam_client, role_name, "role")

        return {
            "roleName": role_name,
            "roleArn": role["Arn"],
            "lastUsed": last_used,
            "daysSinceLastUsed": days_unused,
            "isUnused": is_unused,
            "hasUnusedPermissions": len(unused_permissions) > 0,
            "unusedPermissions": unused_permissions
        }

    except Exception as e:
        print(f"Error analyzing role {role_name}: {str(e)}")
        return {"roleName": role_name, "roleArn": role["Arn"], "isUnused": False}

Enter fullscreen mode Exit fullscreen mode

For the policy checks the following steps are being performed:

  • Detect excessive Admin rights (FullAccess, AdminAccess, PowerUser policies)
  • Detect wildcard permissions in actions
  • Detect wildcard permissions in resources
def has_wildcard_permissions(policy_document):
    """Check if policy document contains wildcard permissions"""
    statements = policy_document.get("Statement", [])
    if not isinstance(statements, list):
        statements = [statements]

    for statement in statements:
        actions = statement.get("Action", [])
        if isinstance(actions, str):
            actions = [actions]

        for action in actions:
            if action == "*" or action.endswith(":*"):
                return True

        # Check for wildcard resources
        if "*" in statement.get("Resource", []):
            return True

    return False

Enter fullscreen mode Exit fullscreen mode

After the analysis part, the gathered information gets published to a Cloudwatch dashboard for better readability.
The output could look like this:

On top of the Cloudwatch Dashboard you find an overview over the number of 'Accounts Analyzed, 'Total Issues Found', 'Unused Resources' and 'Unused Permissions'. Below that, the full list of resources visible. The 'Issue Type' column shows the Resource Type like 'Unused Role' or 'Unused Permissions'. The 'Details' column gives you some more insights like the last usage date for Roles and Users or the overly broad actions in your permissions.

Depending on the amount of findings, you can also check the CSV Report which gets generated and uploaded to the S3 Bucket defined in the templates. This offers you better filtering options and tracking of your clean-up process.

In my solution I put an Eventbridge Scheduler in front of the Lambda, so the analysis gets executed everyday, for sure you can adjust this according to your needs.

With this radar in hand, you're ready to start your cleanup and send thos ghosts into the afterlife!

The complete code is accessible in my GitHub repository. Make sure to rollout the Cross-Account Role on your own before and had it over as a parameter when deploying the stack!

About Me

Hi! My name is Jana, I live in the Southwest of Germany and when I'm not smashing weights in the gym I love to architect solutions in AWS making my and the customers lives easier.

My computer science journey started as an On-Premise System Administrator over the time developing to an AWS Architect. As I know both the "old" and "new" world, I know common pain points in architectures and being able to provide solutions to solve them and making them not even more efficient but also cheaper!

I enjoy to learn and as the AWS portfolio is evolving all the time, I also try to stay up to date by getting certified and checking out newly launched products and services.

If you want to lift your environment either to the cloud or want to leverage your already migrated environment to use more of the cloud services, hit me up or check out Public Cloud Group GmbH!

About PCG

Public Cloud Group supports companies in their digital transformation through the use of public cloud solutions.

With a product portfolio designed to accompany organisations of all sizes in their cloud journey and competence that is a synonym for highly qualified staff that clients and partners like to work with, PCG is positioned as a reliable and trustworthy partner for the hyperscalers, relevant and with repeatedly validated competence and credibility.

We have the highest partnership status with the three relevant hyperscalers: Amazon Web Services (AWS), Google, and Microsoft. As experienced providers, we advise our customers independently with cloud implementation, application development, and managed

Top comments (0)