Running out of IP addresses in your subnets is a real issue that most teams face these days. Most of the times those IPs are reserved but unused!
This blog will address the problem of Unused Elastic Network Interfaces and what to do to free up our IP address Pool for other services.
The Solution will consist of a Lambda function that gets triggered daily and a CloudWatch Alarm to alert us of any errors generated by our Lambda.
At first I will go through the solution and how to create it using the AWS Console, then I will be doing the same solution but using Infrastructure as Code (Terraform).
Creating the Lambda
Before Going through the code, let's set up some of the configuration parameters:
1- The most important one is the timeout, make sure that it is more than 1 minute (this will depend on your workloads)
2- Memory: 200 MB should be enough.
3- No need to put the lambda inside a VPC.
Lambda Code using python:
First, import AWS SDK for Python (Boto3):
PS: To find out more about about the AWS SDK, check out this link
import boto3
client = boto3.client('ec2')
Next Step: Import Subnets that have a specific tag.
key = type
value = private
There are multiple approaches for this. The way that I will be doing it is:
1- Describe all the resources based on tags
2- Add filters on resource type and tag (Key and Value)
# Get Subnets that have a Specific Tag.
tags = client.describe_tags(
Filters = [
{
'Name' : 'resource-type',
'Values' : [
'subnet'
]
},
{
'Name' : 'tag:type',
'Values': [
'private'
]
}
]
)
A formatted output of this method:
{
"Tags":[
{
"Key":"type",
"ResourceId":"subnet-062715a13f1fffa54",
"ResourceType":"subnet",
"Value":"private"
},
{
"Key":"type",
"ResourceId":"subnet-0ee66ce86ffe0c073",
"ResourceType":"subnet",
"Value":"private"
}
],
"ResponseMetadata":{}
}
Next, Get the subnet id from the result dictionary by parsing the data. Example:
# Get Subnets that have a Specific Tag.
list_subnets = []
i = 0
while i < len(tags['Tags']):
list_subnets.append(tags['Tags'][i]['ResourceId'])
i = i+1
Now that we have the subnet IDs, next step is to retrieve all the network Interfaces and delete them.
To narrow down our results to only the ones needed, Filters need to be added. those are:
1- Filter to get ENIs from specific subnets
2- Filter to get ENIs that are unused (Available)
NOTE that in the Value you need to put the Subnet ID that your retrieved before
eni = client.describe_network_interfaces(
Filters=[
{
'Name': 'subnet-id',
'Values': [
subnetid,
]
},
{
'Name': 'status',
'Values': [
'available'
]
},
]
)
i = 0
while i < len(eni["NetworkInterfaces"]):
network_interface = client.NetworkInterface(eni["NetworkInterfaces"][i]['NetworkInterfaceId'])
network_interface.delete()
i = i+1
For the handler we will have to call and sync the 2 previous functions for the lambda to work properly.
# Delete Available Network Interfaces in Specific Subnets
def lambda_handler(event, context):
list_subnet = get_tagged_subnets()
i = 0
while i < len(list_subnet):
delete_available_eni(list_subnet[i])
i = i+1
return {
"statusCode": 200,
}
Role of the lambda
Specific Permission the lambda needs to have to function properly.
The IAM Policy following the least privilege principle is:
{
"Statement": [
{
"Action": [
"ec2:DescribeTags",
"ec2:DescribeNetworkInterfaces",
"ec2:DeleteNetworkInterface"
],
"Effect": "Allow",
"Resource": "*",
"Sid": "1"
}
],
"Version": "2012-10-17"
}
Lambda Trigger
Invoking the lambda with Amazon EventBridge is divided into two steps:
1- Creating a rule that gets triggered every certain time
2- Assigning the the Lambda as a target for the rule
SNS for any errors
In case of any generated errors by the lambda, receiving an email to troubleshoot the error is a must.
3 AWS Services are needed:
- SNS Topic
- SNS Subscription
- CloudWatch Alarms
Creating the SNS Topic is very straightforward.
Select the Standard one, name it and leave the rest as default.
For the SNS Subscription it is even easier!
Select the topic you wish to subscribe to and the protocol and add your email!
For the CloudWatch Alarms, The screenshots below explain how to set them up:
Infrastructure as code
To benefit from Consistency, Speed, and decrease human error. Let's Deploy our infrastructure using Terraform:
Lambda Function:
PS: Your python Code named main.py would be under a src Directory in the same directory as your terraform project.
module "lambda_clean_eni" {
source = "terraform-aws-modules/lambda/aws"
function_name = format("clean_eni")
description = "Delete Unused Available ENIs in Subnets that contains EKS Clusters"
handler = "main.lambda_handler"
runtime = "python3.9"
publish = true
role_name = "Lambda-Clean-ENI"
memory_size = 200
timeout = 600
attach_cloudwatch_logs_policy = true
attach_policy_jsons = true
number_of_policy_jsons = 1
policy_jsons = [
data.aws_iam_policy_document.clean_eni.json,
]
source_path = "${path.module}/src"
hash_extra = filesha256("${path.module}/src/main.py")
allowed_triggers = {
EveryHourRule = {
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.clean_eni.arn
}
}
attach_network_policy = true
}
Lambda IAM Role Policy:
data "aws_iam_policy_document" "clean_eni" {
statement {
sid = "1"
actions = [
"ec2:DeleteNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeTags",
]
effect = "Allow"
resources = ["*"]
}
}
EventBridge:
resource "aws_cloudwatch_event_rule" "clean_eni" {
name = "Clean-Eni-Lambda-Rule"
description = "Fires once everyday"
schedule_expression = "rate(1 day)"
}
resource "aws_cloudwatch_event_target" "clean_eni" {
rule = aws_cloudwatch_event_rule.clean_eni.name
arn = module.lambda_clean_eni.lambda_function_arn
}
Cloudwatch Alarms:
module "alarm_lambda_clean_eni" {
source = "terraform-aws-modules/cloudwatch/aws//modules/metric-alarm"
create_metric_alarm = true
alarm_name = "Lambda-clean-eni-error"
alarm_description = "Lambda error rate is too high"
comparison_operator = "GreaterThanOrEqualToThreshold"
insufficient_data_actions = []
evaluation_periods = 1
threshold = 1
alarm_actions = [aws_sns_topic.alarm_error.arn]
metric_query = [{
id = "1"
return_data = true
label = "Error Count"
metric = [{
namespace = "AWS/Lambda"
metric_name = "Errors"
period = 60
stat = "Sum"
unit = "Count"
dimensions = {
FunctionName = module.lambda_clean_eni.lambda_function_name
}
}]
}]
}
SNS Topic and Subscription:
resource "aws_sns_topic" "alarm-error" {
name = "alarm-error"
}
resource "aws_sns_topic_subscription" "alarm-error-sub" {
topic_arn = aws_sns_topic.alarm-error.arn
protocol = "email"
endpoint = "your@email.com"
}
Summary: With this Solution, we are able to to successfully mitigate the problem of running out of IP addresses in our subnets.
Side Note: You can customize your filters however you like, so feel free to explore how to make them fit your environment!
Top comments (0)