Habeeb

Posted on Jun 26

Save Costs with Automation: Using Lambda to Manage EC2 Uptime

#devops #lambda #ec2 #productivity

Hear me out: It’s Monday morning, and you’re sipping your coffee as you would normally while also checking your AWS billing dashboard. Your heart sinks as you see that last month’s bill is way more than expected. The culprit? A couple of m5.xlarge instances running 24/7 for the past month, quietly burning through your budget while sitting completely idle outside business hours.

If this sounds familiar, you’re not alone. In my team, we’ve all been there. Our data processing workloads only run during business hours, yet our EC2 instances were living their best life 24/7, accumulating costs like a taxi meter stuck in traffic. Even with the best intentions and sticky notes reminding us to “TURN OFF THE INSTANCES!”, human error inevitably crept in.

The Simple Solution That Saved Us Up to 70% on EC2 Costs

After one too many awkward meetings explaining why our AWS costs were through the roof, I decided enough was enough. The solution? A simple yet effective automation using AWS Lambda and Terraform that automatically stops our EC2 instances based on a schedule. No more forgotten instances, no more weekend charges for idle resources, and definitely no more awkward budget meetings.

In this post, I’ll walk you through exactly how to implement this solution, complete with tested Python code and Terraform configurations. By the end, you’ll have:

A Lambda function that intelligently manages your EC2 instances.
Terraform code to deploy everything with a single command.
A scheduling system that works around your team’s actual working hours.
Peace of mind knowing your instances will never be forgotten again.

Why This Approach Works

Before diving into the code, let’s talk about why Lambda + EventBridge (formerly CloudWatch Events) is the perfect combo for this use case:

Cost-Effective: Lambda charges only for execution time (a few cents per month for this use case).
Reliable: AWS manages the infrastructure, so your scheduler won’t fail.
Flexible: Easy to adjust schedules for holidays, maintenance windows, or team changes.
Auditable: CloudTrail logs every start/stop action for compliance and debugging.

How It All Works Together

Define EC2 Instances
You tag the EC2 instances that should be managed by the automation (e.g., AutoStop = true).
Write Lambda Function
The Python code reads instance tags and starts or stops the instances accordingly.
Set Up Scheduled Rules
Using EventBridge, you create cron-style schedules to invoke the Lambda function.
Provision Everything with Terraform
All the resources, including IAM permissions are managed through Terraform, making the setup portable and easy to manage.

For the purpose of this blog, I will assume you already have Terraform and AWS CLI and Python configured.

Project structure

Create a new directory for this project

mkdir ec2-scheduler
cd ec2-schdeuler
#create a directory for python scripts
mkdir python

This is the project structure we want to work with:

ec2-scheduler/
 ├── python/
 │   ├── EC2InstanceStart.py
 │   ├── EC2InstanceStop.py
 │   └── requirements.txt
 ├── lambda_layer/
 │   └── python/
 │       └── (requests library files)
 ├── lambda_layer.zip
 ├── main.tf
 ├── eventbridge.tf
 ├── variables.tf
 ├── outputs.tf
 ├── provider.tf
 └── backend.tf

The requirement.txt file should contain the below as that's the only library we are using, boto3 is already pre-intstalled in Lambda runtime environment and for the lambda layer directory we will get to that later in this post:

requests==2.31.0

Terraform Setup

Let’s start with the infrastructure. Here’s how we set up the Lambda function, necessary permissions, and the Eventbridge rule that triggers it.

Lambda function, IAM Role and Policy:

Create a .tf file in your working directory, name it main.tf as this will consist the main component for this (The role, policy and lambda )

resource "aws_iam_role" "LambdaEC2Role" {
  name = "LambdaEC2Role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Action = "sts:AssumeRole",
      Effect = "Allow",
      Principal = {
        Service = "lambda.amazonaws.com"
      }
    }]
  })

  tags = {
    Project = "EC2-Scheduler"
    Purpose = "Cost-Optimization"
  }
}

resource "aws_iam_policy" "LambdaEC2StartStopPolicy" {
  name = "LambdaEC2StartStopPolicy"
  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Action = ["ec2:StartInstances", "ec2:StopInstances"],
        Resource = "arn:aws:ec2:*:*:instance/*",
        Effect = "Allow"
      },
      {
        Action = ["ec2:DescribeInstances", "ec2:DescribeTags", "ec2:DescribeInstanceStatus"],
        Resource = "*",
        Effect = "Allow"
      },
      {
        Action = ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"],
        Resource = "*",
        Effect = "Allow"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "attach_lambda_policy_to_lambda_role" {
  role       = aws_iam_role.LambdaEC2Role.name
  policy_arn = aws_iam_policy.LambdaEC2StartStopPolicy.arn
}

data "archive_file" "lambda_package" {
  type        = "zip"
  source_dir  = "${path.module}/Python/"
  output_path = "${path.module}/Python/lambda_package.zip"
}

resource "aws_lambda_function" "EC2AutoStopLambda" {
  filename                        = data.archive_file.lambda_package.output_path
  function_name                   = "EC2AutoStopLambda"
  role                            = aws_iam_role.LambdaEC2Role.arn
  runtime                         = "python3.12"
  handler                         = "EC2AutoStop.lambda_handler"
  memory_size                     = 128
  timeout                         = 60
  reserved_concurrent_executions = 10
  source_code_hash = data.archive_file.lambda_package.output_base64sha256

  environment {
    variables = {
      AWS_REGION        = var.aws_region
      TEAMS_WEBHOOK_URL = var.teams_webhook_url
    }
  }

  # Install dependencies if requirements.txt exists

  layers = var.teams_webhook_url != "" ? [aws_lambda_layer_version.requests_layer[0].arn] : []

  tags = {
    Project = "EC2-Scheduler"
    Purpose = "Stop-Instances"
  }

  depends_on                      = [aws_iam_role_policy_attachment.attach_lambda_policy_to_lambda_role]
}

resource "aws_lambda_function" "EC2AutoStartLambda" {
  filename         = data.archive_file.lambda_package.output_path
  function_name    = "EC2AutoStartLambda"
  role            = aws_iam_role.lambda_ec2_role.arn
  runtime         = "python3.9"
  handler         = "EC2InstanceStart.lambda_handler"
  memory_size     = 128
  timeout         = 60
  reserved_concurrent_executions = 10
  source_code_hash = data.archive_file.lambda_package.output_base64sha256

  environment {
    variables = {
      AWS_REGION = var.aws_region
    }
  }

  tags = {
    Project = "EC2-Scheduler"
    Purpose = "Start-Instances"
  }

  depends_on = [aws_iam_role_policy_attachment.lambda_policy_attachment]
}

Eventbridge rule that triggers the Lambda

Create another file in the root of your project, name the file eventbridge.tf

resource "aws_cloudwatch_event_rule" "EC2AutoStopRule" {
  name                = "EC2AutoStopRule"
  description         = "Rule to trigger Lambda to Stop EC2 Instances"
  schedule_expression = var.stop_schedule

  tags = {
    Project = "EC2-Scheduler"
  }
}

resource "aws_cloudwatch_event_rule" "EC2AutoStartRule" {
  name                = "EC2AutoStartRule"
  description         = "Rule to trigger Lambda to start EC2 instances"
  schedule_expression = var.start_schedule

  tags = {
    Project = "EC2-Scheduler"
  }
}

resource "aws_cloudwatch_event_target" "EC2AutoStopRuleTarget" {
  target_id = "EC2AutoStopLambda"
  arn       = aws_lambda_function.EC2AutoStopLambda.arn
  rule      = aws_cloudwatch_event_rule.EC2AutoStopRule.name
}

resource "aws_cloudwatch_event_target" "EC2AutoStartRuleTarget" {
  target_id = "EC2AutoStartLambda"
  arn       = aws_lambda_function.EC2AutoStartLambda.arn
  rule      = aws_cloudwatch_event_rule.EC2AutoStartRule.name
}

resource "aws_lambda_permission" "EC2AutoStopLambdaPermission" {
  statement_id  = "AllowExecutionFromEventbridge"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.EC2AutoStopLambda.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.EC2AutoStopRule.arn
}

resource "aws_lambda_permission" "EC2AutoStartLambdaPermission" {
  statement_id  = "AllowExecutionFromEventbridge"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.EC2AutoStartLambda.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.EC2AutoStartRule.arn
}

Now we define variables to allow flexibility with our code, create a file variable.tf with the below:

variable "aws_region" {
  description = "AWS region for resources"
  type        = string
}

variable "start_schedule" {
  description = "Cron expression for starting instances"
  type        = string
}

variable "stop_schedule" {
  description = "Cron expression for stopping instances"
  type        = string
}

variable "teams_webhook_url" {
  description = "Microsoft Teams webhook URL for notifications"
  type        = string
  default     = ""  # Set this via terraform.tfvars or environment variable
  sensitive   = true
}

Provider and backend config

provider.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  required_version = ">= 1.0"
}

provider "aws" {
  region = var.aws_region  # We'll define this in variables.tf
}

backend.tf

terraform {
  backend "s3" {
    bucket         = "my-terraform-state-bucket"
    key            = "ec2-auto-scheduler/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Make sure the dynamodb and s3 bucket exist before hand, I can do another blog on this in a later date.

Finally, I like to create an outputs.tf file to display useful information, below is what I think is needed for this deployment.

output "start_function_arn" {
  description = "ARN of the start Lambda function"
  value       = aws_lambda_function.start_instances.arn
}

output "stop_function_arn" {
  description = "ARN of the stop Lambda function"
  value       = aws_lambda_function.stop_instances.arn
}

output "start_schedule" {
  description = "Cron expression for start schedule"
  value       = aws_cloudwatch_event_rule.start_schedule.schedule_expression
}

output "stop_schedule" {
  description = "Cron expression for stop schedule"
  value       = aws_cloudwatch_event_rule.stop_schedule.schedule_expression
}

Quick explanation on the Lambda layer:

If you’re using Teams notifications, you’ll need to create a Lambda layer with the requests library. Run these commands from your project root (ec2-scheduler/):

# From the ec2-scheduler/ directory
mkdir -p lambda_layer/python

# Install requests to the directory
pip install requests -t lambda_layer/python/

# Create the layer zip
cd lambda_layer
zip -r ../lambda_layer.zip .
cd ..

Once all of this is in place, up next is the actual Python Lambda code.

The Lambda Functions: Where the Magic Happens

Now for the fun part, the actual code that will save you thousands of dollars. We’ll create two Lambda functions: one to stop instances and another to start them. The stop function has a bonus feature: it monitors instances without the AutoStop tag set to true and alerts you if they’ve been running for more than 48 hours.

The Stop Lambda Function: Your evening watchdog

With the infrastructure defined in Terraform, the core logic lives inside the AWS Lambda function. This Python script performs two tasks:

Automatically stops EC2 instances that are running and tagged for auto-stop.
Sends a Microsoft Teams alert if any untagged instance has been running for more than 48 hours.

Create a folder in your current working directory, name it Python, cd into this folder then create a file name EC2InstanceStop.py with the following code

import boto3
import logging
import os
import requests
from datetime import datetime, timedelta

logger = logging.getLogger()
logger.setLevel(logging.INFO)

region = os.environ['AWS_REGION']
ec2 = boto3.resource('ec2', region_name=region)

# Teams webhook URL
teams_url = 'put in the url of your team webhook'

def send_teams_alert(instance_id):
    message = f"Instance {instance_id} has been running for more than 48 hours."
    headers = {"Content-Type": "application/json"}
    payload = {"text": message}
    requests.post(teams_url, headers=headers, json=payload)

def lambda_handler(event, context):

    filters = [
        {
            'Name': 'tag:AutoStop',
            'Values': ['TRUE','True','true']
        },
        {
            'Name': 'instance-state-name',
            'Values': ['running']
        }
    ]

    instances = ec2.instances.filter(Filters=filters)
    RunningInstances = [instance.id for instance in instances]
    print("Running Instances with AutoStop Tag : " + str(RunningInstances))

    if len(RunningInstances) > 0:
        for instance in instances:
            if instance.state['Name'] == 'running':
                print("Stopping Instance : " + instance.id)
        AutoStopping = ec2.instances.filter(InstanceIds=RunningInstances).stop()
        print("Stopped Instances : " + str(RunningInstances))
    else:
        print("Instance not in Running state or AutoStop Tag not set...")

    # Check for instances running for more than 48 hours
    filters = [
        {
            'Name': 'tag:AutoStop',
            'Values': ['FALSE','False','false']
        },
        {
            'Name': 'instance-state-name',
            'Values': ['running']
        }
    ]

    instances = ec2.instances.filter(Filters=filters)
    RunningInstances = [instance.id for instance in instances]

    for instance in instances:
        launch_time = instance.launch_time
        if datetime.now(launch_time.tzinfo) - launch_time > timedelta(hours=48):
            print("Instance running for more than 48 hours : " + instance.id)
            send_teams_alert(instance.id)

What the code does:

The python script filters for instances with tag AutoStop = true and in a running state, it then stops any instances that satisfy this criteria using the ec2.instances.stop() call. This allows you to only opt in instances you want to manage by this automation, using tags.

The script also check for instances not tagged for auto-stop (AutoStop = false) and if they’ve been running for more than 48 hours , it then send an alert to a configured Microsoft Teams Channel.

We pick the instances we want this Lambda to manage by setting tags on them as below:

| Tag Key    | Tag Value | Behavior                            |
| ---------- | --------- | ----------------------------------- |
| `AutoStop` | `true`    | Will be stopped automatically       |
| `AutoStop` | `false`   | Will be left running, but monitored |
| (missing)  | —         | Ignored entirely                    |

The Start Function: Your Morning Coffee Companion

The start function is much more simpler, it just needs to wake up your instances from their slumber:

import boto3
import logging
import os

logger = logging.getLogger()
logger.setLevel(logging.INFO)

region = os.environ['AWS_REGION']
ec2 = boto3.resource('ec2', region_name=region)

def lambda_handler(event, context):

    filters = [
        {
            'Name': 'tag:AutoStart',
            'Values': ['TRUE','True','true']
        },
        {
            'Name': 'instance-state-name',
            'Values': ['stopped']
        }
    ]

    instances = ec2.instances.filter(Filters=filters)
    StoppedInstances = [instance.id for instance in instances]
    print("Stopped Instances with AutoStart Tag : " + str(StoppedInstances))

    if len(StoppedInstances) > 0:
        for instance in instances:
            if instance.state['Name'] == 'stopped':
                print("Starting Instance : " + instance.id)
        AutoStarting = ec2.instances.filter(InstanceIds=StoppedInstances).start()
        print("Started Instances : " + str(StoppedInstances))
    else:
        print("Instance not in Stopped state or AutoStart Tag not set...")

This lets you apply this automation without losing control and also using tags allows you to be flexible and not have to hardcode instance-id, new instances can also be added by just tagging them accordingly, very good in a scenario where you have different kind of workloads.

Deployment: Making It All Come Together

Now the fun part, putting everything we’ve done so far together for our automation to come to live:

Step 1: Prepare the instances

We want to make sure our EC2 instance are properly tagged, by using the command below, you can tag instances or do it in the management console.

# Tag instances you want to auto-start and auto-stop
aws ec2 create-tags \
    --resources ${instance-id} \
    --tags Key=AutoStart,Value=true Key=AutoStop,Value=true

# For instances that should only be stopped (not auto-started)
aws ec2 create-tags \
    --resources ${instance-id} \
    --tags Key=AutoStop,Value=true

# For instances you want to monitor but not auto-stop
aws ec2 create-tags \
    --resources ${instance-id} \
    --tags Key=AutoStop,Value=false

Step 2: Configure Your Variables

Create a terraform.tfvars file with your environment specific settings:

aws_region = "us-east-1"

# Adjust these times for your timezone (these are in UTC)
start_schedule = "cron(0 13 ? * MON-FRI *)"  # 8 AM EST
stop_schedule  = "cron(0 23 ? * MON-FRI *)"  # 6 PM EST

# Add your Teams webhook for notifications
teams_webhook_url = "https://outlook.office.com/webhook/YOUR-WEBHOOK-URL"

Step 3: Initialize and Deploy

Time to deploy! Run these commands from your project root:

# Initialize Terraform
terraform init

# Create the Lambda layer (if using Teams notifications)

pip install requests -t lambda_layer/python/
cd lambda_layer && zip -r ../lambda_layer.zip . && cd ..

# Review what will be created
terraform plan

# Deploy everything
terraform apply

When Terraform asks for confirmation, review the resources being created, in our case we should have the below:

2 Lambda functions
2 EventBridge rules
1 IAM role and policy
Various permissions and attachments

Type yes to proceed.

Step 4: Verify Your Deployment

After successful deployment, Terraform will output important information that we defined in the outputs.tf file earlier, and you can also go into the management console to confirm the services has been created.

The Bottom Line: Your AWS Bill Will Thank You

Let’s recap what you’ve just built

✅ Automated EC2 scheduling that never forgets.
✅ Proactive monitoring for runaway instances.
✅ Infrastructure as code for easy replication.

But here’s the real win: peace of mind. No more weekend anxiety about whether someone left instances running. No more awkward conversations with finance. Just predictable, optimized AWS costs.

Final Thoughts

As engineers, we’re problem solvers. This solution started from a real pain point and evolved into something that saves our team thousands annually. Your feedback and experiences make these solutions better for everyone.

Remember: The best time to optimize your AWS costs was yesterday. The second best time is now.

Happy cost saving, and may your AWS bills be forever low! 🎉

If you found it helpful, please let me know in the comments. Your feedback helps me improve and motivates me to share more solutions.

Please follow me for more AWS optimization tips. Next up: “How to configure lifecycle rule on S3 buckets” — another fun task for us to explore.