DEV Community

Cover image for Controlling AWS Lambda Costs
Kodsama
Kodsama

Posted on

Controlling AWS Lambda Costs

Having full control over your Amazon Web Services (AWS) costs isn't that easy. Amazon designed the service to be able to limit execution as well as inform when something goes wrong or a specified cost (expected or real) is reached.
Instances usually have a fixed maximum cost per month depending on what we run, so I will focus here on one of the scalable parts of the AWS offerings: AWS lambda.

Of course one could always go manually tear down services when the budget is going too high, but I am just lazy (as many of us) and would rather spend hours once to automate this instead.
XKDC automation

Here are some steps I would recommend

 

Setting up Billing Alarms

The first and foremost thing to do is to set up budget alerts for the whole AWS cost.

Billing Alarms are the essential tool for monitoring your AWS costs. It is an easy way to get notified if your monthly AWS bill is estimated to cross a set threshold.

The official documentation is great.

  1. Create a Budget:

    • Go to the AWS Management Console.
    • Navigate to the AWS Budgets dashboard.
    • Click on "Create a budget."
    • Follow the steps to create a budget. Set the budget amount to your desired limit.
  2. Configure Alerts:

    • Set up alert notifications for when your budget threshold is reached. You can choose to receive alerts via email or SNS (Simple Notification Service).
    • To use SNS, create an SNS topic if you don't have one, and add subscribers to the topic (e.g., your email address).

Now, when you reach the budget (or expected budget), you will get an email or be able to trigger things with the alert email using SNS ;)

 

Limiting concurrency

Balancing scalability and cost is crucial. After setting billing alarms and budgets, the next step is to prevent unexpectedly large bills. Since Lambda functions are usage-based, you can control costs by limiting their invocations. Here are the main methods:

 

Reduce the global AWS Service Quotas for Lambda

One way is to reduce the total concurrency of ALL lambda invocations. By default this is set to 1000.

  • Navigate to the AWS Management Console,
  • In the AWS Management Console, search for "Service Quotas" and select it from the search results.
  • In the Service Quotas console, type "Lambda" in the search bar to filter the quotas related to AWS Lambda.
  • Look for the quota named "Concurrent executions" under the AWS Lambda service.
  • Click on the quota name ("Concurrent executions").
  • Click the "Request quota increase" button.
  • Fill in the form with the new limit you are requesting and any other required details.
  • After filling out the form, click "Request."

 

Set Concurrency Limits for Individual Lambda Functions

Lambda's Reserved concurrency lets you limit concurrent invocations. For example, setting this to 1 during development can prevent unexpected charges due to programming errors.

Here we will rate limits our lambdas to cap costs per hour, day, or month. Setting concurrency limits involves deciding how many instances of each Lambda can run simultaneously. Once the limit is reached, additional invocations are throttled, meaning execution is delayed but not prevented.

Setting Concurrency Limits

AWS’s method for setting concurrency limits can be unintuitive. By default, your account has a concurrency limit of 1000, with at least 100 executions pooled for functions without reserved concurrency. You can reserve up to 900 concurrent executions for specific functions.

For example, setting a concurrency limit of 5 for one function means it can only run five instances concurrently, while the remaining functions share the rest of the quota (so 995 if still set at 1000).

To set concurrency limits in the AWS web console:

  1. Navigate to the desired Lambda function.
  2. Go to Configuration > Concurrency panel.
  3. In the "Concurrency" section, click the "Edit" button.
  4. Select the "Reserve concurrency" option. This will allow you to specify a concurrency limit for this function.
  5. Click "Save" in the top-right corner.

Reserved concurrency applies to all versions of the Lambda function. If you have many functions, you might need to request an increase in your account’s total concurrency limit from AWS.

 

Set Rate Limits on API Gateway

API Gateway, which exposes Lambda functions via HTTP, allows you to set rate limits, capping the number of Lambda invocations.

Setting rate limits on API Gateway helps control the number of requests sent to your backend, such as AWS Lambda, to prevent overuse and manage costs. Here’s how to set rate limits on API Gateway:

  1. Navigate to the AWS Management Console.
  2. In the AWS Management Console, search for "API Gateway" and select it from the search results.
  3. In the API Gateway console, you will see a list of your APIs. Click on the name of the API for which you want to set rate limits.
  4. In the left-hand navigation pane, click on "Stages", Select the stage (e.g., prod, dev) where you want to set the rate limits.
  5. With the stage selected, you will see several tabs on the right side. Click on the "Stage Editor" tab. Scroll down to the "Throttle" section.
  6. In the "Throttle" section, you can set the following limits:
    • Rate Limit (requests per second): This is the maximum number of requests per second that the API can handle.
    • Burst Limit: This is the maximum number of requests that the API can handle in a short period (a burst).

For example, if you set the Rate Limit to 100 and the Burst Limit to 200, API Gateway will throttle requests that exceed 100 requests per second or exceed 200 requests in a burst.

  1. After setting the desired rate limits, click the "Save Changes" button at the bottom of the page.
  2. If your API changes require redeployment, go to the "Actions" dropdown and select "Deploy API." Choose the appropriate stage and confirm the deployment.

Considerations

  • Rate Limit: This controls the steady-state rate of requests your API can handle per second.
  • Burst Limit: This controls the maximum number of requests your API can handle in a short burst.
  • Throttling Behavior: When the request rate exceeds these limits, API Gateway throttles the requests, returning a 429 Too Many Requests error to the client.

 

Use AWS WAF to avoid HTTP Flooding

Amazon Web Application Firewall (WAF) helps protect against DDOS attacks by setting rate limits and blocking IP addresses that exceed those limits. Using AWS WAF (Web Application Firewall) to avoid HTTP flooding involves creating and configuring web ACLs (Access Control Lists) with rate-based rules to protect your web applications from excessive requests. Here’s how to do it:

Step 1: Set Up AWS WAF

  1. Navigate to the AWS Management Console.
  2. In the AWS Management Console, search for "WAF" and select it from the search results.
  3. In the AWS WAF console, click on "Create web ACL.", Configure the web ACL settings.
  4. Add Rules to the Web ACL, click on "Add rules and rule groups." and choose "Add my own rules and rule groups."
  5. Click "Add rule" and select "Rate-based rule.", give your rule a name and set the rate limit (e.g., 2000 requests per 5 minutes).
  6. Define the conditions for the rate-based rule. Typically, you might want to apply it to all requests, but you can also specify conditions like IP address ranges, HTTP methods, or specific URIs.
  7. Set the action to take when the rate limit is exceeded. You can choose to block the requests, allow them, or count them. Typically, you would set this to "Block" to prevent HTTP flooding.
  8. In the "Resource" section of the web ACL creation, select the resources you want to protect with this ACL. For instance, you can choose API Gateway APIs, CloudFront distributions, or Application Load Balancers.
  9. Review your settings and click "Create web ACL."

Additional Considerations

  • Monitoring: Use AWS CloudWatch to monitor the activity and effectiveness of your WAF rules. This can help you adjust rate limits and rules as needed.
  • Custom Rules: In addition to rate-based rules, consider adding custom rules to block known bad IP addresses or patterns that indicate abusive behavior.

 

Using a killer lambda

XKDC macguyver

To prevent AWS API Gateway and AWS Lambda from being invoked when a specific budget is reached. For this we will combine multiple AWS tools:

The idea is this:

  1. The Billing Alert will send a message on SNS (see BIlling Alerts).
  2. The SNS message will trigger a killer lambda,
  3. The killer lambda will store current parameters of each lambda in a DynamoDB Table in order to restore it later.
  4. The killer lambda will change lambda parameters to prevent their invocation (be careful to not kill the recovery lambda!)
  5. At the beginning of each billing cycle, the recovery lambda will be triggered (via EventBridge Scheduler).
  6. The recovery lambda will read DynamoDB and restore the lambda parameters.

NOTE: We need to prevent the killer lambda to kill the recover lambda, for this we need to first create the recovery lambda in order to get its name in the whitelist and avoid it to be killed.

 

General setup

Step 1: Set Up AWS Budgets and send alert to SNS

See Setting up Billing Alarms

Step 2: Create the DynamoDB database

We need to ensure that the DynamoDB has the following attributes:
* Table Name: LambdaAndApiSettings
* Primary Key / Partition Key: ResourceID (String)

You can create the table using the AWS Management Console:
1. Go to the DynamoDB section.
2. Click on "Create table".
3. Set the table name to LambdaAndApiSettings.
4. Add a primary key/Sort key with the name FunctionName and type String.
5. Click "Create".

Step 3: (after creating lambdas) Give proper permissions in IAM

Ensure that the IAM role associated with your Lambda function has the necessary permissions. You need to attach a policy to the role that allows access to DynamoDB and Lambda APIs.

Required Permissions:
* DynamoDB: AmazonDynamoDBFullAccess
* Lambda: AWSLambda_FullAccess
* API Gateway: AmazonAPIGatewayAdministrator

 

Recovery lambda

XKDC reset

To re-enable the services, you can set up a scheduled Lambda function that runs at the beginning of each budget period (e.g., monthly). This function will reset the throttling limits on the API Gateway stages and re-enable the Lambda functions.

NOTE: This Lambda supposes that all lambdas have the same parameters, a logical extension would be to have our killer lambda to store the specific lambda parameters before changing them in a database or file on AWS.

Step 1: Create the Lambda Function

  • In the AWS Lambda console, create a new Lambda function.
  • Set the Lambda type to Python
  • Use the following Python code:
import boto3
import os
import logging
import json

# Set up logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Initialize DynamoDB client
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('LambdaAndApiSettings')

def lambda_handler(event, context):
    """
    The main handler function for the recovery Lambda function. It restores API Gateway stages and specified
    Lambda functions using the settings saved in DynamoDB.

    Args:
        event (dict): The event data that triggered the Lambda function.
        context (object): The context object containing metadata about the Lambda invocation.

    Returns:
        dict: A dictionary containing the status code and message.
    """
    try:
        # Get environment variables
        rest_api_id = os.environ.get('API_GATEWAY_ID')
        restore_all_lambdas = os.environ.get('RESTORE_ALL_LAMBDAS', 'false').lower() == 'true'
        function_names = os.environ.get('LAMBDA_FUNCTION_NAMES')
        blacklisted_functions = os.environ.get('BLACKLISTED_LAMBDA_FUNCTIONS', '').split(',')
        current_function_name = context.function_name  # Get the name of the current Lambda function
        simulate_restore = os.environ.get('SIMULATE_RESTORE', 'false').lower() == 'true'

        # Default values
        burst_limit_default = os.environ.get('BURST_LIMIT', '1000')
        rate_limit_default = os.environ.get('RATE_LIMIT', '500')
        concurrency_default = int(os.environ.get('CONCURRENCY_DEFAULT', '10'))

        # Initialize clients
        api_client = boto3.client('apigateway')
        lambda_client = boto3.client('lambda')

        # Process API Gateway stages
        if rest_api_id:
            restore_api_gateway_stages(rest_api_id, api_client, burst_limit_default, rate_limit_default, simulate_restore)

        # Process Lambda functions
        if restore_all_lambdas:
            restore_all_lambda_functions(lambda_client, current_function_name, blacklisted_functions, concurrency_default, simulate_restore)
        elif function_names:
            restore_specified_lambda_functions(lambda_client, function_names.split(','), current_function_name, blacklisted_functions, concurrency_default, simulate_restore)
        else:
            logger.info('No Lambda functions specified and RESTORE_ALL_LAMBDAS is not true. Skipping restoring Lambda functions.')

        logger.info('API Gateway stages and specified Lambda functions have been restored successfully.')
        return {
            'statusCode': 200,
            'body': 'API Gateway and Lambda functions restoration complete'
        }
    except Exception as e:
        logger.error(f'Error occurred: {str(e)}', exc_info=True)
        return {
            'statusCode': 500,
            'body': 'Error occurred while restoring API Gateway and Lambda functions'
        }

def restore_api_gateway_stages(rest_api_id, api_client, burst_limit_default, rate_limit_default, simulate_restore):
    """
    Restore the stages of an API Gateway using the settings saved in DynamoDB.

    Args:
        rest_api_id (str): The ID of the API Gateway.
        api_client (boto3.client): The API Gateway client.
        burst_limit_default (str): The default burst limit value to use if no saved setting is found.
        rate_limit_default (str): The default rate limit value to use if no saved setting is found.
        simulate_restore (bool): Whether to simulate restoring the API Gateway stages.

    Raises:
        Exception: If an error occurs while restoring the stages.
    """
    try:
        logger.info(f'Restoring stages for API Gateway with ID: {rest_api_id}')
        stages = api_client.get_stages(restApiId=rest_api_id)
        for stage in stages['item']:
            stage_name = stage['stageName']
            logger.info(f'Restoring stage: {stage_name}')

            # Retrieve saved API Gateway stage settings from DynamoDB
            settings = get_from_dynamodb(f'api-{rest_api_id}-{stage_name}')

            # Use defaults if no settings are found
            burst_limit = settings['burstLimit'] if settings and 'burstLimit' in settings else burst_limit_default
            rate_limit = settings['rateLimit'] if settings and 'rateLimit' in settings else rate_limit_default

            if not simulate_restore:
                # Restore API Gateway stage
                api_client.update_stage(
                    restApiId=rest_api_id,
                    stageName=stage_name,
                    patchOperations=[
                        {
                            'op': 'replace',
                            'path': '/*/*/throttling/burstLimit',
                            'value': str(burst_limit)
                        },
                        {
                            'op': 'replace',
                            'path': '/*/*/throttling/rateLimit',
                            'value': str(rate_limit)
                        }
                    ]
                )
                logger.info(f'Stage {stage_name} restored with burstLimit={burst_limit} and rateLimit={rate_limit}.')
            else:
                logger.info(f'Simulation mode: API Gateway stage {stage_name} would be restored with burstLimit={burst_limit} and rateLimit={rate_limit}.')
    except api_client.exceptions.ClientError as e:
        logger.error(f'Failed to restore API Gateway stages: {e}')
        if e.response['Error']['Code'] == 'AccessDeniedException':
            logger.error('Access denied. Ensure the IAM role has the following permissions: AmazonAPIGatewayAdministrator.')
        raise

def restore_all_lambda_functions(lambda_client, current_function_name, blacklisted_functions, concurrency_default, simulate_restore):
    """
    Restore all Lambda functions except the current one and blacklisted ones using the settings saved in DynamoDB.

    Args:
        lambda_client (boto3.client): The Lambda client.
        current_function_name (str): The name of the current Lambda function.
        blacklisted_functions (list): A list of blacklisted Lambda function names.
        concurrency_default (int): The default concurrency limit to use if no saved setting is found.
        simulate_restore (bool): Whether to simulate restoring the Lambda functions.

    Raises:
        Exception: If an error occurs while restoring the Lambda functions.
    """
    logger.info('Restoring all Lambda functions in the account.')
    paginator = lambda_client.get_paginator('list_functions')
    all_function_names = []
    for page in paginator.paginate():
        for function in page['Functions']:
            all_function_names.append(function['FunctionName'])

    restore_specified_lambda_functions(lambda_client, all_function_names, current_function_name, blacklisted_functions, concurrency_default, simulate_restore)

def restore_specified_lambda_functions(lambda_client, function_names, current_function_name, blacklisted_functions, concurrency_default, simulate_restore):
    """
    Restore specified Lambda functions except the current one and blacklisted ones using the settings saved in DynamoDB.

    Args:
        lambda_client (boto3.client): The Lambda client.
        function_names (list): A list of Lambda function names.
        current_function_name (str): The name of the current Lambda function.
        blacklisted_functions (list): A list of blacklisted Lambda function names.
        concurrency_default (int): The default concurrency limit to use if no saved setting is found.
        simulate_restore (bool): Whether to simulate restoring the Lambda functions.

    Raises:
        Exception: If an error occurs while restoring the Lambda functions.
    """
    try:
        account_settings = lambda_client.get_account_settings()
        total_account_concurrency = account_settings['AccountLimit'].get('TotalConcurrentExecutions', 0)
        available_concurrency = account_settings['AccountUsage'].get('UnreservedConcurrentExecutions', 0)

        logger.info(f'Available concurrency: {available_concurrency}')

        for function_name in function_names:
            if function_name == current_function_name or function_name in blacklisted_functions:
                logger.info(f'Skipping blacklisted or current Lambda function: {function_name}')
                continue

            logger.info(f'Restoring Lambda function: {function_name}')
            try:
                # Retrieve saved Lambda settings from DynamoDB
                settings = get_from_dynamodb(function_name)

                # Use default concurrency if no settings are found
                concurrency = settings['Concurrency'] if settings and 'Concurrency' in settings else concurrency_default

                # Check if setting the concurrency will violate the account limits
                if concurrency and concurrency > available_concurrency:
                    logger.error(f'Setting concurrency for {function_name} to {concurrency} would violate account limits. Skipping.')
                    continue

                if not simulate_restore:
                    # Restore concurrency
                    if concurrency is not None:
                        lambda_client.put_function_concurrency(
                            FunctionName=function_name,
                            ReservedConcurrentExecutions=concurrency
                        )
                        logger.info(f'Lambda function {function_name} restored with concurrency={concurrency}.')
                        available_concurrency -= concurrency
                    else:
                        lambda_client.delete_function_concurrency(FunctionName=function_name)
                        logger.info(f'Concurrency limit removed for Lambda function {function_name}.')
                else:
                    logger.info(f'Simulation mode: Lambda function {function_name} would be restored with concurrency={concurrency}.')

            except lambda_client.exceptions.ClientError as e:
                logger.error(f'Failed to restore Lambda function {function_name}: {e}')
                if e.response['Error']['Code'] == 'AccessDeniedException':
                    logger.error(f'Access denied. Ensure the IAM role has the following permissions: AWSLambda_FullAccess.')
                raise
    except Exception as e:
        logger.error(f'Error occurred while calculating available concurrency: {str(e)}')
        raise

def get_from_dynamodb(resource_id):
    """
    Retrieve settings from DynamoDB.

    Args:
        resource_id (str): The unique identifier for the resource.

    Returns:
        dict: The settings retrieved from DynamoDB.

    Raises:
        Exception: If an error occurs while retrieving from DynamoDB.
    """
    logger.info(f'Retrieving settings for {resource_id} from DynamoDB')
    try:
        response = table.get_item(Key={'ResourceID': resource_id})
        if 'Item' in response:
            return response['Item']['Settings']
        else:
            logger.info(f'No settings found for {resource_id} in DynamoDB')
            return None
    except dynamodb.meta.client.exceptions.ClientError as e:
        logger.error(f'Failed to retrieve settings from DynamoDB for {resource_id}: {e}')
        if e.response['Error']['Code'] == 'AccessDeniedException':
            logger.error('Access denied. Ensure the IAM role has the following permissions: AmazonDynamoDBFullAccess.')
        raise
Enter fullscreen mode Exit fullscreen mode
Environment Variables:
  • API_GATEWAY_ID: The ID of your API Gateway.
  • LAMBDA_FUNCTION_NAMES: A comma-separated list of your Lambda function names.
  • RESTORE_ALL_LAMBDAS: Set to true to enable all Lambda functions in the account, or false to enable only the specified functions.
  • BURST_LIMIT: The desired burst limit for API Gateway throttling.
  • RATE_LIMIT: The desired rate limit for API Gateway throttling.
  • SIMULATE_RESTORE: Would simulate the restore but not do anything.

Step 2: Give the Lambda function full access in IAM

See General setup

Step 3: Create a CloudWatch Event Rule

  1. Go to the EventBridge Scheduler.
  2. Navigate to "EventBridge Schedule".
  3. Click "Create Schedule"
  4. Choose a schedule name and description.
  5. Set to recurring schedule and CRON based schedule.
  6. If your budget period is monthly, you can use a cron expression like cron(0 0 1 * ? *) to run the function at midnight on the first day of each month.
  7. Select no flexible time window and click Next.
  8. Click on Invoke an AWS Lambda and select the recover lambda as the Target. 9.Ensure the Lambda function has the necessary permissions to be invoked by the CloudWatch Events rule. Click on Next.
  9. Review the schedule and create it!

 

Killer lambda

Step 1: Create a Lambda Function

  • In the AWS Lambda console, create a new Lambda function.
  • Set the Lambda type to Python
  • Use the following Python code:
import boto3
import os
import logging
import json

# Set up logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Initialize DynamoDB client
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('LambdaAndApiSettings')

def lambda_handler(event, context):
    """
    The main handler function for the Lambda function. It disables API Gateway stages and specified
    Lambda functions, and saves their settings to DynamoDB.

    Args:
        event (dict): The event data that triggered the Lambda function.
        context (object): The context object containing metadata about the Lambda invocation.

    Returns:
        dict: A dictionary containing the status code and message.
    """
    try:
        # Get environment variables
        rest_api_id = os.environ.get('API_GATEWAY_ID')
        disable_all_lambdas = os.environ.get('DISABLE_ALL_LAMBDAS', 'false').lower() == 'true'
        function_names = os.environ.get('LAMBDA_FUNCTION_NAMES')
        blacklisted_functions = os.environ.get('BLACKLISTED_LAMBDA_FUNCTIONS', '').split(',')
        current_function_name = context.function_name  # Get the name of the current Lambda function
        simulate_kill = os.environ.get('SIMULATE_KILL', 'false').lower() == 'true'

        # Initialize clients
        api_client = boto3.client('apigateway')
        lambda_client = boto3.client('lambda')

        # Process API Gateway stages
        if rest_api_id:
            disable_api_gateway_stages(rest_api_id, api_client, simulate_kill)

        # Process Lambda functions
        if disable_all_lambdas:
            disable_all_lambda_functions(lambda_client, current_function_name, blacklisted_functions, simulate_kill)
        elif function_names:
            disable_specified_lambda_functions(lambda_client, function_names.split(','), current_function_name, blacklisted_functions, simulate_kill)
        else:
            logger.info('No Lambda functions specified and DISABLE_ALL_LAMBDAS is not true. Skipping disabling Lambda functions.')

        logger.info('API Gateway stages and specified Lambda functions have been processed successfully.')
        return {
            'statusCode': 200,
            'body': 'API Gateway and Lambda functions processing complete'
        }
    except Exception as e:
        logger.error(f'Error occurred: {str(e)}', exc_info=True)
        return {
            'statusCode': 500,
            'body': 'Error occurred while processing API Gateway and Lambda functions'
        }

def disable_api_gateway_stages(rest_api_id, api_client, simulate_kill):
    """
    Disable the stages of an API Gateway and save their current settings to DynamoDB.

    Args:
        rest_api_id (str): The ID of the API Gateway.
        api_client (boto3.client): The API Gateway client.
        simulate_kill (bool): Whether to simulate killing the API Gateway stages.

    Raises:
        Exception: If an error occurs while disabling the stages.
    """
    try:
        logger.info(f'Disabling stages for API Gateway with ID: {rest_api_id}')
        stages = api_client.get_stages(restApiId=rest_api_id)
        for stage in stages['item']:
            stage_name = stage['stageName']
            logger.info(f'Disabling stage: {stage_name}')

            # Save current API Gateway stage settings to DynamoDB
            stage_settings = {
                'burstLimit': stage.get('methodSettings', {}).get('/*/*/throttling/burstLimit', 'default'),
                'rateLimit': stage.get('methodSettings', {}).get('/*/*/throttling/rateLimit', 'default')
            }
            save_to_dynamodb(f'api-{rest_api_id}-{stage_name}', stage_settings)

            if not simulate_kill:
                # Disable API Gateway stage
                api_client.update_stage(
                    restApiId=rest_api_id,
                    stageName=stage_name,
                    patchOperations=[
                        {
                            'op': 'replace',
                            'path': '/*/*/throttling/burstLimit',
                            'value': '0'
                        },
                        {
                            'op': 'replace',
                            'path': '/*/*/throttling/rateLimit',
                            'value': '0'
                        }
                    ]
                )
                logger.info(f'Stage {stage_name} disabled.')
            else:
                logger.info(f'Simulation mode: API Gateway stage {stage_name} would be disabled.')
    except api_client.exceptions.ClientError as e:
        logger.error(f'Failed to disable API Gateway stages: {e}')
        if e.response['Error']['Code'] == 'AccessDeniedException':
            logger.error('Access denied. Ensure the IAM role has the following permissions: AmazonAPIGatewayAdministrator.')
        raise

def disable_all_lambda_functions(lambda_client, current_function_name, blacklisted_functions, simulate_kill):
    """
    Disable all Lambda functions except the current one and blacklisted ones, and save their settings to DynamoDB.

    Args:
        lambda_client (boto3.client): The Lambda client.
        current_function_name (str): The name of the current Lambda function.
        blacklisted_functions (list): A list of blacklisted Lambda function names.
        simulate_kill (bool): Whether to simulate killing the Lambda functions.

    Raises:
        Exception: If an error occurs while disabling the Lambda functions.
    """
    logger.info('Disabling all Lambda functions in the account.')
    paginator = lambda_client.get_paginator('list_functions')
    all_function_names = []
    for page in paginator.paginate():
        for function in page['Functions']:
            all_function_names.append(function['FunctionName'])

    disable_specified_lambda_functions(lambda_client, all_function_names, current_function_name, blacklisted_functions, simulate_kill)

def disable_specified_lambda_functions(lambda_client, function_names, current_function_name, blacklisted_functions, simulate_kill):
    """
    Disable specified Lambda functions except the current one and blacklisted ones, and save their settings to DynamoDB.

    Args:
        lambda_client (boto3.client): The Lambda client.
        function_names (list): A list of Lambda function names.
        current_function_name (str): The name of the current Lambda function.
        blacklisted_functions (list): A list of blacklisted Lambda function names.
        simulate_kill (bool): Whether to simulate killing the Lambda functions.

    Raises:
        Exception: If an error occurs while disabling the Lambda functions.
    """
    for function_name in function_names:
        if function_name == current_function_name or function_name in blacklisted_functions:
            logger.info(f'Skipping blacklisted or current Lambda function: {function_name}')
            continue

        try:
            # Save current Lambda settings to DynamoDB
            logger.info(f'Saving settings for Lambda {function_name}')
            save_lambda_settings(lambda_client, function_name)
            verify_dynamodb_data(function_name)  # Verify data

            if not simulate_kill:
                logger.info(f'Disabling Lambda function: {function_name}')
                # Disable concurrency
                lambda_client.put_function_concurrency(
                    FunctionName=function_name,
                    ReservedConcurrentExecutions=0
                )
                logger.info(f'Lambda function {function_name} disabled.')
            else:
                logger.info(f'Simulation mode: Lambda function {function_name} would be disabled.')

        except lambda_client.exceptions.ClientError as e:
            logger.error(f'Failed to disable Lambda function {function_name}: {e}')
            if e.response['Error']['Code'] == 'AccessDeniedException':
                logger.error(f'Access denied. Ensure the IAM role has the following permissions: lambda:PutFunctionConcurrency.')
            raise

def save_to_dynamodb(resource_id, settings):
    """
    Save settings to DynamoDB.

    Args:
        resource_id (str): The unique identifier for the resource.
        settings (dict): The settings to save.

    Raises:
        Exception: If an error occurs while saving to DynamoDB.
    """
    logger.info(f'Saving settings for {resource_id} to DynamoDB')
    try:
        table.put_item(
            Item={
                'ResourceID': resource_id,  # Corrected key name
                'Settings': settings
            }
        )
        logger.info(f'Saved settings for {resource_id} to DynamoDB')
    except dynamodb.meta.client.exceptions.ClientError as e:
        logger.error(f'Failed to save settings to DynamoDB for {resource_id}: {e}')
        if e.response['Error']['Code'] == 'AccessDeniedException':
            logger.error('Access denied. Ensure the IAM role has the following permissions: AmazonDynamoDBFullAccess.')
        raise

def verify_dynamodb_data(resource_id):
    """
    Verify that the settings were correctly saved in DynamoDB.

    Args:
        resource_id (str): The unique identifier for the resource.
    """
    try:
        response = table.get_item(Key={'ResourceID': resource_id})
        if 'Item' in response:
            logger.info(f'Verified settings for {resource_id}: {response["Item"]}')
        else:
            logger.error(f'No settings found for {resource_id} in DynamoDB')
    except dynamodb.meta.client.exceptions.ClientError as e:
        logger.error(f'Failed to retrieve settings from DynamoDB for {resource_id}: {e}')
        if e.response['Error']['Code'] == 'AccessDeniedException':
            logger.error('Access denied. Ensure the IAM role has the following permissions: AmazonDynamoDBFullAccess.')
        raise

def save_lambda_settings(lambda_client, function_name):
    """
    Save the current settings of a Lambda function to DynamoDB.

    Args:
        lambda_client (boto3.client): The Lambda client.
        function_name (str): The name of the Lambda function.

    Raises:
        Exception: If an error occurs while retrieving or saving the settings.
    """
    logger.info(f'Saving settings for Lambda {function_name}')
    try:
        # Get current concurrency setting
        concurrency_response = lambda_client.get_function_concurrency(FunctionName=function_name)
        concurrency = concurrency_response.get('ReservedConcurrentExecutions', None)

        # Get current permissions
        permissions_response = lambda_client.get_policy(FunctionName=function_name)
        permissions = json.loads(permissions_response['Policy'])['Statement']

        # Save settings to DynamoDB
        save_to_dynamodb(function_name, {'Concurrency': concurrency, 'Permissions': permissions})
    except lambda_client.exceptions.ResourceNotFoundException:
        logger.info(f'No existing settings for Lambda function: {function_name}')
    except lambda_client.exceptions.ClientError as e:
        logger.error(f'Failed to retrieve settings for Lambda function {function_name}: {e}')
        if e.response['Error']['Code'] == 'AccessDeniedException':
            logger.error('Access denied. Ensure the IAM role has the following permission: AWSLambda_FullAccess.')
        raise
Enter fullscreen mode Exit fullscreen mode
Environment Variables:
  • API_GATEWAY_ID: The ID of your API Gateway.
  • LAMBDA_FUNCTION_NAMES: A comma-separated list of your Lambda function names.
  • BLACKLISTED_LAMBDA_FUNCTIONS: A comma-separated list of Lambda function names to not kill.
  • DISABLE_ALL_LAMBDAS: Set to true to disable all Lambda functions in the account (apart for blacklisted), or false to disable only the specified functions.
  • SIMULATE_KILL: Set to true to simulate likking lambdas (will still save parameters in DynamoDB).

Step 2: Give the Lambda function full access in IAM

See General setup

Step 3: Blaclist the restore Lambda

Don't forget to add the restore lambda name in the BLACKLISTED_LAMBDA_FUNCTIONS, otherwise it will be killed as well.

Step 4: Set Up SNS to Trigger the Lambda Function

  1. Find the SNS topic you created in the Billing Alarm stage, in the SNS console.

  2. Subscribe the Lambda Function to the SNS Topic:

    • Add a subscription to the SNS topic.
    • Choose the protocol as "AWS Lambda" and select your Lambda function.

 

Closing Thoughts

Here you go! By following these steps, you can ensure that your AWS API Gateway and Lambda functions are disabled when you reach a specific budget threshold, preventing further costs and limiting the possible cost overrun. This ensures that your services are only disabled when the budget is reached and are automatically restored at the beginning of the next period, keeping cost in check by killing your product when you spent too much on it.

Managing AWS costs can be challenging, but with the right setup, you can keep expenses in check (sort of). Start with billing alarms and budgets to get notified when you're approaching limits. For scalable services like Lambda, use concurrency limits and API Gateway rate limits to control usage. AWS WAF can protect against HTTP flooding, preventing unexpected spikes.

Implementing these steps might take some effort initially, but it will save you from manual intervention and unexpected costs in the long run. With these strategies, you can focus on development, knowing your AWS costs are under control.

Of course cost explosion should be addressed in the development stage, but at least you can sleep in peace until then...

Top comments (0)