DEV Community: Vibhor Agarwal

Custom API Analytics with AWS Serverless

Vibhor Agarwal — Wed, 09 Oct 2024 12:50:05 +0000

Introduction

This document describes the high level workflow to capture request data at various stages for analytics later for a lambda API.

Objective

Collect API trace data in a custom format with fields that might be needed by developer for analysis in a S3 bucket, on a daily basis.
Data in S3 must be partitioned by year/month/day, so that we can even run Athena queries later.
Also, the data in S3 should be downloadable over an API call.

A framework utility (indicators below) can be used to collect data in a dictionary in your main lambda's memory (not described here). Code may look like this to capture data in memory for each API call.

def lambda_handler(event: any, context: any):
    # first thing in lambda handler
    analytics_data = MyInMemoryData.get_if_exists("AnalyticsData", None)
    if not analytics_data:
        MyInMemoryData.register("AnalyticsData", {})

    # re-initialize this.
    MyInMemoryData.AnalyticsData = {'time': str(int(time.time() * 1000)),
                                    'request_id': context.aws_request_id if context else "dummy"
                                    }

Capture key outcomes of API at various stages of workflow in a dictionary (lives in python memory)

    # populate data for your in-memory dictionary at various stages of your code flow
    MyInMemoryData.AnalyticsData["json_request_body"] = copy.deepcopy(body)
    ...
    MyInMemoryData.AnalyticsData["col1"] = len(data)
    MyInMemoryData.AnalyticsData["col2"] = {some dict data}
    ....

Infrastructure as Code for this feature

Snippets below for 'serverless' framework with Cloud Formation.

Serverless Infra as Code

'analytics' lambda is deployed and has 2 triggers:

load on SQS
an event bridge event rule (CRON) that runs daily

'analytics-download' lambda offloads preparation of data for analytics on S3

load on a different queue, where message is sent when API request is received

   analytics:
    handler: lambda_analytics.lambda_handler
    description: Lambda function to read request & response and log into timestream database, also to upload to s3 via CRON schedule
    memorySize: 128
    module: analytics
    events:
       - sqs: arn:aws:sqs:${self:provider.region}:${aws:accountId}:${self:provider.environment.SQS_ANALYTICS_NAME} # create entry in timestream
       - schedule: cron(0 0 * * ? *) # run daily, as the date changes, and same lambda then queries time stream and uploads data to S3
   analytics-download:
    handler: lambda_download_analytics.lambda_handler # on a SQS trigger (SQS sent when API all is received), zips and uploads to S3
    description: Lambda function to process download analytics data
    memorySize: 128
    module: analytics
    events:
      - sqs: arn:aws:sqs:${self:provider.region}:${aws:accountId}:${self:provider.environment.SQS_DOWNLOAD_ANALYTICS_NAME}

We need other cloud resources also:

1. AnalyticsSQS: FIFO queue that would receive API data to persist in time stream database from 'analytics' lambda above

2. AnalyticsTimeStreamDB: Time stream database

3. AnalyticsTimeStreamTable: Table under the database; note that we dont write in magnetic store but only memory. In memory, only 1 hour old timestamp data can be written (we write instantly !). We also retain data in magnetic store (cheaper) for 7 days only

4. AnalyticsAsyncLambdaSNSDestination, AnalyticsAsyncLambdaSNSEmailSubscription1: we also need a SNS topic and an email subscription to this topic to alert developers via email in case this functionality is broken ( since this feature is not exposed via API, users would not report failures)

5. AnalyticsS3Bucket: S3 bucket to finally persist the time stream data. In STANDARD tier, we retain data for 2 months, to allow download of data in this period after which
data is pushed to INFREQUENT ACCESS tier, where it stays for another 30 days, before being archived in GLACIER. The data expires after 1 year of its creation
There is another lifecycle rule that we run for files under /tmp prefix in this bucket, to delete files after 1 day only ! This is to support admin requests to download data
over a date range, which are zipped and uploaded under /tmp prefix as a zip file, for users to download via signed URL. Since this URL is active only for few minutes, we
do not need long retention for 'tmp' files, the contents of this prefix are cleaned up automatically in 1 day using S3 lifecycle rules.

6. AnalyticsDataDownloadSQS: Queue to offload download data requests. Main lamdba that received API requests can offload the work to this queue, which can be then processed by 'analytics-download' lambda.


Resources:
    AnalyticsSQS:
      Type: AWS::SQS::Queue
      Properties:
        QueueName: ${self:provider.environment.SQS_ANALYTICS_NAME}
        FifoQueue: true
        ReceiveMessageWaitTimeSeconds: 20 # long polling
        MessageRetentionPeriod: 600 # 10 mins
        VisibilityTimeout: 10 # for few seconds, not visible to other consumers once received by a consumer
    AnalyticsTimeStreamDB:
      Type: AWS::Timestream::Database
      Properties:
        DatabaseName: ${self:provider.environment.TIME_STREAM_DB_NAME}
    AnalyticsTimeStreamTable:
      Type: AWS::Timestream::Table
      Properties:
        DatabaseName: ${self:provider.environment.TIME_STREAM_DB_NAME}
        TableName: ${self:provider.environment.TIME_STREAM_TABLE_NAME}
        MagneticStoreWriteProperties:
          EnableMagneticStoreWrites: false
        RetentionProperties:
          MagneticStoreRetentionPeriodInDays: 7 # move to magnetic store after 1 hr
          MemoryStoreRetentionPeriodInHours: 1  # retain only minimal period here
    AnalyticsAsyncLambdaSNSDestination:
      Type: AWS::SNS::Topic
      Properties:
        DisplayName: 'Analytics Lambda Failure Notiffications'
        TopicName: SNS_TOPIC_NAME  # implement your email notif on failures
    AnalyticsAsyncLambdaSNSEmailSubscription1:
      Type: AWS::SNS::Subscription
      Properties:
        Endpoint: ${self:provider.environment.ADMIN_USERS}
        Protocol: "email"
        TopicArn: { "Ref": "AnalyticsAsyncLambdaSNSDestination" }
    AnalyticsS3Bucket:
      Type: AWS::S3::Bucket
      Properties:
        BucketName: ${self:provider.environment.ANALYTICS_BUCKET}
        AccessControl: Private
        AccelerateConfiguration:
          AccelerationStatus: Enabled
        LifecycleConfiguration:
          Rules:
            - Id: TmpFilesDeleteRule # user requests keep tmp zip here, and serve them via pre-signed URLs, they are no longer needed when user downloads
              Prefix: tmp/
              Status: Enabled
              ExpirationInDays: 1 # run deletion midnight UTC following creation
            - Id: ToIAToGlacierRule # analytics requests collected need to remain in standard tier for several days for users to download historic usage stats
              Status: Enabled
              ExpirationInDays: 365
              Transitions:
                - TransitionInDays: 60  # allow fetch of upton 2 months of data in standard and then push to IA. Amazon S3 does not transition objects smaller than 128 KB to the Standard-IA
                  StorageClass: STANDARD_IA
                - TransitionInDays: 90  # remains in IA for 30 days, then pushed to archive
                  StorageClass: GLACIER
    AnalyticsDataDownloadSQS:
      Type: AWS::SQS::Queue
      Properties:
        QueueName: ${self:provider.environment.SQS_DOWNLOAD_ANALYTICS_NAME}
        ReceiveMessageWaitTimeSeconds: 20 # long polling
        MessageRetentionPeriod: 300 # 5 mins
        VisibilityTimeout: 31 # for few seconds, not visible to other consumers once received by a consumer

Refer to the architecture for numbered flows.

FLOW 1: Analytics - collecting API trace in time stream DB

1. Before sending the final primary API response to the user, the object's data is sent to a FIFO queue ( why FIFO ? to guarantee what is sent first is processed first, though not so critical)

2. Load on queue would trigger the lambda function (see lambda_analytics.lambda_handler). For this SQS based trigger, create time stream record.
Create a single entry in AWS timestream database ( various attributes in dict would be various measures ) - this is a single multi measure record. In case of any errors, an email notification with exception trace can be sent to the admins.
Note that if lambda throws error, the record in SQS will be retained; you may want to handle the
exception and be notified, rather than keep un-processed records in queue.

How to create multi measure record

Implement your own logic to create a single record with measure. Key points here:

a. map right data type to the record, for e.g. you can use measure name to indicate type of data - str/numeric. 'json' can be converted to string and persisted in time stream.

    def prepare_measure(payload: dict, measure_name: str) -> dict:
        """prepare measure with attrs name, value and type
        Convert all JSON to VARCHAR and numeric type as BIGINT
        Args:
            payload: user data as dict
            measure_name: name of measure

        Returns:
            dict of measure
        """    print(result)_type = 'VARCHAR'
        elif "count" in measure_name:
            if measure_name not in payload:
                payload[measure_name] = -1 # indicates not relevant for when this measure has no collected data in api
            measure_value = str(payload[measure_name])
            measure_type = 'BIGINT'
        else:
            # measures such as request_id, user_id (varchar)
            # always expected in payload
            measure_value = str(payload[measure_name])
            measure_type = 'VARCHAR'

        return {
            'Name': measure_name,
            'Value': measure_value,
            'Type': measure_type
        }

b. add measures in row itself for year, month and date, so that you can query it later.

    def prepare_partition_measures(timestamp: str) -> list:
        """prepare measure to use as partition

        Args:
            timestamp: timestamp as string for this time stream entry

        Returns:
            list of dicts, each as a time stream measure containing Name, Value & Type
        """
        year, month, day = split_date_time(timestamp) # implement split
        return [{
            'Name': "year",
            'Value': year,
            'Type': 'VARCHAR'
        }, {
            'Name': "month",
            'Value': month,
            'Type': 'VARCHAR'
        }, {
            'Name': "day",
            'Value': day,
            'Type': 'VARCHAR'
        }]

c. prepare common attributes

   def prepare_common_attributes(payload: dict) -> dict:
       """prepare common attributes for timestream such as dimensions, measure
       name and type as MULTI
       Args:
           payload: dict of user data

       Returns:
           dict
       """
       return {
           'Dimensions': [
               {'Name': 'api', 'Value': payload["request_url"]}
           ],
           'MeasureName': 'input_output',
           'MeasureValueType': 'MULTI'
       }

d. capture exception when write fails

    try:
        # Write records to Timestream
        result = client.write_records(
            DatabaseName=db,
            TableName=table,
            CommonAttributes=prepare_common_attributes(payload),
            Records=records)
    # pylint: disable=broad-except
    except client.exceptions.RejectedRecordsException as err:
        print("RejectedRecords: ", err)
        msg = ""
        for key in err.response["RejectedRecords"]:
            msg += f"Rejected Index: {str(key['RecordIndex'])} : {key['Reason']}\n"
        raise Exception(msg) from err

3. Data is retained in time stream database only for minimal period (1 day) after which it is sent to 'magnetic' store, and kept there for about a week ( magnetic store can be queried). Data is written only to memory store, not to magnetic store. 'EnableMagneticStoreWrites' is set to 'false'. We dont need to write there, as almost instantly, the data to write is available when a request to API comes in (API processing takes few seconds only) - data (identified by timestamp) is stale only by few seconds

From AWS: "The memory store is optimized for high throughput data writes and fast point-in-time queries.
The magnetic store is optimized for lower throughput late-arriving data writes, long term data storage, and fast analytical queries."

FLOW 2: Analytics - uploading time stream data to S3

1. The lambda : "lambda_analytics.lambda_handler" is also triggered on a daily basis at 00 hours GMT, by an event bridge rule (CRON job). Since this is an event, control code flow to invoke: 'load_timestream_data_to_s3' method ( we are using same lambda to create timestream entry and also to upload data to S3) - to keep logic together and for traceability.


def load_timestream_data_to_s3() -> int:
    """query time stream DB
     Returns:
         count of rows processed
     """

    # Initialize the Timestream write client
    client = boto3.client('timestream-query')

    # Execute the query
    response = client.query(QueryString=get_query())

    # Extract the column names and rows
    columns = [column['Name'] for column in response['ColumnInfo']]

    rows = response['Rows']

    if not rows:
        print("nothing found in query to upload")
        return 0

    print(f"{len(rows)} found for upload..")

    buffer_map = {}

    def get_key():
        """get key to use
        Include prefix, and partition if any
        Returns:
            get the S3 bucket key to use
        """
        # day, month, year are at fixed positions at end
        day = row_data[len(row_data) - 1]
        month = row_data[len(row_data) - 2]
        year = row_data[len(row_data) - 3]
        # prepare the partition
        key = "my_analytics/year="
        key += year
        key += "/month="
        key += month
        key += "/day="
        key += day
        return key

    # assuming 128 MB would be sufficient to hold data in mem before dumping to s3
    def create_buffer_write():
        # Prepare the CSV data
        csv_buffer = io.StringIO()
        csv_writer = csv.writer(csv_buffer)
        # Write the header
        csv_writer.writerow(columns)
        return {"buffer": csv_buffer, "writer": csv_writer}

    # Write the rows
    for row in rows:
        row_data = [key.get('ScalarValue', 'NA') for key in row['Data']]
        obj_key = get_key()
        if obj_key not in buffer_map:
            buffer_map[obj_key] = create_buffer_write()
        buffer_map[obj_key]["writer"].writerow(row_data)

    do_upload(buffer_map)
    return len(rows)


def get_query() -> str:
    """create and return query to use

    Returns:
        string query
    """
    # query exactly entire yesterday's data

    # Define the database and table names
    db = os.environ.get('TIME_STREAM_DB_NAME')
    table = os.environ.get('TIME_STREAM_TABLE_NAME')

    # Define the Timestream query, 1 day ago's data
    ago_range = '1d'

    # these cols are expected based on how we inserted the data
    query_string = (
        "SELECT col1, col2, coln, year, month, day FROM \"{db}\".\"{table}\""
        f" WHERE time >= date_trunc('day', ago({ago_range}))"
        " AND time < date_trunc('day', now()) ORDER BY time ASC")

    return query_string


def do_upload(buffer_map: dict) -> None:
    """Upload data to s3

    Args:
        buffer_map: with key as object key and value as dict of buffer and writer objects

    Returns:
        nothing
    """
    s3 = boto3.client('s3')
    # Define the S3 bucket and object key
    bucket_name = "my-bucket"

    for obj_key in buffer_map:
        s3_key = os.path.join(obj_key, f'{str(uuid.uuid4())}.csv')
        # Upload the CSV data to S3
        buffer = None
        try:
            buffer = buffer_map[obj_key]["buffer"]
            print(f'  uploading to bucket {bucket_name} with key {s3_key}')
            s3.put_object(Bucket=bucket_name, Key=s3_key, Body=buffer.getvalue())
        finally:
             if buffer:
                buffer.close()

2. Code forms a time stream DB query to query a day's ago of data, prepares the data buffer in memory, and uses the day/month/year fields in the record itself, to create a partition key, where the file with a unique name is finally uploaded

3. In case of any errors, an email notification with exception trace can be sent to the administrators. Implement your logic.

4. Now this partitioned data lake can be used to run analytical queries using Athena ( we can create table and load partitions when needed ) or can be downloaded when needed to analyze data via console or via APIs described below

5. Note that the timestamp here can be used to trace the complete trace in Cloudwatch logs, as the timestamp is when the request reached the lambda function - this should be the first thing done in your lambda: instantiate the record that would go in for analytics with current timestamp

FLOW 3: Analytics - download files for analytics

Build API to download analytica data such as: (/download/usage/start_date=yyyy-MM-dd&end_date=yyyy-MM-dd)

Approach/ Implementation for this API:

User provides date range, and this API would prepare the signed URL for "expected" (need not be already created !) object to be produced yet & send the request to another lambda for processing via SQS.

Message sending to SQS produces an ID called here 'sqs_identifier'

    data = {'start_date': start_date, 'end_date': end_date}
    message_id = AnalyticsDataDownloadSQS.publish_queue_data(data) #implement your logic

    # use the file name : tmp/message_id.zip
    download_link = generate_presigned_download_url(message_id)

    # generate the pre signed URL even when download file is not ready, with a message
    # this avoids sending an email later, when zip is done
    return {
        "result": "ok",
        "message": "Your file is being prepared, use the link to download file after couple of   minutes",
        "download_link": download_link
    }

The processing lambda "lambda_download_analytics.lambda_handler" would iterate over the date range to download data from S3 analytics bucket into '/tmp' location (of the lambda ephemeral space), zip the files, and upload to a 'tmp' prefix in S3 with a pre-defined name matching with what API caller was sent back as the download link for the file.

The name of the zip is '.zip' - same file name is used when creating pre signed URL for the user to download.

    os.makedirs('/tmp', exist_ok=True)
    delete_files_in_directory('/tmp') # implement your logic

    # Iterate over the date range
    current_date = datetime.strptime(start_date, "%Y-%m-%d").date()
    end_date = datetime.strptime(end_date, "%Y-%m-%d").date()

    s3 = boto3.client('s3')

    # Define the S3 bucket and object key
    bucket_name = 'ANALYTICS_BUCKET' # implement your code

    # download for both inclusive dates
    while current_date <= end_date:
        prefix = f"my_analytics/year={current_date.year}/month={current_date.strftime('%B')}/day={current_date.day}"
        f_name = f"{current_date.year}-{current_date.strftime('%B')}-{current_date.day}"
        current_date += timedelta(days=1)

        print(f'listing contents under prefix {prefix}')
        response = s3.list_objects_v2(Bucket=bucket_name, Prefix=prefix)

        if 'Contents' in response:
            for obj in response['Contents']:
                key = obj['Key']
                download_name = f'{f_name}-{os.path.basename(key)}'
                download_name = os.path.join('/tmp', download_name)
                s3.download_file(bucket_name, key, download_name)
                print(f"    downloaded {key} as {download_nam
                at_least_one_file_found = Truee}")
        else:
            print("   no files found")

    # name should align  as the pre signed
    # URL was already generated for this name
    # and sent back to caller when request was submitted
    s3_key = f"tmp/{message_id}.zip"
    # Create an in-memory bytes buffer
    with io.BytesIO() as buffer:

        # zip buffer may be empty if no files were found in the given date range
        zip_files(buffer, "/tmp")

        if not at_least_one_file_found:
            print('no files were found in the given date range, empty zip file being created !!')

        s3.upload_fileobj(buffer, bucket_name, s3_key)
        print(f"Uploaded zip to s3://{bucket_name}/{s3_key}")

        delete_files_in_directory('/tmp')

Approach/ Implementation for this API:

Message sending to SQS produces an ID called here 'sqs_identifier'

    data = {'start_date': start_date, 'end_date': end_date}
    message_id = AnalyticsDataDownloadSQS.publish_queue_data(data) #implement your logic

    # use the file name : tmp/message_id.zip
    download_link = generate_presigned_download_url(message_id)

    # generate the pre signed URL even when download file is not ready, with a message
    # this avoids sending an email later, when zip is done
    return {
        "result": "ok",
        "message": "Your file is being prepared, use the link to download file after couple of   minutes",
        "download_link": download_link
    }

The name of the zip is '.zip' - same file name is used when creating pre signed URL for the user to download.

    os.makedirs('/tmp', exist_ok=True)
    delete_files_in_directory('/tmp') # implement your logic

    # Iterate over the date range
    current_date = datetime.strptime(start_date, "%Y-%m-%d").date()
    end_date = datetime.strptime(end_date, "%Y-%m-%d").date()

    s3 = boto3.client('s3')

    # Define the S3 bucket and object key
    bucket_name = 'ANALYTICS_BUCKET' # implement your code

    # download for both inclusive dates
    while current_date <= end_date:
        prefix = f"my_analytics/year={current_date.year}/month={current_date.strftime('%B')}/day={current_date.day}"
        f_name = f"{current_date.year}-{current_date.strftime('%B')}-{current_date.day}"
        current_date += timedelta(days=1)

        print(f'listing contents under prefix {prefix}')
        response = s3.list_objects_v2(Bucket=bucket_name, Prefix=prefix)

        if 'Contents' in response:
            for obj in response['Contents']:
                key = obj['Key']
                download_name = f'{f_name}-{os.path.basename(key)}'
                download_name = os.path.join('/tmp', download_name)
                s3.download_file(bucket_name, key, download_name)
                print(f"    downloaded {key} as {download_nam
                at_least_one_file_found = Truee}")
        else:
            print("   no files found")

    # name should align  as the pre signed
    # URL was already generated for this name
    # and sent back to caller when request was submitted
    s3_key = f"tmp/{message_id}.zip"
    # Create an in-memory bytes buffer
    with io.BytesIO() as buffer:

        # zip buffer may be empty if no files were found in the given date range
        zip_files(buffer, "/tmp")

        if not at_least_one_file_found:
            print('no files were found in the given date range, empty zip file being created !!')

        s3.upload_fileobj(buffer, bucket_name, s3_key)
        print(f"Uploaded zip to s3://{bucket_name}/{s3_key}")

        delete_files_in_directory('/tmp')

Customized Scaling of AWS ECS

Vibhor Agarwal — Sat, 17 Aug 2024 16:44:16 +0000

Summary
There are multiple use cases to containerize and host proprietary applications on AWS ECS which is “a fully managed container orchestration service that makes it easy for you to deploy, manage, and scale containerized applications”

Scaling ECS then is one of the key needs of any application. This article describes challenges with ECS supported scaling and describes a custom solution to alleviate them. Thanks to Edwin Essenius for the mentorship.

Problem Statement
In this scenario, application hosted on AWS ECS can process a wide range of requests, each with unpredictable compute usage and uncertain execution time.

The core back-end architecture on AWS is based on asynchronous processing, where SQS receives the requests. The application on AWS ECS (Fargate) polls continuously on SQS for incoming requests and process them.

One of the use cases for us is that the users can split a single large request into multiple smaller requests and distribute them asynchronously to cloud, for parallel processing to reduce the overall processing time by even 10-12 times (from even hours to minutes). ECS needs to scale quickly & accurately to be able to serve the spikes in demand.

The ECS task needs to run the application for an uncertain amount of time, could be seconds or days. The compute usage statistics are not dependable to find if a particular task is active. This means that the scale-in action can kill active tasks and jobs which is highly undesirable.
Implementing scaling with the below policies to meet above requirements is a challenge.

Target Tracking Scaling Policies

Link below describe automatic scaling support from ECS.
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-auto-scaling.html

The default automatic scaling policy support from ECS can increase or decrease the number of tasks that your service runs based on a target value for a specific metric.

The scale out action based on compute usage spikes, but there would be cases when compute usage is un-predictable, where few requests consume negligible compute but need to parallel processing with others
Inability to configure the exact number of tasks needed with out-of-the-box basic scaling
Default scale out takes minutes before application in the container picks requests for processing
Default scale in takes 5-15 minutes causing wasted compute adding to unnecessary costs. More importantly, the scale-in policy cannot identify & stop the actual idle tasks, just based on compute usage, resulting in stopping active tasks !!
For the scale-in event, KILL signal sent by ECS expects tasks to complete their job within 30 seconds, which was not possible, depending on the nature of the application.

Step Scaling Policies

Limitations of basic target-tracking policies drove the need to set up customized step auto-scaling policies based on cloud watch metrics on request count on SQS (possible with both default from ECS or custom metrics.
These policies use Cloud Watch alarms and aggregates metric data points based on the statistic for the metric. On breach of the alarm, the appropriate scaling policy in invoked.
Step scaling policies are complex and in-capable to decide exact desired count needed. Additionally, Cloud Watch alarms are costly & slow in response.
The core issues with still slow scaling (out or in) stays un-resolved and the scaling policy cannot find the idle tasks to stop when scaling in, which is the key requirement. Even the advanced customized auto scaling policies can only “approximate” scaling needs.

The link below is an interesting read on how cluster auto scaling works, the complexity and math applied in implementing scaling policy.

https://aws.amazon.com/blogs/containers/deep-dive-on-amazon-ecs-cluster-auto-scaling/

Custom Solution

Summary

With multiple limitations in what ECS supports by default, there is a need to build a custom scaling solution by updating desired count on AWS ECS Fargate service. In the example above, desired count is known by querying SQS for the available message count.

Extend solution with ECS Capacity Providers

An EC2 auto-scaling group can provide capacity to ECS instead of the serverless Fargate option; preferred in certain scenarios such as:

a. ECS Fargate does not support exceptionally large compute (up-to 16 vCPUs now)
b. The image caching feature of container is another valuable proposition when using ECS with EC2, especially when images are large. Currently, Fargate does not offer container image caching feature. This allows single EC2 to run multiple instances of the container but downloading the container image only once for that EC2.
c. EC2 warm-up pool saves on instance provisioning times unlike Fargate instances
d. Need for more control over infrastructure, for example, specific OS configuration.

Extend solution to support ECS with EC2, to simultaneously update auto-scaling group (ASG) configured as the “capacity provider” to ECS service. This is more complex to design & configure.

Solution Components & Configuration

AWS ECS runs application container on Fargate or uses auto-scaling group (ASG) as the “capacity provider”
An event bridge bus with a set of rules intercepts the scaling event and triggers a lambda (target) which runs ECS scaling logic, referred now as ecs-scaling-lambda
ecs-scaling-lambda’s environment is prepared with required properties to talk to AWS such as queue name, ECS cluster & service details, min/max desired count. With ASG, attributes such as per EC2 capacity, min/max ASG count.
Configure ecs-scaling-lambda as ASG custom termination policy (for ECS with EC2). Per documentation Amazon EC2 Auto Scaling uses termination policies to prioritize which instances to terminate first when decreasing the size of your Auto Scaling group (referred to as scaling in). However, this works fine even to stop EC2 to return them to the warm pool.
Design lambda to respond to scale up and scale down events. Additionally, configure lambda to respond to auto-scaling service scale-down event, with the list of EC2 to stop (needed when ASG is the “capacity provider”)

Solution Architecture

Implementation

ECS Scaling Lambda Handler

The lambda handler function responds to these event types

a. Scale up for new requests
b. Scale down when task shuts down
c. Respond to ASG scale-down event with the idle instance-ids to stop.

def lambda_handler(event, context):
    # Read event type, parse it based on your application
    # ASG when scaling in sends an event with cause SCALE_IN
    event_type = get_event_type(event)

    if event_type == "_scale_up":
        return scale_up()

    if event_type == "_scale_down":
        return scale_down()

    if event_type == "asg_scale_down":
        # Response to auto-scaling service with idle EC2
     # Reset idle_ec2 in environment to empty
        return  {"InstanceIDs": [env.idle_ec2]}

    return None

Scale Up

On receiving a request, application emits a scale-up event. For example, application may receive a request via an API or via an async trigger such as, on a file upload.

  # Application when receiving a request
  event_bridge.put_event(source,"_scale_up", event)

On receiving a scale-up event, ecs-scaling-lambda calculates & updates the new desired count on ECS based on its current running & pending task count; and based on pending requests on the queue.
The lambda caps the maximum desired count as configured is its environment (same as on the ECS service).
When using ASG, start EC2s on scale-up event by calculating and updating ASG’s desired count, again based on ECS scaling status and per EC2 processing capability (e.g., a 32 vCPU EC2 can process four requests when one request uses maximum 8 vCPU). ASG “capacity provider” itself is configured with placement strategy “binpack” to maximize compute utilization & run with minimal instances.

def scale_up():
   queued = Queue.get_available_messages_count()
   if queued <= 0:
      return None

   desired, running = ECS.get_task_count()

   pending = desired - running
   to_add = queued - pending

   if to_add <= 0:
      update_asg_desired(desired)
      return desired

   # Limit to count as configured
   tasks_desired = desired + to_add
   tasks_desired = min(tasks_desired, env.max_tasks)

   # Need update of ASG for EC2 deployments and new tasks
   update_asg_desired(tasks_desired)
   ECS.update_ecs_desired(tasks_desired)


def update_asg_desired(tasks_desired):

    if not env.is_asg_providing_capacity:
        return None

    # Query ASG to fetch in service instances, and its desired count
    in_service_instances, desired = AutoScaling.describe()

    # Calculate capacity of 1 EC2
    capacity = desired * env.ec2_capacity

    if capacity >= tasks_desired:
        return None

    # Calculate new ASG desired count
    new_desired = int(math.ceil((tasks_desired - capacity) / 
                                env.ec2_capacity)) + desired

    # But cap with maximum ASG size
    new_desired = min(new_desired, env.max_ec2)

    # Update ASG to start exact needed EC2
    AutoScaling.set_desired_count(new_desired, desired)

Scale Down

Container triggers scale down event

The container application keeps on processing requests and checks if it has been idle for too long (for example, idle time of 30 seconds). Only the container in the task decides when it is idle & when idle, requests ECS for its shutdown, stops accepting any more requests and generates a scale-down event.
The task itself needs to query ECS to fetch running count and request shutdown but try to keep minimum desired count.

Stopping of ‘self’ is the key to be able request ECS service for a graceful shutdown.

  # ----- Code Inside Container ----

    # Container starting. Define an Exit Handler
    exit_handler = ExitHandler()
    signal.signal(signal.SIGTERM, exit_handler.shutdown)

    while not exit_handler.stop:
        if shutdown_mode:
            # Do not pick any requests.
            # Though the SIGTERM is immediate
            time.sleep(0.5)
            continue
        # Main app processing logic
        if is_request_available():
            process()
        else:
            # Check since how long the process has been idle 
         shutdown_mode = stop_if_idle(last_active_at, timeout)

    def stop_if_idle(last_active_at, timeout):
        # If the task had been idle for too long, stop itself
        if (time.time() - last_active_at) <= timeout:

            # Check ECS for if running more than minimum tasks
        can_i_shut_down()

           # Query ECS metadata service to get own (task) ARN
           ecs.stop_task(cluster=self.cluster,
                         task=self.task_arn,
                         reason="Custom scale in")
           # Scale down event for this task which wants to stop
           EventBridge.put_event(self.service_arn, "_scale_down", 
                                  {'timestamp': str((time.time()))},
                                  [self.task_arn])
           return True

AWS ECS service on the other hand, on receiving the STOP request sends a KILL SIGNAL to this task, which the container process reads and exits completely. The task finally shuts down gracefully.

ecs-scaling-lambda responds to scale down event

ecs-scaling-lambda intercepts scale-down event and decrements ECS desired task count. While ECS service performs the action of stopping the task by sending KILL signal, the decremented desired count ensures a replacement task is not spun. “Stopping the task” and “decrementing desired count” works together in conjunction.

def scale_down():
      ECS.decrement_desired_tasks()
      if env.is_asg_providing_capacity:
         scale_down_asg()

With shutdown of idle tasks one by one, finally ECS runs minimum desired count.

With ASG when used, to scale down ASG to the needed EC2 count, find idle EC2, and ask auto-scaling service to shut down only the idle instances, and decrement ASG’s desired count simultaneously.

In the below code snippet, the update of desired count on ASG results in auto-scaling service invoking ecs-scaling-lambda again, asking for list of EC2s to stop. Configure a custom termination policy on ASG to stop only the idle instances.
The lambda responds with the list of idle EC2 instance-ids (see lambda_handler definition) and auto-scaling service either stops them to return them to the warm pool or terminates them when there is no warm pool used.

def scale_down_asg():

     # Query auto scaling service for in-service EC2, desired count
     in_service_instances, current_desired = AutoScaling.describe()

    # Query ECS to find EC2 that are in use by the tasks
    ecs_instances = ECS.get_instances_in_use()
    new_desired = len(ecs_instances) if ecs_instances else 0

    # Find idle EC2
    if not ecs_instances:
        idle = in_service_instances
    else:
        idle = in_service_instances - ecs_instances

    if not idle:
         return
    if not ecs_instances:
         # ECS is not using any EC2, ASG desired be 0
         new_asg_desired = 0
    else:
         new_asg_desired = len(ecs_instances)

    # Only decrement ASG desired count
    # This throws an event from AWS auto-scaling service that   
    # Lambda capture and returns actual instance IDS to 
    # Stop (with warm pool) or terminate
    AutoScaling.set_desired_count(new_asg_desired,            
                            current_desired)

    # Update Lambda environment with the set of idle EC2
    # Return them auto scaling service to stop these idle
    env.idle_ec2 = idle

Design Considerations

To accurately read the pending request count on SQS, use “FIFO” queue and not “STANDARD” queue which “almost” guarantees accuracy (with slight delays seen up-to one second) in synchronizing the queue attributes. Scaling lambda waits & reads queue attributes after a second of the request made.
Run scaling lambda with a reserved concurrency of one to avoid concurrent updates on the ECS service from multiple scaling events received at the same time. The ecs-scaling-lambda responds very quickly to the events, and a fixed concurrency of one adds negligible overhead.
The capacity provider considered in design is 100% by either FARGATE or by EC2. Mixing “capacity provider” types would result in un-desired behavior.
If using ASG as the “capacity provider” use the placement strategy binpack. This leaves the least amount of unused CPU or memory. This strategy minimizes the number of container instances in use. Additionally, start with using no placement constraints. Turn off “instances protected from scale-in” on ASG, for the custom scaling to work.
The capacity provider should still have “ecs managed scaling” turned on. Reason - If the scaling is “managed” by ECS, the ECS service waits for EC2s to come up and does not fail the tasks at once due to lack of available instances.
Also, turn off managed termination protection for the capacity provider.
Delete any lifecycle hooks on ASG that may intervene with the custom scaling service & add overhead.
If using ASG, use ASG warm pool to save on time (turn on reuse on scale-in), to provision new instances.
For improved performance, remember to re-use cached AWS connections in the lambda for improved performance & throughout your application

More Ideas

Before sending requests for processing, a trusted client can ask for capacity up front. Integrated with ecs-scaling-lambda, increment ECS desired count for the “expected” demand as asked by a smart client.
With automated deployments, a new deployment would replace tasks & may shut down “active” processes, which is un-desired. Abort deployment if ECS is busy processing, by querying ECS desired/pending count to check if scaling is in progress & by querying cloud watch log activity from the container.
To capture scaling metrics, persist ECS desired count update actions in a timestream database. One use case could be to see ECS scaling status and analyze in real-time how busy the system is.
Asynchronous invocation of ecs-scaling-lambda means errors may go un-noticed. Configure destinations on ecs scaling lambda to be able to send out SNS email notifications on failed invocations.

Solution Benefits

The solution is highly scalable, fast, re-usable, robust, and cost-effective for any AWS ECS deployment that needs to scale.

Lambda itself is serverless pay-as-you-go service and runs with minimal compute (128 MB) to avoid costs & overhead of other components such as Cloud watch metrics & alarms or other resources
Control exact desired count on ECS service. Custom calculation logic allows maximum control over updates on desired task count. For instance, can start say 10% or 4 more tasks to keep few instances warmed up.
Event bridge is fast and within few hundred milliseconds of the scaling event, lambda can process it to update ECS, as compared against at least two minutes delay with out-of-the-box scaling solution. This means extremely fast response to scaling events. For the scale-up events, updates on desired count of ECS is immediate. With FARGATE, the tasks would start provisioning at once and with EC2 “capacity provider”, the EC2 would start also provisioning at once.
The container can reliably run for as long as possible, even days, and can save considerable costs by shutting down with just few seconds of idle time (the logic within container as described). The solution gives complete control to the application on how long it wants to run, when it is idle.
Simple solution that does not use any of the ECS target tracking, step scaling or other custom metrics-based policies.