AWS Application load balancer logging: a true serverless approach with AWS Athena

Let's say we have an AWS Lambda stack + ALB + %something_else_useful% implemented. Very often, at this stage, we may encounter some unexpected Lambda or ALB errors. Also, we always have the general need to store access logs, etc. In this aspect, bearing in mind the concept of true-serverless, it seems most convenient to use the AWS Athena tool. How do I configure it?

AWS S3

We go to AWS S3 and create two new buckets. The first will be used to store ALB logs in text form (zip archives), and the other is to create physical space for the AWS Athena database. Zip archives can be used to work with your ALB logs in plain text format, but this definitely doesn't sound good.
Be sure to choose the AWS region corresponding to the region in which the ALB is located. Bucket name must be unique within the global namespace!

Other settings can be left default, but you need to make sure that encryption type = Server-side encryption with Amazon S3 managed keys(SSE-S3)

Set Tags according to the policy of your organization.

Bucket permissions

At this stage, it is necessary to configure the bucket in such a way as to give ALB the right to write logs, because "out of the box" it does not work. To do this, find an account that corresponds to your ALB, decide whether the bucket will use a prefix in the path on which the logs will be collected, and define the ARN of your bucket.
For example, let the ALB account be 123456789012, the prefix for storing logs be 'access'. We are looking for ARN s3 bucket:

Now, let's construct the full value of the resource: if ARN = arn:aws:s3:::alb-websrvatms1, and path = access and account = 777777777777, the resulting resource will be arn:aws:s3:::albwebsrvatms1/access/AWSLogs/777777777777/*
The formula is: arn:aws:s3:::{mybucket-name}/{prefix}/AWSLogs/{accountId}/*
As the result, we get the following policy

{
  "Version": "2012-10-17",
  "Id": "Policy1708481615785",
  "Statement": [
    {
      "Sid": "Stmt1708481607341",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:root"
      },
      "Action": "s3:*",
      "Resource": "arn:aws:s3:::alb-websrvatms1/access/AWSLogs/777777777777/*"
    }
  ]
}

Let's apply the policy in AWS S3 → buckets → my-bucket → Permissions
Also, you can use the auxiliary tool: AWS Policy Generator

In addition, there is another variant of the Principal like "Principal": { "Service": "logdelivery.elasticloadbalancing.amazonaws.com" }.

Do not hesitate to make your own experiments!

Setting up the ALB

Now that we've dealt with the storage, we need to configure the balancer. Go to: EC2 → Load balancers → my-alb → Attributes. We turn on the necessary ones logs and set an optional prefix:

A closing slash is not allowed.

AWS Athena

Let's deal with Athena. First, we need to create a database. Amazon Athena → Query editor
We execute a request to create a DB using the newly created S3 bucket:

CREATE DATABASE IF NOT EXISTS testdb COMMENT 'test DB' LOCATION 's3://aws-athena1/DB/' WITH DBPROPERTIES ('creator'='Oleg Sydorov');

Then, let's execute the request to create the alb_logs table:

CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs (
type string,
time string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code int,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string,
domain_name string,
chosen_cert_arn string,
matched_rule_priority string,
request_creation_time string,
actions_executed string,
redirect_url string,
lambda_error_reason string,
target_port_list string,
target_status_code_list string,
classification string,
classification_reason string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' =
'([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]
*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]
*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)
\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"')
LOCATION 's3://alb-websrvatms1/access/AWSLogs/777777777777/elasticloadbalancing/eu-central-1/'

The name of the table alb_logs as well as the location LOCATION 's3://alb-websrvatms1/access/AWSLogs/777777777777/elasticloadbalancing/eu-central-1/' must be adapted to current paths and names.

No additional configuration of rights is required.

Conclusion

Now that everything is configured, you can get the necessary data using a simple SELECT (SQL-like) syntax:

SELECT * FROM alb_logs WHERE time > '2024-02-22' ORDER BY request_creation_time desc limit 10;

Congratulations, it works! Be creative and feel free to perform your own investigations.

Good luck!

DEV Community