Observability becomes significantly more challenging when transitioning to distributed systems, particularly in Serverless architectures. While serverless design is beneficial for decomposition and scalability, its granular nature imposes challenges for observability. Therefore, it is important always to find ways to instrument the software without tightly coupling the instrumentation to the dedicated environment for core software processing.
This article explores two Serverless computing services and offers a straightforward method to centralize logging for Serverless applications:
Cloudwatch Subscription Filters
AWS Lambda Zip package with Extensions
AWS Lambda Custom Image with Extensions
AWS Lambda Web Adapter Image with Extensions
AWS Fargate with sidecar
About Provided Source Code
💡 The whole examples can be found in source code GitHub repository
The source code is designed as mono-repo using NX
, and pnpm
. The core
package is a private lib that shares some central helpers with other modules. observability-core
module is the prerequisite for all other modules and provides:
Lambda Extension Layer
ECR Repository
Base Container Image with Extension
Kinesis data Stream
IAM managed policy.
The dependencies are configured via nx.json
file in root of the repository for the cdk
and build
targets. This will force the build and deployment of prerequisite module before the other modules.
{
"targetDefaults":{
"cdk":{
"dependsOn":[
{
"projects":"@xaaxaax/observability-core",
"target":"cdk",
"params":"forward",
"required":[
"projects",
"target"
]
}
]
},
"build":{
"dependsOn":[
{
"projects":"@xaaxaax/observability-core",
"target":"build",
"params":"forward",
"required":[
"projects",
"target"
]
}
]
}
}
}
The targets are defined in the root package.json
file scripts section as below:
{
"scripts":{
"nx:build:all":"nx run-many --target=build --output-style static --skip-nx-cache",
"nx:cdk:all":"nx run-many --target=cdk --output-style static --skip-nx-cache --require-approval never"
}
}
💡 For simplicity the created functions are configured with function URLs and can be triggered easily. The only hint is a function gets triggered two times when triggering from the browser as an invocation will be for
favicon.ico
.
Cloudwatch Subscription Filters
AWS services like Lambda and Fargate have native integration with CloudWatch, but for critical workloads, the cost of log ingestion can become prohibitive. Depending on usage needs, CloudWatch logs can be utilized selectively, with different approaches available.
When using CloudWatch, Subscription Filters offer a way to forward logs to various destinations, including OpenSearch, Kinesis Data Streams, Amazon Data Firehose, or AWS Lambda.
In this section, CloudWatch Subscription Filters are used to stream logs to an Amazon Kinesis Data Stream for further processing and analysis.
The following code snippet showcases how to implement this log forwarding mechanism using AWS CDK:
const logGropup = new LogGroup(this, 'LogGroup', {
logGroupName: `/aws/lambda/${lambdaWithCloudwatch.functionName}`,
retention: RetentionDays.ONE_DAY,
removalPolicy: RemovalPolicy.DESTROY, });
const logsDeliveryRole = new Role(this, `LogsDeliveryRole`, {
assumedBy: new ServicePrincipal('logs.amazonaws.com')
});
logGropup.addSubscriptionFilter('SubscriptionFilter', {
destination: new KinesisDestination(LogStream,{
role: logsDeliveryRole
}),
filterPattern: {
logPatternString: ' ', // this configure all logs to be filtered
}
})
LogStream.grantWrite(logsDeliveryRole);
Invoking the lambda function will result sending log records in kinesis via cloudwatch subscription filters as shown is the following figure.
Lambda Telemetry Api
Lambda offers a Telemetry API, which will be excellent choice to capture function log records without using the cloudwatch logs. The logs received through the Telemetry API follow a straightforward format, as shown below.
{
"time":"2025-01-30T00:00:00.000Z",
"type":"function",
"record":{
"timestamp":"2025-01-30T00:00:09.429Z",
"level":"INFO",
"requestId":"79b4f56e-95b1-4643-9700-2807f4e68189",
"message":"Log Message HERE"
}
}
If The lambda LogFormat is TEXT the received format will be as the following snippet.
{
"time":"2025-01-30T00:00:00.000Z",
"type":"function",
"record":"2025-01-30T00:00:09.429Z 79b4f56e-95b1-4643-9700-2807f4e68189 [INFO] Log Message HERE"
}
Lambda Extensions
For high-throughput applications, relying on CloudWatch Logs can lead to substantial costs. One way to mitigate this is by denying CloudWatch Logs permissions in the Lambda execution role. This prevents the Lambda service from sending logs to CloudWatch, and the consequent will be preventing the use of "Subscription Filters."
However, Lambda provides a Telemetry API that captures all logs, even when CloudWatch logging is disabled. By using the Extension API, you can subscribe to the Telemetry API and register for specific log categories, such as platform, function, or extension logs.
💡 The Source code repository provides the extension module here, that will be used for Lambda Zip package, Custom Image and Web Adapter image sections
The execution of extensions, whether as a standard ZIP package or a custom image, is managed by the Lambda runtime. The Lambda service scans the /opt/extensions
directory and automatically executes any extensions found in that location.
For ZIP package deployments, this attachment occurs during the Lambda initialization phase, where the extensions path is constructed by aggregating all attached layers. However, for custom images, this structure must be manually set up during the container image build process.
This project generates two final assets from the same extension source, along with other previously mentioned resources. The extension itself is built using esbuild
and bundled as JavaScript, with a post-build script handling the folder structure setup.
The provided final assets are:
A Lambda Layer
A ECR Base Container Image
Lambda Layer
For the layer the build process do all necessary steps. The only required step is to use CDK for creating the layer. The following snippet demonstrates the way to create a layer using CDK.
const extension = new LayerVersion(this, 'kinesis-telemetry-api-extension', {
layerVersionName: `${props?.extensionName}`,
code: Code.fromAsset(resolve(process.cwd(), `build`)),
compatibleArchitectures: [
Architecture.X86_64,
Architecture.ARM_64
],
compatibleRuntimes: [
Runtime.NODEJS_20_X,
Runtime.NODEJS_22_X
],
description: props?.extensionName
});
// Exporting the Layer Arn to parameter store
new StringParameter(this, `LambdaExtensionArnParam`, {
parameterName: `/${props.contextVariables.stage}/${props.contextVariables.context}/telemetry/kinesis/extension/arn`,
stringValue: extension.layerVersionArn
});
The LayerVersion
resource point to the build directory generated by build script, The underlying build folder structure is as below
- build
- extensions
- kinesis-telemetry-extension
- kinesis-telemetry-extension
- index.js
The kinesis-telemetry-extension
file under extensions
folder is an executable file that will be the entry point for lambda service to detect extension and execute it. The file name must be equal to the extension directory name for this example executable.
#!/bin/bash
set -euo pipefail
OWN_FILENAME="$(basename $0)"
LAMBDA_EXTENSION_NAME="$OWN_FILENAME"
echo "[extension:bash] launching ${LAMBDA_EXTENSION_NAME}"
exec "/opt/${LAMBDA_EXTENSION_NAME}/index.js"
Base Container Image
The base custom image with extension included is created using a Dockerfile
. The Dockerfile
simply use the built asset ( build
folder ) contents and move it to the /opt/
directory in built image.
FROM node:22.13.1-slim
COPY build /opt/
WORKDIR /opt/extensions
The Image will be built and pushed to the ECR repository created via cdk
which is a prerequisite for pushing the image to ECR. This is done using a post
script.
{
"name":"@xaaxaax/observability-core",
...
"scripts":{
"build:docker":"docker buildx build --platform linux/arm64 --no-cache -t $ECR_REPOSITORY:latest .",
"postbuild:docker":"pnpm run build:docker:login && pnpm run build:docker:tag && pnpm run build:docker:push",
"build:docker:login":"aws ecr get-login-password --region $REGION --profile admin@dev | docker login --username AWS --password-stdin $ECR_URI",
"build:docker:tag":"docker tag $ECR_REPOSITORY:latest $ECR_URI/$ECR_REPOSITORY:latest",
"build:docker:push":"docker push $ECR_URI/$ECR_REPOSITORY:latest",
"cdk":"cdk --profile admin@dev --app 'tsx ./cdk/bin/app.ts' -c env=dev",
"postcdk":"cross-env REGION=eu-west-1 ECR_URI=904233108557.dkr.ecr.eu-west-1.amazonaws.com ECR_REPOSITORY=lambda-telemetry-image pnpm run build:docker"
}
...
}
Lambda Zip package with Extensions
Dealing with Zip lambda package is the simplest option by attaching the layer to the lambda function. but also giving the required permissions to the associated role for underlying infrastructure that the extension need to interact with that is kinesis data stream in the provided example.
The following CDK represents how the extension layer can be attached to the function and what are the required permissions.
const extensionArn = StringParameter.fromStringParameterName(
this,
'extensionId',
`/${props.contextVariables.stage}/logs-collector-lambda-extension/telemetry/kinesis/extension/arn`
).stringValue;
const managedPolicyArn = StringParameter.fromStringParameterName(
this,
'policyName',
`/${props.contextVariables.stage}/logs-collector-lambda-extension/telemetry/kinesis/runtime/policy/arn`
).stringValue;
const functionRole = new Role(
this,
'LambdaFunctionRole', {
assumedBy: new ServicePrincipal('lambda.amazonaws.com'),
managedPolicies: [
ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSLambdaBasicExecutionRole'),
ManagedPolicy.fromManagedPolicyArn(this, 'managed-policy', managedPolicyArn)] });
const lambdaFunction = new NodejsFunction(this, 'LambdaZipFunction', {
entry: resolve(process.cwd(), 'src/handler.ts'),
...
bundling: { ... },
layers: [
LayerVersion.fromLayerVersionArn(this, 'ExtensionArn', extensionArn)
]
});
Lambda Custom Image With Extensions
The example custom images build a docker image base function from a Dockerfile
based on provided base image with extension included.
The Dockerfile
is based on both extension image and lambda nodejs 22 image provided by aws. The interesting part of AWS provided image is that it can be run locally and can be invoked for example using a curl command.
FROM 904233108557.dkr.ecr.eu-west-1.amazonaws.com/lambda-telemetry-image:latest AS extensions
FROM public.ecr.aws/lambda/nodejs:22
WORKDIR ${LAMBDA_TASK_ROOT}
COPY dist/* ./
COPY --from=extensions ./opt/ /opt/
CMD ["index.handler"]
The built asset will be copied to the /var/task
path that can be accessed using LAMBDA_TASK_ROOT
env variable and finally the CMD
layer point to the handler inside index.js
file.
💡 In the example the base image imageUri is hardcoded in
Dockerfile
but it can be parametrized using parameter store and passing as docker build ARGs,
Lambda Web Adapter Image With Extensions
While Lambda Adapter provides a custom runtime, it brings some particularity to the way the Dockerfile shall be used. As per LWA documentation and examples, the base image used is public.ecr.aws/docker/library/node:22.9.0-slim
and not public.ecr.aws/lambda/nodejs:22
which means the lambda api interface no more can be used for local invocation, e.g using curl.
The example Dockerfile
uses multiples stages
FROM 904233108557.dkr.ecr.eu-west-1.amazonaws.com/lambda-telemetry-image:latest AS extensions
FROM public.ecr.aws/awsguru/aws-lambda-adapter:0.9.0-aarch64 AS webadapter
FROM public.ecr.aws/docker/library/node:22.9.0-slim
WORKDIR ${LAMBDA_TASK_ROOT}
COPY dist/* ./
COPY --from=extensions ./opt/ /opt/
COPY --from=webadapter /lambda-adapter /opt/extensions/lambda-adapter
CMD ["node", "index.js"]
The image will use the extension base image alongside the lambda web adapter base image and use the contents of ./opt
folder related to each extension. Also it uses the built asset of function code ( here a http
server on default port of LWA 8080 ).
The particularity behavior for LWA in our example is the way function logs are received in extension. Only the function logs are under this unfortunate behavior and are not formatted as a valid json object but are treated as text event while the LambdaLogFormat is set as JSON. so the above official format is not working and element.record.message
will result an undefined value. The following shows how the record is received which is a representation of Javascript object surrounded by double quotes.
{
"time":"2025-01-29T21:24:33@.665Z",
"type":"function",
"record":"{ name: 'omid' }"
}
To resolve the problem, the extension is adopted to look at element.record
if the element.record.message
is value. But event the change is not sufficient as the received record is a double quoted JS object. So the log data must be formatted using JSON.stringify()
.
console.log(JSON.stringify( logObject ));
Fargate with Firelens sidecar
Fargate, as a serverless solution for running containers on demand, supports both short-lived and long-running tasks. Regardless of the use case, enabling containers to communicate and complement each others capabilities is essential for building scalable and efficient architectures.
In line with the examples in this article, this section demonstrates how to forward container logs to a central Kinesis Data Stream. To achieve this, the Fargate task can include a sidecar container responsible for collecting logs and forwarding them to the data stream.
The application container provides a Dockerfile
as below
FROM --platform=linux/arm64 public.ecr.aws/docker/library/node:22-slim
COPY dist/* ./
CMD ["node", "index.js"]
But as mentioned above, there will be another Dockerfile
for the log forwarder container that uses the FluentBit
image provided by aws.
FROM amazon/aws-for-fluent-bit:latest
ADD container.conf /container.conf
ADD parsers.conf /parsers.conf
As shown in the Dockerfile
there are two configuration files for Parsing and Container specific configurations such as Filtering , etc. the content of both files are as below
// parsers.conf file
[PARSER]
Name log_json
Format json
// container.conf file
[SERVICE]
Parsers_File parsers.conf
[FILTER]
Name parser
Match *
Key_Name log
Parser log_json
[FILTER]
Name grep
Match *
Regex app_name fargate-example-app
Let see how these are deployed and resources are created. AWS Cdk provides the L2 constructs that can be used to simplify the infra as code steps. The example uses the FargateTaskDefinition
and FargateService
constructs.
const jobDefinition = new FargateTaskDefinition(this, 'JobDefinition', {
cpu: 256,
memoryLimitMiB: 512,
runtimePlatform: {
cpuArchitecture: CpuArchitecture.ARM64,
operatingSystemFamily: OperatingSystemFamily.LINUX
},
taskRole: jobTaskRole,
executionRole: jobTaskExecutionRole
});
After creating the base TaskDefinition, the app container and firelens router will be added as below
jobDefinition.addContainer('Container', {
image: ContainerImage.fromAsset(join(process.cwd())),
logging: LogDrivers.firelens({
options: {
Name: 'kinesis_streams',
region,
stream: props.streamName
}
})
});
jobDefinition.addFirelensLogRouter('LoggingContainer', {
image: ContainerImage.fromAsset(join(process.cwd(), 'fluent-bit')),
logging: LogDrivers.awsLogs({
streamPrefix: 'logging',
logGroup: new LogGroup(this, 'FireLensLogGroup', {
logGroupName: `/ecs/${props.contextVariables.context}`,
retention: RetentionDays.ONE_DAY,
removalPolicy: RemovalPolicy.DESTROY
})
}),
environment: { FLB_LOG_LEVEL: 'info' },
firelensConfig: {
type: FirelensLogRouterType.FLUENTBIT,
options: {
configFileType: FirelensConfigFileType.FILE,
configFileValue: '/container.conf'
}
}
});
A service shall be created to encapsulate a task consisting of two side by side containers. The is simple and straightforward.
const service = new FargateService(this, 'Service', {
cluster,
capacityProviderStrategies: capacityStrategy,
desiredCount: 1,
platformVersion: FargatePlatformVersion.VERSION1_4,
propagateTags: PropagatedTagSource.TASK_DEFINITION,
taskDefinition: jobDefinition,
assignPublicIp: true,
vpcSubnets: {
subnets: vpc.publicSubnets
},
securityGroups: [taskSecurityGroup]
});
💡 For simplicity, the example allow assigning a public ip address to the task and put the service in public subnets, the reason that this is required is the
FargatePlatformVersion.VERSION1_4
is under managedawsvpc
and this is the simplest way to let fargate pull images from ECR. This is not recommended for production cases.
The Task role must have the kinesis PutRecords
action permissions. Here the observability-core stack provides a managed policy that can be attached to the role.
const managedPolicyArn = StringParameter.fromStringParameterName(
this,
'ObservabilityManagedPolicy',
`/${props.contextVariables.stage}/logs-collector-observability-core/telemetry/kinesis/runtime/policy/arn`
).stringValue;
const jobTaskRole = new Role(this, 'JobTaskRole', {
assumedBy: new ServicePrincipal('ecs-tasks.amazonaws.com'),
managedPolicies:[
ManagedPolicy.fromManagedPolicyArn(this, 'TaskRoleManagedPolicy', managedPolicyArn)
]
});
const jobTaskExecutionRole = new Role(this, 'JobTaskExecutionRole', {
assumedBy: new ServicePrincipal('ecs-tasks.amazonaws.com'),
managedPolicies: [
ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonECSTaskExecutionRolePolicy')
]
});
After deploying attached IP to the created ENI can be used over http
protocol. The logs will be sent to the Kinesis DS as shown in following screenshot.
💡 The Firelens has the same problem as LWA mentioned before, The log metadata object must be stringyfied. If the JS object is directly logged the same behavior will be provided as Web Adapter.
Conclusion
While Serverless offers a wide range of managed services that scale per needs, It is important to forget the shared responsibility that forces engineering teams to be engaged on their side. As part of software development the use of Processors capacity and Memory is the part under engineering teams ownership. This is not far from traditional software principals but is somehow forgotten by the fascinating nature of managed services.
Using Lambda extensions, multi container, or background processes is the way to apply processing isolation and achieve more trustable software which is running as a foreground process.
The article focuses on log aggregation to represent how decouple the critical processing from non critical ones via isolation and provides some examples to showcase the implementation in different scenarios.
Top comments (0)