AWS added the possibility to attach Elastic File System (EFS) disks to Lambda functions. That opened some new possibilities and use cases. About a month ago, I used EFS with Step Functions to build an ETL process that feeds our data lake. It was a bit of fun and challenge at the same time, so I decided I will share my experience and solution with you.
This article explains in detail how I configured the EFS disk in Serverless Framework. I hope this knowledge will allow you to discover new serverless possibilities. ๐
What is AWS EFS?
Letโs begin by explaining what we are dealing with. AWS EFS is something like a network drive that can be connected to many devices at the same time. EFS is an old service, so far it has supported EC2 instances and containers. It has only recently been integrated with AWS Lambda.
After everything is set up in the infrastructure (as code, of course), the EFS disk becomes available to the Lambda function at /mnt/your_efs_diskpath
.
What does it change?
What new scenarios come into play thanks to this integration?
First and foremost, an EFS drive with virtually unlimited capacity removes the lack of space problem. Normally, functions are bound to 500MB limit and can only store files in /tmp
directory. With the EFS disc, we can freely cross this limit. So we can work with large files!
All machine learning fans are excited by that!
Thatโs right, Machine Learning is one of the main use cases unlocked with the new functionality. There are already reports on the web about what has been achieved with the combination of the ESF and Lambda. That also unlocked other use cases that require large resources, for example converting video files.
Moreover, the directory /tmp
is an integral part of each container in which a given instance of the Lambda function runs. In contrast, EFS is shared between multiple Lambda devices or functions. This means that the file once saved is available to all users of the disk and will not disappear when AWS Lambda service destroys the function container.
We can also use unlimited space to hold all kinds of libraries. For example, in the aforementioned machine learning, where libraries are of significant size, this will be exploited for sure.
Second, the integration with the EFS service gives us an alternative to using AWS S3. In some scenarios, access to a regular system file turns out to be faster, more convenient, and cheaper. EFS is billed only for the amount of space we use (unless we choose extra features), by contrast, AWS S3 service also charges for each write and read (PUT and GET) operation. Often, frequent writing and reading to S3 generates more costs than the used space.
This is one of the reasons why I used EFS in my latest application, which takes data from various sources and puts it in my data lake. Together with Lambda and Step Functions, it proved to be a perfect match.
Disadvantages of the EFS
Of course, as it usually happens in architecture, there are serious trade-offs here as well. EFS is just a regular file system. There are no events there, so forget to call the Lambda function automatically after someone uploads the file to the disk. Provisioning and disk configuration is very clunky, and compared to S3 it is a real ordeal, which you will see for yourself ๐
How to connect EFS disk to Lambda function?
At the moment of writing, there is no official support in Serverless Framework yet (although it is already in CloudFormation and SAM) โ Iโm sure that when they introduce them, it will be much more convenient to add EFS to the function. For now, however, I am sharing my solution, which I developed based on articles by Yan Cui, James Beswick, and Peter Sbarski.
First, you need a VPC
Lambda function to be able to connect to the EFS disk must be in the same VPC as the disk. From 2019, this is not a problem because AWS has significantly reduced the cold starts in VPC.
In order for the function to run in the VPC, it is enough to provide the subnet and Security Group IDs in the serverless.yml
configuration file.
functions:
writeToEfs:
handler: src/writeToEfs/function.handler
vpc:
securityGroupIds:
- sg-xxxxxxxx
subnetIds:
- subnet-xxxxxxxx
- subnet-xxxxxxxx
Second, correct privileges
The role with which the Lambda function will be run must have the appropriate rights. I copied those from Yanโs blog post:
provider:
# other configuration
iamRoleStatements:
- Effect: Allow
Action:
- ec2:CreateNetworkInterface
- ec2:DescribeNetworkInterfaces
- ec2:DeleteNetworkInterface
- elasticfilesystem:ClientMount
- elasticfilesystem:ClientRootAccess
- elasticfilesystem:ClientWrite
- elasticfilesystem:DescribeMountTargets
Resource: '*'
Next, the EFS drive itself
Creating a drive is not as easy as it may seem and is far from the convenience of creating and using S3 buckets. In the resources
section of serverless.yml
define the following resources:
resources:
Resources:
NetworkDrive:
Type: AWS::EFS::FileSystem
Properties:
FileSystemTags:
- Key: Name
Value: LambdaDrive-${self:provider.stage}
MountTargetResourceA:
Type: AWS::EFS::MountTarget
Properties:
FileSystemId: !Ref NetworkDrive
SubnetId: subnet-xxxxxxxx # change that value to your id
SecurityGroups:
- !GetAtt MountPointSecurityGroup.GroupId
MountTargetResourceB:
Type: AWS::EFS::MountTarget
Properties:
FileSystemId: !Ref NetworkDrive
SubnetId: subnet-xxxxxxxx # change that value to your id
SecurityGroups:
- !GetAtt MountPointSecurityGroup.GroupId
MountPointSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group to allow NFS - Lambda communication.
VpcId: vpc-xxxxxxx
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 2049
ToPort: 2049
SourceSecurityGroupId: sg-xxxxxxxx # change that. Same as one for Lambda
SecurityGroupEgress:
- IpProtocol: '-1'
CidrIp: 0.0.0.0/0
AccessPointResource:
Type: AWS::EFS::AccessPoint
Properties:
FileSystemId: !Ref NetworkDrive
PosixUser:
Uid: 1001
Gid: 1001
RootDirectory:
CreationInfo:
OwnerGid: 1001
OwnerUid: 1001
Permissions: 770
Path: /efs
Why so much? I wonder myself. ๐ค
First, we have the EFS drive itself with a logical name NetworkDrive. Unfortunately, without the rest of the stuff, it is completely useless. We need to connect it to some virtual network, hence we have MountTargetResourceA
and MountTargetResourceB
, which allows us to get to it from the given subnets. Then we have the MountPointSecurityGroup
one that is necessary because otherwise network traffic will be blocked on the port that EFS uses.
It is essential to provide the correct Security Group id in the parameter SourceSecurityGroupId
- this is the same group we assigned to the Lambda function at the very beginning.
The last item is AccessPointResource
. And that is what AWS has cleverly invented in my opinion. Since many clients will connect to the same network drive, we need some way of managing users and access to files on that drive. And thatโs AccessPointResource
abstracts away. There can be many access points, but I used one for all my Lambda functions. Contrary to appearances, a lot is happening here, and this configuration is closely related to the NetworkDrive
defined at the beginning. As I said, here we define the access rights, and depending on them AWS will or will not be able to initialize the file system for us during first use. Hence Permissions
set to 770
(remember chmod
?). Thanks to this, a directory (folder) /efs
will be created for us. More information about that in the documentation.
Combining EFS disk with Lambda function
At the moment of writing this, it cannot be done elegantly in Serverless Framework. While waiting for official support, we can use a little-known functionality called extensions
, which allows you to modify the settings of the Lambda functions created in the section functions
.
resources:
Resources:
NetworkDrive:
# ...
# ...
# ...
extensions:
WriteToEfsLambdaFunction:
Properties:
FileSystemConfigs:
- Arn: !GetAtt AccessPointResource.Arn
LocalMountPath: /mnt/efs
Here we are peeking under the Serverless Framework hood.
Congratulations, youโve just become a specialist!
But whatโs going on here? Above, is a piece of CloudFormation that will be added to the code generated by Serverless Framework for the Lambda function that we defined at the very beginning and named writeToEfs
.
The key here is the name of our extension, without going into the details of how Serverless Framework works, the convention is that "capitalized-function-name" + "LambdaFunction". This is how writeToEfs
becomes WriteToEfsLambdaFunction
. Later in this section, we give a reference to AccessPointResource
and say that the drive is to be mounted under a path /mnt/efs
. This is actually standard Linux disk mounting operation, after all Lambda runs on Amazon Linux.
Letโs do the deploy
Thatโs all the infrastructure code needed to connect your EFS disk to the Lambda function (assuming you already have a VPC). Unfortunately, in my project, the deployment of such a configuration did not work. After some tinkering, I came up with a solution. I had to do it in two steps.
- In the first step, I commented out the
extensions
section and didsls deploy
which finished successfully. - In the second step, I uncommented this fragment and did deployment again. My Lambda functions were properly updated and had access to the EFS disk.
Summary
Hope this tutorial helps you to set up an EFS drive in your project using Infrastructure as Code principle. As you can see, it is not simple, but once configured it works flawlessly.
In my project, I am satisfied with this solution. I have 4 different Lambda functions configured exactly this way. The only difference is that each of them has its own configuration in the section extensions, but they all use the same AccessPointResource
.
Good luck with your project!
Update
This article was originally posted on medium, but I wanted to move it to dev.to.
Since publication, Serverless Framework added native support for EFS drives. You can read documentation here, and it is much simpler to implement:
provider: aws
functions:
hello:
handler: handler.hello
fileSystemConfig:
localMountPath: /mnt/example
arn: arn:aws:elasticfilesystem:us-east-1:111111111111:access-point/fsap-0d0d0d0d0d0d0d0d0
vpc:
securityGroupIds:
- securityGroupId1
subnetIds:
- subnetId1
Source: Serverless docs.
Top comments (4)
Thanks for the post. With CDK it should be even easier, in theory!
Depends on what you like :-)
I'm not a big fan of CDK, and prefer Serverless Framework.
it depends on the personal preference and the team. yes!
do you have example of cdk for this? It's better to understand through code