Pawel Zubkiewicz for AWS Community Builders

Posted on Jan 24, 2022

How to connect EFS disk to a Lambda function?

#serverless #aws #efs #lambda

AWS added the possibility to attach Elastic File System (EFS) disks to Lambda functions. That opened some new possibilities and use cases. About a month ago, I used EFS with Step Functions to build an ETL process that feeds our data lake. It was a bit of fun and challenge at the same time, so I decided I will share my experience and solution with you.

This article explains in detail how I configured the EFS disk in Serverless Framework. I hope this knowledge will allow you to discover new serverless possibilities. 😀

What is AWS EFS?

Let’s begin by explaining what we are dealing with. AWS EFS is something like a network drive that can be connected to many devices at the same time. EFS is an old service, so far it has supported EC2 instances and containers. It has only recently been integrated with AWS Lambda.

After everything is set up in the infrastructure (as code, of course), the EFS disk becomes available to the Lambda function at /mnt/your_efs_diskpath.

What does it change?

What new scenarios come into play thanks to this integration?

First and foremost, an EFS drive with virtually unlimited capacity removes the lack of space problem. Normally, functions are bound to 500MB limit and can only store files in /tmp directory. With the EFS disc, we can freely cross this limit. So we can work with large files!

All machine learning fans are excited by that!

That’s right, Machine Learning is one of the main use cases unlocked with the new functionality. There are already reports on the web about what has been achieved with the combination of the ESF and Lambda. That also unlocked other use cases that require large resources, for example converting video files.

Moreover, the directory /tmp is an integral part of each container in which a given instance of the Lambda function runs. In contrast, EFS is shared between multiple Lambda devices or functions. This means that the file once saved is available to all users of the disk and will not disappear when AWS Lambda service destroys the function container.

We can also use unlimited space to hold all kinds of libraries. For example, in the aforementioned machine learning, where libraries are of significant size, this will be exploited for sure.

Second, the integration with the EFS service gives us an alternative to using AWS S3. In some scenarios, access to a regular system file turns out to be faster, more convenient, and cheaper. EFS is billed only for the amount of space we use (unless we choose extra features), by contrast, AWS S3 service also charges for each write and read (PUT and GET) operation. Often, frequent writing and reading to S3 generates more costs than the used space.

This is one of the reasons why I used EFS in my latest application, which takes data from various sources and puts it in my data lake. Together with Lambda and Step Functions, it proved to be a perfect match.

Disadvantages of the EFS

Of course, as it usually happens in architecture, there are serious trade-offs here as well. EFS is just a regular file system. There are no events there, so forget to call the Lambda function automatically after someone uploads the file to the disk. Provisioning and disk configuration is very clunky, and compared to S3 it is a real ordeal, which you will see for yourself 😉

How to connect EFS disk to Lambda function?

At the moment of writing, there is no official support in Serverless Framework yet (although it is already in CloudFormation and SAM) — I’m sure that when they introduce them, it will be much more convenient to add EFS to the function. For now, however, I am sharing my solution, which I developed based on articles by Yan Cui, James Beswick, and Peter Sbarski.

First, you need a VPC

Lambda function to be able to connect to the EFS disk must be in the same VPC as the disk. From 2019, this is not a problem because AWS has significantly reduced the cold starts in VPC.

In order for the function to run in the VPC, it is enough to provide the subnet and Security Group IDs in the serverless.yml configuration file.

functions:
  writeToEfs:
    handler: src/writeToEfs/function.handler
    vpc:
      securityGroupIds:
        - sg-xxxxxxxx
      subnetIds:
        - subnet-xxxxxxxx
        - subnet-xxxxxxxx

Second, correct privileges

The role with which the Lambda function will be run must have the appropriate rights. I copied those from Yan’s blog post:

provider:
  # other configuration 
  iamRoleStatements:
    - Effect: Allow
      Action:
        - ec2:CreateNetworkInterface
        - ec2:DescribeNetworkInterfaces
        - ec2:DeleteNetworkInterface
        - elasticfilesystem:ClientMount
        - elasticfilesystem:ClientRootAccess
        - elasticfilesystem:ClientWrite
        - elasticfilesystem:DescribeMountTargets
      Resource: '*'

Next, the EFS drive itself

Creating a drive is not as easy as it may seem and is far from the convenience of creating and using S3 buckets. In the resources section of serverless.yml define the following resources:

resources:
  Resources:
    NetworkDrive:
      Type: AWS::EFS::FileSystem
      Properties:
        FileSystemTags:
          - Key: Name
            Value: LambdaDrive-${self:provider.stage}

    MountTargetResourceA:
      Type: AWS::EFS::MountTarget
      Properties:
        FileSystemId: !Ref NetworkDrive
        SubnetId: subnet-xxxxxxxx # change that value to your id
        SecurityGroups:
          - !GetAtt MountPointSecurityGroup.GroupId

    MountTargetResourceB:
      Type: AWS::EFS::MountTarget
      Properties:
        FileSystemId: !Ref NetworkDrive
        SubnetId: subnet-xxxxxxxx # change that value to your id
        SecurityGroups:
          - !GetAtt MountPointSecurityGroup.GroupId

    MountPointSecurityGroup:
      Type: AWS::EC2::SecurityGroup
      Properties:
        GroupDescription: Security group to allow NFS - Lambda communication.
        VpcId: vpc-xxxxxxx
        SecurityGroupIngress:
          - IpProtocol: tcp
            FromPort: 2049
            ToPort: 2049
            SourceSecurityGroupId: sg-xxxxxxxx # change that. Same as one for Lambda
        SecurityGroupEgress:
          - IpProtocol: '-1'
            CidrIp: 0.0.0.0/0

    AccessPointResource:
      Type: AWS::EFS::AccessPoint
      Properties:
        FileSystemId: !Ref NetworkDrive
        PosixUser:
          Uid: 1001
          Gid: 1001
        RootDirectory:
          CreationInfo:
            OwnerGid: 1001
            OwnerUid: 1001
            Permissions: 770
          Path: /efs

Why so much? I wonder myself. 🤔

First, we have the EFS drive itself with a logical name NetworkDrive. Unfortunately, without the rest of the stuff, it is completely useless. We need to connect it to some virtual network, hence we have MountTargetResourceA and MountTargetResourceB, which allows us to get to it from the given subnets. Then we have the MountPointSecurityGroup one that is necessary because otherwise network traffic will be blocked on the port that EFS uses.

It is essential to provide the correct Security Group id in the parameter SourceSecurityGroupId - this is the same group we assigned to the Lambda function at the very beginning.

The last item is AccessPointResource. And that is what AWS has cleverly invented in my opinion. Since many clients will connect to the same network drive, we need some way of managing users and access to files on that drive. And that’s AccessPointResource abstracts away. There can be many access points, but I used one for all my Lambda functions. Contrary to appearances, a lot is happening here, and this configuration is closely related to the NetworkDrive defined at the beginning. As I said, here we define the access rights, and depending on them AWS will or will not be able to initialize the file system for us during first use. Hence Permissions set to 770(remember chmod?). Thanks to this, a directory (folder) /efs will be created for us. More information about that in the documentation.

Combining EFS disk with Lambda function

At the moment of writing this, it cannot be done elegantly in Serverless Framework. While waiting for official support, we can use a little-known functionality called extensions, which allows you to modify the settings of the Lambda functions created in the section functions.

resources:
  Resources:
    NetworkDrive:
    # ...
    # ...
    # ...

  extensions:
    WriteToEfsLambdaFunction:
      Properties:
        FileSystemConfigs:
          - Arn: !GetAtt AccessPointResource.Arn
            LocalMountPath: /mnt/efs

Here we are peeking under the Serverless Framework hood.

Congratulations, you’ve just become a specialist!

But what’s going on here? Above, is a piece of CloudFormation that will be added to the code generated by Serverless Framework for the Lambda function that we defined at the very beginning and named writeToEfs.

The key here is the name of our extension, without going into the details of how Serverless Framework works, the convention is that "capitalized-function-name" + "LambdaFunction". This is how writeToEfs becomes WriteToEfsLambdaFunction. Later in this section, we give a reference to AccessPointResource and say that the drive is to be mounted under a path /mnt/efs. This is actually standard Linux disk mounting operation, after all Lambda runs on Amazon Linux.

Let’s do the deploy

That’s all the infrastructure code needed to connect your EFS disk to the Lambda function (assuming you already have a VPC). Unfortunately, in my project, the deployment of such a configuration did not work. After some tinkering, I came up with a solution. I had to do it in two steps.

In the first step, I commented out the extensions section and did sls deploy which finished successfully.
In the second step, I uncommented this fragment and did deployment again. My Lambda functions were properly updated and had access to the EFS disk.

Summary

Hope this tutorial helps you to set up an EFS drive in your project using Infrastructure as Code principle. As you can see, it is not simple, but once configured it works flawlessly.

In my project, I am satisfied with this solution. I have 4 different Lambda functions configured exactly this way. The only difference is that each of them has its own configuration in the section extensions, but they all use the same AccessPointResource.

Good luck with your project!

Update

This article was originally posted on medium, but I wanted to move it to dev.to.

Since publication, Serverless Framework added native support for EFS drives. You can read documentation here, and it is much simpler to implement:

provider: aws

functions:
  hello:
    handler: handler.hello
    fileSystemConfig:
      localMountPath: /mnt/example
      arn: arn:aws:elasticfilesystem:us-east-1:111111111111:access-point/fsap-0d0d0d0d0d0d0d0d0
    vpc:
      securityGroupIds:
        - securityGroupId1
      subnetIds:
        - subnetId1