In this article, I will explain why we need to build a Pandas & NumPy compatible lambda layer to process our data analytics tasks in AWS using Lambda Function. And how to choose and build a specific Lambda layer for these two packages.
As we all know, Pandas & NumPy are two of the most popular python packages for machine learning, big data analytics and data processing, etc. Although there is a 900 seconds (15 minutes) timeout quota for Lambda Function, AWS Lambda Function is widely used in Cloud Computing when handling lightweight data analytics tasks. Moreover, Lambda Layer provide a convenient way to package libraries and other dependencies that you can use with your Lambda functions, as layers reduces the size of uploaded deployment archives and makes it faster to deploy your code.
With all of the above, it's best practice to package Pandas& NumPy in Lambda layer to improve deploy speed and code reusability. What's more, you can use the code editor in the AWS Lambda console to write, test, and view the execution results of your Lambda function code if the source code is a .zip archive deployment package, and the size of the deployment package is less than 3 MB.
However, when I created a Lambda function with python3.8 runtime, x86_64 architecture from AWS console, then packaged Pandas & NumPy as a .zip archive on MacBook Air (my local machine), finally created a Lambda layer and attached it to the function. I got the error message below when executing the function.
[ERROR] Runtime.ImportModuleError: Unable to import module 'pandas_tester': Unable to import required dependencies:
numpy:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
* The Python version is: Python3.9 from "/var/lang/bin/python3.9"
* The NumPy version is: "1.24.3"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: No module named 'numpy.core._multiarray_umath'
Well, I thought that the root cause was the mismatch Python version. The Python version on my machine is 3.9, but the function runtime is Python 3.8. So I installed a virtual environment for Python 3.8 using pipenv, and installed packages in it using pip install pandas -t build/
. Below packages got installed.
However, same error messages disappeared. After diving into Lambda runtime and dependency compatibility on the Internet for several hours, I found below explanation:
Python packages that contain compiled code (for example: Pandas and NumPy) aren't always compatible with Lambda runtimes by default. If you install these packages using pip, then the packages download and compile a module-name package for the architecture of the local machine. This makes your deployment package incompatible with Lambda if you're not using a Linux operating system.
For my case, AWS Lambda function and layer run on Amazon Linux or Amazon Linux 2, and dependency versions I installed using pipenv on my MacBook Air (OS: macOS Monterey, Apple M2, Darwin arm64 ) are ARM compatible (xxx_macosx_xxx_arm64.whl) no matter which python version is used. ARM compatible packages are always installed on the Darwin arm64 machine. Below is the relationship among AWS Lambda Runtime, operation system and architecture.
Lambda Runtime | Operating System | Architectures |
---|---|---|
Python 3.7 | Amazon Linux | x86_64 |
Python 3.8 | Amazon Linux 2 | x86_64, arm64 |
Python 3.9 | Amazon Linux 2 | x86_64, arm64 |
To fix it, there are two options:
- Install Linux OS compatible packages in virtual environment if you don't have docker installed.
- Install Linux OS compatible packages in AWS provided base images if you have docker installed.
Install Linux OS compatible packages in virtual environment
Run the pip install
command with manylinux2014 as the value for the --platform parameter.
# install-package.sh
pip install \
--platform manylinux2014_x86_64 \
--target=my-lambda-function \
--implementation cp \
--python 3.8 \
--only-binary=:all: --upgrade \
pandas -t ./build/python
Install Linux OS compatible packages in AWS provided base images
If you have docker installed on the local machine, you can leverage AWS provided base images for Lambda and install your lambda dependencies in the container. The below command makes sure dependencies with python 3.8 version are installed.
# build.sh
docker run \
-v "$PWD":/var/task \
"public.ecr.aws/sam/build-python3.8" /bin/sh -c "./install-package.sh"
These Base Images contain the Amazon Linux Base operating system, the runtime for a given language, dependencies and the Lambda Runtime Interface Client (RIC), which implements the Lambda Runtime API. The Lambda Runtime Interface Client allows your runtime to receive requests from and send requests to the Lambda service.
Personally I recommend to use this solution as you don't need to worry about the packages compatibility problem when using AWS Lambda. Well, if you don't have docker installed on your local machine, you can still choose the first option to install these dependencies in a virtual environment using Conda or other Python package manager as you prefer.
For testing purposes, I built 6 lambda layers with Pandas & NumPy installed for Python 3.7, 3.8, 3.9 runtimes that run on x86_64 or arm64 architectures. And add each layer to a lambda function pandas_tester.py.
Invoke lambda function, and the result shows as below.
Lambda Layers | x86_64 (default) Python 3.7 | x86_64 (default) Python 3.8 | x86_64 (default) Python 3.9 | arm64 Python 3.7 | arm64 Python 3.8 | arm64 Python 3.9 |
---|---|---|---|---|---|---|
pandas_cp37_manylinux2014_x86_64 | Pass | Failed | Failed | Pass | Failed | Failed |
pandas_cp37_manylinux2014_aarch64 | Failed | Failed | Failed | Failed | Failed | Failed |
pandas_cp38_manylinux2014_x86_64 | Failed | Pass | Failed | Failed | Pass | Failed |
pandas_cp38_manylinux2014_aarch64 | Failed | Pass | Failed | Failed | Failed | Failed |
pandas_cp39_manylinux2014_x86_64 | Failed | Failed | Pass | Failed | Failed | Pass |
pandas_cp39_manylinux2014_aarch64 | Failed | Failed | Failed | Failed | Failed | Failed |
I pushed the source code that generates Lambda functions and layers to GitHub repo https://github.com/camillehe1992/pandas-compatible-on-aws-lambda. Choose the compatible --platform and --python to install your packages based on the Lambda function runtime.
Reference
- https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html
- https://repost.aws/knowledge-center/lambda-python-package-compatible
- https://github.com/aws/aws-lambda-base-images
Thanks for reading and appreciate your comments on content and grammar!
Top comments (1)
Great work!