James Thomas

Posted on Aug 13, 2018 • Originally published at jamesthom.as

Serverless Machine Learning With TensorFlow.js

#serverless #machinelearning #tensorflow #javascript

In a previous blog post, I showed how to use TensorFlow.js on Node.js to run visual recognition on images from the local filesystem. TensorFlow.js is a JavaScript version of the open-source machine learning library from Google.

Once I had this working with a local Node.js script, my next idea was to convert it into a serverless function. Running this function on IBM Cloud Functions (Apache OpenWhisk) would turn the script into my own visual recognition microservice.

Sounds easy, right? It's just a JavaScript library? So, zip it up and away we go... ahem 👊

Converting the image classification script to run in a serverless environment had the following challenges...

TensorFlow.js libraries need to be available in the runtime.
Native bindings for the library must be compiled against the platform architecture.
Models files need to be loaded from the filesystem.

Some of these issues were more challenging than others to fix! Let's start by looking at the details of each issue, before explaining how Docker support in Apache OpenWhisk can be used to resolve them all.

Challenges

TensorFlow.js Libraries

TensorFlow.js libraries are not included in the Node.js runtimes provided by the Apache OpenWhisk.

External libraries can be imported into the runtime by deploying applications from a zip file. Custom node_modules folders included in the zip file will be extracted in the runtime. Zip files are limited to a maximum size of 48MB.

Library Size

Running npm install for the TensorFlow.js libraries used revealed the first problem... the resulting node_modules directory was 175MB. 😱

Looking at the contents of this folder, the tfjs-node module compiles a native shared library (libtensorflow.so) that is 135M. This means no amount of JavaScript minification is going to get those external dependencies under the magic 48 MB limit. 👎

Native Dependencies

The libtensorflow.so native shared library must be compiled using the platform runtime. Running npm install locally automatically compiles native dependencies against the host platform. Local environments may use different CPU architectures (Mac vs Linux) or link against shared libraries not available in the serverless runtime.

MobileNet Model Files

TensorFlow models files need loading from the filesystem in Node.js. Serverless runtimes do provide a temporary filesystem inside the runtime environment. Files from deployment zip files are automatically extracted into this environment before invocations. There is no external access to this filesystem outside the lifecycle of the serverless function.

Models files for the MobileNet model were 16MB. If these files are included in the deployment package, it leaves 32MB for the rest of the application source code. Although the model files are small enough to include in the zip file, what about the TensorFlow.js libraries? Is this the end of the blog post? Not so fast....

Apache OpenWhisk's support for custom runtimes provides a simple solution to all these issues!

Custom Runtimes

Apache OpenWhisk uses Docker containers as the runtime environments for serverless functions (actions). All platform runtime images are published on Docker Hub, allowing developers to start these environments locally.

Developers can also specify custom runtime images when creating actions. These images must be publicly available on Docker Hub. Custom runtimes have to expose the same HTTP API used by the platform for invoking actions.

Using platform runtime images as parent images makes it simple to build custom runtimes. Users can run commands during the Docker build to install additional libraries and other dependencies. The parent image already contains source files with the HTTP API service handling platform requests.

TensorFlow.js Runtime

Here is the Docker build file for the Node.js action runtime with additional TensorFlow.js dependencies.

FROM openwhisk/action-nodejs-v8:latest

RUN npm install @tensorflow/tfjs @tensorflow-models/mobilenet @tensorflow/tfjs-node jpeg-js

COPY mobilenet mobilenet

openwhisk/action-nodejs-v8:latest is the Node.js action runtime image published by OpenWhisk.

TensorFlow libraries and other dependencies are installed using npm install in the build process. Native dependencies for the @tensorflow/tfjs-node library are automatically compiled for the correct platform by installing during the build process.

Since I'm building a new runtime, I've also added the MobileNet model files to the image. Whilst not strictly necessary, removing them from the action zip file reduces deployment times.

Want to skip the next step? Use this image jamesthomas/action-nodejs-v8:tfjs rather than building your own.

Building The Runtime

In the previous blog post, I showed how to download model files from the public storage bucket.

Download a version of the MobileNet model and place all files in the mobilenet directory.
Copy the Docker build file from above to a local file named Dockerfile.
Run the Docker build command to generate a local image.

docker build -t tfjs .

Tag the local image with a remote username and repository.

docker tag tfjs <USERNAME>/action-nodejs-v8:tfjs

Replace <USERNAME> with your Docker Hub username.

Push the local image to Docker Hub

 docker push <USERNAME>/action-nodejs-v8:tfjs

Once the image is available on Docker Hub, actions can be created using that runtime image. 😎

Example Code

This source code implements image classification as an OpenWhisk action. Image files are provided as a Base64 encoded string using the image property on the event parameters. Classification results are returned as the results property in the response.

Caching Loaded Models

Serverless platforms initialise runtime environments on-demand to handle invocations. Once a runtime environment has been created, it will be re-used for further invocations with some limits. This improves performance by removing the initialisation delay ("cold start") from request processing.

Applications can exploit this behaviour by using global variables to maintain state across requests. This is often use to cache opened database connections or store initialisation data loaded from external systems.

I have used this pattern to cache the MobileNet model used for classification. During cold invocations, the model is loaded from the filesystem and stored in a global variable. Warm invocations then use the existence of that global variable to skip the model loading process with further requests.

Caching the model reduces the time (and therefore cost) for classifications on warm invocations.

Memory Leak

Running the Node.js script from blog post on IBM Cloud Functions was possible with minimal modifications. Unfortunately, performance testing revealed a memory leak in the handler function. 😢

Reading more about how TensorFlow.js works on Node.js uncovered the issue...

TensorFlow.js's Node.js extensions use a native C++ library to execute the Tensors on a CPU or GPU engine. Memory allocated for Tensor objects in the native library is retained until the application explicitly releases it or the process exits. TensorFlow.js provides a dispose method on the individual objects to free allocated memory. There is also a tf.tidy method to automatically clean up all allocated objects within a frame.

Reviewing the code, tensors were being created as model input from images on each request. These objects were not disposed before returning from the request handler. This meant native memory grew unbounded. Adding an explicit dispose call to free these objects before returning fixed the issue.

Profiling & Performance

Action code records memory usage and elapsed time at different stages in classification process.

Recording memory usage allows me to modify the maximum memory allocated to the function for optimal performance and cost. Node.js provides a standard library API to retrieve memory usage for the current process. Logging these values allows me to inspect memory usage at different stages.

Timing different tasks in the classification process, i.e. model loading, image classification, gives me an insight into how efficient classification is compared to other methods. Node.js has a standard library API for timers to record and print elapsed time to the console.

Demo

Deploy Action

Run the following command with the IBM Cloud CLI to create the action.

ibmcloud fn action create classify --docker <IMAGE_NAME> index.js

Replace <IMAGE_NAME> with the public Docker Hub image identifier for the custom runtime. Use jamesthomas/action-nodejs-v8:tfjs if you haven't built this manually.

Testing It Out

Download this image of a Panda from Wikipedia.

wget http://bit.ly/2JYSal9 -O panda.jpg

Invoke the action with the Base64 encoded image as an input parameter.

 ibmcloud fn action invoke classify -r -p image $(base64 panda.jpg)

Returned JSON message contains classification probabilities. 🐼🐼🐼

{
  "results":  [{
    className: 'giant panda, panda, panda bear, coon bear',
    probability: 0.9993536472320557
  }]
}

Activation Details

Retrieve logging output for the last activation to show performance data.

ibmcloud fn activation logs --last

Profiling and memory usage details are logged to stdout

prediction function called.
memory used: rss=150.46 MB, heapTotal=32.83 MB, heapUsed=20.29 MB, external=67.6 MB
loading image and model...
decodeImage: 74.233ms
memory used: rss=141.8 MB, heapTotal=24.33 MB, heapUsed=19.05 MB, external=40.63 MB
imageByteArray: 5.676ms
memory used: rss=141.8 MB, heapTotal=24.33 MB, heapUsed=19.05 MB, external=45.51 MB
imageToInput: 5.952ms
memory used: rss=141.8 MB, heapTotal=24.33 MB, heapUsed=19.06 MB, external=45.51 MB
mn_model.classify: 274.805ms
memory used: rss=149.83 MB, heapTotal=24.33 MB, heapUsed=20.57 MB, external=45.51 MB
classification results: [...]
main: 356.639ms
memory used: rss=144.37 MB, heapTotal=24.33 MB, heapUsed=20.58 MB, external=45.51 MB

main is the total elapsed time for the action handler. mn_model.classify is the elapsed time for the image classification. Cold start requests print an extra log message with model loading time, loadModel: 394.547ms.

Performance Results

Invoking the classify action 1000 times for both cold and warm activations (using 256MB memory) generated the following performance results.

warm invocations

Classifications took an average of 316 milliseconds to process when using warm environments. Looking at the timing data, converting the Base64 encoded JPEG into the input tensor took around 100 milliseconds. Running the model classification task was in the 200 - 250 milliseconds range.

cold invocations

Classifications took an average of 1260 milliseconds to process when using cold environments. These requests incur penalties for initialising new runtime containers and loading models from the filesystem. Both of these tasks took around 400 milliseconds each.

One disadvantage of using custom runtime images in Apache OpenWhisk is the lack of pre-warmed containers. Pre-warming is used to reduce cold start times by starting runtime containers before they are needed. This is not supported for non-standard runtime images.

classification cost

IBM Cloud Functions provides a free tier of 400,000 GB/s per month. Each further second of execution is charged at $0.000017 per GB of memory allocated. Execution time is rounded up to the nearest 100ms.

If all activations were warm, a user could execute more than 4,000,000 classifications per month in the free tier using an action with 256MB. Once outside the free tier, around 600,000 further invocations would cost just over $1.

If all activations were cold, a user could execute more than 1,2000,000 classifications per month in the free tier using an action with 256MB. Once outside the free tier, around 180,000 further invocations would cost just over $1.

Conclusion

TensorFlow.js brings the power of deep learning to JavaScript developers. Using pre-trained models with the TensorFlow.js library makes it simple to extend JavaScript applications with complex machine learning tasks with minimal effort and code.

Getting a local script to run image classification was relatively simple, but converting to a serverless function came with more challenges! Apache OpenWhisk restricts the maximum application size to 50MB and native libraries dependencies were much larger than this limit.

Fortunately, Apache OpenWhisk's custom runtime support allowed us to resolve all these issues. By building a custom runtime with native dependencies and models files, those libraries can be used on the platform without including them in the deployment package.

Top comments (6)

James Thomas • Aug 13 '18

Nope - I checked. Definitely no servers...

erikobryant • Dec 19 '23 • Edited

Hey James Thomas,

My name is Erik O’Bryant and I’m assembling a team of developers to create an AI operating system. An OS like this would use AI to interpret and execute user commands (just imagine being able to type plain English into your terminal and having your computer do exactly what you tell it). You seem to know a lot about AI development and so I was wondering if you’d be interested in joining my team and helping me develop the first ever intelligent operating system. If you’re interested, please shoot me a message at erockthefrog@gmail.com and let me know.

K • Aug 14 '18

Cloud providers like IBM have a bunch of additional services.

Ying Chun Guo • Oct 25 '18

Thanks, it works for me.