DEV Community: ddif06

How to run Tensorflow.js on a serverless platform : deploying models

ddif06 — Wed, 25 Mar 2020 12:10:13 +0000

This is the last part of a 3 articles serie.
In the first part, we introduced neural networks and TensorFlow framework basics.
In the second part, we explained how to convert existing models from Python to TensorFlow.js
Finally, we present today, through an example, how to use an online TensorFlow.js model and deploy it rapidly using our WarpJS JavaScript Serverless Function-as-a-Service (FaaS).

Using an online model with TensorFlow.js

Many public models can be retrieved from web databases. TensorFlow Hub, in particular, hosts models usable with TensorFlow.js, which are generally also available as npm packages.

We'll use the "toxicity" pre-trained model in the next sections as an example.

The toxicity model detects whether text contains toxic content such as threatening language, insults, obscenities, identity-based hate, or sexually explicit language. The model was trained on the civil comments' dataset: https://figshare.com/articles/data_json/7376747 which contains ~2 million comments labeled for toxicity. The model is built on top of the Universal Sentence Encoder (Cer et al., 2018).

Browser-based usage:

The model can be directly loaded for use in JavaScript at:

https://cdn.jsdelivr.net/npm/@tensorflow-models/toxicity

In the html, add:

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/toxicity"></script>

Then, in the JS code:

// sets the minimum prediction confidence
const threshold = 0.9
// load and init the model
const model = await toxicity.load(threshold);
...
// apply an inference
const predictions = await model.classify(inputText);
...

Node-based usage:

Toxicity is also available as a NPM module for Node.js (package that actually loads the model from the storage link above):
$ npm install @tensorflow-models/toxicity
Then, in the JS code:

const toxicity = require('@tensorflow-models/toxicity');
// sets the minimum prediction confidence
const threshold = 0.9         // sets the minimum prediction confidence
// load and init the model
const model = await toxicity.load(threshold);
...
// apply an inference
const predictions = await model.classify(inputText);
...

Deploying a model with WarpJS

As discussed before, inference on big data sets in the browser comes rapidly short in terms of performance due to model and data loading time
as well as computing capabilities (even with accelerated backends).

Node.js allows to push further the performance limit by deploying on a high performance GPU engine and in the network neighborhood of the dataset, but the user will face complexity when trying to address distributed processing for the next performance step.

The WarpJS JavaScript FaaS enables easy serverless process distribution with very little development effort.

Example: toxicity model serverless deployment

WarpJS installation guidelines can be found here: Getting started with WarpJS

You can request an account on WarpJS here.

This article also provides a good tutorial on all steps to operate WarpJS.

In our WarpJS serverless operation, the browser acts as the primary
input/output interface, through an index.html file.

It contains a text box to submit the input text to be analyzed and a "classify" button triggering the inference process.

<!DOCTYPE html>
...
  <body>
   ...
    <h1>TensorFlow.js toxicity demo with WarpJS</h1>
    <form id="form">
     <input id="classifyNewTextInput" placeholder="i.e. 'you suck'" 
required>
      <button>Classify</button>
    </form>
    <p id="result"></p>
  </body>
</html>

WarpJS is a function-as-a-service platform for JavaScript. Instead of creating HTTP endpoints and use HTTP calls to do a remote inference, we just have to build a client for this inference function, deploy it on its FaaS, import it in the main application (via import statement in index.js or via script tag in html) and then call it like any JavaScript function.

index.js (code to be deployed on WarpJS FaaS):

// Server initialization
require(‘@tensorflow/tfjs’)
require(‘@tensorflow/tfjs-node’)
const toxicity = require(‘@tensorflow-models/toxicity’)
// The minimum prediction confidence
const threshold = 0.9
// Load the model
let modelLoaded = false
let model = null
toxicity.load(threshold).then(tsModel => {
   model = tsModel
   modelLoaded = true
})
// Force waiting for the async TensorFlow model load.
// The “deasync” lib turns async function into sync via JS wrapper of Node event loop.
// The “loopWhile” function will wait for the condition resolution to continue.
require(‘deasync’).loopWhile(() => !modelLoaded)
// Prediction function
const classify = async inputs => {
// predict with tensorflow model
   const predictions = await model.classify(inputs)
   // check toxicity results
   const toxic = predictions.some(({ results }) =>
                 results[0].match !== false)
   return toxic
}
module.exports = { classify }

index.js (main, browser-based application):

/*
* Copyright 2020 ScaleDynamics SAS. All rights reserved.
* Licensed under the MIT license.
*/
‘use strict’
// import WarpJS engine module
import engine from ‘@warpjs/engine’
// import deployed inference
import { classify } from 'warp-server';
// on submit form
event.document.getElementById(‘form’).addEventListener(‘submit’, async event => {
   event.preventDefault()
   result.innerHTML = ‘<h2>Remote inference running</h2>’
   // scan textbox
   const text = classifyNewTextInput.value
   // invoke inference
   const toxic = await classify([text])
   // render result
   if (toxic) {
     result.innerHTML = `<h2 style=”color:red”>Your sentence is TOXIC :(</h2> <img src=”/img/Pdown.png” alt=””>`
   } else {
     result.innerHTML = `
     <h2 style=”color:green”>Your sentence is NON TOXIC :)</h2>
     <img src=”/img/Pup.png” alt=””>`
   }
})

Deploying to the WarpJS FaaS is straightforward, just use "npm run deploy" to get the url of the deployed site and start playing with TensorFlow.js.

Feel free to access https://warpjs-744h4bixx1x93pg3oxc3hr4cf.storage.googleapis.com/index.html url to see the demo in action.

#MadeWithTFJS

Thanks!

About the author

Dominique d'Inverno holds a MSC in telecommunications engineering. After 20 years of experience including embedded electronics design, mobile computing systems architecture and mathematical modeling, he joined ScaleDynamics team in 2018 as AI and algorithm development engineer.

How to run Tensorflow.js on a serverless platform : reusing models

ddif06 — Mon, 16 Mar 2020 17:23:45 +0000

In a previous article, we introduced neural networks and TensorFlow framework basics.

Today, we present convert models developed with Python TensorFlow for use with TensorFlow.js, and discuss web-based versus server-based deployment.

TensorFlow, from Python to JavaScript

As we introduced in the first article, the original Python TensorFlow consisted of a declarative-style API.

The declarative style (requiring by nature a specific debug environment: Tensorboard) and the broad API functionalities induced a relatively long learning curve on the developer's side:

As a first step, the user constructs a "graph" of all TensorFlow operations (from simple operators on tensors to operations with complete networks, including connections with data sources and sinks).
then, he creates a "session", in which TensorFlow analyses the graph, resolves the operation schedule and executes all computations.

Due to the long time needed to learn and master this API, Google introduced 2 significant python TF improvements toward user-friendliness:

An imperative execution mode ("eager execution"), that is way more intuitive for python (and other script-languages) programmers… and making debugging easier. It was however not fully compatible with all existing features.
Keras API: a set of high-level operations (network assembly, inference and training), user-friendly, dedicated to neural networks, inherited from Keras by Google in 2017.

Tensorflow.js, the JavaScript version of TensorFlow (imperative execution) does not include all TF functionalities available in "declarative" mode, but supports, amongst others, the full Keras API.

Ready to use TensorFlow.js models

Pre-trained models are available for public use by non-experts in machine learning on TensorFlow.js model repository, for various applications:

Images processing: classification, objects detection, body/hand pose estimation, body segmentation, face meshing
Text processing: toxicity detection, sentence encoding,
Speech processing: command recognition.
Language processing: the newly released mobileBERT model enables applications like chat bots, ...

All of these are also hosted on NPM. Feel free to visit the repository https://www.tensorflow.org/js/models for more details.

More than 1000 available TensorFlow models and variants are being centralized in the TensorFlow Hub, which includes models for Python and the models mentioned above, usable in JavaScript.

As mentioned in our previous article, the Magenta project (music and art using ML), hosted on NPM as well, provides a JavaScript API using models, amongst which recursive neural networks (RNN).

Converting a Python TF model for JavaScript

Although many ready-to-use models are available online, in most cases, re-training (at least, fine-tuning) is often required for a specific application case, when not re-architecting.

As Python is widely used in model design and training, situations arise where a model developed with Python TF has to be used with JavaScript (browser or Node.js).

Knowing the Python TF history that was briefly summarized above, when the time comes to save or export a trained model, one won't be surprised to see different formats:

saved model format: includes a complete model architecture, weights and optimizer configuration in a single folder. Such a model can be used without access to the original python code. Training can be resumed from the checkpoint reached by the time it was saved,
Keras saved model ('hdf5' format): models created using the Keras API can be saved in a single file ('.h5'). Basically, it contains the same info as the saved model,
frozen model ('.pb'): a variant of a saved model, but that cannot be trained anymore (only architecture and weights are saved). It is aimed at being used for inference only.

TensorFlow provides a converter in python environment: tensorflowjs_converter.

It can be installed easily using:

$ pip install tensorflowjs

This utility converts various model file formats generated by the TF python API into a JSON file with additional binary files containing weights.

For details on model converter, see the links below:

https://www.tensorflow.org/js/guide/conversion

https://www.tensorflow.org/js/tutorials/conversion/import_saved_model

https://www.tensorflow.org/js/tutorials/conversion/import_keras

In addition, the TensorFlow.js team just released a model conversion wizard (announced at TensorFlow dev summit 2020).

Converting with python shell command-line utility

Example for a frozen graph model's '.pb' file. The output node of the TensorFlow graph must be specified:

>>> tensorflowjs_converter \
--input_format=tf_frozen_model \
--output_node_names='MobilenetV2/Predictions/Reshape_1' \
/mobilenet/frozen_model.pb \
/mobilenet/web_model

Example for a '.h5' keras model file:

>>> tensorflowjs_converter --input_format=keras /my_path/my_model.h5 /my_tfjsmodel_path

Both examples create a JSON model file & binary weights

Generating a converted model in python code

For Keras models, the tensorflow.js module includes APIs callable in python TF that directly output JSON format.

Example:

# In Python code where the model is created and trained
import tensorflowjs as tfjs
...
def train(...):
    model = keras.models.Sequential()   # create a layered keras model
    ...
    model.compile(...)
    model.fit(...)                                 # train model
    tfjs.converters.save_keras_model(model, my_tfjsmodel_path)

Once converted, depending on the model type (Graph or Keras), it can be loaded in a JavaScript environment with Tensorflow.js model loading utilities:

// in JavaScript code inferring the converted model
const model = await tf.loadGraphModel('myTfjsmodelPath/model.json';);

or,

const model = await tf.loadLayersModel('myTfjsmodelPath/model.json';);

then the model is usable for an inference:

const prediction = model.predict(inputData);

Operating a JavaScript model

At some point, a neural network model is sufficiently stable to be used on significant data sets. Depending on the application case, this usage may consist of:

inference only: analyzing "production" data sets (texts, images or other media content, etc…) without further training (at least during the analysis).
inference and training: part of the "production" data sets is also used for continuous network training in order to increase performance with application-specific experience.

If both browser-based and Node-based TensorFlow.js APIs are equivalent in terms of functionalities, multiple key decision aspects add to performance when selecting the best way to operate the model : data volumes, transfer bandwidth and privacy.

Browser-based execution is interesting in highly-interactive applications, particularly when processing media that are streamed in or out locally (webcam, graphical user interfaces, sound, …), and for moderate-size NN whose load-time is not crippling for user experience.

Using a browser-based execution has some drawbacks for standard size-models, impacting a lot the user experience:

The performance of the model is limited, and only moderate size NN modules can be used, despite TensorFlow.js' webGl and Wasm backends that provide acceleration capabilities,
loading a model can take 15s or even a minute due to the size of models and the performance of the mobile network, which is a long time for the user,
memory requirements to run the model are high. On small memory devices it restricts the use of the model, breaking application features,
not all mobile phones/browsers are up to date and the model could not run on all devices.

Of course, this is a current state as Google progresses on some of these issues. In the short term, using a server-based execution using Node.js is an excellent solution that solves all these drawbacks.

Performance of the model is close to Python TF thanks to using native or GPU accelerated versions of TF.js for Node.js, there are no more limits to the model complexity;
a server has a super fast network, and time to load a model is significantly decreased. Also, servers can be already ready to run with models preloaded;
a server can be tuned with memory requirements to run any model size;
the model is guaranteed to run on any server.

The new drawbacks are more related to the remote data transfers to the server, in particular moving sensitive data out of the device must be managed and defined in the service provider...

It could also open the possibility to perform inference/training processes within or at the edge of the network boundary where the data is stored to reduce latency and data transfer times.

Only the inference results (usually lighter than input data flows) have to be considered as payload from latency & infrastructure cost viewpoints.

Finally, TensorFlow.js, on the server side, provides the TFX tool (Tensorflow extended) to deploy production machine-learning pipelines. The AutoML tool (provided by Google Cloud) also provides a GUI-based suite to train and deploy custom ML models without requiring extended machine-learning and NN expertize.

In next article, we’ll show how to use an online TensorFlow.js model and deploy it rapidly using our WarpJS JavaScript Serverless Function-as-a-Service (FaaS).

#MadeWithTFJS

Thanks!

About the author

Introduction to TensorFlow.js, the AI stack for JavaScript

ddif06 — Wed, 26 Feb 2020 17:37:24 +0000

In my company we're JavaScript fans, and we're working on providing the JavaScript community with cool products to broaden its use to all computing areas. Artificial Intelligence (AI) is currently one of the areas mainly addressed by Python. JavaScript has everything you would need, in the browser or in the cloud thanks to tensorflow.js. This article is an introduction to AI for JavaScript users. You will find everything to get started with basic notions in mind. We will introduce AI concepts, Google's TensorFlow framework, and then the AI stack for JavaScript: TensorFlow.js.

AI concepts

Artificial intelligence, and artificial neural networks (NN) in particular, gained increasing adoption in many applications over the last 5 years. It is the result of the convergence of 2 main evolutions: availability of efficient architectures in the cloud and key innovations in neural network training algorithms. It opened new ways like deep learning (a term designating neural networks that includes several hidden layers between inputs and outputs).

A neural network is built with several neuron layers to constitute a ready-to-use AI model. Previously limited to a few layers, they gained in complexity, depth, efficiency and precision.

A key step has been accomplished in machine vision with efficient convolutional neural networks, whose architecture is inspired from the human visual cortex.

Machine vision and image processing are today primary fields for NN, and many pre-trained models of variable complexity exist, as well as collections of training images (like MNIST or imageNet models).

Perceptron and layers

Without getting into details, the basic artificial neuron (perceptron) model encountered in most neural networks is shown in the figure below and operates as follows:

it takes multiple input values
it multiplies each input by a weight value
it sums all these individual products
it generates an output from this sum through an activation function, which generally normalizes output values and reduces their spread.

Single perceptron model

Various network architectures are obtained by replicating, interconnecting and cascading such cells, as shown in the figure below:

A fully-connected 15-perceptron topology

Inference and training

The "direct" operation of a neural network, that consists in applying series of unknown inputs to a network whose weights are defined in order to obtain outputs (such as a prediction of next data, a classification of an image, an indication on whether a specific pattern exists in an image, etc ….) is called " inference".

The operation that consists of computing the weight values of a neural network, usually by submitting series of inputs together with their expected output values is called " training".

Training of deep networks is performed by calculating the difference, or "gradient", between the output(s) of the network for a given input and the expected output. This gradient is then split in gradient contributions of every weight of the output layer, and then down to all layers of the network. This process is usually called "gradient backpropagation algorithm". The weights are then adjusted to minimize the gradients in an iterative algorithm on multiple inputs, with optimization policies that are tunable by the user.

Neural network model examples

Many types and topologies of Neural Networks (NN) can be built, and ongoing researches continuously improve and enrich existing NN collections available. Deep networks can be built by assembling reused modules (pre-trained or not) proven to be efficient at a given task.

However, typical module architectures exist that turn out to be efficient at specific tasks.

In particular :

fully connected networks: in these, all outputs of a layer are connected to all inputs of the next layer, which makes the "treillis" connection complex for deep structures of this type and makes training tricky to tune. They are often used as final decision layers on top of other structures.
convolutional networks: directly inspired from the human visual cortex, they are very efficient at image analysis (filtering, features detection), can be easily replicated and grouped to analyze picture regions and separate channels, and stacked in deep structures without prohibitive complexity. Moreover, pre-trained submodules for a specific feature, for instance, can advantageously be reused in different networks, which reduces training times. They are usually topped by a few fully-connected layers depending on the required final outputs.
recurrent networks: these introduce memory elements which make them able to analyze and predict time series in the broad sense, which can go from text analysis (sentiment classification) to music creation and stock prices trends prediction.
residual networks: those are networks (or assemblies of networks) of any type, in which the results of a given layer are added to the result of a particular layer ("skip connections"), which accelerates training in very deep networks.

TensorFlow framework

Increasing adoption of AI in big data processing pulled in the need for frameworks that are efficient at creating/editing complex networks, manipulating various types of data sets (multidimensional matrices being a baseline), and performing inference and training operations from an outline view without having to explicitly express every neuron operation and related algorithms.

As Google has been pioneering deployment of AI in its infrastructures for a long time, it open-sourced its home-brewed internal framework in 2015 under the "TensorFlow" name (referred to here under as TF).

Built on top of Python, an environment widely adopted by scientific and data analysts for its simplicity and plethoric availability of math and array-specific libraries, TF first consisted of a declarative-style API. It was very comprehensive and versatile, and allowed to solve almost any problem of tensor (multidimensional matrix) calculus, including of course complex neural networks.

Later on, Google included an imperative mode (for more "intuitive" programming and more straightforward debugging) and Keras API, a third party higher-level function set that makes neural network development, training and inference easier.

Note that Python TF provides a C++ optimized computing backend and a CUDA-based backend for platforms equipped with NVIDIA GPUs to provide performant inferences or trainings.

TensorFlow.js (using JavaScript)

In 2018, a JavaScript version of TensorFlow was released: TensorFlow.js , to enable its use in browsers or Node.js. When launched, it supported imperative execution mode and Keras API but was missing full support of "legacy" python TF functionalities. TensorFlow.js ramped up rapidly, release after release, towards alignment with Keras python API.

Although TensorFlow.js supports all advanced functionalities and algorithms for both inference and training, it is mainly used for inference of pre-trained models in web browsers, which was improved by a webGL-based computation backend to take advantage of GPUs within the browser.

Good examples are:

Magenta.js (music and art using machine learning): a Google research project on creative neural networks (https://magenta.tensorflow.org/) that offers open-source tools, models and demos (interactive music composition, amongst others).
Coco-ssd: Object detection in webcam-streamed images using mobilenet NN.

On Node.js side, TensorFlow.js is available in 2 versions. The whole stack of the first version is fully written in JavaScript while the second uses the same C++ backend and CUDA-based backend for NVIDIA GPUs as the python version of TensorFlow.

The TensorFlow.js team recently released a Wasm backend (optimizing performance on browsers through native C++ kernels without using a GPU), and will release soon a WebGpu backend (evolution of webGL standard).

In this second part article, we describe how to operate TensorFlow.js, using models coming from Python TensorFlow or ready-to-use models in the browser and in the cloud using our WarpJS product. JavaScript fans will have everything to start entering this area. In the meantime, visit https://www.tensorflow.org/js for API documentation and installation guidelines, including tutorials, guides and demos.