DEV Community

Cover image for TensorFlow Model Deployment using FastAPI & Docker

TensorFlow Model Deployment using FastAPI & Docker

kushalvala profile image Kushal Vala ・4 min read




In this article, we are going to build a TensorFlow model (v2) and using FastAPI create REST API calls to predict from the model, and finally containerize it using Docker 😃

I want to emphasize the usage of FastAPI and how rapidly this framework is a game-changer for building easy to go and much faster API calls for a machine learning pipeline.
Traditionally, we have been using Flask Microservices for building REST API calls but the process involves a bit of nitty-gritty to understand the framework and implement it.
On the other end, I found FastAPI to be pretty user-friendly and very easy to pick up and implement type of framework.

And finally from one game-changer to another, Docker
As a data scientist: our role is vague and it keeps on changing year in year out. Some skillset gets added, some get extinct and obsolete, but Docker has made its mark as one of the very important and most sought out skills in the market. Docker gives us the ability to containerize a solution with all its binding software and requirements.

The Data

We have used a text classification problem : IMDb Dataset for the purpose of building the model.

The dataset comprises 50,000 reviews of movies and is a binary classification problem with the target variable being a sentiment: positive or negative.


We use Tensorflow's TextVectorization layer which tidies things up and outputs a layer which we will be using in the process of creating a graph on a Sequential or Functional Model.

encoder = tf.keras.layers.experimental.preprocessing.TextVectorization(
    max_tokens = VOCAB_SIZE, standardize = 'lower_and_strip_punctuation',
    output_mode = 'int', output_sequence_length = 200,

encoder.adapt( text, label: text))
Enter fullscreen mode Exit fullscreen mode

We can go for custom standardization by curating a function for our own use case but there are some bugs in tf:2.4.1 which create trouble whilst creating REST API call for the model.


As we can see below, we are using the encoder layer on the top of Embedding that outputs us with a 256 dimension vector.
The rest of the graph is self-explanatory, although we are giving a probabilistic output instead of a 2-class softmax layer: the closer the probability to 1 meaning a positive sentiment for the review and vice-versa.

# Creating the model
model = tf.keras.Sequential([
    Embedding(input_dim=len(encoder.get_vocabulary()), output_dim= 256, mask_zero=True),
    Dense(128, activation = 'relu'),
    Dense(1, activation = 'sigmoid')])

Enter fullscreen mode Exit fullscreen mode

After initialising the graph, we compile and fit the model:

# Compiling the model
model.compile(optimizer= tf.keras.optimizers.Adam(learning_rate= 0.001), 
              loss = tf.keras.losses.BinaryCrossentropy(from_logits= False), metrics = ['accuracy'])

# Training the model
history =, epochs=10,
Enter fullscreen mode Exit fullscreen mode


After model training, we evaluate the model on the test dataset and get a reasonably satisfactory test accuracy of 86.2%
(Although our major focus is the API & Docker and not extending our virtues in model building for this scenario)

# Evaluating the model on test dataset
loss, accuracy = model.evaluate(test_dataset)
print("Loss: ", loss)
print("Accuracy: ", accuracy)

# Output:
Loss:  0.3457462191581726
Accuracy:  0.8626000285148621

# Saving the model'tf_keras_imdb/')
Enter fullscreen mode Exit fullscreen mode

In TensorFlow, we can save the model in two ways: 'tf' or '.h5' format. Our model cannot be saved in '.h5' format since we are using the TextVectorization layer

Before we start creating APIs, we need a particular directory structure that will be utilized for creating a Docker image.

tf_keras_imdb/ : SavedModel from TensorFlow : Python file for creating REST APIs using FastAPI framework

|--- model
|    |______ tf_keras_imdb/
|--- app
|    |_______
|--- Dockerfile
Enter fullscreen mode Exit fullscreen mode

Whenever we are building an API using FastAPI, we use pydantic to set the type of input our API expects. For eg, a list, dictionary, JSON, string, integer, float.

To create an object using pydantic, we use BaseModel that defines our type of inputs.

One of the reasons why FastAPI is faster and more efficient is its usage of ASGI - Asynchronous Server Gateway Interface, instead of traditional WSGI - Web Server Gateway Interface (which is used in Flask, Django)

POST request is assigned to our prediction API, since it requires us to post the data and fetch back the results.

Uvicorn is a lightning-fast ASGI server implementation, which creates a server on our host machine and lets our API host the model on.

We can test our API on SwaggerUI:


Finally, to wrap it all up, we create a Dockerfile

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7

RUN pip install tensorflow==2.4.1

COPY ./model /model/

COPY ./app /app


CMD ["python", ""]
Enter fullscreen mode Exit fullscreen mode

We have attached a docker container (tiangolo/uvicorn-gunicorn-fastapi) which is made public on docker-hub, which makes quick work of creating a docker image on our own functionalities.

To create a docker image and deploy it, we run the following commands, and voila!

docker build -t api .

docker run -d -p 8000:8000 api
Enter fullscreen mode Exit fullscreen mode


After going through the process of working around FastAPI and Docker, I feel this skillset is a necessary repertoire in a data scientist's toolkit. The process of building around our model and deploying it has become easier and much more accessible than it was before.

Github Link:

Kushal Vala
Data Scientist

Discussion (0)

Editor guide