DEV Community

Cover image for How to deploy a ML model as an API
Nabil Alamin
Nabil Alamin

Posted on • Originally published at

How to deploy a ML model as an API

Hello 👋, this is a quick guide into deploying a ml model as an API, so lets get started.


First of the model we are going to be making use of is a Deepfake model by the name of First Order Motion. Deepfakes you allow you create an artificial version of a person saying or doing an action, I first found about this particular model on two minute papers (an awesome YT channel for lovers of AI ⚡) and wanted to try it for myself. The video below talks more about the model.

In this article we will be taking this model which could only be tested using the jupyter file in the repo and through the power of python and cloud computing we can access it as an API.

Tools used

  • Python and Flask: to make the API.
  • Docker: to build the docker image of the API
  • Google Account with Billing and Compute Engine enabled: To Create the VM instance where the container will be deployed.


Step 1

Make the

This is the main file of the project, it is where the APIs default and post requests are defined. The route for the home page can be easily defined by routing to "/" and defining a landing html file as can be shown below:

def homepage():
    return render_template("index.html", title="JUST WORK")
Enter fullscreen mode Exit fullscreen mode

Next is defining the post request, which does the work of making the specified image mirror the specified video. The function to make this request is based on the inference of the model as seen in this colab notebook


This inference was then given the route "/post" and the appropriate headers as seen below:

@app.route('/post', methods=['GET', 'POST'])
def post():
Enter fullscreen mode Exit fullscreen mode

The post function has some alterations and tweaks from the original used in the notebook due handling the inputs and their processing after which it was basically the exact same use of the functions of loading the model checkpoints and making the deepfake;

generator, kp_detector = load_checkpoints(config_path='config/vox-256.yaml', checkpoint_path='../vox-cpk.pth.tar', cpu=True) # for gpu specify as false
print("generator done")

predictions = make_animation(source_image=image,driving_video=driving_video, generator=generator, kp_detector=kp_detector, relative=True, cpu=True
                                                  ) # for gpu specify as false

imageio.mimsave('generatedVideo.mp4', [img_as_ubyte(frame) for frame in predictions], fps=fps)    
Enter fullscreen mode Exit fullscreen mode

The full code can be found here

If you followed the above, you can test it locally and see some nice results, here's an example test script & the result

test script ⬇

import requests
resp ="http://localhost:5000/post",
                     files={"image": open('02.png','rb'), # 94kb file
                             "video": open('test.mp4','rb') # 10secs vid 
## output generation took 03m:03s on cpu (AMD ryzen 7 4800HS)
## for best result use a gpu
Enter fullscreen mode Exit fullscreen mode

result ⬇

Step 2

Make the Dockerfile

There wasn't much to change from the dockerfile provided in the repo except for some additions as seen below:


RUN DEBIAN_FRONTEND=noninteractive apt-get -qq update \
 && DEBIAN_FRONTEND=noninteractive apt-get -qqy install python3-pip ffmpeg git less nano libsm6 libxext6 libxrender-dev \
 && rm -rf /var/lib/apt/lists/*

COPY . /app/

RUN pip3 install --upgrade pip
RUN pip3 install \ \
  git+ \
  -r requirements.txt

ENTRYPOINT [ "python3" ]

CMD [ "" ]
Enter fullscreen mode Exit fullscreen mode
### requirements.txt
Enter fullscreen mode Exit fullscreen mode

NB: The best thing to do would be to edit the dockerfile and requirements.txt(can be seen above) then add the to a forked version of this repo so the container image can be built successfully

Step 3

Deploy to google cloud platform as a VM instance on compute engine

So first you need to have a google account, then if this your first time using the google cloud platform you would get $300 worth of cloud credit which comes in handy for this and any other projects later on. Let's get started:

  • Create a project on GCP(google cloud platform) eg: "photo-mirrors-video"


  • Open your cloud shell editor


  • In the cloud shell terminal copy the below to enter your current project. The project id in this case is "photo-mirrors-video"
gcloud config set project [PROJECT_ID]
Enter fullscreen mode Exit fullscreen mode


  • Upload a folder containing your version of this project. The uploaded folder should have a structure similar to this


  • Make sure you followed up till this point and enter this command in the terminal:
gcloud builds submit --tag[PROJECT_ID]/chosen-image-name
Enter fullscreen mode Exit fullscreen mode
  • Once the container has finished building it will be deployed to your google container registry

  • Go back to the cloud console dashboard and navigate to the compute engine option and select vm instance. Once opened click create instance


  • Under machine configuration have a minimum of v8CPUs to run the container( A GPU would have been ideal but the model was built with torch 1.0 so there are compatibility issues for the available configurations)

  • Check the container and specify the address of your container. (also check all the boxes under advanced)


  • Specify 30gb as the size of the container.

  • Allow http traffic for the firewall.


  • Give it some minutes and your API should be live.

For the sake of this example go and edit this firewall http rule to allow all ports to access the IP.


  • You can go to the external IP and add port 5000 which will take you to the index and should display this:



If you've followed up to this point, you successfully made an ml model an API, Congratulations 👏👏. Thanks for sticking with me so far so and stay tuned for more how-to posts. Been a pleasure sharing what I've learnt this week 👋


Top comments (0)