Show Off Your Computer Vision Model with Gradio

Haoli Yin — Fri, 12 Aug 2022 14:52:00 +0000

Most of machine learning development originates from academia, which is often very technical and hard to understand for the layperson. Personally, I enjoy being able to interact with the end result with my own inputs without having to understand the technicalities of the model until later. Another motivation to write this piece is that I haven't seen many public articles or examples using Gradio and HuggingFace hosting to demo computer vision models with image input and output as HuggingFace is traditionally a hub for natural language processing (NLP) knowledge (but they're shifting to incorporate more of other ML domains like computer vision).

What is HuggingFace and Gradio

No, HuggingFace is not those face grabbers from the movie Aliens (although that's what I first thought it was). It's actually the hugging emoji that's now at the top of my most-used emoji keyboard list! HuggingFace is not only a hub for ML models and demos but also a Python library that allows anyone to create and train select model architectures without having to write it all from scratch (e.g. transformers). For our case, HuggingFace Spaces allows us to freely host our model for public deployment without any extra fees, full-stack development knowledge, or headaches using other platforms. Note that demoing using GPU does cost extra, so I will be showing a CPU-only example.

Gradio, on the other hand, is the Python library that allows for user-friendly functions and front-end development using a single line of code. This interface can be used to create simple visual demos like mine (link in next paragraph) to complex interfaces like this one.

Demoing my Research Project

In this tutorial, I'll show you how to quickly deploy a demo of a model that takes an image input and outputs an instance segmentation map overlayed onto the image. This demo is related to a portion of my research project to detect specular reflection regions created from light reflected off wet organ surfaces and blinds the camera to the true colors and features in that region. The generated segmentation map is later used to restore the region using information from adjacent frames of the endoscopy video. Trigger Warning: Blood, Biological Tissue. EDIT: the public demo is taken down until further notice. Please contact me if you want a demo.

Creating a Skeleton of the Code

For any code I work on, I like to first write comments and create a version control system to use such as Github (HuggingFace has their own version of Git in this case that's hosted on their platform). To get started, create an account on HuggingFace and make a Spaces. From there, you can specify a repository name and license type and select Gradio as the display option.

Then, you can either clone the repo to your local storage using git bash or directly modify your code on HuggingFace. Next, there are two important files to create: requirements.txt and app.py. The requirements text file will store all the pip library dependencies that need to be installed in the HuggingFace hosting environment to run all the Python libraries you use in your code. The app.py file will define how you want your interface to look like and function. The most basic app you can first make is kind of like a "hello world" program using Gradio, which they show you after you make the repository.

import gradio as gr

def greet(name):
    return "Hello " + name + "!!"

iface = gr.Interface(fn=greet, inputs="text", outputs="text")
iface.launch()

Let's dissect this code a bit. The import statement is telling Python that you want to use all of Gradio's functionality (assuming that you have put this into requirements.txt). The function "greet" is where you will put all the machine learning model logic to process an input and return a prediction or output from your model to be displayed. The Interface is the main function that defines how you want your demo to look - we will dive more into the options later in this post. Then, all you need to do is to launch the demo and HuggingFace will deploy all this code for you automatically. It's as easy as that!

Defining model behavior

Now that we have a skeleton of the code we want to write, we will fill it in with the desired functionality. Instead of using the greet function, we want to create a function that will accept an image (read by Gradio as a NumPy or PIL image), preprocess it to be fed into the model, run it through the model with a forward pass for prediction, and output a segmentation map overlayed onto the original image.

def speclab(img):

    # initialize the model
    model = torch.hub.load('Nano1337/SpecLab', 'srdetect', force_reload=True) # for some reasons loads the model in src rather than demo
    model.eval()

    # preprocess image to be used as input
    transforms = A.Compose([
        A.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
        ToTensorV2()
    ])
    input = transforms(image=img)['image']
    input = input.unsqueeze(0)

    # model prediction
    output = model(input)

    # overlay output onto original image
    img[output==255] = [0, 255, 0]

    return img

A small detail that you may notice is that I used torch.hub to load my model from a saved location in my personal Github account. You can see how I made a hubconf.py file there to specify how I wanted to load my model for external usage.

Importing Example Images

Since the subject I'm working on is quite niche, users won't always have the appropriate data to input into the demo. We can define ready-to-be-used examples to be shown. I stored the images in the same personal Github repo as shown here and I imported the raw image address as shown below.

from urllib.request import urlretrieve

# get image examples from github
urlretrieve("https://github.com/Nano1337/SpecLab/blob/main/examples/05829.png?raw=true", "05829.png") # make sure to use "copy image address when copying image from Github"
urlretrieve("https://github.com/Nano1337/SpecLab/blob/main/examples/10384.png?raw=true", "10384.png")
examples = [ # need to manually delete cache everytime new examples are added
    ['05829.png'], 
    ["10384.png"]

Another small detail you may notice is that I specified later on in the code that I wanted these examples and their outputs to be cached so that users won't have to wait for the example images to be run through the model (which can take significantly more time since the model is run on CPU only).

Creating the Interface

Now comes the front-end work, which is thankfully really easy for a noob webdev like me. Below is my code to define the demo's look using the Interface function, but Gradio also has more customizable and advanced functionality if you want more front-end flexibility in their documentation.

# define app features and run
title = "SpecLab Demo"
description = "<p style='text-align: center'>Gradio demo for an ASPP model architecture trained on the SpecLab dataset. To use it, simply add your image, or click one of the examples to load them. Since this demo is run on CPU only, please allow additional time for processing. </p>"
article = "<p style='text-align: center'><a href='https://github.com/Nano1337/SpecLab'>Github Repo</a></p>"
css = "#0 {object-fit: contain;} #1 {object-fit: contain;}"
demo = gr.Interface(fn=speclab, 
                    title=title, 
                    description=description,
                    article=article,
                    inputs=gr.Image(elem_id=0, show_label=False), 
                    outputs=gr.Image(elem_id=1, show_label=False),
                    css=css, 
                    examples=examples, 
                    cache_examples=True,
                    allow_flagging='never')
demo.launch()

Let's walk through this code together. The title and description are self-explanatory and the article shows some more text at the bottom of the screen. If you're front-end savvy (which I definitely am not), you can also define custom CSS to pretty up your interface. We specify both the input and output components to be Gradio images, but these can be any one of the components that Gradio supports (similar to the diverse components found in React.js components if you're familiar with that framework). Other parameters are just nice-to-haves that can be found in the Interface documentation.

Congrats

There you have it! After about 50 lines of code (or even less), you have a functioning demo that doesn't cost anything to deploy. Feel free to share this article, and if you like this content, let me know what you would like to see next from me! Happy deep learning my friends :))

Adversarial ML: Making a Turtle Look like a Rifle

Haoli Yin — Sun, 07 Aug 2022 03:50:02 +0000

Introduction

The use of machine learning (ML) for image classification is very common today from identifying objects in the original ImageNet dataset to grading levels of cancerous tumors. Machine learning models trained on such datasets, however, aren't resistant to Out-Of-Distribution (OOD) testing samples, in which the models are assessed on their edge cases and commonly result in inaccurate classifications. One particular ongoing challenge is held by the European Conference on Computer Vision (ECCV) specifically addressing these edge cases. While testing edge cases for these models may seem boring, there is a whole field of ML dedicated to addressing the limitations of classification models that's on the rise called adversarial ML.

Beyond just testing models on their edge cases, adversarial ML is important for maintaining the security and safe usage of such ML models. Adversarial attacks from the input data are intentional image patterns generated to fool the classification model based on previous knowledge of its weaknesses. There are several strategies to generate such patterns, but one of the easiest ones to understand is to recognize that deep learning classification models are trained on the principle of gradient descent to minimize their defined loss function. Instead of descending, we want to ascend the gradient so we achieve maximal loss and generate a specific pattern that can be put on the object to be detected to fool the classifier to think that the object is something else. This type of adversarial attack is called an untargeted attack since we simply want to make the model think that the object is something else. On the other hand, a targeted attack is fooling the classifier to think that the object is another specific label, such as making a turtle look like a rifle.

Original Paper

The original paper from Google research named "Synthesizing Robust Adversarial Examples" from 2017 brought the field of adversarial ML to the public eye when it showed how a specific image pattern applied to a turtle's back fooled a classifier into thinking that the turtle was a rifle when observed from any angle. While it is a surprising finding that such a targeted attack can make a turtle look like a rifle, the more important question on everyone's minds (especially the US Department of Defense) was: can a rifle be made to look like a turtle?

With the introduction of machine learning applications to all aspects of our lives, it is not surprising that the defense industry is leveraging this emerging technology in all ways possible. A couple of examples of relevant use today range from image classification of vehicles for targeting to identifying hidden weapons. If the opposing side knows that the defense industry is using such ML models, they can theoretically pose adversarial attacks and fool the models; thus, the topic of adversarial defenses is naturally introduced.

Much like the parallel evolution of attacks and defenses between bacteria and viruses, adversarial attacks and adversarial defenses are quickly advancing in research to match the development rate of the other.

Closing Remarks

Personally, I'm excited to see how this field progresses as it's still in its early stages. One tweet I saw that made me chuckle said that anyone could write a paper on the adversarial attacks or out-of-distribution data samples of a new state-of-the-art model and get it easily published - theoretically pushing out hundreds of papers every year!

DEV Community: Haoli Yin