BekahHW

Posted on Nov 15, 2021 • Originally published at bekahhw.github.io on Sep 20, 2021

You're Toxic: Using the Toxicity Model with TensorFlow.js

#tfjs #machinelearning #tensorflow #beginners

One of the things I’ve really enjoyed about learning TensorFlow.js (TFJS) is how quickly you can get it up and running and how many existing projects you can use as well. As someone learning about models and datasets and things like neural networks that haven’t been a normal part of my vocabulary, I appreciate not having to be mentally exhausted with the costs of learning a new world of information as I explore this new technology.

As Gant Laborde says in Learning Tensorflow.js

Solving simple problems you understand with machine learning helps you extrapolate the steps, logic, and trade-offs of solving advanced problems you could never code by hand.

He goes on to say that (in the book), he tries to balance theory and practicality. Below, we’ll dive into a little bit of both, trying to understand the Toxicity model and what’s happening as we get some code up on the screen. (Side note: if you’re interested in exploring other TFJS models, you can check out the official TFJS model list.)

Key Terms

Model: An expression of an algorithm that’s been trained with data to recognize patterns and can make predictions, transformations, or react.
Threshold: The minimum prediction confidence that we’ll allow.
Labels: An array of categories–in our case, insults.
Predictions: An array of objects that shows the raw probabilities for each input with match indicating true or false. However, if neither prediction exceeds the threshold, match will return null.

Now who’s ready to see if you’re toxic?

The Toxicity Classifier

The toxicity model determines if text falls into the following categories: threatening language, insults, obscenities, identity-based hate, or sexually explicit language. You can check out the dataset the model was trained on if you’re interested.

So for our purposes, we’re going to give the model a sentence in the form of a string, and it will classify whether or not it violates one of the categories above.

We’ll see that the predictions will use percentage-of-probability prediction for each string in the array. These percentages are represented as two Float32 values between 0 and 1. So, for example, my 5yo called my 7yo “a tiny, little baby poop” one day. If we run that string through the model and get an array that says something like [0.7630404233932495, 0.2369595468044281], it’s 76% not a violation and 24% likely a violation.

Getting Started

We're going to keep it simple to get your model up and running in this post. The quickest way is using script tags.

In your terminal, cd into the folder you use for your projects. Create a new directory for this project and then hit enter:

mkdir toxicity-classifier

cd into that directory:

cd toxicity-classifier

Using the Script Tag

In your project, create a file called index.html. In that file, we’ll use script tags to get our model up and running quickly.

<!DOCTYPE html>
<html>
  <head>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@3.0.0/dist/tf.min.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/toxicity@1.2.2"></script>
  </head>
</html>

First, we want to set the threshold. If we don’t set it, the default is 0.85. For this exercise, let’s say that anything 0.5 and above is positive.

const threshold = 0.5

Now, we need to load the model, give it data, and then console.log the predictions.

    <script>
 toxicity.load(threshold).then((model) => {
   const sentences = ["You are a tiny, little baby poop", "My favorite color is blue.", "Shut up!"];

   model.classify(sentences).then((predictions) => {
     console.log(JSON.stringify(predictions, null, 2));
   });
 });
    </script>
//this is the end of our script tags in the `head` element. 
</head>

VSCode Tip: JSON.stringify

Adds indentation, white space, and line break characters to the return-value JSON text to make it easier to read.

Converts a JavaScript value to a JavaScript Object Notation (JSON) string.

Now let's add just a bit more to let the user know to look at the console.log. This will come just after the script tag section:

<body>
    <h1>Check the console log!</h1>
  </body>
//close the html tag we opened above the first head tag.
</html>

Now our users know where to look! So let's look at the results for our first string, we’d see this for the first two labels:

{
    "label": "identity_attack",
    "results": [
      {
        "probabilities": {
          "0": 0.9983431100845337,
          "1": 0.001656862674281001
        },
        "match": false
      }
    ]
  },
  {
    "label": "insult",
    "results": [
      {
        "probabilities": {
          "0": 0.059056248515844345,
          "1": 0.9409437775611877
        },
        "match": true
      }
    ]
  }

What we see here tells us that “You are a tiny, little baby poop” is not classified as an identity attack, but it is 94% sure that it is, in fact, an insult.

Let’s go back a little bit. We’re calling toxicity.load(), but where is the model coming from? That request is going to TFHub, a model hosting service set up by Google for popular community models.

Although you can only load from TFHub, it is open source, which means we can fork and modify the model!

Next up in our code, we run the classify method. This is where the input is passed through the model and we get the results. We give the model data, which is converted to tensors and then reconverted into normal JS primitives. Whoa. That sounds like a lot, and that’s something I hope to explore in a post down the line. But it’s worth noting here to start peeling back the layers of magic we get to dive into while playing with TFJS.

That's it! You’ll notice that the only thing we’ll see is the message to check the console. We could add more functionality to display the labels and probabilities on the screen, but this is all you need to get started!

And if you want to explore some more, here are some resources to check out: