Implementing a toxicity detector in your chatbots

#chatbots #toxicity #xatkit #opensource

The Xatkit team worries about good behavior in chatbots. Respect is fundamental in Internet communication. This also applies to chatbots. We don't want you to offend our bots. And even more important, we don't want our clients to waste resources in processing toxic comments from trolls and other undesirable visitors.

Today we explain how we have made our chatbots safer by integrating a toxicity detector for the messages the chatbots receive.

What is toxicity?

The formal definition of toxicity is "very harmful or unpleasant in a pervasive or insidious way". This kind of language can be offensive to a lot of people or specific collectives and must be avoided as much as possible, to guarantee a healthy community and no hate.

How can it be detected?

As in most of Natural Language Processing, a toxicity detector is implemented as a language model that, given an input (e.g. the message we send to a chatbot) it produces an output (in this case the probability of the message being toxic). Most models can go further than detecting whether a message is toxic or not and are able also to classify the type of toxicity the message contains, such as "insult", "threat", "obscenity", etc. This can be useful, for instance, when a web page admin has to analyze which are the main causes of toxicity, to ban disrespectful users or to censor inappropriate language.

Two ways to add a toxicity detector for your chatbots with Xatkit

Xatkit is a chatbot orchestration platform where we always try to bring you the best NLP solutions out there and enable them for easy use in your own bots. For toxicity detection, we have added a new language processor for toxicity analysis. Even more, we provide two different implementations of this toxic language detector. You're free to choose the one that best fits your needs.

PerspectiveAPI

"Perspective API is the product of a collaborative research effort by Jigsaw and Google’s Counter Abuse Technology team. We open-source experiments, tools, and research data that explore ways to combat online toxicity and harassment." (extracted from Pespective API web page)

This tool provides the capability of detecting a lot of different toxicity labels, some of them in different languages. When using this implementation, Xatkit will send a request to the API for every user utterance and receive a score between 0 (0% toxic) and 1 (100% toxic) for each toxic label we want. These scores are available to the chatbot designer to decide what to do with the message based on its toxicity level.

Detoxify

"At Unitary we are working to stop harmful content online by interpreting visual content in context." (by Laura Hanu from Unitary AI, full article here)

Detoxify is the result of three Kaggle competitions proposed to improve toxicity classifiers. Each had a different purpose within the toxicity classifiers context.

Toxicity comment classification challenge: The first competition aimed to build a generic toxicity classification model that contemplates different kinds of toxicity (insult, threat, sexuality...)
Unintended Bias in Toxicity Classification: It is a fact that there are some words that cause confusion since they are often used to harm some collectives (e.g. homosexual, women or race-related words). When these kinds of words are used in a healthy context they can also be considered toxic by biased language models. The 2nd version of the original competition wanted to improve the unintended bias when classifying toxic messages.
Multilingual Toxic Comment Classification: The last competition aimed to classify toxicity in a wide range of languages. The 2 previous worked only in English, so this time the objective was to achieve good results with other languages.

In Xatkit, you can rely on these models for your toxic language detection. Same as before, the toxicity scores are available to the chatbot designer to react and kick out trolls from your conversations. You can check our prototype implementation of a REST API that wraps and exposes the Detoxify models to Xatkit.

Example chatbot

We have built an example bot that detects toxicity in messages, which can be found in our LanguageProcessorsBots directory. Feel free to play with it!

Example usage of the ToxicityPostProcessor in a chatbot

In this code excerpt, you can see a simple definition of a state of a bot. We want to get the toxicity value the Perspective API has computed, and based on its value we choose the chatbot response. Here we assume a threshold of 0.7. This means that every message with a score greater than 0.7 will be considered toxic. Note that we also check if the score has been set properly with the !toxicity.equals(PerspectiveApiScore.DEFAULT_SCORE) boolean expression. Every score will be equal to "default expression" when it has not been set properly. It is like a security value to not break the chatbot when accessing the scores.