The Internet is arguably one of the most important inventions in human history. However, since its creation, it has also become a source of concern. Not only is there an abundance of misinformation, but the prevalence of harmful and offensive content has turned it into a daunting place. This issue has reached alarming levels, leading to severe consequences in some cases.
As a result, major organizations such as Twitch, YouTube, Facebook, and others that rely on user-generated content have been actively working on implementing measures to filter out offensive content. The underlying structures of these platforms are complex and just a few years ago, creating an effective profanity-filtering tool would have been a difficult task.
Fortunately, with the advent of AI tools such as Gemini and ChatGPT, as well as pre-trained moderation language models (LLMs), integrating content filtering into your application has become much more feasible.
In this article, I want to explore the Azure AI Content Safety service and demonstrate how it can be used to detect potentially harmful content.
Whether you are trying to build an app where people can leave comments under articles or aiming to develop a new Instagram to compete with Mister Zuckerberg, you would need to manage user-generated content - uploaded images, comments, etc. And that's where you might benefit from AI Safety Tool - perhaps you want to block people from uploading offensive or harmful content or at least be notified that potentially unsafe content was uploaded.
This tool is one of Azure's AI Services and it precisely addresses the need discussed above.
From docs:
AI Content Safety detects harmful user-generated and AI-generated content in applications and services. Azure AI Content Safety includes text and image APIs that allow you to detect material that is harmful.
It offers 4 types of analysis:
- text analysis
- image analysis
- jailbreak risk detection (scans text for the risk of a jailbreak attack on a Large Language Model)
- protected material text detection (scans AI-generated text for known text content (for example, song lyrics, articles, recipes, selected web content))
Today, we're only going to focus on the first aspect - text analysis and how you can integrate it into your application.
But before going into the details of the actual implementation, I'd like to mention one more thing - the interactive Content Safety Studio.
This platform enables users to view, explore, and experiment with sample code for detecting harmful content across various modalities. Additionally, it empowers you as a developer to monitor key performance indicators (KPIs) such as technical metrics (latency, accuracy, recall) and business metrics (block rate, block volume, category proportions, language proportions, and more).
To begin the actual implementation, you'll need to have an Azure account with an active Azure Subscription.
To create a new instance of the AI Content Safety service, navigate to the Azure home page and start typing "content safety" in the search bar. This will bring up relevant services related to content safety.
Let's create a new instance.
Choose the resource group, name, location and pricing tier.
You can also configure networking and identity settings if needed. However, if you prefer, you can leave everything by default for simplicity.
Now that everything is set up, go ahead and click on the "Create" button to proceed with creating the instance.
Navigate to the newly created resource, then select "Keys and Endpoint" from the menu. This section will provide you with the necessary authentication keys and endpoints to interact with the AI Content Safety service.
If you go to the documentation, you will find a lot of ways to use Content Safety - you can call it directly through REST API or use the language-specific packages (JS, Python, C#).
I am going to use REST - that's why we needed that endpoint and keys.
For demonstration purposes, I've created a simple Astro page where users can leave comments. These comments will be validated using the Content Safety service across four harm categories: Self-Harm, Sexual, Violence, and Hate & Fairness.
You can read more about those here.
For each category, we will also receive the severity level rating, which indicates the severity of the consequences of displaying the flagged content.
In the case of text analysis, it supports a full 0-7 severity scale, where 0 represents content deemed safe and 7 signifies a high severity level.
You don't have to concern yourself with HTML structure and CSS, as we won't be focusing on them in this article. The only code we care about is the script tag where we make a call to the Content Safety endpoint.
<script>
interface ContentSafetyResponse {
blocklistsMatch: {blocklistItemText: string; blocklistItemId: string;blocklistName: string}[];
categoriesAnalysis: { category: string; severity: number }[];
}
const comment : HTMLTextAreaElement = document.getElementById("comment") as HTMLTextAreaElement;
const submitBtn = document.getElementById("submitBtn");
submitBtn?.addEventListener("click",async ()=> {
if (comment) {
const data = await fetch('your-endpoint', {
method: "POST",
headers: {
"Content-Type": "application/json",
"Ocp-Apim-Subscription-Key": "key"
},
body: JSON.stringify({
"text": comment.value,
"categories": [
"Hate", "Sexual", "SelfHarm", "Violence"
],
"blocklistNames": [
"SwearWords"
],
"outputType": "FourSeverityLevels"
})
})
const json: ContentSafetyResponse = await data.json()
json.blocklistsMatch.map((match)=> {
const blocklist = document.getElementById("blocklist");
if (blocklist) {
blocklist.innerText = blocklist.innerText + " " + match.blocklistItemText
}
})
json.categoriesAnalysis.map((category)=> {
if (category.category.toLowerCase() == "hate") {
const hateLevel = document.getElementById("hateLevel");
if (hateLevel) {
hateLevel.innerText = category.severity.toString()
}
}
if (category.category.toLowerCase() == "violence") {
const violenceLevel = document.getElementById("violenceLevel");
if (violenceLevel) {
violenceLevel.innerText = category.severity.toString()
}
}
if (category.category.toLowerCase() == "sexual") {
const sexualLevel = document.getElementById("sexualLevel");
if (sexualLevel) {
sexualLevel.innerText = category.severity.toString()
}
}
if (category.category.toLowerCase() == "selfharm") {
const selfharmLevel = document.getElementById("selfharmLevel");
if (selfharmLevel) {
selfharmLevel.innerText = category.severity.toString()
}
}
})
}
})
</script>
We make a POST request to the endpoint, we'll assign the value "application/json" to the "Content-Type" header and provide "Ocp-Apim-Subscription-Key" as another header.
The request body should contain the following parameters:
- text - content that needs to be validated
- categories - all 4 or some of them
- outputType - can be FourSeverityLevels (possible values: 0,2,4,6) or EightSeverityLevels (possible values: 0,1,2,3,4,5,6,7)
- blocklistNames - names of custom blocklists (we will discuss this in more detail later on)
- haltOnBlocklistHit - when set to true, further analyses of harmful content won't be performed in cases where blocklists are hit
Let's test it out.
As you can see, severity levels for all 4 categories are rated 0 which means content is safe.
Now let's try something different (only an example, you can do it only in GTA RP)
As you can see here, we changed the context on how we want to use the axe and now we are getting a violence level of 4. It's not just a keyword matching, but it does understand the context as well.
You may have already observed the "blockListsMatch" property in the response.
What is that?
So, for most cases default AI classifiers are good enough, but sometimes you might need more than just the default ones.
This is where blocklists come into play. Essentially, a blocklist is a curated list of words or phrases that, if detected in the text, will trigger a failure in the content check.
As an example, the word "hoser" is a Canadian playful insult (like a loser). Let's say you do not want any insults on your platform.
If someone gonna use this word, with the default classifiers, it will pass the check (in comparison with the word "loser")
How to fix that?
You already know the answer - blocklists.
One way to create a blocklist is through API. We not gonna do that instead let's use Safety Studio.
Here you can also create a new blocklist.
I am not creating a new one as I already created one before, so let's just use it. It's called SwearWords - the name is important as we gonna use it as one of the body parameters.
Now let's add the word "hoser" to that list.
If you were paying attention, in our code above, we specified blocklist names - the "SwearWords" list was there.
"blocklistNames": ["SwearWords"]
Now, let's try to validate the last sentence one more time and see the results.
Did something change? Severity levels are still at 0. But look at the second property in the response object - blocklistsMatch. It is no longer empty - instead, it shows us that our text contained the word from the SwearWords list and it should fail the validation.
That was just a sample of how you can use it in your apps and there is so much more you can do with this tool. This article aims to introduce you to the capabilities of the AI Content Safety service and demonstrate how you can leverage it to enhance your projects.
It's worth noting that the AI Content Safety service has undergone extensive testing and is not merely a rough prototype in the development phase. It has already been successfully integrated into various Microsoft products, including Copilot, showcasing its reliability and effectiveness.
There is a great video on its functionality with Sarah Bird who was leading the project. Check it out!
Here is the API documentation.
And that's it, guys.
I hope that you have learned something new today!
I would appreciate it if you could like this post or leave a comment below!
Also, feel free to follow me on GitHub and Medium!
Adios, mi amigos)
Top comments (0)