This post is part of the 2023 C# Advent Calendar. Please check out the other great posts as well!
In this post, I explain what Named Entity Recognition (NER) is and how to leverage Azure AI to integrate NER into a .NET application using C# and the Azure SDK.
Named Entity Recognition, commonly abbreviated (NER), is the ability to extract entities from unstructured text. Entities include people, locations, organizations, dates, and more. These are referred to as entity types and different NER implementations support varying entity types. Some applied uses of NER include search engines and chatbots. It is a subtask of Natural Language Processing (NLP).
NLP is a field of computer science with the goal of processing and understanding human language. While the field can be traced back to the 1950s, recent advancements in NLP and AI have made the field more accessible than it has ever been before. This includes text classification, information, extraction, and question / answer systems to name a few. NER is a task of information extraction. The diagram below puts NER in context.
Azure AI Language
Azure AI Language is a cloud service that provides NLP features for analyzing text. This includes NER. There is a free tier available. Microsoft docs has a full overview of Azure Language.
Demo Application
I created a command line application to show Azure Language in action. The full source code is available on GitHub.
The app has 2 actions available as command line arguments. The first is -i
followed by the text to extract entities from. The second is -f
followed by the path to a file. If -f
is used, the app will read the text from the file and extract the entities found in the file. This is good for longer text and the file must be a text file. The entities found are displayed to the user.
Here is an example using -i
.
.\AzLangExample.CLI.exe -i "Santa Clause visits on December 25th"
Type Text Score
Person Santa Clause 0.97
DateTime December 25th 1.00
The Code
Now that you have seen an example, lets see how it extracts the entities. The Azure Text Analytics SDK works like other Azure SDKs. First a TextAnalyticsClient
is instantiated with an endpoint and credentials. Then RecognizeEntitiesBatchAsync
is called and returns a collection named RecognizedEntitiesResultCollection
. This object is the result of the API call and contains a collection of CategorizedEntity
objects that have the properties displayed in the app as well as additional properties. The full documentation of this class is available here. Below is the complete method to get entities. There is a limit on the amount of text that can be sent in one request so it also breaks it up into chunks.
public async Task<List<Entity>> GetEntities(string content)
{
var credentials = new AzureKeyCredential(Settings.Instance.AzLanguageServicesKey);
var endpoint = new Uri(Settings.Instance.AzLanguageEndpoint);
var client = new TextAnalyticsClient(endpoint, credentials);
var contentParts = SplitStringByLength(content, 5000);
var response = await client.RecognizeEntitiesBatchAsync(contentParts);
var collection = response.Value;
var result = new List<Entity>();
foreach (var docResponse in response.Value)
{
foreach (var entity in docResponse.Entities)
{
result.Add(new Entity()
{
Text = entity.Text,
Type = entity.Category.ToString(),
Score = entity.ConfidenceScore,
Start = entity.Offset,
Length = entity.Length,
});
}
}
return result;
}
Config file
Credentials are required for accessing the language service. The repository contains a config file with placeholders for real credentials. Credentials can be added to this file and the file can be renamed to config.json
and then the sample app will work with any azure language service!
Conclusion
In conclusion, NER is a task of NLP, which is a big field with a lot of applications across many business domains. Cloud services have made NER more accessible than it has ever been. This post showed one example using the AI services available in Azure. There are other cloud and on-premises solutions as well.
Top comments (0)