Ólafur Aron Jóhannsson

Posted on Feb 13 • Originally published at olafuraron.is

Sentiment Analysis in C# - Without Python or External APIs

#csharp #dotnet #machinelearning #nlp

Sentiment Analysis

You have text. You want to know if it's positive, negative, or neutral.

The usual options:

Azure Cognitive Services - API call per request, pay per character
ML.NET - train your own model, bring your own dataset
Python sidecar - run Flask next to your .NET app, serialize everything as JSON

There's a simpler option. Load a pretrained transformer model and run it locally.

dotnet add package Kjarni

NuGet

using Kjarni;

using var classifier = new Classifier("roberta-sentiment");
Console.WriteLine(classifier.Classify("I love this product!"));
// positive (98.5%)

The model downloads on first use, then caches locally, and runs on CPU by default, but GPU is also available.
No API key. No Python. No container.

How Sentiment Models Work

A sentiment classifier is a neural network trained on labeled text.
The model has seen millions of examples like:

Text	Label
"This movie was fantastic"	positive
"Terrible customer service"	negative
"The package arrived on Tuesday"	neutral

At inference time, the model reads your input text, encodes it into a
high-dimensional vector, and runs that vector through a classification
head that outputs a probability for each label.

Input text -> Tokenizer -> Transformer Encoder -> Classification Head -> Probabilities
                                                                         positive: 98.5%
                                                                         neutral:   1.2%
                                                                         negative:  0.3%

You pass in text, you get back a label and a score.

Three-Class Sentiment

The roberta-sentiment model classifies text as positive, negative, or neutral.
It was trained on ~124M tweets and handles informal text, slang, and emoji well.

using var classifier = new Classifier("roberta-sentiment");

var inputs = new[] {
    "I love this product!",
    "Terrible quality, broke after one day.",
    "It's okay I guess.",
    "The packaging was nice but the product itself was mediocre.",
    "Just received my order",
};

foreach (var text in inputs)
    Console.WriteLine($"{classifier.Classify(text)}  \"{text}\"");

positive (98.5%)  "I love this product!"
negative (94.1%)  "Terrible quality, broke after one day."
positive (52.9%)  "It's okay I guess."
negative (81.4%)  "The packaging was nice but the product itself was mediocre."
positive (57.5%)  "Just received my order"

Notice the nuance. "It's okay I guess" is technically positive but barely, at 52.9%.
The model picks up on hedging. "The packaging was nice but the product itself was mediocre"
is classified negative because the overall sentiment leans that way despite the
positive clause.

Getting All Scores

Classify() returns the top label. To see the full probability distribution:

var result = classifier.Classify("The packaging was nice but the product itself was mediocre.");
Console.WriteLine(result.ToJson());

{
  "label": "negative",
  "score": 0.8138,
  "predictions": [
    {"label": "negative", "score": 0.8138},
    {"label": "neutral", "score": 0.1615},
    {"label": "positive", "score": 0.0247}
  ]
}

The scores sum to 1.0. The model is 81.4% confident this is negative,
16.2% neutral, 2.5% positive. In production, you might treat anything
below 70% confidence as "uncertain" rather than taking the label at face value.

Five-Star Sentiment (Multilingual)

If you need finer granularity, bert-sentiment-multilingual maps text to a 1-5 star rating.
It works across English, German, French, Spanish, Italian, and Portuguese.

using var classifier = new Classifier("bert-sentiment-multilingual");

var inputs = new[] {
    ("en", "Absolutely amazing!"),
    ("es", "Esta es la peor compra que he hecho."),
    ("de", "Das ist ganz okay."),
    ("fr", "C'est terrible."),
    ("it", "Non male, ma potrebbe essere meglio."),
};

foreach (var (lang, text) in inputs)
    Console.WriteLine($"[{lang}] {classifier.Classify(text)}  \"{text}\"");

[en] 5 stars (96.7%)  "Absolutely amazing!"
[es] 1 star (94.1%)   "Esta es la peor compra que he hecho."
[de] 3 stars (77.7%)  "Das ist ganz okay."
[fr] 1 star (70.4%)   "C'est terrible."
[it] 3 stars (83.7%)  "Non male, ma potrebbe essere meglio."

Same API. The model handles language detection internally.

Emotion Detection

Sentiment tells you positive or negative. Emotion tells you why.

The distilroberta-emotion model classifies text into seven emotions:
anger, disgust, fear, joy, neutral, sadness, and surprise.

using var classifier = new Classifier("distilroberta-emotion");

var inputs = new[] {
    "I just got promoted!",
    "My dog passed away yesterday.",
    "I can't believe they did that to me.",
    "I'm so nervous about the interview tomorrow.",
};

foreach (var text in inputs)
    Console.WriteLine($"{classifier.Classify(text)}  \"{text}\"");

surprise (50.7%)  "I just got promoted!"
sadness (98.4%)   "My dog passed away yesterday."
surprise (89.2%)  "I can't believe they did that to me."
fear (99.4%)      "I'm so nervous about the interview tomorrow."

"I just got promoted!" is interesting. The model sees it as surprise more than joy.
If you need the full breakdown:

var result = classifier.Classify("I just got promoted!");
Console.WriteLine(result.ToJson());

{
  "label": "surprise",
  "score": 0.5066,
  "predictions": [
    {"label": "surprise", "score": 0.5066},
    {"label": "anger", "score": 0.2376},
    {"label": "joy", "score": 0.0980},
    {"label": "neutral", "score": 0.0664},
    {"label": "disgust", "score": 0.0658},
    {"label": "sadness", "score": 0.0221},
    {"label": "fear", "score": 0.0035}
  ]
}

For finer-grained emotions, the roberta-emotions model detects 28 labels
including admiration, amusement, curiosity, gratitude, and others.

Toxicity Detection

Content moderation is a specific form of classification. The toxic-bert model
scores text across six categories simultaneously:

using var classifier = new Classifier("toxic-bert");
Console.WriteLine(classifier.Classify("You are an idiot").ToDetailedString());

             toxic   98.61%
            insult   96.00%
           obscene   75.64%
      severe_toxic    4.56%
     identity_hate    1.41%
            threat    0.13%

This is a multi-label model, meaning multiple categories can be true at the same time.
A comment can be both toxic and an insult. The scores are independent, not competing.

Compare with something benign:

Console.WriteLine(classifier.Classify("I respectfully disagree with your point").ToDetailedString());

             toxic    0.07%
           obscene    0.02%
            insult    0.02%
     identity_hate    0.01%
            threat    0.01%
      severe_toxic    0.01%

All scores near zero. The model is confident this is not toxic.
In production you'd set a threshold (say 80%) and only flag content above it.

Bulk Processing

For analyzing many texts, loop through them:

using var classifier = new Classifier("roberta-sentiment");

var reviews = new[] {
    "Fast shipping, great product",
    "Arrived damaged, no response from support",
    "Does what it says",
    "Best purchase I've made this year",
    "Meh",
};

foreach (var text in reviews)
    Console.WriteLine($"{classifier.Classify(text)}  \"{text}\"");

positive (97.2%)  "Fast shipping, great product"
negative (85.3%)  "Arrived damaged, no response from support"
neutral (82.0%)   "Does what it says"
positive (98.1%)  "Best purchase I've made this year"
neutral (62.3%)   "Meh"

Practical Patterns

Confidence Thresholding

Don't trust low-confidence predictions blindly:

var result = classifier.Classify(text);

if (result.Score > 0.80)
    // High confidence, act on it
    SaveSentiment(text, result.Label);
else
    // Low confidence, flag for review or use a neutral default
    FlagForReview(text, result);

Aggregating Sentiment

To get the overall sentiment of a product from many reviews:

var results = reviews.Select(r => classifier.Classify(r)).ToArray();

var summary = results
    .GroupBy(r => r.Label)
    .ToDictionary(g => g.Key, g => g.Count());

// { "positive": 847, "negative": 121, "neutral": 232 }

Combining with Search

Sentiment pairs well with semantic search.
Find relevant documents first, then analyze their sentiment:

using var embedder = new Embedder("minilm-l6-v2");
using var classifier = new Classifier("roberta-sentiment");

// Find reviews about shipping
var query = embedder.Encode("shipping and delivery experience");
var relevant = reviews
    .Select(r => (review: r, score: Embedder.CosineSimilarity(query, embedder.Encode(r))))
    .Where(x => x.score > 0.3)
    .OrderByDescending(x => x.score);

// Classify only the relevant ones
foreach (var (review, score) in relevant)
    Console.WriteLine($"{classifier.Classify(review)}  \"{review}\"");

Which Model to Use

Need	Model	Labels
Quick positive/negative	`distilbert-sentiment`	2 (pos/neg)
Positive/negative/neutral	`roberta-sentiment`	3
Star rating (multilingual)	`bert-sentiment-multilingual`	5 (1-5 stars)
Basic emotion	`distilroberta-emotion`	7
Fine-grained emotion	`roberta-emotions`	28
Content moderation	`toxic-bert`	6 (multi-label)

Start with roberta-sentiment. If you need more detail, move to the multilingual
or emotion models. They all work the same way, same API, different model name.

How This Works Under the Hood

Kjarni loads HuggingFace models
directly from safetensors using memory-mapped I/O. The inference engine is
written in Rust with hand-tuned SIMD kernels. The C# package wraps a single
native library.

These are the same models used by Python's transformers and sentence-transformers
libraries. The outputs match to four decimal places. The difference is you don't
need a Python runtime or 2GB of pip dependencies.

NuGet:  https://www.nuget.org/packages/Kjarni
GitHub: https://github.com/olafurjohannsson/kjarni

Other Resources

Semantic Search in C# - Embeddings and similarity from scratch
Build a Document Search Engine in C# - Full hybrid search with indexing and reranking
BM25 vs TF-IDF: Keyword Search Explained - How keyword search works under the hood
What are Vector Embeddings? - How machines understand meaning through numbers

DEV Community