DEV Community: Nitonde Auric Ergeson

I Built a Hate Speech Detector That Actually Knows the Difference Between Offensive and Hateful

Nitonde Auric Ergeson — Sat, 06 Jun 2026 19:12:09 +0000

Most hate speech models get this wrong: they treat "this movie sucked ass" and "heil hitler" as the same category.

They're not. One is someone venting. The other is an ideological statement. Conflating them makes content moderation either useless (too permissive) or annoying (bans people for swearing). So when I built AuricErgeson/hate-speech-detector, I started with that distinction as a hard requirement.

Three classes, not two

The model outputs neither, offensive, or hate_speech.

That middle class, offensive, is where most binary classifiers fail. They either flag everything offensive as hate speech, or they let actual hate speech through because it doesn't contain obvious slurs. "They control the media" is a good example. No profanity, no slur, but it is a well-documented antisemitic dog whistle. A binary clean/hateful model often misses it. Mine doesn't.

The dataset problem

110,585 training examples, fused from four public datasets:

Davidson et al. 2017: 24,783 examples of explicit Twitter slurs and offensive language
ImplicitHate: 21,480 examples of coded language and dog whistles
HateXplain: 19,229 examples with multi-annotator labels and rationales
HateDay 2025: 45,000 examples of contemporary Twitter hate speech

Each dataset uses different label schemes. Harmonizing them into a unified 3-class system took more time than the actual training. Davidson uses hate/offensive/neither, HateXplain uses hate/offensive/normal, ImplicitHate is binary. You have to make judgment calls about where the boundaries are and apply them consistently across 110K rows.

The other problem is class imbalance. "Neither" dominates naturally, since most text online is not hateful. Without correction, the model just learns to predict "neither" and gets decent accuracy while being useless. I oversampled to 21,621 examples per class, giving 64,863 total training examples.

One more thing: general Twitter corpora barely contain neo-Nazi numeric codes like 1488 or phrases like "14 words." They exist in the real world but not in enough volume to train on. I added 93 targeted augmentation examples covering these specifically. 93 is a small number but it moved the needle on those cases noticeably.

The base model choice

I used cardiffnlp/twitter-roberta-base-2022-154m instead of standard RoBERTa. The reason is simple: it was trained on 154 million tweets through 2022. Hate speech on Twitter has its own grammar, abbreviations, and coded vocabulary. A model that has never seen that register will struggle with it no matter how good the fine-tuning data is.

There was one annoying technical issue. This checkpoint uses legacy TensorFlow-style parameter names for LayerNorm: gamma and beta instead of the standard weight and bias. Transformers version 5.0 and above no longer maps these automatically, so loading the weights silently fails in some configurations. I had to reload the checkpoint manually with the names remapped before training. Took a while to debug because there was no error, the model just trained on randomly initialized LayerNorm parameters.

Results

Evaluated on a stratified held-out test set of 11,059 examples. Weighted F1 of 0.849, accuracy 0.843. Per class: neither at 0.884, offensive at 0.870, hate_speech at 0.697.

The hate_speech F1 of 0.697 is the honest weak point. It is harder to classify than the other two because it covers a wide range: explicit slurs, coded language, dog whistles, and symbol use all fall in the same bucket. A model that gets 0.88 on "neither" and 0.70 on "hate_speech" is not broken, it reflects how genuinely ambiguous hate speech classification is.

I ran 8 probe cases manually. 7 passed. The one that failed: "1488" as a standalone 4-digit string. In context ("1488 white power") it classifies correctly. As a bare number it predicts neither. That is a known limitation and it is in the model card.

What it gets right that others miss

"they control the media" classifies as hate_speech with 0.87 confidence. This is the antisemitic dog whistle test. Most models miss it because there is no surface-level offensive vocabulary.

"this movie sucked ass" classifies as offensive, not hate_speech. This matters for moderation. You probably do not want to ban people for this.

"I really enjoyed the concert" classifies as neither with 0.97 confidence. Basic sanity check.

Limitations worth knowing

A few honest limitations. Post-2022 coded language may not be recognized since the base model training data ends there and new slang appears constantly. Academic discussion of hate speech can produce false positives because "researchers studying the n-word" and an actual slur look similar at the token level. English only. And do not use this as the sole decision system for automated account bans. It is a classifier, not a human moderator.

The model

AuricErgeson/hate-speech-detector on HuggingFace. MIT license. There is also a live Gradio demo at the same path under Spaces if you want to test it without any code.

If you find edge cases it gets wrong, post them in the Community tab. The limitation around bare numeric codes is known. Other failures I want to hear about.

Auric Ergeson Nitonde is a software development in Germany, building NLP tools and publishing models at huggingface.co/AuricErgeson.

I Fine-Tuned a 3B Model for Text-to-SQL and It Actually Works

Nitonde Auric Ergeson — Sat, 06 Jun 2026 18:59:52 +0000

I have been building a logistics SaaS on the side called Frachtdok. At some point I needed a way to let non-technical users query data without touching SQL. I looked at the usual options: GPT-4 via API, hosted solutions, the whole thing. They all felt like overkill, or too expensive, or both.

So I decided to fine-tune something small myself.

This is the story of how I built Antelope-textTosql, a Phi-2 3B model that converts plain English into SQL queries. It has 534 downloads so far, which honestly surprised me.

Why Phi-2?

At the time, Phi-2 was one of the more interesting small models around. Microsoft trained it to punch above its weight on reasoning tasks, and text-to-SQL is fundamentally a structured reasoning problem: you're mapping a question onto a schema.

The size mattered too. I wanted something that could run on modest hardware. Phi-2 at 2.7B parameters, quantized to 4-bit with QLoRA, fits comfortably on a single GPU in Colab.

I considered Llama 2 7B. But 7B felt like more than I needed, and the fine-tuning cost adds up. Phi-2 let me move faster.

The Dataset

I trained on Spider, a well-known text-to-SQL benchmark from Yale. It has around 7,000 training examples covering 200+ different database schemas: airports, concert halls, HR systems, you name it.

Spider is useful because it forces the model to generalize. You can't just memorize table names; you have to actually learn the mapping from question structure to SQL structure.

I kept the prompt format simple on purpose. Complex prompt engineering can mask model weakness. I wanted to know if the model itself was actually learning.

Training Setup

I used QLoRA: 4-bit quantization plus Low-Rank Adaptation. You freeze most of the model, add small trainable adapter layers, and fine-tune just those. Cheaper than full fine-tuning and good enough for focused tasks like this.

The key numbers:

Base model: microsoft/phi-2
Quantization: 4-bit (bitsandbytes)
LoRA rank: 16
LoRA alpha: 32
Target modules: q_proj, v_proj
Learning rate: 2e-4
Epochs: 3
Hardware: A100 on Google Colab
Training time: ~1.5 hours
Adapter size: ~21MB
Merged model size: ~5.56GB

One thing I got wrong the first time: I set the learning rate too high and the model started hallucinating table names confidently. Dropping to 2e-4 and adding a warmup schedule fixed it.

Does It Actually Work?

Here are some real examples:

Question: How many employees are there?
SQL: SELECT COUNT(*) FROM employees

Question: List all customers from Germany
SQL: SELECT * FROM customers WHERE country = 'Germany'

Question: What is the average salary by department?
SQL: SELECT department, AVG(salary) FROM employees GROUP BY department

It handles these well. Where it starts to struggle is complex multi-join queries with ambiguous column names. If you ask something like "show me orders that have more than three items across all categories" without a clear schema, you'll get creative but wrong SQL.

Spider itself has an easy, medium, hard, and extra hard split. My model is solid on easy and medium. Hard and extra hard are hit or miss.

What I'd Change

The prompt only takes a database name, not the actual schema. That's the biggest limitation. If you pass real table and column definitions, the model would handle complex queries much better. Models like RESDSQL do this and the accuracy difference is significant. I cut that corner to keep the interface simple and I probably shouldn't have.

I also published without running formal benchmarks. The Spider evaluation script gives you an exact match score and I plan to run it soon. Even a modest number is more useful than nothing.

One thing that surprised me: Spider's training data skews toward simple queries. I trained on the full set without compensating for that, so the model is better at easy things than it needs to be, and worse at hard things than it could be.

The Downloads

The model gets pulled about 15-25 times per day without any active promotion. I think it mostly shows up in HuggingFace search results for "text-to-sql" and people grab it to test.

534 downloads is not a lot in absolute terms. But for a personal model with no organization behind it, trained in 1.5 hours on a free Colab GPU, I'll take it.

What's Next

Two more models planned: text-to-regex and text-to-shell. Same idea, small model, one focused task. I'll write about those when they're done.

The model is at AuricErgeson/Antelope-textTosql on HuggingFace. MIT license. If something breaks, drop a comment there. I check it.

Auric Ergeson Nitonde is a software development apprentice (Azubi FIAE) in Germany, building logistics tools and publishing NLP models at huggingface.co/AuricErgeson.