Alexandr I

Posted on Apr 3

I Built a Moderation Agent That Refuses to Be Intelligent — Just Focused

#ai #agents #llm #rag

Cross‑posted from Medium:

https://medium.com/@alex6055082i/a-simple-text-moderation-system-with-llm-or-why-we-dont-need-another-agent-d991174e6107

I spent weeks reading papers about multi-agent systems. Autonomous this, reasoning chains that. Everyone seemed convinced that complexity equals power.

Then I realized something: my moderation problem doesn't need a genius. It needs someone who can say "block" or "pass" with confidence.

So I built Lexicont differently. It's an agent, yes. But a stubborn one. It refuses to be "smart" in the conventional sense. No multi-tasking. No autonomous wandering. No pretending to understand philosophy when it just needs to catch bad text.

The result? Something that actually works. Transparently. Predictably. Without the theater.

How It Works: Four Layers, No Loops

The pipeline runs sequentially from fast to slow, without unnecessary cycles:

1. Pre-filter (rules and dictionaries)

It starts with lightning-fast checks:

Built-in profanity filter in 23 languages
Detection of leetspeak (when people write "k1ll" or "5ex" instead of normal letters)
Unicode obfuscation (attempts to hide bad words using similar symbols)
Custom trigger lists

Examples of what gets caught:

"buy fk documents" → BLOCK
"h4te you" → BLOCK

If the text is caught here, it's immediately BLOCK. We don't go further.

2. ML Classifier (detoxification)

A small, fast ML classifier runs on CPU in about 50ms. It evaluates text across categories: toxicity, threat, insult, identity attack, and others.

Input: "I'll beat you up tomorrow"
Model: threat score 0.95 → BLOCK

If confidence is greater than or equal to 0.85, it's BLOCK again. Early exit triggers.

3. Contextual Analysis (LLM)

If earlier layers aren't confident (confidence less than 0.8), the LLM kicks in.

Here's the key point: the LLM doesn't make the final decision. It provides category scores, confidence levels, and a brief explanation. The actual verdict (block, review, or pass) comes from a rule engine that considers the contribution of all previous layers to the LLM's assessment.

4. Rule Engine (final verdict)

The rule engine makes the final decision, taking into account results from all layers.

This way, even when the LLM is involved, the system stays more controllable and deterministic than complex multi-agent architectures.

Why It's Done This Way

Because in real life, most projects don't need the smartest system. They need one they can trust and configure to their own rules.

The more complex an agent is, the more autonomous it becomes, the higher the risk of getting a black box whose behavior is hard to explain and adjust. When it comes to content moderation, predictability and transparency often matter more than raw model power.

We deliberately built a linear pipeline without loops or repeated analyses. The LLM doesn't drive the process here. It helps where simple rules aren't enough.

The Beauty of Simplicity

Lexicont turned out minimal in design. No complex abstractions, no reinventing the wheel. Just proven libraries.

Because of this:

Any layer can be turned on or off
You can add your own triggers via a YAML file
You can use the library partially, just the filters you need
All logic is transparent and debuggable
The system can run on a regular CPU, though speed depends on your hardware

About Models: Small But Capable

Not everyone has a GPU, and large models demand serious resources. I discovered Qwen3-4B, a 4-billion-parameter model from Alibaba Cloud. It offers a surprisingly good balance of quality and speed.

Quantized versions (Q4_K_M and IQ3_M) from bartowski work especially well. Quantization is when you compress a model to make it smaller while keeping quality.

Q4_K_M: ~2.5 GB, works well for most tasks
IQ3_M: ~2.0 GB, if memory is very limited

For text moderation with the ML classifier, processing happens in 50–200ms on a laptop. When the LLM layer is invoked for complex cases, expect additional latency depending on your hardware.

Who This Is For

For small and medium projects, Discord bots, Telegram bots, startups that want their own moderation system but:

Don't want to pay thousands monthly for cloud APIs
Want to control what rules apply
Want to adjust behavior for their own policies

Sometimes a simple, reliable pipeline you fully control is enough.

Try It Now

git clone https://github.com/corefrg/lexicont.git
cd lexicont
poetry install

Before running, you need to set up your own configuration files with your moderation rules and policies. Check the documentation for details on configuring moderation_rules.yaml and moderation_config.yaml to match your specific needs.

You can run the LLM locally using llama.cpp or Ollama:

Using Ollama:

ollama pull qwen3:4b
ollama serve

Or using llama.cpp:

llama-server -m Qwen_Qwen3-4B-Q4_K_M.gguf --host 0.0.0.0 --port 11434 -c 512 --threads 12

Once configured, test it:

poetry run lexicont check "buy fake documents" --log-level DEBUG

For quick experimentation, you can also try running it on Kaggle or any machine with a CPU. No GPU required, though speed depends on your hardware.

GitHub: https://github.com/corefrg/lexicont

PyPI: https://pypi.org/project/lexicont/

Final Thought

This is a text moderation project actively being developed, and its value may lie in showing that simple architecture often works well. More documentation and examples will be added as the project evolves.

If you also believe that sometimes building a system you understand and can control matters more than something that works by magic, I'd love to hear your thoughts in the comments.

Simplicity still deserves a place in the world.

DEV Community