Prince Raj

Posted on Apr 16

Part 1: What We Built - A Tiny AI System for Support Ticket Classification

#go #ai #backend #machinelearning

Why this series exists

If you are a backend engineer, you already know how to build reliable systems.

You know how requests flow through services.
You know how data gets cleaned before it is useful.
You know how APIs hide complicated internals behind simple contracts.

AI systems are not as magical as they look from the outside.

They are still systems.
They still have inputs, processing stages, outputs, tradeoffs, and production constraints.

In this series, I am going to break down a real project I built:

a tiny support ticket classifier
trained in Python
exported to JSON
served in pure Go
fast enough to run in a few milliseconds on CPU

This is not a "train a giant LLM on a cluster" story.

This is a practical story for backend engineers who want to understand how AI products are actually assembled.

Github Repos:

Inference: Built in pure Go
Trainig on dataset: Python service

What the model does

The input is simple:

one raw text support ticket

The output is richer than a single label. The model predicts five things at once:

department
sentiment
lead_intent
churn_risk
intent

So for one ticket like:

"I was charged twice and need a refund"

the system can produce something like:

department: billing
sentiment: negative
lead_intent: low
churn_risk: high
intent: refund

That makes it a multi-task classifier.

Plain-English version:

We built one small brain that answers five related questions about the same ticket.

The full system in layman terms

Before we get technical, here is the project in everyday language.

Raw ticket
   ->
Clean the text
   ->
Pull out useful clues
   ->
Turn those clues into numbers
   ->
Pass the numbers through a tiny neural network
   ->
Get 5 answers
   ->
Package the result for production use

Now let me expand each block.

Block 1: Raw ticket

This is the message a user writes.

Examples:

"refund nahi mila yet"
"pricing for enterprise plan?"
"app is not working after reset"

At this stage, the text is messy.
People type casually.
They use typos.
They mix Hindi and English.
They write with emotion.

Block 2: Clean the text

The model cannot reason about raw text the way a human does.
So first we normalize it.

That means things like:

convert to lowercase
replace URLs with <url>
replace emails with <email>
replace numbers with <num>
normalize Hinglish words like nahi to not and paisa to money

Plain-English version:

We reduce unnecessary variation so the model sees the same idea in a more consistent form.

Technical term:

This is text preprocessing or normalization.

Block 3: Pull out useful clues

After cleaning the text, we extract signals.

We do not rely on only one trick.
We use a hybrid set of features:

bag-of-words counts
keyword flags
token embeddings

Why three kinds?

Because each one catches something different.

Bag-of-words helps with direct vocabulary signals.
Keyword flags help with business-important phrases like refund, cancel, or not working.
Embeddings help the model capture softer meaning patterns.

Plain-English version:

Instead of asking the model to "just understand everything," we hand it several different kinds of clues.

Block 4: Turn those clues into numbers

Neural networks do not consume text directly.
They consume arrays of numbers.

So the cleaned ticket becomes:

one numeric vector for word counts
one numeric vector for keyword flags
one numeric vector from averaged token embeddings

Then we combine them into one final feature vector.

Technical term:

This is feature engineering plus vectorization.

Block 5: Pass the numbers through a tiny neural network

This project uses a very small network.

At a high level:

Feature vector
   ->
Dense layer
   ->
ReLU
   ->
Dense layer
   ->
ReLU
   ->
5 output heads

Each output head is responsible for one prediction task.

Why this shape?

Because the tasks are related.

For example:

refund often points to billing
angry language can affect sentiment and churn_risk
pricing questions often affect department and lead_intent

So the model first learns a shared internal representation and then each task gets its own small output layer.

Technical term:

This is a shared-base multi-head neural network.

Block 6: Get 5 answers

Each head produces a score.
Then the system converts scores into labels:

softmax for multi-class outputs like department or intent
sigmoid for binary output like churn risk

Plain-English version:

The model does not directly shout "billing." It first scores all options, then picks the most likely one.

Block 7: Package the result for production

Training happens in Python.
Production inference happens in Go.

That design was intentional.

Why?

Because I wanted:

easy training with PyTorch
a simple export format
a lightweight production runtime
low latency
low memory usage
no external ML runtime in production

So the trained model is exported into JSON, and the Go service loads that artifact and runs the forward pass manually.

Plain-English version:

Python is the workshop. Go is the factory floor.

Mapping the real project to these blocks

Here is how the actual project maps to the conceptual flow.

Training side

preprocess.py Cleans text, normalizes Hinglish, replaces URLs/emails/numbers, injects style noise for synthetic data
features.py Builds bag-of-words vocab, embedding vocab, keywords, and encodes text
datasets.py Loads Hugging Face datasets, local JSONL files, corrections, and normalizes everything into one schema
synth.py Generates synthetic support-ticket examples to improve domain coverage
model.py Defines the tiny hybrid neural network
train.py Handles training, validation, class weights, metrics, early stopping, and artifact creation
export.py Writes the trained model to JSON for production

Inference side

features/ Rebuilds the same preprocessing and feature extraction logic in Go
model/ Loads exported JSON and defines dense layers, embeddings, softmax, sigmoid, and validation
quantization/ Supports int8 inference for smaller and faster runtime
inference/ Orchestrates prediction and creates the final result object
benchmark/ Compares the local model against hosted models like GPT-5-mini

Why I did not use an LLM for everything

This question comes up a lot now.

Why build a tiny model at all when an LLM can classify text?

Because production engineering is about fit, not hype.

For this use case, the tiny model has real advantages:

much lower latency
much lower cost
predictable output shape
simpler deployment
easier control over labels
easier offline benchmarking

LLMs are great when you need open-ended reasoning or generation.

But if you need:

narrow labels
stable routing
predictable performance
cheap per-request inference

then a smaller custom model can be the better tool.

The main lesson from this project

The most important idea I want you to take from Part 1 is this:

An AI application is not "a model."
It is a pipeline.

The model matters, yes.
But so do:

your labels
your preprocessing
your synthetic data
your export format
your inference runtime
your benchmark setup

If you only focus on the neural network block, you miss most of the engineering work.

What is coming next

In Part 2, I will go one level deeper into the most underrated part of AI work:

the dataset and label design

That is where this project really starts.
Not in PyTorch.
Not in matrix multiplication.
Not in fancy model architecture.

It starts with deciding:

what we want the system to predict
what "good" labels even mean
how to combine real data, heuristics, and synthetic examples into one usable training set

If you can understand that piece, the rest of the system becomes much easier to follow.

Disclosure: AI was used to frame the article.

Top comments (1)

Abhishek Sharma • Apr 16

I thoroughly enjoyed reading about your project and the insightful breakdown of the AI system you've built for support ticket classification. The explanation of the different stages and your design choices are particularly well-articulated, and I appreciate the practical focus on backend engineering aspects. The concept of a "tiny AI system" is compelling, especially in today's LLM-dominated landscape. ❤️‍🔥