Why this series exists
If you are a backend engineer, you already know how to build reliable systems.
You know how requests flow through services.
You know how data gets cleaned before it is useful.
You know how APIs hide complicated internals behind simple contracts.
AI systems are not as magical as they look from the outside.
They are still systems.
They still have inputs, processing stages, outputs, tradeoffs, and production constraints.
In this series, I am going to break down a real project I built:
- a tiny support ticket classifier
- trained in Python
- exported to JSON
- served in pure Go
- fast enough to run in a few milliseconds on CPU
This is not a "train a giant LLM on a cluster" story.
This is a practical story for backend engineers who want to understand how AI products are actually assembled.
Github Repos:
- Inference: Built in pure Go
- Trainig on dataset: Python service
What the model does
The input is simple:
- one raw text support ticket
The output is richer than a single label. The model predicts five things at once:
departmentsentimentlead_intentchurn_riskintent
So for one ticket like:
"I was charged twice and need a refund"
the system can produce something like:
- department:
billing - sentiment:
negative - lead_intent:
low - churn_risk:
high - intent:
refund
That makes it a multi-task classifier.
Plain-English version:
We built one small brain that answers five related questions about the same ticket.
The full system in layman terms
Before we get technical, here is the project in everyday language.
Raw ticket
->
Clean the text
->
Pull out useful clues
->
Turn those clues into numbers
->
Pass the numbers through a tiny neural network
->
Get 5 answers
->
Package the result for production use
Now let me expand each block.
Block 1: Raw ticket
This is the message a user writes.
Examples:
- "refund nahi mila yet"
- "pricing for enterprise plan?"
- "app is not working after reset"
At this stage, the text is messy.
People type casually.
They use typos.
They mix Hindi and English.
They write with emotion.
Block 2: Clean the text
The model cannot reason about raw text the way a human does.
So first we normalize it.
That means things like:
- convert to lowercase
- replace URLs with
<url> - replace emails with
<email> - replace numbers with
<num> - normalize Hinglish words like
nahitonotandpaisatomoney
Plain-English version:
We reduce unnecessary variation so the model sees the same idea in a more consistent form.
Technical term:
This is text preprocessing or normalization.
Block 3: Pull out useful clues
After cleaning the text, we extract signals.
We do not rely on only one trick.
We use a hybrid set of features:
- bag-of-words counts
- keyword flags
- token embeddings
Why three kinds?
Because each one catches something different.
Bag-of-words helps with direct vocabulary signals.
Keyword flags help with business-important phrases like refund, cancel, or not working.
Embeddings help the model capture softer meaning patterns.
Plain-English version:
Instead of asking the model to "just understand everything," we hand it several different kinds of clues.
Block 4: Turn those clues into numbers
Neural networks do not consume text directly.
They consume arrays of numbers.
So the cleaned ticket becomes:
- one numeric vector for word counts
- one numeric vector for keyword flags
- one numeric vector from averaged token embeddings
Then we combine them into one final feature vector.
Technical term:
This is feature engineering plus vectorization.
Block 5: Pass the numbers through a tiny neural network
This project uses a very small network.
At a high level:
Feature vector
->
Dense layer
->
ReLU
->
Dense layer
->
ReLU
->
5 output heads
Each output head is responsible for one prediction task.
Why this shape?
Because the tasks are related.
For example:
-
refundoften points tobilling - angry language can affect
sentimentandchurn_risk - pricing questions often affect
departmentandlead_intent
So the model first learns a shared internal representation and then each task gets its own small output layer.
Technical term:
This is a shared-base multi-head neural network.
Block 6: Get 5 answers
Each head produces a score.
Then the system converts scores into labels:
-
softmaxfor multi-class outputs like department or intent -
sigmoidfor binary output like churn risk
Plain-English version:
The model does not directly shout "billing." It first scores all options, then picks the most likely one.
Block 7: Package the result for production
Training happens in Python.
Production inference happens in Go.
That design was intentional.
Why?
Because I wanted:
- easy training with PyTorch
- a simple export format
- a lightweight production runtime
- low latency
- low memory usage
- no external ML runtime in production
So the trained model is exported into JSON, and the Go service loads that artifact and runs the forward pass manually.
Plain-English version:
Python is the workshop. Go is the factory floor.
Mapping the real project to these blocks
Here is how the actual project maps to the conceptual flow.
Training side
-
preprocess.pyCleans text, normalizes Hinglish, replaces URLs/emails/numbers, injects style noise for synthetic data -
features.pyBuilds bag-of-words vocab, embedding vocab, keywords, and encodes text -
datasets.pyLoads Hugging Face datasets, local JSONL files, corrections, and normalizes everything into one schema -
synth.pyGenerates synthetic support-ticket examples to improve domain coverage -
model.pyDefines the tiny hybrid neural network -
train.pyHandles training, validation, class weights, metrics, early stopping, and artifact creation -
export.pyWrites the trained model to JSON for production
Inference side
-
features/Rebuilds the same preprocessing and feature extraction logic in Go -
model/Loads exported JSON and defines dense layers, embeddings, softmax, sigmoid, and validation -
quantization/Supports int8 inference for smaller and faster runtime -
inference/Orchestrates prediction and creates the final result object -
benchmark/Compares the local model against hosted models like GPT-5-mini
Why I did not use an LLM for everything
This question comes up a lot now.
Why build a tiny model at all when an LLM can classify text?
Because production engineering is about fit, not hype.
For this use case, the tiny model has real advantages:
- much lower latency
- much lower cost
- predictable output shape
- simpler deployment
- easier control over labels
- easier offline benchmarking
LLMs are great when you need open-ended reasoning or generation.
But if you need:
- narrow labels
- stable routing
- predictable performance
- cheap per-request inference
then a smaller custom model can be the better tool.
The main lesson from this project
The most important idea I want you to take from Part 1 is this:
An AI application is not "a model."
It is a pipeline.
The model matters, yes.
But so do:
- your labels
- your preprocessing
- your synthetic data
- your export format
- your inference runtime
- your benchmark setup
If you only focus on the neural network block, you miss most of the engineering work.
What is coming next
In Part 2, I will go one level deeper into the most underrated part of AI work:
the dataset and label design
That is where this project really starts.
Not in PyTorch.
Not in matrix multiplication.
Not in fancy model architecture.
It starts with deciding:
- what we want the system to predict
- what "good" labels even mean
- how to combine real data, heuristics, and synthetic examples into one usable training set
If you can understand that piece, the rest of the system becomes much easier to follow.
Disclosure: AI was used to frame the article.
Top comments (1)
I thoroughly enjoyed reading about your project and the insightful breakdown of the AI system you've built for support ticket classification. The explanation of the different stages and your design choices are particularly well-articulated, and I appreciate the practical focus on backend engineering aspects. The concept of a "tiny AI system" is compelling, especially in today's LLM-dominated landscape. ❤️🔥