DEV Community

Cover image for Part 1: What We Built - A Tiny AI System for Support Ticket Classification
Prince Raj
Prince Raj

Posted on

Part 1: What We Built - A Tiny AI System for Support Ticket Classification

Why this series exists

If you are a backend engineer, you already know how to build reliable systems.

You know how requests flow through services.
You know how data gets cleaned before it is useful.
You know how APIs hide complicated internals behind simple contracts.

AI systems are not as magical as they look from the outside.

They are still systems.
They still have inputs, processing stages, outputs, tradeoffs, and production constraints.

In this series, I am going to break down a real project I built:

  • a tiny support ticket classifier
  • trained in Python
  • exported to JSON
  • served in pure Go
  • fast enough to run in a few milliseconds on CPU

This is not a "train a giant LLM on a cluster" story.

This is a practical story for backend engineers who want to understand how AI products are actually assembled.

Github Repos:

What the model does

The input is simple:

  • one raw text support ticket

The output is richer than a single label. The model predicts five things at once:

  • department
  • sentiment
  • lead_intent
  • churn_risk
  • intent

So for one ticket like:

"I was charged twice and need a refund"

the system can produce something like:

  • department: billing
  • sentiment: negative
  • lead_intent: low
  • churn_risk: high
  • intent: refund

That makes it a multi-task classifier.

Plain-English version:

We built one small brain that answers five related questions about the same ticket.

The full system in layman terms

Before we get technical, here is the project in everyday language.

Raw ticket
   ->
Clean the text
   ->
Pull out useful clues
   ->
Turn those clues into numbers
   ->
Pass the numbers through a tiny neural network
   ->
Get 5 answers
   ->
Package the result for production use
Enter fullscreen mode Exit fullscreen mode

Now let me expand each block.

Block 1: Raw ticket

This is the message a user writes.

Examples:

  • "refund nahi mila yet"
  • "pricing for enterprise plan?"
  • "app is not working after reset"

At this stage, the text is messy.
People type casually.
They use typos.
They mix Hindi and English.
They write with emotion.

Block 2: Clean the text

The model cannot reason about raw text the way a human does.
So first we normalize it.

That means things like:

  • convert to lowercase
  • replace URLs with <url>
  • replace emails with <email>
  • replace numbers with <num>
  • normalize Hinglish words like nahi to not and paisa to money

Plain-English version:

We reduce unnecessary variation so the model sees the same idea in a more consistent form.

Technical term:

This is text preprocessing or normalization.

Block 3: Pull out useful clues

After cleaning the text, we extract signals.

We do not rely on only one trick.
We use a hybrid set of features:

  • bag-of-words counts
  • keyword flags
  • token embeddings

Why three kinds?

Because each one catches something different.

Bag-of-words helps with direct vocabulary signals.
Keyword flags help with business-important phrases like refund, cancel, or not working.
Embeddings help the model capture softer meaning patterns.

Plain-English version:

Instead of asking the model to "just understand everything," we hand it several different kinds of clues.

Block 4: Turn those clues into numbers

Neural networks do not consume text directly.
They consume arrays of numbers.

So the cleaned ticket becomes:

  • one numeric vector for word counts
  • one numeric vector for keyword flags
  • one numeric vector from averaged token embeddings

Then we combine them into one final feature vector.

Technical term:

This is feature engineering plus vectorization.

Block 5: Pass the numbers through a tiny neural network

This project uses a very small network.

At a high level:

Feature vector
   ->
Dense layer
   ->
ReLU
   ->
Dense layer
   ->
ReLU
   ->
5 output heads
Enter fullscreen mode Exit fullscreen mode

Each output head is responsible for one prediction task.

Why this shape?

Because the tasks are related.

For example:

  • refund often points to billing
  • angry language can affect sentiment and churn_risk
  • pricing questions often affect department and lead_intent

So the model first learns a shared internal representation and then each task gets its own small output layer.

Technical term:

This is a shared-base multi-head neural network.

Block 6: Get 5 answers

Each head produces a score.
Then the system converts scores into labels:

  • softmax for multi-class outputs like department or intent
  • sigmoid for binary output like churn risk

Plain-English version:

The model does not directly shout "billing." It first scores all options, then picks the most likely one.

Block 7: Package the result for production

Training happens in Python.
Production inference happens in Go.

That design was intentional.

Why?

Because I wanted:

  • easy training with PyTorch
  • a simple export format
  • a lightweight production runtime
  • low latency
  • low memory usage
  • no external ML runtime in production

So the trained model is exported into JSON, and the Go service loads that artifact and runs the forward pass manually.

Plain-English version:

Python is the workshop. Go is the factory floor.

Mapping the real project to these blocks

Here is how the actual project maps to the conceptual flow.

Training side

  • preprocess.py Cleans text, normalizes Hinglish, replaces URLs/emails/numbers, injects style noise for synthetic data
  • features.py Builds bag-of-words vocab, embedding vocab, keywords, and encodes text
  • datasets.py Loads Hugging Face datasets, local JSONL files, corrections, and normalizes everything into one schema
  • synth.py Generates synthetic support-ticket examples to improve domain coverage
  • model.py Defines the tiny hybrid neural network
  • train.py Handles training, validation, class weights, metrics, early stopping, and artifact creation
  • export.py Writes the trained model to JSON for production

Inference side

  • features/ Rebuilds the same preprocessing and feature extraction logic in Go
  • model/ Loads exported JSON and defines dense layers, embeddings, softmax, sigmoid, and validation
  • quantization/ Supports int8 inference for smaller and faster runtime
  • inference/ Orchestrates prediction and creates the final result object
  • benchmark/ Compares the local model against hosted models like GPT-5-mini

Why I did not use an LLM for everything

This question comes up a lot now.

Why build a tiny model at all when an LLM can classify text?

Because production engineering is about fit, not hype.

For this use case, the tiny model has real advantages:

  • much lower latency
  • much lower cost
  • predictable output shape
  • simpler deployment
  • easier control over labels
  • easier offline benchmarking

LLMs are great when you need open-ended reasoning or generation.

But if you need:

  • narrow labels
  • stable routing
  • predictable performance
  • cheap per-request inference

then a smaller custom model can be the better tool.

The main lesson from this project

The most important idea I want you to take from Part 1 is this:

An AI application is not "a model."
It is a pipeline.

The model matters, yes.
But so do:

  • your labels
  • your preprocessing
  • your synthetic data
  • your export format
  • your inference runtime
  • your benchmark setup

If you only focus on the neural network block, you miss most of the engineering work.

What is coming next

In Part 2, I will go one level deeper into the most underrated part of AI work:

the dataset and label design

That is where this project really starts.
Not in PyTorch.
Not in matrix multiplication.
Not in fancy model architecture.

It starts with deciding:

  • what we want the system to predict
  • what "good" labels even mean
  • how to combine real data, heuristics, and synthetic examples into one usable training set

If you can understand that piece, the rest of the system becomes much easier to follow.

Disclosure: AI was used to frame the article.

Top comments (1)

Collapse
 
aksfever profile image
Abhishek Sharma

I thoroughly enjoyed reading about your project and the insightful breakdown of the AI system you've built for support ticket classification. The explanation of the different stages and your design choices are particularly well-articulated, and I appreciate the practical focus on backend engineering aspects. The concept of a "tiny AI system" is compelling, especially in today's LLM-dominated landscape. ❤️‍🔥