<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Prince Raj</title>
    <description>The latest articles on DEV Community by Prince Raj (@prince_raj).</description>
    <link>https://dev.to/prince_raj</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3882564%2F1ed442f8-5d60-4cec-a854-271d4963a1d3.jpg</url>
      <title>DEV Community: Prince Raj</title>
      <link>https://dev.to/prince_raj</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/prince_raj"/>
    <language>en</language>
    <item>
      <title>Part 1: What We Built - A Tiny AI System for Support Ticket Classification</title>
      <dc:creator>Prince Raj</dc:creator>
      <pubDate>Thu, 16 Apr 2026 14:07:48 +0000</pubDate>
      <link>https://dev.to/prince_raj/part-1-what-we-built-a-tiny-ai-system-for-support-ticket-classification-4ihl</link>
      <guid>https://dev.to/prince_raj/part-1-what-we-built-a-tiny-ai-system-for-support-ticket-classification-4ihl</guid>
      <description>&lt;h2&gt;
  
  
  Why this series exists
&lt;/h2&gt;

&lt;p&gt;If you are a backend engineer, you already know how to build reliable systems.&lt;/p&gt;

&lt;p&gt;You know how requests flow through services.&lt;br&gt;
You know how data gets cleaned before it is useful.&lt;br&gt;
You know how APIs hide complicated internals behind simple contracts.&lt;/p&gt;

&lt;p&gt;AI systems are not as magical as they look from the outside.&lt;/p&gt;

&lt;p&gt;They are still systems.&lt;br&gt;
They still have inputs, processing stages, outputs, tradeoffs, and production constraints.&lt;/p&gt;

&lt;p&gt;In this series, I am going to break down a real project I built:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a tiny support ticket classifier&lt;/li&gt;
&lt;li&gt;trained in Python&lt;/li&gt;
&lt;li&gt;exported to JSON&lt;/li&gt;
&lt;li&gt;served in pure Go&lt;/li&gt;
&lt;li&gt;fast enough to run in a few milliseconds on CPU&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a "train a giant LLM on a cluster" story.&lt;/p&gt;

&lt;p&gt;This is a practical story for backend engineers who want to understand how AI products are actually assembled.&lt;/p&gt;
&lt;h3&gt;
  
  
  Github Repos:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Inference: &lt;a href="https://github.com/pncraz/tickets-inf" rel="noopener noreferrer"&gt;Built in pure Go&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Trainig on dataset: &lt;a href="https://github.com/pncraz/tickets-trainig" rel="noopener noreferrer"&gt;Python service&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  What the model does
&lt;/h2&gt;

&lt;p&gt;The input is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one raw text support ticket&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The output is richer than a single label. The model predicts five things at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;department&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sentiment&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;lead_intent&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;churn_risk&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;intent&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So for one ticket like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I was charged twice and need a refund"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;the system can produce something like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;department: &lt;code&gt;billing&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;sentiment: &lt;code&gt;negative&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;lead_intent: &lt;code&gt;low&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;churn_risk: &lt;code&gt;high&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;intent: &lt;code&gt;refund&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That makes it a &lt;strong&gt;multi-task classifier&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Plain-English version:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We built one small brain that answers five related questions about the same ticket.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  The full system in layman terms
&lt;/h2&gt;

&lt;p&gt;Before we get technical, here is the project in everyday language.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Raw ticket
   -&amp;gt;
Clean the text
   -&amp;gt;
Pull out useful clues
   -&amp;gt;
Turn those clues into numbers
   -&amp;gt;
Pass the numbers through a tiny neural network
   -&amp;gt;
Get 5 answers
   -&amp;gt;
Package the result for production use
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let me expand each block.&lt;/p&gt;

&lt;h2&gt;
  
  
  Block 1: Raw ticket
&lt;/h2&gt;

&lt;p&gt;This is the message a user writes.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"refund nahi mila yet"&lt;/li&gt;
&lt;li&gt;"pricing for enterprise plan?"&lt;/li&gt;
&lt;li&gt;"app is not working after reset"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this stage, the text is messy.&lt;br&gt;
People type casually.&lt;br&gt;
They use typos.&lt;br&gt;
They mix Hindi and English.&lt;br&gt;
They write with emotion.&lt;/p&gt;
&lt;h2&gt;
  
  
  Block 2: Clean the text
&lt;/h2&gt;

&lt;p&gt;The model cannot reason about raw text the way a human does.&lt;br&gt;
So first we normalize it.&lt;/p&gt;

&lt;p&gt;That means things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;convert to lowercase&lt;/li&gt;
&lt;li&gt;replace URLs with &lt;code&gt;&amp;lt;url&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;replace emails with &lt;code&gt;&amp;lt;email&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;replace numbers with &lt;code&gt;&amp;lt;num&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;normalize Hinglish words like &lt;code&gt;nahi&lt;/code&gt; to &lt;code&gt;not&lt;/code&gt; and &lt;code&gt;paisa&lt;/code&gt; to &lt;code&gt;money&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Plain-English version:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We reduce unnecessary variation so the model sees the same idea in a more consistent form.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Technical term:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is &lt;strong&gt;text preprocessing&lt;/strong&gt; or &lt;strong&gt;normalization&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Block 3: Pull out useful clues
&lt;/h2&gt;

&lt;p&gt;After cleaning the text, we extract signals.&lt;/p&gt;

&lt;p&gt;We do not rely on only one trick.&lt;br&gt;
We use a hybrid set of features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;bag-of-words counts&lt;/li&gt;
&lt;li&gt;keyword flags&lt;/li&gt;
&lt;li&gt;token embeddings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why three kinds?&lt;/p&gt;

&lt;p&gt;Because each one catches something different.&lt;/p&gt;

&lt;p&gt;Bag-of-words helps with direct vocabulary signals.&lt;br&gt;
Keyword flags help with business-important phrases like &lt;code&gt;refund&lt;/code&gt;, &lt;code&gt;cancel&lt;/code&gt;, or &lt;code&gt;not working&lt;/code&gt;.&lt;br&gt;
Embeddings help the model capture softer meaning patterns.&lt;/p&gt;

&lt;p&gt;Plain-English version:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Instead of asking the model to "just understand everything," we hand it several different kinds of clues.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Block 4: Turn those clues into numbers
&lt;/h2&gt;

&lt;p&gt;Neural networks do not consume text directly.&lt;br&gt;
They consume arrays of numbers.&lt;/p&gt;

&lt;p&gt;So the cleaned ticket becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one numeric vector for word counts&lt;/li&gt;
&lt;li&gt;one numeric vector for keyword flags&lt;/li&gt;
&lt;li&gt;one numeric vector from averaged token embeddings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then we combine them into one final feature vector.&lt;/p&gt;

&lt;p&gt;Technical term:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is &lt;strong&gt;feature engineering&lt;/strong&gt; plus &lt;strong&gt;vectorization&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Block 5: Pass the numbers through a tiny neural network
&lt;/h2&gt;

&lt;p&gt;This project uses a very small network.&lt;/p&gt;

&lt;p&gt;At a high level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Feature vector
   -&amp;gt;
Dense layer
   -&amp;gt;
ReLU
   -&amp;gt;
Dense layer
   -&amp;gt;
ReLU
   -&amp;gt;
5 output heads
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each output head is responsible for one prediction task.&lt;/p&gt;

&lt;p&gt;Why this shape?&lt;/p&gt;

&lt;p&gt;Because the tasks are related.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;refund&lt;/code&gt; often points to &lt;code&gt;billing&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;angry language can affect &lt;code&gt;sentiment&lt;/code&gt; and &lt;code&gt;churn_risk&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;pricing questions often affect &lt;code&gt;department&lt;/code&gt; and &lt;code&gt;lead_intent&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the model first learns a &lt;strong&gt;shared internal representation&lt;/strong&gt; and then each task gets its own small output layer.&lt;/p&gt;

&lt;p&gt;Technical term:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is a &lt;strong&gt;shared-base multi-head neural network&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Block 6: Get 5 answers
&lt;/h2&gt;

&lt;p&gt;Each head produces a score.&lt;br&gt;
Then the system converts scores into labels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;softmax&lt;/code&gt; for multi-class outputs like department or intent&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sigmoid&lt;/code&gt; for binary output like churn risk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Plain-English version:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The model does not directly shout "billing." It first scores all options, then picks the most likely one.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Block 7: Package the result for production
&lt;/h2&gt;

&lt;p&gt;Training happens in Python.&lt;br&gt;
Production inference happens in Go.&lt;/p&gt;

&lt;p&gt;That design was intentional.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because I wanted:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;easy training with PyTorch&lt;/li&gt;
&lt;li&gt;a simple export format&lt;/li&gt;
&lt;li&gt;a lightweight production runtime&lt;/li&gt;
&lt;li&gt;low latency&lt;/li&gt;
&lt;li&gt;low memory usage&lt;/li&gt;
&lt;li&gt;no external ML runtime in production&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the trained model is exported into JSON, and the Go service loads that artifact and runs the forward pass manually.&lt;/p&gt;

&lt;p&gt;Plain-English version:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Python is the workshop. Go is the factory floor.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Mapping the real project to these blocks
&lt;/h2&gt;

&lt;p&gt;Here is how the actual project maps to the conceptual flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Training side
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;preprocess.py&lt;/code&gt;
Cleans text, normalizes Hinglish, replaces URLs/emails/numbers, injects style noise for synthetic data&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;features.py&lt;/code&gt;
Builds bag-of-words vocab, embedding vocab, keywords, and encodes text&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;datasets.py&lt;/code&gt;
Loads Hugging Face datasets, local JSONL files, corrections, and normalizes everything into one schema&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;synth.py&lt;/code&gt;
Generates synthetic support-ticket examples to improve domain coverage&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;model.py&lt;/code&gt;
Defines the tiny hybrid neural network&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;train.py&lt;/code&gt;
Handles training, validation, class weights, metrics, early stopping, and artifact creation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;export.py&lt;/code&gt;
Writes the trained model to JSON for production&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Inference side
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;features/&lt;/code&gt;
Rebuilds the same preprocessing and feature extraction logic in Go&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;model/&lt;/code&gt;
Loads exported JSON and defines dense layers, embeddings, softmax, sigmoid, and validation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;quantization/&lt;/code&gt;
Supports int8 inference for smaller and faster runtime&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;inference/&lt;/code&gt;
Orchestrates prediction and creates the final result object&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;benchmark/&lt;/code&gt;
Compares the local model against hosted models like GPT-5-mini&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why I did not use an LLM for everything
&lt;/h2&gt;

&lt;p&gt;This question comes up a lot now.&lt;/p&gt;

&lt;p&gt;Why build a tiny model at all when an LLM can classify text?&lt;/p&gt;

&lt;p&gt;Because production engineering is about fit, not hype.&lt;/p&gt;

&lt;p&gt;For this use case, the tiny model has real advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;much lower latency&lt;/li&gt;
&lt;li&gt;much lower cost&lt;/li&gt;
&lt;li&gt;predictable output shape&lt;/li&gt;
&lt;li&gt;simpler deployment&lt;/li&gt;
&lt;li&gt;easier control over labels&lt;/li&gt;
&lt;li&gt;easier offline benchmarking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLMs are great when you need open-ended reasoning or generation.&lt;/p&gt;

&lt;p&gt;But if you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;narrow labels&lt;/li&gt;
&lt;li&gt;stable routing&lt;/li&gt;
&lt;li&gt;predictable performance&lt;/li&gt;
&lt;li&gt;cheap per-request inference&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;then a smaller custom model can be the better tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  The main lesson from this project
&lt;/h2&gt;

&lt;p&gt;The most important idea I want you to take from Part 1 is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An AI application is not "a model."&lt;br&gt;
It is a pipeline.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model matters, yes.&lt;br&gt;
But so do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;your labels&lt;/li&gt;
&lt;li&gt;your preprocessing&lt;/li&gt;
&lt;li&gt;your synthetic data&lt;/li&gt;
&lt;li&gt;your export format&lt;/li&gt;
&lt;li&gt;your inference runtime&lt;/li&gt;
&lt;li&gt;your benchmark setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you only focus on the neural network block, you miss most of the engineering work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is coming next
&lt;/h2&gt;

&lt;p&gt;In Part 2, I will go one level deeper into the most underrated part of AI work:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;the dataset and label design&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is where this project really starts.&lt;br&gt;
Not in PyTorch.&lt;br&gt;
Not in matrix multiplication.&lt;br&gt;
Not in fancy model architecture.&lt;/p&gt;

&lt;p&gt;It starts with deciding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what we want the system to predict&lt;/li&gt;
&lt;li&gt;what "good" labels even mean&lt;/li&gt;
&lt;li&gt;how to combine real data, heuristics, and synthetic examples into one usable training set&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you can understand that piece, the rest of the system becomes much easier to follow.&lt;/p&gt;

&lt;p&gt;Disclosure: AI was used to frame the article.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>go</category>
      <category>backend</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
