DEV Community

Artyom Molchanov
Artyom Molchanov

Posted on

I Built a Support Ticket Classifier with a Fine-Tuned LLM for $10/month

I fine-tuned Qwen2.5-0.5B to classify telecom support tickets, quantized it to 350MB, and deployed it on a cheap VPS. Here's how.

Live Demo | API Docs

The Problem

Support teams waste hours manually routing tickets. A customer writes "my wifi is slow" — is it a technical issue? Billing? Should it go to L1 or L2 support?

I built a classifier that outputs structured JSON with intent, category, urgency, sentiment, routing target, and extracted entities.

Why Not Just Use a Cloud API?

  1. Cost — 50K requests/month via cloud LLMs (OpenAI, Claude, Gemini) ≈ $100-200. Self-hosted = $10-20
  2. Privacy — Some companies can't send customer data to external APIs
  3. Control — Fine-tune for your specific domain

The Stack

  • Qwen2.5-0.5B (fine-tuned) → GGUF Q4_K_M (350MB)
  • llama-cpp-python for inference → FastAPI for API → nginx for reverse proxy
  • Docker → VPS ($10/mo)

Fine-Tuning

Base Model

Qwen2.5-0.5B-Instruct — small enough for CPU inference, smart enough for classification.

Dataset

~1000 synthetic support tickets with labels:

  • Technical issues (internet, TV, mobile)
  • Billing inquiries
  • Cancellation requests
  • General questions

Training

Full fine-tuning on Google Colab T4 (free tier):

  • 3 epochs
  • Learning rate: 2e-5
  • bf16 training
  • ~40 minutes total

Quantization

Converted to GGUF and quantized to 4-bit using llama.cpp tools.

Result: 350MB model that runs on CPU.

The API

Simple FastAPI wrapper: load the GGUF model, accept POST requests, construct chat messages with system prompt and user text, parse JSON from model output, log to database.

Filtering Garbage Input

Users will send random stuff. Added a heuristic check:

  • Text too short (< 10 chars) → not relevant
  • Contains telecom keywords (wifi, internet, bill, etc.) → relevant
  • No keywords + category=unknown → not relevant

Now irrelevant queries return is_relevant: false.

Deployment

VPS Setup

Standard approach:

  1. Install Docker
  2. Deploy with docker compose
  3. Add SSL with Certbot

Total cost: ~$10-15/month for a 2 vCore, 4GB RAM VPS.

Performance

Metric Value
Intent accuracy ~92%
Category accuracy ~89%
Inference (VPS CPU) 3-5 sec
Inference (M1 Mac) 150-300ms
Model size 350 MB
Memory usage ~700 MB

Why 3-5 seconds is fine

This isn't a chatbot. It's ticket classification that happens once when a ticket is created. You can also process async via a queue.

For faster inference: use a modern CPU (AMD EPYC) or add a GPU.

When to Fine-Tune vs Use GPT API

Fine-tune when:

  • Data privacy is required (on-premise)
  • High volume of similar requests (>10K/month)
  • Specific domain knowledge needed

Use GPT API when:

  • Low volume
  • Diverse tasks
  • Need best quality regardless of cost

Try It


Want something similar for your company? I build custom LLM solutions that run on your infrastructure.

Reach out on Telegram — let's discuss your use case.

Top comments (0)