DEV Community

Tilak Raj
Tilak Raj

Posted on

Why I Switched From GPT-4 to Small Language Models for Two of My Products

GPT-4 and Claude Sonnet are not always the right model for the job. After 18 months of running AI products in production, I've moved two of my products from frontier models to small language models — and the results have been better latency, lower cost, and in one case, higher accuracy on the specific task. Here is exactly what I did and why.


Background: The Two Products That Changed

Product 1: AgriIntel — Crop recommendation classification

AgriIntel uses AI to classify incoming sensor data events and route them to the appropriate recommendation workflow.

The classification task is:

Given a set of sensor readings (soil moisture, temperature, nutrient levels, weather forecast), classify what type of agronomic decision is needed:

  • Irrigation
  • Fertilization
  • Pest management
  • Harvest timing
  • No action

This is a classification task with a fixed taxonomy. GPT-4o was doing it well — but at $0.005 per classification, at 15,000+ classifications per day, the cost was significant.

Latency was also 800ms–1.2s for a task where users expect near-instant feedback.


Product 2: CanadaCompliance — Regulation change impact classification

CanadaCompliance.ai monitors regulatory changes and classifies each change by:

  • Industry sector affected
  • Type of obligation (new requirement, amendment, repeal)
  • Urgency level (immediate action, planning horizon, informational)

Again — fixed taxonomy classification with high volume.


Why Small Language Models Made Sense

The key insight:

Frontier models are optimized for general capability. For specific classification tasks, that capability is overkill — and you pay for it in cost and latency.

Small language models (Phi-3, Mistral 7B, Llama 3.2) are:

  • Much faster (50–200ms vs 800ms–2s)
  • Much cheaper (10–100× lower cost)
  • Fine-tuneable to specific tasks
  • Privately hostable for data residency needs

The Fine-Tuning Process for AgriIntel

Step 1: Build training dataset

I generated a labeled dataset using GPT-4o — using the model I was replacing to label 3,000 examples.

This is a common pattern:
Use a strong model to generate training data for a smaller model.

Example workflow:

  • Generate labeled examples
  • Format JSONL dataset
  • Prepare training pipeline

Step 2: Fine-tune the model

I fine-tuned GPT-4o-mini using OpenAI’s fine-tuning API.

Why GPT-4o-mini?

It is smaller, cheaper, and performs better on specialized tasks while keeping OpenAI API simplicity.


Step 3: Benchmark results

Before switching production traffic, I tested both models on a 500-example dataset:

Results:

GPT-4o:

  • Accuracy: 96.2%
  • Latency: 1100ms
  • Cost: $0.0048 per call

Fine-tuned GPT-4o-mini:

  • Accuracy: 97.1%
  • Latency: 280ms
  • Cost: $0.00048 per call

Improvements:

  • Cost reduction: 90%
  • Latency reduction: 75%
  • Accuracy improvement: +0.9%

Why the Fine-Tuned Model Performed Better

GPT-4o tries to be helpful and nuanced, which sometimes adds unnecessary complexity.

The fine-tuned model learned:

  • Exact taxonomy
  • Expected output structure
  • Domain edge cases

For structured classification tasks, precision beats general capability.

Fine-tuning teaches the model:
How to apply knowledge to your specific domain.


When NOT to Use Small Language Models

This approach does NOT work for:

  • Open-ended generation (reports, documents)
  • Complex reasoning tasks
  • Low-volume workloads
  • Rapidly changing taxonomies

Use frontier models when flexibility matters more than cost.


Decision Framework

Use fine-tuned SLM when:

  • Volume > 1,000 calls/day
  • Fixed taxonomy
  • Stable task definition
  • Latency matters
  • Cost matters
  • You have training data

Use frontier models when:

  • Volume is low
  • Task requires reasoning
  • Task changes frequently
  • No training data exists
  • Output quality variance is risky

Results Summary

AgriIntel improvements:

Cost reduction: 90%
Latency reduction: 75%
Accuracy improvement: +0.9%

Monthly savings:
$3,100/month (~$37,000/year)


About the Author

Tilak Raj is CEO & Founder of Brainfy AI, building vertical AI SaaS products across agriculture, insurance, aviation compliance, and real estate.

Website:
https://tilakraj.info

Projects:
https://tilakraj.info/projects

Top comments (0)