kanta13jp1

Posted on Apr 28

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

#ai #postgres #automation #indiedev

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

Haiku is fast and cheap. But "cheap, so use it" isn't a design strategy. Here are the actual patterns from building a horse racing prediction system with 11 scoring factors — how to extract maximum value from haiku and when to upgrade.

Model Selection Principle

Claude Opus 4.7:   highest quality, highest cost → architecture decisions, complex design
Claude Sonnet 4.6: balanced                      → code review, moderate reasoning
Claude Haiku 4.5:  fast, cheap                   → routine inference, batch processing

Decision rule: Does this task require deep understanding?

Take 11 numeric scores, generate a prediction text → haiku is sufficient
Design the 11-factor scoring system → needs sonnet or above

Haiku in the Horse Racing AI

const CLAUDE_MODELS = {
  haiku: 'claude-haiku-4-5-20251001',
  sonnet: 'claude-sonnet-4-6',
};

async function predictRace(raceData: RaceInput): Promise<string> {
  // High data quality = straightforward numeric analysis → haiku
  const model = raceData.dataQualityScore >= 7
    ? CLAUDE_MODELS.haiku
    : CLAUDE_MODELS.sonnet;  // low quality = more inference needed

  const response = await anthropic.messages.create({
    model,
    max_tokens: 800,
    messages: [{
      role: 'user',
      content: buildPredictionPrompt(raceData),
    }],
  });

  return response.content[0].text;
}

Prompt Design for Haiku

Haiku performs best with short, structured, explicit inputs:

function buildPredictionPrompt(data: RaceInput): string {
  return `
You are a horse racing prediction specialist. Analyze the data below and output ranked predictions.

<<<RACE_DATA>>>
Race: ${data.raceName}
Starters: ${data.horseCount}

Horses (11-factor scores):
${data.horses.map(h => `
  ${h.name}: total ${h.totalScore}
  - final lap: ${h.finalLapScore} | prev rank: ${h.prevRankScore}
  - jockey: ${h.jockeyScore} | weight change: ${h.weightScore}
  - odds: ${h.oddsScore} | time: ${h.timeScore}
  - popularity: ${h.popularityScore} | margin: ${h.marginScore}
  - rest days: ${h.freshnessScore} | age penalty: ${h.agePenaltyScore}
  - data quality: ${h.dataQualityScore}/17
`).join('')}
<<<END>>>

Output: Top 3 predicted finishers with one-sentence rationale each. Under 150 words.
`;
}

<<<RACE_DATA>>>...<<<END>>> blocks guard against prompt injection from external data. Explicit output format ("under 150 words") keeps haiku's output consistent.

Batch Parallelism

async function predictAllRaces(races: RaceInput[]): Promise<PredictionResult[]> {
  const batchSize = 5;
  const results: PredictionResult[] = [];

  for (let i = 0; i < races.length; i += batchSize) {
    const batch = races.slice(i, i + batchSize);
    const batchResults = await Promise.all(
      batch.map(race => predictRace(race).catch(e => ({ error: String(e) })))
    );
    results.push(...batchResults as PredictionResult[]);

    if (i + batchSize < races.length) {
      await new Promise(r => setTimeout(r, 200));
    }
  }

  return results;
}

Haiku has lower latency and more relaxed rate limits than sonnet/opus — parallel batches work well.

Cost Math: Haiku vs Sonnet

Per prediction (~800 input tokens / ~200 output tokens):

Haiku:   $0.25/M in + $1.25/M out
  = $0.00045 per prediction

Sonnet:  $3.00/M in + $15.00/M out
  = $0.00540 per prediction

→ Haiku costs ~1/12 of Sonnet

50 races/day:
  Haiku:   $0.0225/day = $0.68/month
  Sonnet:  $0.2700/day = $8.10/month

$7.40/month difference. Small now, meaningful at scale.

Dynamic Model Selection via DQS

function selectModel(dqs: number, factorCount: number): 'haiku' | 'sonnet' {
  if (dqs >= 12 && factorCount >= 10) return 'haiku';
  return 'sonnet';
}

High DQS = complete data = numeric analysis = haiku is enough.

Low DQS = missing data = requires contextual inference = upgrade to sonnet.

Summary

Five rules for getting value from haiku:

Routine, structured tasks only. Don't use it where creativity or deep reasoning matters.
Keep prompts short and structured. Fewer input tokens = lower cost and faster response.
Specify output format explicitly. "Under 150 words" keeps haiku's output consistent.
Switch models dynamically based on context quality. Don't fix everything to haiku.
Parallelize batches. Low latency plus relaxed rate limits = efficient batch processing.

The insight isn't "haiku is cheap." It's "this task doesn't need more than haiku." That judgment is the design decision.

DEV Community

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

Model Selection Principle

Haiku in the Horse Racing AI

Prompt Design for Haiku

Batch Parallelism

Cost Math: Haiku vs Sonnet

Dynamic Model Selection via DQS

Summary

Top comments (0)