Yonatan Naor

Posted on Mar 31

Why LLMs Need Structured Math Tools (Not Prompt Engineering)

#mcp #llm #ai #typescript

LLMs are remarkably good at seeming competent at math. They'll talk about compound interest formulas with confidence. They'll describe how TDEE is calculated step by step. Then they'll give you two different answers when you ask the same question twice.

This isn't a hallucination problem. It's an architecture problem — and trying to fix it with prompts is the wrong solution.

The Problem: LLMs Are Probabilistically Inconsistent

Here's a real example. Ask Claude to calculate a 30-year mortgage on $400,000 at 6.5% APR:

Attempt 1: "Your monthly payment would be approximately $2,528."

Attempt 2: "You'd pay around $2,533 per month."

Correct answer: $2,528.27 (using the standard amortization formula)

The difference is small — but in finance, it matters. And TDEE (Total Daily Energy Expenditure) is even worse:

Attempt 1: "Your TDEE is approximately 2,240 calories."

Attempt 2: "Based on your stats, I'd estimate around 2,185 calories."

The model isn't doing arithmetic. It's predicting what a reasonable answer sounds like based on training data patterns. It's not deterministic, and it's not calling a calculator.

Prompt engineering doesn't fix this. "Think step by step" makes the model show its work — but the work can still be wrong. "Use the Harris-Benedict formula" helps, but you have no way to verify it actually used it correctly.

The Solution: Treat Math Like a Database Query

You wouldn't prompt an LLM to "look up the current price of AAPL." You'd call an API. The same logic applies to calculations.

When you give an LLM a structured tool for a calculation — a function it calls with specific inputs and gets a deterministic output — the math becomes reliable. The LLM's job is to understand what the user wants and extract the right parameters. The tool's job is to compute the answer.

This is the MCP (Model Context Protocol) approach: tools called like functions, with typed inputs, validated inputs, and deterministic outputs.

MCP Calculators in Practice

Here's how this looks in a real Claude assistant setup:

// claude_desktop_config.json
{
  "mcpServers": {
    "calculators": {
      "command": "npx",
      "args": ["-y", "@thicket-team/mcp-calculators"]
    }
  }
}

That's it. Now when a user asks Claude "what would my monthly payment be on a $350k house?", Claude calls:

calculate_mortgage({
  principal: 350000,
  annual_rate: 6.5,
  term_years: 30
})

And gets back:

{
  "monthly_payment": 2212.24,
  "total_interest": 446406.40,
  "total_cost": 796406.40,
  "amortization_schedule": [...]
}

Every time. Deterministic. Formula-verified. No probabilistic drift.

Current Tools in the Package

As of v1.0.0, @thicket-team/mcp-calculators includes:

Tool	What it calculates
`calculate_mortgage`	Monthly payment, total interest, full amortization schedule
`calculate_tdee`	Total Daily Energy Expenditure using Mifflin-St Jeor or Harris-Benedict
`calculate_compound_interest`	Future value with compounding periods
`calculate_bmi`	Body Mass Index with WHO category
`calculate_loan_payoff`	Loan payoff with extra payment scenarios
`calculate_percentage`	Percent of, percent change, reverse percentage
`calculate_age`	Age in years/months/days from birthdate

All tools have:

TypeScript type definitions
Input validation (rejects nonsense like negative ages)
Unit test coverage (500+ tests across the package)
Deterministic outputs — same inputs, same answer, every time

Why This Beats Fine-Tuning

You might think: "just fine-tune the model on math." But:

Fine-tuning is expensive — and you'd need to re-fine-tune with every new model version
Fine-tuning improves averages, not consistency — the model still has variance on edge cases
Tools compose better — a structured mortgage tool works in Claude, GPT-4, Gemini, Mistral, anywhere that supports function calling

Structured tools give you determinism today, across any model, with zero training cost.

Getting Started

npm install @thicket-team/mcp-calculators

Or use it directly in Claude Desktop without any code:

{
  "mcpServers": {
    "calculators": {
      "command": "npx",
      "args": ["-y", "@thicket-team/mcp-calculators"]
    }
  }
}

NPM page: npmjs.com/package/@thicket-team/mcp-calculators

The LLM doesn't need to be better at math. It needs to stop doing math and start calling tools that are.

That's the real fix.

DEV Community