DEV Community

Yonatan Naor
Yonatan Naor

Posted on

Why LLMs Need Structured Math Tools (Not Prompt Engineering)

LLMs are remarkably good at seeming competent at math. They'll talk about compound interest formulas with confidence. They'll describe how TDEE is calculated step by step. Then they'll give you two different answers when you ask the same question twice.

This isn't a hallucination problem. It's an architecture problem — and trying to fix it with prompts is the wrong solution.


The Problem: LLMs Are Probabilistically Inconsistent

Here's a real example. Ask Claude to calculate a 30-year mortgage on $400,000 at 6.5% APR:

Attempt 1: "Your monthly payment would be approximately $2,528."

Attempt 2: "You'd pay around $2,533 per month."

Correct answer: $2,528.27 (using the standard amortization formula)

The difference is small — but in finance, it matters. And TDEE (Total Daily Energy Expenditure) is even worse:

Attempt 1: "Your TDEE is approximately 2,240 calories."

Attempt 2: "Based on your stats, I'd estimate around 2,185 calories."

The model isn't doing arithmetic. It's predicting what a reasonable answer sounds like based on training data patterns. It's not deterministic, and it's not calling a calculator.

Prompt engineering doesn't fix this. "Think step by step" makes the model show its work — but the work can still be wrong. "Use the Harris-Benedict formula" helps, but you have no way to verify it actually used it correctly.


The Solution: Treat Math Like a Database Query

You wouldn't prompt an LLM to "look up the current price of AAPL." You'd call an API. The same logic applies to calculations.

When you give an LLM a structured tool for a calculation — a function it calls with specific inputs and gets a deterministic output — the math becomes reliable. The LLM's job is to understand what the user wants and extract the right parameters. The tool's job is to compute the answer.

This is the MCP (Model Context Protocol) approach: tools called like functions, with typed inputs, validated inputs, and deterministic outputs.


MCP Calculators in Practice

Here's how this looks in a real Claude assistant setup:

// claude_desktop_config.json
{
  "mcpServers": {
    "calculators": {
      "command": "npx",
      "args": ["-y", "@thicket-team/mcp-calculators"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

That's it. Now when a user asks Claude "what would my monthly payment be on a $350k house?", Claude calls:

calculate_mortgage({
  principal: 350000,
  annual_rate: 6.5,
  term_years: 30
})
Enter fullscreen mode Exit fullscreen mode

And gets back:

{
  "monthly_payment": 2212.24,
  "total_interest": 446406.40,
  "total_cost": 796406.40,
  "amortization_schedule": [...]
}
Enter fullscreen mode Exit fullscreen mode

Every time. Deterministic. Formula-verified. No probabilistic drift.


Current Tools in the Package

As of v1.0.0, @thicket-team/mcp-calculators includes:

Tool What it calculates
calculate_mortgage Monthly payment, total interest, full amortization schedule
calculate_tdee Total Daily Energy Expenditure using Mifflin-St Jeor or Harris-Benedict
calculate_compound_interest Future value with compounding periods
calculate_bmi Body Mass Index with WHO category
calculate_loan_payoff Loan payoff with extra payment scenarios
calculate_percentage Percent of, percent change, reverse percentage
calculate_age Age in years/months/days from birthdate

All tools have:

  • TypeScript type definitions
  • Input validation (rejects nonsense like negative ages)
  • Unit test coverage (500+ tests across the package)
  • Deterministic outputs — same inputs, same answer, every time

Why This Beats Fine-Tuning

You might think: "just fine-tune the model on math." But:

  1. Fine-tuning is expensive — and you'd need to re-fine-tune with every new model version
  2. Fine-tuning improves averages, not consistency — the model still has variance on edge cases
  3. Tools compose better — a structured mortgage tool works in Claude, GPT-4, Gemini, Mistral, anywhere that supports function calling

Structured tools give you determinism today, across any model, with zero training cost.


Getting Started

npm install @thicket-team/mcp-calculators
Enter fullscreen mode Exit fullscreen mode

Or use it directly in Claude Desktop without any code:

{
  "mcpServers": {
    "calculators": {
      "command": "npx",
      "args": ["-y", "@thicket-team/mcp-calculators"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

NPM page: npmjs.com/package/@thicket-team/mcp-calculators


The LLM doesn't need to be better at math. It needs to stop doing math and start calling tools that are.

That's the real fix.

Top comments (0)