LLMs are remarkably good at seeming competent at math. They'll talk about compound interest formulas with confidence. They'll describe how TDEE is calculated step by step. Then they'll give you two different answers when you ask the same question twice.
This isn't a hallucination problem. It's an architecture problem — and trying to fix it with prompts is the wrong solution.
The Problem: LLMs Are Probabilistically Inconsistent
Here's a real example. Ask Claude to calculate a 30-year mortgage on $400,000 at 6.5% APR:
Attempt 1: "Your monthly payment would be approximately $2,528."
Attempt 2: "You'd pay around $2,533 per month."
Correct answer: $2,528.27 (using the standard amortization formula)
The difference is small — but in finance, it matters. And TDEE (Total Daily Energy Expenditure) is even worse:
Attempt 1: "Your TDEE is approximately 2,240 calories."
Attempt 2: "Based on your stats, I'd estimate around 2,185 calories."
The model isn't doing arithmetic. It's predicting what a reasonable answer sounds like based on training data patterns. It's not deterministic, and it's not calling a calculator.
Prompt engineering doesn't fix this. "Think step by step" makes the model show its work — but the work can still be wrong. "Use the Harris-Benedict formula" helps, but you have no way to verify it actually used it correctly.
The Solution: Treat Math Like a Database Query
You wouldn't prompt an LLM to "look up the current price of AAPL." You'd call an API. The same logic applies to calculations.
When you give an LLM a structured tool for a calculation — a function it calls with specific inputs and gets a deterministic output — the math becomes reliable. The LLM's job is to understand what the user wants and extract the right parameters. The tool's job is to compute the answer.
This is the MCP (Model Context Protocol) approach: tools called like functions, with typed inputs, validated inputs, and deterministic outputs.
MCP Calculators in Practice
Here's how this looks in a real Claude assistant setup:
// claude_desktop_config.json
{
"mcpServers": {
"calculators": {
"command": "npx",
"args": ["-y", "@thicket-team/mcp-calculators"]
}
}
}
That's it. Now when a user asks Claude "what would my monthly payment be on a $350k house?", Claude calls:
calculate_mortgage({
principal: 350000,
annual_rate: 6.5,
term_years: 30
})
And gets back:
{
"monthly_payment": 2212.24,
"total_interest": 446406.40,
"total_cost": 796406.40,
"amortization_schedule": [...]
}
Every time. Deterministic. Formula-verified. No probabilistic drift.
Current Tools in the Package
As of v1.0.0, @thicket-team/mcp-calculators includes:
| Tool | What it calculates |
|---|---|
calculate_mortgage |
Monthly payment, total interest, full amortization schedule |
calculate_tdee |
Total Daily Energy Expenditure using Mifflin-St Jeor or Harris-Benedict |
calculate_compound_interest |
Future value with compounding periods |
calculate_bmi |
Body Mass Index with WHO category |
calculate_loan_payoff |
Loan payoff with extra payment scenarios |
calculate_percentage |
Percent of, percent change, reverse percentage |
calculate_age |
Age in years/months/days from birthdate |
All tools have:
- TypeScript type definitions
- Input validation (rejects nonsense like negative ages)
- Unit test coverage (500+ tests across the package)
- Deterministic outputs — same inputs, same answer, every time
Why This Beats Fine-Tuning
You might think: "just fine-tune the model on math." But:
- Fine-tuning is expensive — and you'd need to re-fine-tune with every new model version
- Fine-tuning improves averages, not consistency — the model still has variance on edge cases
- Tools compose better — a structured mortgage tool works in Claude, GPT-4, Gemini, Mistral, anywhere that supports function calling
Structured tools give you determinism today, across any model, with zero training cost.
Getting Started
npm install @thicket-team/mcp-calculators
Or use it directly in Claude Desktop without any code:
{
"mcpServers": {
"calculators": {
"command": "npx",
"args": ["-y", "@thicket-team/mcp-calculators"]
}
}
}
NPM page: npmjs.com/package/@thicket-team/mcp-calculators
The LLM doesn't need to be better at math. It needs to stop doing math and start calling tools that are.
That's the real fix.
Top comments (0)