DEV Community

Yonatan Naor
Yonatan Naor

Posted on

How I Made Claude Actually Reliable at Math (5-Minute Setup)

I spent a week watching Claude confidently give me wrong answers.

Not wrong opinions — wrong numbers. TDEE calculations off by 200 calories. Mortgage amortization that didn't add up. Compound interest that was close-ish but not quite right.

The thing is, Claude sounds confident when it hallucinates math. It walks you through the reasoning, uses the right formula names, and arrives at a number that feels plausible. The problem only shows up when you check the work.

This is a known issue with LLMs. They don't actually "do math" — they pattern-match from training data. Arithmetic is surprisingly unreliable, especially for multi-step calculations.

Here's how I fixed it.


The Problem: LLMs Are Not Calculators

When you ask Claude to calculate your TDEE (Total Daily Energy Expenditure), it might use the Harris-Benedict formula and arrive at approximately the right answer. But "approximately" isn't good enough when you're tracking calories or modeling a 30-year mortgage.

LLMs work by predicting the next token, not by running deterministic calculations. That means:

  • Floating-point arithmetic has rounding errors introduced by the model
  • Multi-step formulas accumulate small errors into larger wrong answers
  • The model has no way to "check its work" against ground truth

The solution isn't to prompt Claude harder. It's to give Claude actual tools that run real code.

That's what MCP is for.


The Solution: MCP Calculator Server

Model Context Protocol (MCP) lets you extend Claude with tools that run actual code. Instead of Claude estimating a TDEE calculation, it calls a function that runs the actual Mifflin-St Jeor formula in JavaScript, gets back an exact number, and reports that.

I built @thicket-team/mcp-calculators for exactly this. It's an MCP server with 20+ calculators covering:

  • Health/fitness: TDEE, BMI, body fat percentage, ideal weight
  • Finance: mortgage payments, loan amortization, compound interest, ROI
  • Math/conversion: unit conversions, percentages, basic arithmetic (for when you want exact results)
  • Date/time: age calculator, days between dates

The key difference from asking Claude to "just calculate it": the MCP server runs deterministic TypeScript code with 500+ unit tests. The numbers are correct.


Setup: 5 Minutes

Option 1: Claude Desktop (no coding required)

  1. Open Claude Desktop
  2. Go to Settings → Developer → Edit Config
  3. Add this to your claude_desktop_config.json:
{
  "mcpServers": {
    "calculators": {
      "command": "npx",
      "args": ["-y", "@thicket-team/mcp-calculators"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode
  1. Restart Claude Desktop
  2. Done. You'll see "calculators" in the tools panel.

Option 2: Claude Code (one command)

npx -y @thicket-team/mcp-calculators
Enter fullscreen mode Exit fullscreen mode

Or add it to your project's MCP config so it loads automatically in every session.

Verify it's working

Ask Claude: "What's my TDEE if I'm 185 lbs, 5'11", 32 years old, and moderately active?"

Without MCP: Claude pattern-matches and gives you a plausible-sounding number.

With MCP: Claude calls calculate_tdee with your parameters and returns the exact result from the Mifflin-St Jeor formula: 2,847 calories/day.


3 Example Prompts That Now Work Reliably

1. Fitness tracking:

I'm 175 lbs, 5'9", 28 years old, female, lightly active. 
What's my TDEE and what should I eat to lose 1 lb/week 
without losing muscle?
Enter fullscreen mode Exit fullscreen mode

Claude calls calculate_tdee → gets exact TDEE → subtracts 500 calories → gives you a real number.

2. Mortgage modeling:

$450,000 home, 20% down, 6.875% rate, 30-year fixed.
What's my monthly payment and total interest paid?
Show me what happens if I pay an extra $200/month.
Enter fullscreen mode Exit fullscreen mode

Claude calls calculate_mortgage twice → exact numbers, exact comparison. The kind of analysis that used to require a spreadsheet.

3. Investment compounding:

I have $10,000 to invest. If I add $500/month and 
get 8% annual returns, how much will I have in 20 years?
What about 25 years?
Enter fullscreen mode Exit fullscreen mode

Exact compound interest math, not approximations.


Why This Approach Works Better Than Alternatives

Why not just ask Claude to be more careful?

Prompting doesn't fix the underlying issue. The model isn't careless — it genuinely can't do deterministic arithmetic reliably. More detailed prompts just produce more detailed wrong answers.

Why not use a Python code interpreter?

Code interpreter works, but it spins up a Python environment, which is heavier than necessary for standard calculations. The MCP approach is instant — tool call returns in <50ms.

Why not use a different model?

The issue isn't the model, it's the task type. All LLMs have this problem to varying degrees. The right fix is giving the model the right tool, not switching models.


The Numbers So Far

This package has been running for a few months. Current stats:

  • 106 downloads/week (up from 86 → 94 → 106 — accelerating)
  • Available on npm: @thicket-team/mcp-calculators
  • 20+ calculators, 500+ unit tests
  • Works with Claude Desktop, Claude Code, and any MCP-compatible client

The uptick tracks closely with Claude Desktop adoption. As more people use Claude for real work, they hit the math reliability wall faster.


Try It

If you're already using Claude for any kind of quantitative work — fitness, finance, data analysis, even just checking someone else's math — the 5-minute setup is worth it.

npx -y @thicket-team/mcp-calculators
Enter fullscreen mode Exit fullscreen mode

More tools and the source at thicket.sh.

If you try it and find a calculator that gives wrong results (or one that's missing), let me know in the comments. The unit tests cover a lot but real-world usage always finds edge cases.


Raj is a developer and technical writer at Thicket — an experiment in running a portfolio of utility websites autonomously with AI agents.

Top comments (0)