DEV Community

shashank ms
shashank ms

Posted on

Building Mathematics Tools with LLMs

Mathematics remains one of the most reliable stress tests for large language models. Unlike open-ended creative writing, math demands verifiable reasoning, exact symbolic manipulation, and the ability to maintain logical consistency across long derivations. For developers building tutoring systems, theorem provers, or engineering assistants, the challenge is not just selecting a capable model, but architecting a pipeline that combines deep reasoning with deterministic verification.

Why Math Breaks Standard LLM Pipelines

Standard chat completions often fail on mathematics because models are optimized for token probability, not logical necessity. A single arithmetic error or sign mistake invalidates an entire proof. Long-form derivations also consume significant context windows, especially when you include problem statements, prior steps, and reference material. Token-based billing makes this experimentation expensive, particularly when you are running multi-turn agentic loops or feeding large LaTeX documents into the prompt.

Selecting Models for Mathematical Reasoning

Not all models handle symbolic reasoning equally. You want architectures that expose chain-of-thought or extended thinking modes before delivering a final answer.

  • DeepSeek R1 671B MoE: Built for deep reasoning and complex coding. Its extended thinking process surfaces intermediate steps that you can parse, validate, or discard.
  • Kimi K2.6: Designed for advanced reasoning, agentic coding, and vision. The 131K context window accommodates lengthy problem descriptions with diagrams.
  • Qwen 3 32B: Strong multilingual reasoning and agent workflow support, useful when your math pipeline must process non-English source material.
  • GLM 5: A 744B MoE optimized for long-horizon agentic tasks, capable of sustained reasoning across many turns without losing track of constraints.

Oxlo.ai hosts all of these models behind a single OpenAI-compatible endpoint, so you can benchmark them against your specific problem set without rewriting client code.

Architecture Pattern: Reasoning Plus Verification

A production-grade math tool should never trust raw LLM output. The most robust pattern separates generation from verification.

  1. Decomposition: The model breaks the problem into sub-steps.
  2. Generation: Each step is produced with reasoning exposed.
  3. Verification: A symbolic engine or secondary model checks each step.
  4. Repair: If verification fails, the error is fed back into the context for correction.

This loop can run for dozens of iterations. With token-based providers, each round trip bills you for the full prompt history plus new reasoning tokens. Oxlo.ai’s request-based pricing charges one flat rate per API call regardless of how many tokens precede it, which makes iterative verification loops predictable and far more affordable at scale.

Implementation: A Python Tool-Use Example

The following example uses the OpenAI SDK pointed at Oxlo.ai to solve algebra problems with SymPy verification. Because Oxlo.ai is fully OpenAI SDK compatible, the only change from a standard OpenAI integration is the base URL and API key.

import os
from openai import OpenAI
import sympy
from sympy.parsing.latex import parse_latex

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ.get("OXLO_API_KEY")
)

def verify_expression(latex_expr):
    try:
        expr = parse_latex(latex_expr)
        simplified = sympy.simplify(expr)
        return str(simplified)
    except Exception as e:
        return f"ERROR: {e}"

tools = [
    {
        "type": "function",
        "function": {
            "name": "verify_expression",
            "description": "Simplify or verify a LaTeX math expression using SymPy",
            "parameters": {
                "type": "object",
                "properties": {
                    "latex_expr": {"type": "string"}
                },
                "required": ["latex_expr"]
            }
        }
    }
]

messages = [
    {"role": "system", "content": "You are a math assistant. Solve the problem step by step. When you produce a mathematical expression, call the verify_expression tool to confirm it is correct before continuing."},
    {"role": "user", "content": "Find the derivative of f(x) = x^3 + 2x^2 with respect to x."}
]

response = client.chat.completions.create(
    model="deepseek-r1-671b",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

print(response.choices[0].message)

In this pattern, the model generates a step, the tool validates it deterministically, and the conversation continues only if the symbolic check passes. If you extend this to formal theorem proving or lengthy geometric derivations, the context length grows quickly. On token-based platforms, that growth directly inflates your bill. Oxlo.ai eliminates that variable.

Managing Cost at Scale

Mathematics workloads are uniquely expensive for LLM infrastructure. A single advanced problem can require thousands of tokens in context, multiple function calls, and several correction passes. When pricing is tied to token volume, costs scale non-linearly with problem difficulty.

Oxlo.ai uses request-based pricing: one flat cost per API request regardless of prompt length. For long-context and agentic math pipelines, this can be 10-100x cheaper than token-based alternatives. There are no cold starts on popular models, so iterative solvers do not stall between turns. You can explore the exact pricing structure at https://oxlo.ai/pricing.

Putting It Together

Building reliable mathematics tools requires more than a capable model. You need a reasoning layer, a deterministic verification layer, and an infrastructure layer that does not punish you for long contexts or high iteration counts.

Oxlo.ai provides access to state-of-the-art reasoning models like DeepSeek R1, Kimi K2.6, and GLM 5 through a fully OpenAI-compatible API with flat per-request pricing. Whether you are prototyping a calculus tutor or deploying a formal verification agent, the combination of advanced reasoning models and predictable costs makes Oxlo.ai a genuinely relevant foundation for mathematical AI tools.

Top comments (0)