DEV Community

Sarah Lindauer for Stream

Posted on • Originally published at getstream.io

Build an AI Math & Physics Agent with DeepSeek v3.2

DeepSeek recently released a powerful new model, DeepSeek-V3.2, that's now instantly accessible via OpenRouter. In under 5 minutes, you can turn it into a real-time, voice-enabled math and physics agent that not only solves problems but also explains its reasoning out loud.

DeepSeek's latest open-source reasoning and agent-AI model, V3.2, leverages the new DeepSeek Sparse Attention (DSA). This fine-grained mechanism boosts training and inference efficiency for long-context scenarios while delivering output quality on par with DeepSeek-V3.1-Terminus. It's gaining momentum for its superior reasoning and agentic tool-use capabilities.

In this demo, the agent solves a vector addition problem using the law of cosines, calculates the magnitude (≈15.26 units), and then verbally explains every step when asked "How did you get that?" — all in natural conversation.

Here's exactly how to build the same agent yourself.

What You'll Build

Outline of the build for the AI math and physics agent with DeepSeek v3.2

  • A real-time voice AI tutor specialized in Math & Physics

  • Powered by DeepSeek-V3.2 (via OpenRouter)

  • Text-to-speech and Speech-to-Text→ ElevenLabs

  • Real-time audio/video transport → Stream

  • Turn detection → Smart-Turn

  • Built with the open-source Vision Agents framework

Requirements (API Keys)

You'll need API keys from:

  • Stream (for video calls & WebRTC)

  • OpenRouter (hosts DeepSeek-V3.2)

  • ElevenLabs (STT component to speak to the agent and TTS to hear from the agent)

Step 1: Set Up the Project

#  Create  a  new  project  with  uv  (highly  recommended)  or  pip
uv  init  deepseek-physics-tutor
cd  deepseek-physics-tutor

#  Install  Vision  Agents  +  required  plugins
uv  add  vision-agents
uv  add  "vision-agents[getstream,  openrouter,  elevenlabs,  smart-turn]"
Enter fullscreen mode Exit fullscreen mode

Step 2: Full Working Code

Replace the content of the uv project's main.py with this:

"""
DeepSeek V3.2 Maths and Physics Tutor

This example demonstrates how to use the DeepSeek V3.2 model with the OpenRouter plugin with a Vision Agent.

OpenRouter provides access to multiple LLM providers through a unified API. The DeepSeek V3.2 model is a powerful LLM that is able to solve Maths and Physics problems based on what the user shows you through their camera feed.

Set OPENROUTER_API_KEY environment variables before running.
"""

import  asyncio
import  logging

from  dotenv  import  load_dotenv

from  vision_agents.core  import  User,  Agent,  cli
from  vision_agents.core.agents  import  AgentLauncher
from  vision_agents.plugins  import  (
    openrouter,
    getstream,
    elevenlabs,
    smart_turn,
)

logger  =  logging.getLogger(__name__)

load_dotenv()

async  def  create_agent(**kwargs)  ->  Agent:
    """Create the agent with OpenRouter LLM."""
    #model = "deepseek/deepseek-v3.2"  # Can also use other models like anthropic/claude-3-opus/gemini
    model  =  "deepseek/deepseek-v3.2-speciale"

    # Determine personality based on model
    if  "deepseek"  in  model.lower():
        personality  =  "Talk like a Maths and Physics tutor."
    elif  "anthropic"  in  model.lower():
        personality  =  "Talk like a robot."
    elif  "openai"  in  model.lower()  or  "gpt"  in  model.lower():
        personality  =  "Talk like a pirate."
    elif  "gemini"  in  model.lower():
        personality  =  "Talk like a cowboy."
    elif  "x-ai"  in  model.lower():
        personality  =  "Talk like a 1920s Chicago mobster."
    else:
        personality  =  "Talk casually."

    agent  =  Agent(
        edge=getstream.Edge(),
        agent_user=User(name="OpenRouter AI",  id="agent"),
        instructions=f"""
        You are an expert in Maths and Physics. You help users solve Maths and Physics problems based on what they show you through their camera feed. Always provide concise and clear instructions, and explain the step-by-step process to the user so they can understand how you arrive at the final answer.  
        {personality}
        """,
        llm=openrouter.LLM(model=model),
        tts=elevenlabs.TTS(),
        stt=elevenlabs.STT(),
        turn_detection=smart_turn.TurnDetection(
            pre_speech_buffer_ms=2000,  speech_probability_threshold=0.9
        ),
    )

    return  agent

async  def  join_call(agent:  Agent,  call_type:  str,  call_id:  str,  **kwargs)  ->  None:
    """Join the call and start the agent."""
    # Ensure the agent user is created
    await  agent.create_user()
    # Create a call
    call  =  await  agent.create_call(call_type,  call_id)

    logger.info("🤖 Starting OpenRouter Agent...")

    # Have the agent join the call/room
    with  await  agent.join(call):
        logger.info("Joining call")
        logger.info("LLM ready")

        # Open demo page for the user to join the call
        await  agent.edge.open_demo(call)

        # Wait until the call ends (don't terminate early)
        await  agent.finish()

if  __name__  ==  "__main__":

    cli(AgentLauncher(create_agent=create_agent,  join_call=join_call))
Enter fullscreen mode Exit fullscreen mode

Step 3: Run It

Run the following commands in your Terminal to store the API credentials in your working environment.

export  OPENROUTER_API_KEY=sk-...
export  ELEVENLABS_API_KEY=...
export  STREAM_API_KEY=...
export  STREAM_API_SECRET=...
export  EXAMPLE_BASE_URL=https://pronto-staging.getstream.io
Enter fullscreen mode Exit fullscreen mode

Lastly, execute the Python script with uv run main.py.

A browser tab will open automatically with a simple video call interface, which automatically connects you to the voice agent. You can now enable your microphone and start talking!

Example interaction from the demo:

You: "Two forces of 8 N and 13 N act on an object at an angle of 25 degrees to each other. What is the magnitude of the resultant force?"
Agent: "The magnitude of the resultant force is approximately 15.26 units."
You: "How did you get that?"
Agent: "I used the law of cosines. First, the angle between the vectors is 180° - 25° = 155°. Then..."

Why We Love This Combination  

This stack is powerful yet simple: Vision Agents handles all the heavy lifting, including turn detection, real-time streaming, interruptions, and orchestration with industry-leading voice AI services and LLMs.

OpenRouter gives you instant, no-wait-list access to the latest DeepSeek-V3.2 models (and hundreds more) with unified billing and routing.

Stream delivers mature WebRTC infrastructure that keeps end-to-end latency under 500 ms even on consumer connections, and the entire agent (except the API calls) is fully open-source and runs locally on your machine. 

Links & Resources

Give it a try and play around with the type of tutor: calculus, mechanics, electricity, or something completely different. 🤓

Happy coding!

Top comments (0)