DEV Community

john lee
john lee

Posted on

"Building an HSK Speaking Test AI: Real-time Tone Grading with Gemini

Building an HSK Speaking Test AI: Real-time Tone Grading with Gemini

I built a free Mandarin speaking assessment tool that grades tone + grammar in real time. Here's the engineering behind it.

The Problem

HSK (Chinese proficiency test) has a speaking component (HSKK), but most learners can't self-assess their level. Online tutors are expensive. Generic AI conversation tools don't grade tones.

So I built ToneTutor: a 3-minute spoken-HSK test that estimates your speaking level and identifies weak points.

The Tech Stack

Frontend:

  • Web Audio API (record user voice β†’ PCM β†’ LINEAR16)
  • React + TypeScript (real-time transcript display)

Backend:

  • FastAPI (Python) on Google Cloud Run
  • Gemini 2.5 Flash (real-time conversation + transcript grading)
  • Firestore (user sessions + results)

The Challenge:

Web Audio API records as WebM. Gemini expects LINEAR16 (WAV). iOS Safari doesn't support WebM. So:

  1. Transcode WebM β†’ PCM in browser (Web Audio context)
  2. Send raw PCM bytes to backend
  3. Backend wraps PCM in WAV header β†’ sends to Gemini Speech-to-Text
  4. Gemini analyzes transcript + provides HSK level estimate

The Grading Loop


python
async def grade_session(transcript: str):
    prompt = """
    Rate this Mandarin response on HSK 1-6 scale.
    Assess: tone accuracy, grammar, vocabulary range.
    Provide: level estimate + weak points.
    """
    response = await gemini.generate_content(prompt, stream=True)
    return parse_hsk_level(response)

Results

- 3-min test
- Real-time feedback
- Shareable HSK score card
- Free (limited sessions)

Open source coming soon. Built because I'm a native speaker + voice actor frustrated with generic tools.

Try it: tonetutor.tefusiang.com (free for 3 sessions)

Curious about the speech-to-text pipeline or tone grading logic? Ask below.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)