"Building an HSK Speaking Test AI: Real-time Tone Grading with Gemini

#ai #gemini #showdev #webdev

Building an HSK Speaking Test AI: Real-time Tone Grading with Gemini

I built a free Mandarin speaking assessment tool that grades tone + grammar in real time. Here's the engineering behind it.

The Problem

HSK (Chinese proficiency test) has a speaking component (HSKK), but most learners can't self-assess their level. Online tutors are expensive. Generic AI conversation tools don't grade tones.

So I built ToneTutor: a 3-minute spoken-HSK test that estimates your speaking level and identifies weak points.

The Tech Stack

Frontend:

Web Audio API (record user voice → PCM → LINEAR16)
React + TypeScript (real-time transcript display)

Backend:

FastAPI (Python) on Google Cloud Run
Gemini 2.5 Flash (real-time conversation + transcript grading)
Firestore (user sessions + results)

The Challenge:

Web Audio API records as WebM. Gemini expects LINEAR16 (WAV). iOS Safari doesn't support WebM. So:

Transcode WebM → PCM in browser (Web Audio context)
Send raw PCM bytes to backend
Backend wraps PCM in WAV header → sends to Gemini Speech-to-Text
Gemini analyzes transcript + provides HSK level estimate

The Grading Loop


python
async def grade_session(transcript: str):
    prompt = """
    Rate this Mandarin response on HSK 1-6 scale.
    Assess: tone accuracy, grammar, vocabulary range.
    Provide: level estimate + weak points.
    """
    response = await gemini.generate_content(prompt, stream=True)
    return parse_hsk_level(response)

Results

- 3-min test
- Real-time feedback
- Shareable HSK score card
- Free (limited sessions)

Open source coming soon. Built because I'm a native speaker + voice actor frustrated with generic tools.

Try it: tonetutor.tefusiang.com (free for 3 sessions)

Curious about the speech-to-text pipeline or tone grading logic? Ask below.