samiwise.app — live now
GMAT prep tutors charge $150–200 per hour. For a 3-month prep period, that's $5,000–10,000. Most people preparing for an MBA simply can't afford that — or can't find a good tutor available at 11pm when they finally have time to study.
So I built SamiWISE — a voice AI GMAT tutor that remembers every session, adapts to your weak spots, and explains material in real time using RAG over official GMAT materials. This is the story of how it was built, what I learned, and the technical decisions that made it work.
The three things no competitor combines: voice + memory + real GMAT content
The Core Problem I Was Solving
Every GMAT prep tool I looked at had the same fundamental issue: they start from scratch every single session. You explain your weak spots again. You get generic explanations that don't account for what confused you last Tuesday. There's no continuity.
A good human tutor doesn't do this. They remember that you always mess up Data Sufficiency with inequalities. They know that analogies work better for you than abstract explanations. They track your trajectory over weeks.
I wanted to build that — but accessible to everyone, available 24/7, at $49/month.
The Architecture
The system has four main layers:
User (voice)
→ Deepgram STT (~1s)
→ Orchestrator Agent — Groq llama-3.3-70b (~200ms routing)
→ Specialist Agent — Claude Sonnet + RAG from Pinecone (~3-5s)
→ ElevenLabs TTS (~1s)
→ User hears response
Total latency: 5–8 seconds. Not perfect, but feels natural — like a real tutor pausing to think.
The Agent System
The most interesting architectural decision was the multi-agent routing system.
Instead of one monolithic AI tutor, there are five specialist agents and an invisible orchestrator:
| Agent | Specialization |
|---|---|
quantitative |
Problem Solving + Data Sufficiency |
verbal |
Critical Reasoning + Reading Comprehension |
data_insights |
Table Analysis, MSR, Graphics Interpretation, TPA |
strategy |
Timing, exam psychology, study planning |
orchestrator |
Routes messages — user never sees this |
The orchestrator runs on Groq (llama-3.3-70b) because it needs to be fast — 200ms routing decisions. Specialist agents run on Claude Sonnet because they need to be smart.
Routing prompt returns structured JSON:
{
"route": "quantitative",
"confidence": 0.94,
"detected_topic": "data sufficiency with inequalities",
"difficulty": "hard",
"notes": "user has struggled with DS inequalities in past 3 sessions"
}
The user always hears the same voice — Sam. Transitions between agents are completely invisible.
Every other GMAT tool treats you like a stranger on every visit. Sam carries your entire learning history.
The Memory System — The Hard Part
This is where most AI tutors fail. Building long-term memory that actually improves tutoring quality took the most iteration.
After every session, a Memory Agent runs in the background. It reads the full session transcript and extracts a structured learner profile:
interface GmatLearnerProfile {
weak_topics: string[]
strong_topics: string[]
effective_techniques: string[] // what explanation styles worked
ineffective_approaches: string[] // what didn't land
insight_moments: string[] // "aha" phrases that clicked
common_error_patterns: string[] // e.g. "misreads DS question stem"
learning_style: string
next_session_plan: string
score_trajectory: string
time_pressure_notes: string
}
This profile gets stored in Supabase as a JSON field on the User model. At the start of every session, the full profile is injected into the specialist agent's system prompt.
The result: Sam says things like "Last week you struggled with probability in DS — let's approach this one differently than before" without you having to explain anything.
RAG — What I Indexed and Why
The knowledge base lives in Pinecone with 7 namespaces:
gmat-quant → Quantitative problems and methods
gmat-verbal → Verbal problems and methods
gmat-di → Data Insights problems
gmat-strategy → Strategies, timing, test psychology
gmat-focus → GMAT Focus Edition specific content
gmat-errors → Common error patterns
Free sources I used:
-
deepmind/aqua_rat— 97,467 GMAT/GRE algebra problems with rationales (Apache 2.0) -
allenai/math_qa— Math word problems with annotated formulas (Apache 2.0) -
mister-teddy/gmat-database— DS, PS, CR, SC questions in JSON (MIT) - ReClor paper — 17 CR question types with examples (research)
- Manhattan Review free PDFs — Strategy guides openly distributed
The RAG pipeline uses @xenova/transformers for embeddings (runs locally, no API cost) and retrieves top-5 chunks with reranking before passing to the specialist agent.
The Tech Stack
Frontend: Next.js 14 + TypeScript + Tailwind CSS
Auth: Supabase Auth
Database: Supabase PostgreSQL + Prisma 6
Vector DB: Pinecone (7 namespaces)
LLM Router: Groq (llama-3.3-70b) — fast, cheap
LLM Agents: Anthropic Claude Sonnet — smart, consistent
STT: Deepgram (Whisper)
TTS: ElevenLabs
Memory: Custom Memory Agent → Supabase JSON
Payments: Paddle (Merchant of Record, handles US tax)
Deploy: Vercel (frontend) + Railway (agents backend)
Why split Vercel + Railway?
Vercel has an 800-second serverless function limit. A 30-minute voice tutoring session would time out. Railway runs persistent containers — no limits, no cold starts for agents.
Practice Mode with FSRS
Beyond voice sessions, I built a visual practice mode where users can work through GMAT questions in exam format.
The interesting part: I implemented FSRS (Free Spaced Repetition Scheduler) — the same algorithm used by Anki. After each answer, the system records:
- Was it correct?
- How long did it take?
- What was the difficulty?
Then it schedules the next review using an exponential forgetting curve. Questions you answered wrong come back sooner. Questions you mastered disappear for weeks.
This means the practice queue automatically prioritizes your weak spots without you having to manage anything.
The Study Journal
Every session automatically updates a daily journal entry:
interface StudyJournalEntry {
date: Date
totalMinutes: number
questionsTotal: number
accuracy: number
topicsCovered: string[]
errorTypes: Record<string, number>
samInsight: string // AI-generated daily summary
milestones: string[] // "100 questions solved", "5 hour week"
streakDay: number
}
The streak counter turned out to be unexpectedly powerful for retention — users don't want to break their streak. Same psychology as Duolingo, but for GMAT prep.
What Doesn't Work Yet
Being honest about where things stand:
- Voice pipeline not live yet — Deepgram + ElevenLabs keys configured but need production testing with real users
- RAG not indexed — scripts are ready, Pinecone account set up, but haven't pushed the data yet
- No real users — launching next week, zero feedback so far
The architecture is built. The UI works. The agents respond correctly. Next step is connecting all the APIs and getting real people to use it.
Lessons Learned
1. Multi-agent routing is worth the complexity.
A single "GMAT tutor" prompt produces mediocre results across all topics. Specialist agents with deep domain prompts are significantly better. The routing overhead is minimal.
2. Memory quality matters more than memory quantity.
I originally tried to store everything — full transcripts, every message. The prompts became too long and performance degraded. The Memory Agent that extracts structured insights (not raw content) works much better.
3. Split your infrastructure early.
I almost deployed everything to Vercel. The 800-second limit would have killed voice sessions. Railway for long-running processes saved the architecture.
4. Free datasets are better than I expected.
The deepmind/aqua_rat dataset has 97,000 high-quality GMAT-style problems with step-by-step rationales. Apache 2.0 license. This single dataset provides more practice material than most paid prep courses.
5. Paddle for payments if you're targeting the US market.
They handle sales tax across all 50 states automatically. As a Merchant of Record, they handle chargebacks and disputes. The 5% + $0.50 fee is worth it for the peace of mind.
What's Next
- Get first 10 beta users from r/GMAT and GMAT Club
- Connect production APIs (Deepgram, ElevenLabs, Pinecone)
- Run the RAG indexing scripts
- Collect feedback on voice experience quality
If you're interested in trying it or have feedback on the architecture, I'd love to hear from you. The product is live at samiwise.app — 7-day free trial, no credit card required.
Built with Next.js, Claude Sonnet, Groq, Pinecone, Deepgram, ElevenLabs, Supabase, Paddle, and Railway. Full stack TypeScript.



Top comments (0)