DEV Community

Cover image for Khanmigo Was 'a Non-Event.' What's Next for AI Tutors
Max Quimby
Max Quimby

Posted on • Originally published at agentconn.com

Khanmigo Was 'a Non-Event.' What's Next for AI Tutors

In April 2026, Sal Khan sat down with Chalkbeat and said the quiet part out loud: for most students, Khanmigo "was a non-event." Two and a half years after his TED talk promising that "every child will have an AI tutor that is infinitely patient and infinitely knowledgeable," the founder of the most distribution-ready AI tutor in the United States โ€” 700K+ users, Microsoft-subsidized, free for teachers, integrated with the Khan Academy library โ€” admitted that the students who had access "just didn't use it much."

๐Ÿ“– Read the full version with diagrams and embedded sources on AgentConn โ†’

Chalkbeat April 2026 headline โ€” Sal Khan reflects on AI in schools and Khanmigo, where the Khan Academy founder admits that for most students the chatbot tutor was a non-event

Read the Chalkbeat interview โ†’

That admission did not arrive in a vacuum. Dan Meyer published RIP Khanmigo & Edtech Industry Dreams of AI Tutors a few days later and turned the disappointment into an obituary for an entire product category. Quizlet had already shut down Q-Chat in June 2025 โ€” the first ChatGPT-built tutor at a major edtech brand killed inside two years because the per-user inference costs ate the margins. Stanford's CEPA documented a 60% engagement drop after three weeks without teacher facilitation. Khanmigo's own Common Sense Media review is generous; the IBL News audit of its math is not โ€” the chatbot insisted 6 ร— 2 wasn't 12, marked 10,332 รท 4 wrong three times before agreeing, and miscalculated 343 โˆ’ 17.

Dan Meyer Substack โ€” RIP Khanmigo and Edtech Industry Dreams of AI Tutors, April 2026, the most-cited skeptic case in the field

Read 'RIP Khanmigo' on Substack โ†’

And yet the AI-tutoring market is still ballooning. MagicSchool closed a $45M Series B in February 2025 on a base of 6M educators. Brisk Teaching raised $15M Series A in March 2025 with 1M educators and 20% of US K-12 teachers running its Chrome extension. SchoolAI raised a $25M Series A in April 2025. Flint โ€” Claude 4.5 Sonnet-powered, 500K users โ€” raised $15M Series A in November 2025. Synthesis Tutor crossed $10M revenue in 2025 at 4.5ร— year-over-year growth. The field has $2.3B in revenue, $4.2B in 2025 venture capital, and 2,800+ AI-education startups.

Read together, this is not a story of AI tutoring failing. It is a story of AI tutoring bifurcating. The teacher-facing layer won 2024โ€“25. The student-facing layer โ€” the one Sal Khan was talking about โ€” is the open question.

The teacher tools won. The student tools stalled.

The pattern is striking once you draw the line. Every name in the "won" column sells, primarily, to the adult in the room. MagicSchool's pitch is lesson plans, rubrics, IEP scaffolds โ€” 80-something teacher productivity tools, and a generic student-facing chatbot bolted on the side as "Tutor Me with AI." Brisk is a Chrome extension that lives in the teacher's Docs and Gmail. SchoolAI's differentiator is real-time monitoring of student chats by the teacher. These are not student-tutoring products. They are teacher-orchestration products that include a student surface as a procurement justification.

The companies that did try to build the real thing โ€” a student-facing AI tutor without a human in the loop โ€” have had a much harder year. Khanmigo's engagement numbers shipped with the asterisk that 95% of study participants had to be excluded for the strongest results to appear, what Dan Meyer called the "5 Percent Problem." Q-Chat is dead. Khanmigo had to make itself auto-activate without student invitation in a 2026 product overhaul because, in Khan's own words, "students were not seeking out Khanmigo's help as much as we had hoped."

Hacker News thread on Microsoft Khan Academy partnership to make Khanmigo free for teachers โ€” 134 points, 74 comments, with the top comments framing the unit-economics problem and the math-hallucination concerns

View on Hacker News โ†’

The structural reasons are knowable, and the post-Khanmigo product has to address each one.

The Socratic paradox. Every product worth its salt now promises "doesn't just give answers." The strongest user complaint, especially for K-5, is exactly that โ€” kids who want help get questions. Synthesis succeeds because its game loop disguises the Socratic structure. Khanmigo struggles because its UI is naked Socratic dialogue.

Hallucination in math is documented, real, and unsolved. Khanmigo accepted 272 โˆ’ 172 = 430. It failed at division. It couldn't reliably compute square roots. This is GPT-4 base behavior. Flint runs Claude 4.5 Sonnet to soften this; Grokkoli refuses generative AI entirely and uses a proprietary adaptive engine; Super Teacher โ€” a Disrupt 2025 Battlefield finalist โ€” explicitly avoids LLMs for content generation. These are reactions to the same problem.

The engagement cliff. Stanford CEPA's 60% drop after three weeks of unfacilitated use is not a Khanmigo problem; it is the median outcome for chat-only student tutors. Synthesis's Trustpilot reviews carry the same shape โ€” kids enthusiastic for the first two months, "ran out of content" by month three, parents canceling.

The COPPA reckoning. The FTC's June 2025 final rule โ€” voiceprints reclassified as personal information, separate parental consent required for AI training, indefinite retention banned โ€” came into compliance force on April 22, 2026. The FTC's September 2025 inquiry into seven major AI companies, Character.AI's wrongful-death lawsuit, Buddy.ai's retrofit after the CARU compliance action โ€” the legal exposure for "kid voice" is now real, and most products were not designed for it.

The unit-economics graveyard. Q-Chat shut down because the per-user generative-tutor economics did not work at Quizlet's price. Khanmigo runs at $4/month for learners only because Microsoft funds the compute. Anyone hand-waving about per-token costs in a children-facing voice product is whistling past Q-Chat's grave.

Hacker News thread Everyone is cheating their way through college โ€” 118 points discussion on AI use in education, the unsolved tension between AI as tutor versus AI as homework-bypass machine

View on Hacker News โ†’

The field, as of May 2026

Here's the working map. Twelve names that matter, sorted by where they have actually won.

Audience Winners Stalled / contrarian Notes
Teacher-facing MagicSchool (6M), Brisk (1M, 20% of US K-12), SchoolAI ($25M A), Flint ($15M A, Claude 4.5) โ€” The wins of 2024โ€“25. Adults pay. Adults adopt.
K-5 student math Synthesis ($10M rev, 4.5ร— YoY) Khanmigo (Sal Khan: "non-event"), Grokkoli (non-LLM, contrarian) Synthesis's game loop is the working pattern.
K-12 broad subject (no clear winner) Khanmigo, MagicSchool Tutor, SchoolAI The wide-open slot.
Language voice (adult) Speak ($1B, Series C $78M), Duolingo Max Praktika Voice "works" for adults at depth comparable to a 30-second human chat.
K-2 voice (ESL) Buddy.ai (20M users, kidSAFE+COPPA) โ€” One of the few products that designed for COPPA from day one.
Discontinued โ€” Q-Chat (June 2025) Unit economics.
Higher ed Claude for Education, ChatGPT Edu โ€” Different motion. No K-12 guardrails.
"AI" school operations Alpha School / 2-Hour Learning โ€” Adaptive bundle + 5:1 guides. Founders themselves admit it isn't generative AI.

Field map of AI tutoring agents May 2026: teacher-facing winners MagicSchool Brisk SchoolAI Flint, K-5 math with Synthesis winning while Khanmigo stalls and Grokkoli takes the non-LLM contrarian path, K-12 broad subject wide open with MyTutor making the bet, adult voice Speak Duolingo Max winning, kid voice Buddy.ai holding kidSAFE plus COPPA, Q-Chat discontinued

Two observations from this matrix.

First, the column labeled "K-12 broad subject" has no clear winner. That's the slot Khanmigo was supposed to own. It now sits as the most expensive, most-watched empty seat in edtech.

Second, the products that grow are the ones that picked a narrow lane. Synthesis is K-5 math. Buddy.ai is voice ESL for ages 3โ€“8. Speak is adult language. Flint is school-channel K-12. The general-purpose Socratic chatbot โ€” the thing Q-Chat tried and Khanmigo built โ€” is the lane that hasn't worked.

Hacker News thread for Synthesis Tutor โ€” Math tutor for children, 106 points, 68 comments, with parent reviews and the early game-loop product positioning that has since scaled to $10M revenue

View on Hacker News โ†’

What the post-Khanmigo product looks like

If you accept the diagnosis, the structural requirements for the next student-facing tutor are not subtle. They map almost one-to-one onto the failure modes above.

  1. Multi-agent orchestration. Single-prompt chatbots are not good enough. You need a planner that decides what to teach next, and a separate executor that actually teaches it. The planner watches mastery, picks the next standard, calls the right tool. The executor does the conversation. Anthropic's multi-agent work has been Claude Code-focused; the same architectural shape is what AI tutoring has been missing. (For more on agent-orchestration patterns we cover regularly, see our field map of orchestration tools.)

  2. Mastery telemetry per standard. Probabilistic. Calibrated. Surfaced to the parent. Bayesian Knowledge Tracing and IRT have been in academic ITS literature for two decades; the shipping question is whether the parent can read a CCSS-aligned mastery card with confidence intervals on Tuesday morning. Most products don't even try.

  3. Kid-native UX with persona depth. Khanmigo's failure was partly that the interface looks like a help desk. Synthesis's success is partly that it looks like a game. K-2 children need an emoji-first surface; high schoolers need a serious one; both have to live in the same product or the lifetime value collapses.

  4. Real voice, real-time, COPPA-compliant. Not TTS-and-STT. Not asynchronous reply. WebSocket audio streaming, sub-300ms latency, turn-lock barge-in so the child can interrupt without freezing the agent, kid-tuned automatic speech recognition (Buddy.ai's 25K-hour BSR corpus is the gold standard here), and a documented two-step verifiable parental consent flow with IP/UA tracking. After April 22, 2026, this is not a feature โ€” it is a license to operate.

  5. Standards-aligned content at scale, not "type your own question." The dirty secret of generic AI tutors is that the lesson is whatever the kid happens to type into the box. Real tutoring is structured. The CCSS cluster they need to work on this week is something a system should know, and it should generate the explainer and the practice. Auto-generated 8โ€“10 minute standards-aligned tutorial videos delivered daily โ€” there is no shipping competitor doing this at K-12 scale that we have been able to find.

  6. Interactive practice that isn't a multiple-choice question bank. Generated React-component practice artifacts paired to each concept. Synthesis's games are hand-authored. Brilliant's interactives are hand-authored. Generating these alongside the video, per topic, every day, is a category that doesn't really have a category yet.

MyTutor dual-agent architecture: student interacts via WebSocket voice with turn-lock barge-in, the Tutor agent runs the conversation while the Strategist agent watches Bayesian IRT mastery per CCSS standard, a daily-cron NotebookLM pipeline generates 8 to 10 minute videos and Claude Design generates JSX practice artifacts, parent dashboard receives mastery telemetry, all under COPPA two-step IP and UA tracked consent

MyTutor as the example

MyTutor (youraitutors.com) is one of the products making that bet, end to end. We are flagging it here not as a pitch but as an existence proof that each of the six requirements above is shippable today.

The architecture is a dual-agent Claude orchestration โ€” a Strategist agent that watches mastery and plans the next session, and a Tutor agent that runs the actual conversation. That split is genuinely rare in shipping tutoring products; even Flint, which runs Claude 4.5 Sonnet end-to-end, appears to be a single-agent loop. The Strategist is the thing that catches the executor about to make 6 ร— 2 = 13, before it reaches the student. (For the broader argument about why agents in production keep failing on reliability when single-loop, see our agents-fail-real-jobs reliability brief.)

Mastery is Bayesian IRT per CCSS standard, with the calibration surfaced to the parent dashboard. Daily quests, weekly digests, and struggle alerts go to the parent under a COPPA-compliant disclosure model. The parent dashboard is the engagement layer Khanmigo never built.

Content runs on a daily-cron NotebookLM pipeline: a CCSS cluster comes in, an 8โ€“10 minute standards-aligned explainer video comes out, the watermark is stripped, and the next morning students see a fresh tutorial for the standard they are working on. Practice ships alongside the video as a Claude Design-generated JSX interactive artifact โ€” a React component, not a PDF. Same standard. Same morning.

Voice is real-time WebSocket with turn-lock interruption โ€” the architecture Buddy.ai uses for its 20M ESL users, applied to the broad K-12 subject span. The consent flow is the two-step verifiable parental model with IP/UA tracking, the kind that ages from "tax" into "moat" the week after April 22, 2026.

The persona system is Max, Dr. Sage, Coach Ace โ€” multiple distinct tutor characters with a paper-studio aesthetic that bridges a K-2 emoji UI through grade-12 depth. Gamification is streaks, daily quests, chests, room customization, and a 1v1 math battle arena that โ€” as far as we can tell โ€” only Edzy in India CBSE is shipping anything like, and not at US K-12 scale.

Map each of those back to the structural requirements list. Multi-agent orchestration: Strategist + Tutor. Mastery telemetry: IRT per CCSS. Kid-native UX: K-2 emoji UI + paper-studio + personas. Real voice + consent: WebSocket + COPPA two-step. Standards content at scale: daily-cron NBM. Interactive practice: Claude Design JSX artifacts. The list is filled. Whether MyTutor ships at the quality bar the architecture implies is the only question that matters, and it is a product question, not an architectural one.

At synthesischool on X โ€” Synthesis Tutor crossed $10M revenue in 2025 with 4.5x year-over-year growth, the working pattern for K-5 math AI tutoring that game-loop-driven products are building on

View original post on X โ†’

Where MyTutor doesn't stand out โ€” honest version

Three things, named directly.

Brand trust at scale. Khan Academy has hundreds of millions of historical learners. MagicSchool has 6M teachers. MyTutor is new. That is a sales problem, not a product problem, but it is real and it doesn't fix itself.

District procurement channel. SchoolAI, Edia, Smartschool, and Flint own the school-purchase lane. MyTutor's parent-pay positioning is a different motion, and worth being explicit about. The teacher-tools winners of 2024โ€“25 won that lane on purpose.

The non-LLM contrarian case. Grokkoli explicitly avoids generative AI for K-5 math. Super Teacher avoids LLMs for content generation. The hypothesis behind those companies is that no amount of orchestration fully solves the hallucination problem at K-5 arithmetic, and the safer engineering bet is to use deterministic adaptive systems. If they are right, MyTutor's Strategist-catches-the-Tutor architecture has to actually work, not just exist in the diagram. That is the technical risk it doesn't get to wave away.

Hacker News thread for Sal Khan โ€” The amazing AI super tutor for students and teachers TED video, 46 points, 42 comments, the original 2023 optimism case that the 2026 walkback is now responding to

View on Hacker News โ†’

The wide-open slot after the first wave

The honest read of AI tutoring in May 2026 is two stories. The teacher-facing story is a quiet success โ€” MagicSchool, Brisk, SchoolAI are real and growing and saving real teacher hours. The student-facing story is a humbling. The category-defining product publicly admitted it didn't matter for most users, and the field has spent the spring of 2026 metabolizing that.

But "Khanmigo failed" is the wrong lesson. The right lesson is that Khanmigo was the first attempt at a hard problem, and the first attempt was a single-LLM chatbot wrapped in a great content library. The second wave โ€” multi-agent, mastery-telemetric, voice-native, standards-aligned, COPPA-clean, kid-native โ€” has not yet picked a winner. The slot is open precisely because the first wave is so visibly empty.

For founders, that is the most interesting state a market can be in. For parents, it is a reason to wait a beat before assuming any of this is a finished product. For schools, it is a reason to keep the teacher in the loop until the student-facing layer earns the absence of one. And for the field, it is the moment after the first wave that picks favorites.

The Khanmigo reckoning is not the end of the story. It is the part where the second wave gets to start.


Related reading on AgentConn: research-agent comparison 2026, best AI agent orchestration tools 2026, agents-fail-real-jobs reliability brief.


๐Ÿ”— Full article with diagrams on AgentConn โ†’ | Follow @ComputeLeapAI

Originally published at AgentConn.

Top comments (0)