Making an AI tutor feel like it remembers you — the user-profile layer behind Elispeak

#ai #machinelearning #startup #productivity

Making an AI tutor feel like it remembers you — the user-profile layer behind Elispeak

I build Elispeak — an AI English speaking coach. Most of the interesting product work is not the voice pipeline or the scoring rubric. It's the user-profile layer that sits between a user's sessions and the next conversation Eli (the tutor persona) opens with.

Without it, every session starts with the generic "What would you like to practice today?" With it, Eli opens with something like: "Last time you wanted to sound less stiff in standups — still that, or do you want to prep for Friday's interview instead?"

That one sentence changes retention more than any other single thing we shipped. Here's how the layer actually works.

The problem

LLM apps default to two broken modes:

Stateless. Every session starts from zero. The user has to re-explain who they are, what their level is, what they're practicing for. That friction kills daily-use intent on week two.
Full transcript memory. Shove every past message into context. Expensive, slow, leaks old topics into new ones ("you mentioned your mom's surgery three weeks ago — how is she?" when the user just wanted to practice a TOEFL prompt).

What we actually want is somewhere between these two: a compact, structured model of the user that survives across sessions without dragging raw conversation history forward.

What the profile stores

The profile is a JSON-shaped record per user, updated after every session — not during. A few fields that carry weight:

type UserProfile = {
  goals: Goal[];                // "TOEFL in May", "sound natural in standups"
  level: { speaking: CEFR; writing: CEFR; listening: CEFR };
  weaknesses: Weakness[];       // "articles", "past perfect", "th sounds"
  strengths: string[];          // short, positive; used for tone, not praise
  interests: string[];          // "football", "indie dev", "sci-fi"
  recentTopics: Topic[];        // last ~10, with timestamps + summaries
  styleSignals: {               // helps Eli pace/tone replies
    wantsCorrection: "immediate" | "end-of-turn" | "summary-only";
    preferredPace: "slow" | "normal" | "fast";
    emotionalRegister: "direct" | "warm" | "playful";
  };
  openLoops: OpenLoop[];        // things user said they wanted to come back to
  lastSessionAt: Timestamp;
  sessionCount: number;
};

Nothing here is free-form prose. Everything is a bounded enum or a short tagged string. That constraint is the whole point — it's what lets the layer stay cheap to read and safe to pass into a prompt.

How it gets populated

Two paths:

1. Explicit onboarding. The first few sessions ask the user a small number of low-friction questions — "what's the closest thing to why you're practicing?" with 4 options, not a text box. These seed goals, level, and styleSignals.emotionalRegister.

2. Post-session enrichment. This is the interesting part. After a session ends, a second, slower model pass runs on the transcript and answers a short, fixed set of questions:

Did the user mention any new goal, deadline, or context we don't have?
Which grammatical/phonetic weaknesses showed up at least twice?
Did the user ask to come back to anything later?
Did the user's preferred correction cadence shift in this session?

The output of this pass is a structured diff, not a rewrite. Something like:

{
  "addWeaknesses": ["conditional-3rd"],
  "addOpenLoop": { "topic": "salary negotiation", "context": "promo prep" },
  "reinforce": { "goal": "interview prep", "confidence": 0.8 }
}

The diff is applied to the profile with simple merge rules (cap recentTopics at 10, cap openLoops at 5, decay confidence on older items). Keeping this as a diff — not a full overwrite — is what keeps the profile stable. One weird session doesn't erase four weeks of accumulated knowledge about the user.

How recommendations use it

When the user opens the app, we don't show a flat list of prompts. We compute a small ranked set.

Roughly:

function rankTopics(profile: UserProfile, pool: Topic[]): Topic[] {
  return pool
    .map((t) => ({
      topic: t,
      score:
        goalAlignment(t, profile.goals) * 0.45 +
        weaknessHit(t, profile.weaknesses) * 0.25 +
        interestHit(t, profile.interests) * 0.15 +
        noveltyAgainst(t, profile.recentTopics) * 0.15,
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, 5);
}

The weights are not magic. They came from watching early users either pick the first card or bounce. Three things moved the needle more than tuning the weights:

Novelty penalty against recentTopics. If the user practiced "interview: tell me about yourself" two sessions ago, don't put it first again. This was the single biggest retention move. Users reading the same top card twice don't feel "understood," they feel "lazy AI."
Open-loop surfacing. If the user said "I want to come back to negotiating salary," show that as its own explicit card with the phrase they used. This makes the continuity feel real because the language is theirs, not a paraphrase.
Goal recency decay. Goals aren't permanent. A TOEFL goal with a May date should rank near 1.0 in April and near 0.2 in July. Hard decay beats soft decay here — users notice when stale goals hang around.

How Eli opens a session

This is where the profile stops being a data structure and starts being a feeling.

The opening line is generated by a small prompt that receives the minimum useful slice of the profile — not the whole thing. Something like:

user's top goal: {top_goal}
most recent open loop: {top_open_loop.topic}
last session ended: {days_ago}d ago
preferred register: {emotional_register}

That's it. No transcripts. No list of weaknesses. No confidence scores. The LLM isn't asked to decide what matters; the profile ranking already did that. The LLM is only asked to say one natural-sounding sentence that threads those three or four facts together.

Two rules the opening line has to follow:

Never invent continuity. If there's no recent open loop, don't fake one. "Last time you wanted X" is the fastest way to destroy trust if the user didn't actually say X. When in doubt, ask.
Match the user's register. A user who set emotionalRegister: "direct" gets "Interview prep or something else?" A user with "warm" gets "Hey — want to pick up the interview prep, or reset?" Same information, different tone. This is the cheapest personalization we have.

The privacy line we don't cross

The profile is structured, bounded, and summary-only. Full transcripts are not stored beyond the session's scoring pipeline. That's not just a privacy stance — it's an engineering one. If we kept transcripts, the profile layer would drift toward "shove raw text into context" and we'd be back to the expensive, leaky mode we were avoiding.

The rule we follow internally: if a field can't be expressed as a bounded schema entry, it doesn't belong in the profile. A user saying "I'm nervous about my green card interview next Thursday" becomes { goal: "immigration-interview-prep", deadline: "2026-05-08", register: "warm" } — not a stored quote.

What I'd tell someone building the same thing

Four things in order of how much time they saved us:

Update the profile after the session, not during. Trying to update live made every turn slower and introduced race conditions between the scoring pass and the conversation turn. A slow async pass post-session is fine — the user won't feel it.
Diffs over rewrites. Always. One bad session should never clobber the profile.
Bound every field. Enums, capped arrays, tagged strings. Free-form prose in a profile is technical debt that compounds every session.
Pass the minimum slice to the opener, not the whole profile. Let the ranker decide what matters. The LLM gets four lines of context, not forty.

Once those four are in place, the "feels like Eli knows me" property shows up almost for free. Users describe it as "the AI remembers me" even though technically nothing from last week's transcript is in this week's prompt.

That gap — between what's actually in context and what the user feels — is where the product lives.

Try it

The free tier is enough to see whether the personalized cold-open lands for you. For paid plans, the launch promo ELISPEAK50 gets you 50% off any plan (no minimum).

🔗 Try Elispeak