DEV Community

Cover image for I Built an Interactive Learning Engine Inside an AI Chat App — Here's Every Technical Decision
Asad marcus for kirodotdev

Posted on

I Built an Interactive Learning Engine Inside an AI Chat App — Here's Every Technical Decision

There's a problem that every AI chat app eventually runs into that nobody really talks about openly: text is a terrible medium for teaching certain things. Not because the AI gives bad answers, but because the format itself gets in the way.

Ask an AI to explain how a sine wave changes with frequency and it'll give you a mathematically correct, well-worded paragraph. You'll read it, nod, and still not feel what happens when frequency doubles. You need to see the wave. You need to drag a slider and watch it compress in real time. You need the concept to be live in front of you.

I built AimiChat as my AI web app, and this problem was bothering me enough that I spent a few weeks building a full interactive learning visualization engine directly into the chat interface. The AI doesn't just answer questions anymore. It generates live, interactive widgets embedded inside its responses. Adjustable math graphs, step-through code walkthroughs, physics simulations, inline quizzes, the whole thing.

I built this using Kiro as my AI IDE, and honestly the experience taught me a lot about what that kind of tooling is actually good for at a deeper level. This post covers the full technical architecture, the specific problems I ran into and how I solved them, and honest detail about where Kiro genuinely changed how I worked versus where I still had to drive things myself.

If you're building anything with streaming AI responses, interactive DOM components, or you're just curious about Kiro and what it actually means to use an agentic IDE in practice, I hope this is useful.


Understanding the Problem Space First

Before getting into implementation, it's worth understanding why this is hard. The difficulty isn't in building a graph renderer. Canvas graphs are straightforward. The difficulty is the intersection of three things happening at the same time.

The AI returns text. Every AI model, regardless of provider, returns a stream of tokens that your frontend assembles into a string. Your markdown parser turns that string into HTML. There's no native concept of "and also render a live React component here."

Responses stream in progressively. The response doesn't arrive all at once. Tokens come in over one to ten seconds and you're updating the DOM incrementally so the user sees text appearing in real time. This means your parser runs many times on the same message, each time on a slightly longer version of the text.

Interactive widgets have lifecycle requirements. A Canvas graph needs to mount once, attach event listeners once, and stay mounted. If anything resets innerHTML on its parent, which streaming forces you to do constantly, the widget is destroyed and its listeners are orphaned.

These three constraints create a genuine architectural puzzle. You can't mount widgets during streaming. You can't wait until streaming is done to parse because the user would see a blank screen. And you need the widget configs to survive DOM resets. Every approach that seems obvious breaks on one of these three.

The solution I arrived at is a two-phase registry architecture. But understanding why simpler approaches fail is important before understanding why this one works.


Kiro: What It Actually Is and How I Used It

Since this is a community post about building with Kiro, I want to give you an honest picture of what it is rather than just repeating the marketing summary.

Kiro is an AI-powered IDE built on VS Code, developed by AWS. The surface-level description is "it has AI assistance built in" but that really undersells what makes it meaningfully different from GitHub Copilot or cursor-style tab completion.

The key concept in Kiro is specs. When you start a significant piece of work, Kiro generates a spec document: a structured breakdown of what you're building, broken into requirements, then tasks, then implementation steps. This spec lives in your project and Kiro uses it to maintain context across an entire feature's development, not just the current file.

For this project I wrote a prompt describing the interactive learning system. What types of visualizations I wanted, how they'd integrate with the chat streaming system, the security constraints around eval, the JSON contract between the AI model and the renderer. Kiro turned that into a spec with tasks like "implement recursive descent math parser," "build two-phase streaming architecture," "implement graph renderer with parameter sliders," and "add JSON repair pipeline for AI output." Each task had acceptance criteria.

What this gave me practically: when I was three widget types into building eleven, and I asked Kiro to help me implement the simulation renderer, it understood the full contract from the spec without me re-explaining it. That's the real value. Not tab completion. Persistent project-level context that makes each individual task faster because you're not starting from zero each time.

Kiro also has agent hooks, which are automated actions that trigger on file save, test run, or other events. I set up a hook that ran a basic smoke test on the widget registry whenever I added a new renderer type, catching mismatches between the type string I registered and the function I exported. Small thing, but it caught two bugs before I found them manually.

The other Kiro concept worth knowing is steering documents. These are markdown files you put in your project that give Kiro persistent instructions about your codebase conventions. Things like "always use the escapeHtml() utility instead of setting innerHTML directly with user data," or "responsive canvas sizing always uses devicePixelRatio for retina support." Kiro reads these before generating code. It meant I stopped seeing the same category of mistake in generated code after I documented my patterns once.


Designing the Widget Protocol

The first real design decision was how the AI model communicates what widget to render and with what data. I settled on custom fenced code blocks with an interactive- prefix as the language identifier.

A normal markdown code block looks like this in the raw text:

// Simplified streaming loop
eventSource.onmessage = (event) => {
    accumulatedText += event.data;
    const html = markdownToHtml(accumulatedText);
    messageDiv.innerHTML = html; // This runs 50-200 times per message
};
Enter fullscreen mode Exit fullscreen mode

Every time innerHTML is assigned, the entire subtree is torn down and rebuilt from scratch. Any Canvas elements lose their contexts. Any event listeners are orphaned. Any widget state is gone. If you mounted a graph on the 30th token, it's destroyed by the 31st.

The instinct is to try to be smart about partial updates, only re-rendering the parts that changed and preserving existing widgets. This sounds reasonable but breaks quickly because streaming text can change earlier parts of the message and tracking which DOM nodes correspond to which parts of the markdown is genuinely complex.

Phase 1: Register, don't render

The solution is to make the markdown parsing phase completely inert with respect to widgets. When the regex finds an interactive-* code block, it does two things and only two things.

First, it stores the config in an external Map keyed by a stable hash:

// The registry lives outside the DOM, so innerHTML resets can't touch it
const _ilBlockRegistry = new Map();

function _hashCode(str) {
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
        hash = ((hash << 5) - hash + str.charCodeAt(i)) | 0;
    }
    return 'il-' + Math.abs(hash).toString(36);
}
Enter fullscreen mode Exit fullscreen mode

The hash takes both the widget type and the JSON string as input, so the same widget config always generates the same ID regardless of how many times streaming re-parses the message. This is the key insight: content-addressed IDs survive DOM resets because they're deterministic.

Second, it returns a placeholder div instead of any real widget markup:

function processInteractiveBlocks(html) {
    return html.replace(
        /<pre><code[^>]*class="[^"]*language-(interactive-[\w][\w-]*)[^"]*"[^>]*>([\s\S]*?)<\/code><\/pre>/g,
        (match, type, encodedJson) => {
            if (!INTERACTIVE_TYPES[type]) return match;

            let jsonStr = encodedJson
                .replace(/<[^>]+>/g, '')
                .replace(/&lt;/g, '<')
                .replace(/&gt;/g, '>')
                .replace(/&amp;/g, '&')
                .replace(/&quot;/g, '"')
                .replace(/&#39;/g, "'");

            const id = _hashCode(type + jsonStr);
            _ilBlockRegistry.set(id, { type, jsonStr });

            return `<div id="${id}" class="il-placeholder">
                <div class="il-loading-spinner"></div>
                Loading visualization...
            </div>`;
        }
    );
}
Enter fullscreen mode Exit fullscreen mode

The placeholder is just a div with an id and a loading spinner. Recreating it on every stream tick is fine because it has no state, no listeners, no canvas context. It's just markup.

Phase 2: Mount once, after streaming ends

The rendering phase runs exactly once, triggered by your streaming completion event:

eventSource.addEventListener('done', () => {
    messageDiv.innerHTML = markdownToHtml(accumulatedText);
    renderInteractiveWidgets(messageDiv);
});

function renderInteractiveWidgets(container) {
    const placeholders = container.querySelectorAll('.il-placeholder');

    placeholders.forEach(el => {
        const config = _ilBlockRegistry.get(el.id);
        if (!config) return;

        try {
            const parsed = JSON.parse(config.jsonStr);
            el.innerHTML = '';
            el.classList.replace('il-placeholder', 'il-rendered');
            INTERACTIVE_TYPES[config.type](el, parsed);
        } catch (err) {
            el.innerHTML = `
                <div class="il-error">
                    <div class="il-error-title">Could not render visualization</div>
                    <details>
                        <summary>Show raw data</summary>
                        <pre>${escapeHtml(config.jsonStr)}</pre>
                    </details>
                </div>
            `;
            el.classList.remove('il-placeholder');
        }
    });
}
Enter fullscreen mode Exit fullscreen mode

The renderer gets a clean, stable div that nothing will ever reset. It can mount canvas elements, attach resize observers, start animation loops, all of it safe.


Building a Safe Math Expression Evaluator

The graph renderer needs to evaluate user-parameterized math expressions like sin(a * x) + b * cos(x / 2) in real time as sliders move. The obvious approach is eval(). The obvious approach is also a serious security risk in a web app where the expression comes from an AI model responding to arbitrary user input.

The correct approach is a recursive descent parser. This is a well-studied technique in compiler design and it's worth understanding even outside this specific context.

How recursive descent parsing works

A recursive descent parser is built around the grammar of your expression language. Every grammar rule becomes a function. The functions call each other recursively, and the call depth at any point mirrors the nesting depth of the expression.

For a math expression language, the grammar looks like this:

expression  := term (('+' | '-') term)*
term        := power (('*' | '/') power)*
power       := unary ('^' unary)?
unary       := '-' primary | '+' primary | primary
primary     := number | constant | function_call | '(' expression ')'
Enter fullscreen mode Exit fullscreen mode

The ordering here encodes operator precedence. Addition and subtraction are at the top level, so they're evaluated last. Multiplication and division are nested one level deeper, so they bind more tightly. Exponentiation is deeper still. This is why 2 + 3 * 4 evaluates to 14 and not 20. The term() call for 3 * 4 completes before the addition at the expression() level sees either operand.

Here's the tokenizer first:

tokenize(expr) {
    const tokens = [];
    let i = 0;
    while (i < expr.length) {
        if (/\s/.test(expr[i])) { i++; continue; }

        if (/[0-9.]/.test(expr[i])) {
            let num = '';
            while (i < expr.length && /[0-9.eE]/.test(expr[i])) {
                num += expr[i++];
            }
            tokens.push({ type: 'num', val: parseFloat(num) });
        }
        else if (/[a-zA-Z_]/.test(expr[i])) {
            let id = '';
            while (i < expr.length && /[a-zA-Z0-9_]/.test(expr[i])) {
                id += expr[i++];
            }
            tokens.push({ type: 'id', val: id });
        }
        else {
            tokens.push({ type: 'op', val: expr[i++] });
        }
    }
    return tokens;
}
Enter fullscreen mode Exit fullscreen mode

And the parser, where each grammar rule is a function:

parse(tokens, vars = {}) {
    let pos = 0;
    const peek = () => tokens[pos];
    const next = () => tokens[pos++];

    const expression = () => {
        let left = term();
        while (peek() && (peek().val === '+' || peek().val === '-')) {
            const op = next().val;
            const right = term();
            left = op === '+' ? left + right : left - right;
        }
        return left;
    };

    const term = () => {
        let left = power();
        while (peek() && (peek().val === '*' || peek().val === '/')) {
            const op = next().val;
            const right = power();
            left = op === '*' ? left * right : left / right;
        }
        return left;
    };

    const power = () => {
        let base = unary();
        if (peek() && peek().val === '^') {
            next();
            base = Math.pow(base, power());
        }
        return base;
    };

    const unary = () => {
        if (peek() && peek().val === '-') { next(); return -primary(); }
        if (peek() && peek().val === '+') { next(); return primary(); }
        return primary();
    };

    const primary = () => {
        const t = peek();
        if (!t) return 0;

        if (t.type === 'num') { next(); return t.val; }

        if (t.type === 'id') {
            next();
            if (MathEval.CONSTS[t.val] !== undefined) return MathEval.CONSTS[t.val];
            if (MathEval.FUNCS[t.val] && peek()?.val === '(') {
                next();
                const args = [expression()];
                while (peek()?.val === ',') { next(); args.push(expression()); }
                if (peek()?.val === ')') next();
                return MathEval.FUNCS[t.val](...args);
            }
            if (vars[t.val] !== undefined) return vars[t.val];
            return 0;
        }

        if (t.val === '(') {
            next();
            const v = expression();
            if (peek()?.val === ')') next();
            return v;
        }

        next();
        return 0;
    };

    return expression();
}
Enter fullscreen mode Exit fullscreen mode

The security guarantee here is total. This code only ever calls functions from the FUNCS whitelist and reads values from the CONSTS and vars objects. There's no code path that executes arbitrary JavaScript.

Kiro helped me get the recursive structure scaffolded quickly. What I had to work through myself was the right-associativity for exponentiation, which is easy to get wrong. If you write power() as left-recursive like term(), you get left-associativity and 2^3^2 gives the wrong answer. I also had to sort out the ordering of checks in primary() because you need to check CONSTS before checking FUNCS since E as a constant and exp as a function share a namespace, and a bug here gives silent wrong answers with no error thrown.


Handling AI-Generated JSON That Isn't Quite Valid

Here's a real-world problem that doesn't appear in blog posts about AI systems often enough. The model's output is almost right, but not quite, in ways that vary by model version, temperature setting, and prompt phrasing.

The specific failure modes I ran into:

Trailing commas. The model sometimes emits { "a": 1, "b": 2, } which is valid JavaScript but not JSON. JSON.parse throws.

Single quotes. Occasionally { 'title': 'Sine Wave' } instead of double quotes.

Colons corrupted to >. This one is subtle. When the markdown parser applies HTML encoding, : can become > in certain highlight span wrappers, giving you "title">"Sine Wave" instead of "title":"Sine Wave".

Highlight spans inside the JSON. Syntax highlighters wrap parts of code blocks in <span> elements for coloring. The JSON content has spans injected into it before you ever see it.

The repair pipeline addresses these in a specific order that matters:

// Strip syntax highlight spans BEFORE any entity decoding.
let jsonStr = encodedJson.replace(/<[^>]+>/g, '');

// Decode HTML entities
jsonStr = jsonStr
    .replace(/&lt;/g, '<').replace(/&gt;/g, '>')
    .replace(/&amp;/g, '&').replace(/&quot;/g, '"')
    .replace(/&#39;/g, "'");

// Collapse whitespace for non-canvas types
if (type !== 'interactive-canvas') {
    jsonStr = jsonStr.trim().replace(/\n\s*/g, ' ');
}

// Remove trailing commas
jsonStr = jsonStr.replace(/,\s*([}\]])/g, '$1');

// Fix > corruption
jsonStr = jsonStr
    .replace(/"(\w+)">([\d.]+)/g, '"$1":$2')
    .replace(/"(\w+)">"([^"]*?)"/g, '"$1":"$2"');

// If still invalid, try single-to-double quote conversion
try {
    JSON.parse(jsonStr);
} catch {
    const singleQuoteFix = jsonStr.replace(/'/g, '"');
    try {
        JSON.parse(singleQuoteFix);
        jsonStr = singleQuoteFix;
    } catch {
        const combined = singleQuoteFix.replace(/"([^"]+)">\s*/g, '"$1":');
        try {
            JSON.parse(combined);
            jsonStr = combined;
        } catch {
            // All repairs failed, store as-is and surface error at render time
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The ordering of the first two steps is the non-obvious part that Kiro helped me reason through. If you decode HTML entities before stripping tags, then &lt;span&gt; becomes <span> before the tag stripper runs, which is fine. But &lt;10 becomes <10 which looks like a malformed tag to the stripper and gets incorrectly removed. Strip tags first on the encoded string, then decode the entities on the cleaned result.


Canvas Rendering: DPI, Coordinates, and Performance

The graph renderer draws to an HTML Canvas element. Canvas has a subtle gotcha that causes blurry output on retina displays if you don't handle it. There are two separate size concepts.

The CSS size is what you set with canvas.style.width and canvas.style.height. This controls how much space the element takes up in the layout.

The buffer size is what you set with canvas.width and canvas.height. This is the actual resolution of the pixel buffer the browser renders into.

On a retina display, window.devicePixelRatio is 2 or 3 on some phones. If your CSS size is 700px wide but your buffer is also 700px, the browser has to upscale the buffer 2x to fill the CSS space and the result is visibly blurry. The fix is to make the buffer 2x the CSS size and then scale the drawing context by the same factor:

function resizeCanvas() {
    const containerWidth = wrapper.getBoundingClientRect().width - 32;
    const dpr = window.devicePixelRatio || 1;

    canvas.width = containerWidth * dpr;
    canvas.height = (containerWidth * 0.55) * dpr;

    canvas.style.width = containerWidth + 'px';
    canvas.style.height = (containerWidth * 0.55) + 'px';

    ctx.setTransform(dpr, 0, 0, dpr, 0, 0);
}
Enter fullscreen mode Exit fullscreen mode

After this, all your drawing code works in CSS pixel coordinates. ctx.fillRect(0, 0, 100, 100) draws a 100 CSS pixel square that looks sharp on any display density.

The coordinate transform from math space to canvas space is then layered on top:

function toCanvasX(x) {
    return pad.left + ((x - xRange[0]) / (xRange[1] - xRange[0])) * plotWidth;
}

function toCanvasY(y) {
    return pad.top + ((yRange[1] - y) / (yRange[1] - yRange[0])) * plotHeight;
}
Enter fullscreen mode Exit fullscreen mode

The y-inversion is important. In math, y increases upward. In canvas, y increases downward. So yRange[1] - y gives you the correct mapping.

For plotting the actual curve, I sample 400 points across the x range and handle discontinuities explicitly. Functions like tan(x) have vertical asymptotes where the value jumps from large positive to large negative. Without handling this you'd get a visible vertical line at the asymptote:

const points = [];
for (let i = 0; i <= numPoints; i++) {
    const x = xRange[0] + (i / numPoints) * (xRange[1] - xRange[0]);
    const y = MathEval.evaluate(f.expr, { x, ...paramValues });
    if (isNaN(y) || !isFinite(y) || y < yRange[0] - 50 || y > yRange[1] + 50) {
        points.push(null);
    } else {
        points.push({ cx: toCanvasX(x), cy: toCanvasY(y) });
    }
}

ctx.beginPath();
let penDown = false;
points.forEach(p => {
    if (!p) { penDown = false; return; }
    if (!penDown) { ctx.moveTo(p.cx, p.cy); penDown = true; }
    else ctx.lineTo(p.cx, p.cy);
});
ctx.stroke();
Enter fullscreen mode Exit fullscreen mode

Performance: IntersectionObserver for Animation Loops

The interactive-canvas widget type runs a continuous requestAnimationFrame loop for animated visualizations like oscillating waves or particle simulations. On a page with several of these in a long conversation, running all of them all the time is expensive even when they're off-screen.

The fix is IntersectionObserver, which fires a callback whenever an element enters or leaves the viewport:

let active = false;
let animFrame = null;

function loop() {
    if (!active) return;
    animTime += 0.016;
    draw();
    animFrame = requestAnimationFrame(loop);
}

const observer = new IntersectionObserver(entries => {
    const isVisible = entries[0]?.isIntersecting ?? false;

    if (isVisible && !active) {
        active = true;
        loop();
    } else if (!isVisible && active) {
        active = false;
        if (animFrame) cancelAnimationFrame(animFrame);
    }
}, {
    threshold: 0.1
});

observer.observe(widgetElement);
Enter fullscreen mode Exit fullscreen mode

This means only the widgets currently visible on screen are running their animation loops. With five physics simulations in a long conversation, this is the difference between 10ms/frame and 50ms/frame.


LaTeX Rendering with KaTeX

Many of the widget titles and descriptions contain mathematical notation. I integrated KaTeX rather than MathJax because KaTeX renders synchronously. MathJax is more complete but uses an async API that adds complexity when you're updating the DOM incrementally during streaming.

function renderKaTeX(text) {
    if (!text || typeof katex === 'undefined') return escapeHtml(text);

    let result = escapeHtml(text);

    result = result.replace(/\$\$([\s\S]*?)\$\$/g, (_, tex) => {
        try {
            return katex.renderToString(tex, { displayMode: true, throwOnError: false });
        } catch {
            return escapeHtml(tex);
        }
    });

    result = result.replace(/\$([^\$]+?)\$/g, (_, tex) => {
        try {
            return katex.renderToString(tex, { displayMode: false, throwOnError: false });
        } catch {
            return escapeHtml(tex);
        }
    });

    return result;
}
Enter fullscreen mode Exit fullscreen mode

The throwOnError: false option is important for production. If the AI generates slightly malformed LaTeX, you want graceful degradation to raw text rather than an uncaught exception that kills the widget render.

Display math gets processed first. If you process inline $...$ first, the regex will match the first $ of a $$...$$ block and corrupt it.


How Kiro Actually Changed My Day-to-Day

I want to be specific here because "AI-powered IDE" covers a lot of ground and most descriptions stay vague.

Adding new widget types is where Kiro made me genuinely faster. Once I had the graph and code-trace renderers working, I had a clear pattern. Kiro could read my existing renderers, understand the contract, and produce a solid first draft of the timeline or flowchart renderer that already matched my conventions. I'd typically spend maybe 30-40% of the time I would have spent writing it from scratch, with the rest going toward refinement rather than structure.

The spec-based context also caught bugs I'd have missed. When I refactored the hash function from random IDs to content-based IDs, Kiro flagged three places where I was still generating IDs the old way. Each would have been a silent bug with no error, just widgets that occasionally failed to render.

Visual quality is still something I had to drive manually. The rendering math, the gradient fills, the glow effects on plotted curves, the tooltip positioning, all of that required iteration with eyes on the actual output. Kiro could implement a line renderer. Whether it looked right was something I had to evaluate myself. That's honestly the right boundary. Code structure is something an AI IDE can carry a lot of. Aesthetic judgment isn't.

The steering documents were underrated honestly. I put in conventions like "always use devicePixelRatio for canvas sizing" and "never set innerHTML directly with user data." Kiro followed these consistently in generated code and I stopped seeing the same categories of mistake repeatedly.


What I'd Build Differently

I'd go schema-first. Defining TypeScript interfaces for every widget config before writing any renderer would have saved a lot of pain. Right now the JSON repair pipeline and the renderers' defensive defaults are doing work that a proper schema validator could do more clearly. Something like:

interface GraphConfig {
    title?: string;
    description?: string;
    functions: Array<{
        expr: string;
        label?: string;
        color?: string;
    }>;
    parameters?: Array<{
        name: string;
        label?: string;
        min: number;
        max: number;
        default: number;
        step?: number;
    }>;
    xRange: [number, number];
    yRange: [number, number];
}
Enter fullscreen mode Exit fullscreen mode

I'd also separate the streaming buffer from the display layer. Right now they're more coupled than I'd like. The cleaner approach would be an intermediate representation where you parse markdown into a tree of content nodes and widget placeholder nodes, then render the tree to DOM once at the end rather than rebuilding innerHTML incrementally. More complex upfront, cleaner long-term.

And for the canvas widget type specifically, I'd add server-side static analysis of the generated draw functions before shipping at scale. For now the attack surface is limited, but it's something that should be handled properly before this gets in front of a lot of users.


The Result

When a user asks "show me how frequency and amplitude interact in a sine wave," they get a graph with two sliders and a crosshair tooltip. When they ask "walk me through merge sort," they get a step-by-step code trace with variable state visible at each step. When they ask about quantum mechanics, they get an animated canvas simulation they can pause and scrub.

This is what AI-assisted learning should look like. Not longer text answers. Active answers that respond to interaction.

The full app is at aimichat.app while still in beta. Every interactive learning widget is available on all plans, and the feature works on any topic the AI decides warrants a visual explanation.

If any part of this architecture is something you're working on, whether that's streaming widget systems, safe expression evaluation, canvas rendering, or the two-phase DOM approach, I'm happy to go deeper in the comments.


Top comments (0)