"Designing the UI for an AI That Does Not Always Behave the Same Way Twice"

#ai #webdev #programming #javascript

By Dwig Yadav | Frontend Engineer & AI Interaction Designer, Team FireFury
EduRag Project | Stack: React 18, React Router v6, Axios, Hindsight Memory
Designing the UI for an AI That Does Not Always Behave the Same Way Twice

Nobody talks about this enough: designing for an AI backend is fundamentally different from designing for a regular REST API. When your backend returns a user object, it looks the same every time. When your backend asks Gemini a question, you get back something that is anywhere from one sentence to four paragraphs, sometimes structured, sometimes flowing prose, sometimes starting with an apology.
That unpredictability was my main enemy when building EduRag's frontend. And honestly, solving it taught me more about UI engineering than any project I have worked on before.
What I Was Responsible For
I built the three role-specific dashboards that form the core of EduRag's interface: the student dashboard, the teacher dashboard, and the admin dashboard. Each has a completely different purpose and information density. The student dashboard is the most complex — it has the RAG search interface where students ask questions and get AI-generated answers, a personalized study plan generator, a peer discovery section to find classmates, an anonymous feedback system for teachers, and the Hindsight memory bank view.
Beyond the dashboards, I owned the overall design language — color system, typography, spacing, the animated background component that runs on every page. I also built the AI interaction layer, which is the part of the UI where users actually talk to the system and get responses back.
The Problem That Hit Me in the First Week
I connected the RAG search interface to the real backend endpoint on day three. Until then I had been testing with a hardcoded response string that was always exactly two sentences. The moment I swapped in the real Gemini response, the layout broke.
Some answers were three words. Some were six paragraphs with numbered lists embedded in them. My response card had a fixed height of 200px that looked great in my mock data and looked terrible everywhere else. Short answers left a huge dead zone of empty space. Long answers got clipped without any indication that there was more content.
I spent an afternoon trying to patch the fixed-height card instead of replacing it. That was a mistake. When the underlying model does not have a fixed output contract, your layout cannot have a fixed height constraint either.
Rebuilding the Layout for Unpredictable Content
The response area now uses a card with no fixed height — it grows to fit its content. For very long responses, I set a max-height and overflow scroll, with a subtle gradient at the bottom edge to signal that the content continues. This pattern is borrowed from chat interfaces and it works well because users already understand it from messaging apps.
Source citations were the next issue. The RAG endpoint returns source references as chunk identifiers tied to the original PDF. Raw chunk IDs mean nothing to a student — they look like database noise. I built a citation renderer that maps each chunk ID to a human-readable label showing the PDF filename and the approximate page range. The citations appear below each answer as small chips that students can click to understand where the information came from.
Loading states deserved more attention than I initially gave them. Gemini responses take two to four seconds on average. A spinner communicates 'this will take a moment' but does not tell you anything about what is happening. I replaced the spinner with a skeleton loader — a pulsed placeholder in the shape of the response card. It sets a more honest expectation: this is a substantive operation, not a network round-trip. Users in testing said the skeleton felt more 'professional' than the spinner. I think what they actually meant is that it did not feel like the app was broken.
Three Dashboards, Three Very Different Users
The student dashboard needed to feel like a study companion. I wanted students to open it and immediately see something relevant to them — not a blank search box asking them what they want. So the top of the dashboard shows the student's recent searches, then personalized recommendations pulled from their behavior, then the search interface.
The teacher dashboard is much more about information density. Teachers want to see what topics students are asking about, which PDFs have been indexed, and how engaged the class is. I designed it around a scan-first layout — the most important metrics at the top, action items (upload PDF, review feedback) clearly separated from monitoring views.
The admin dashboard is different again. Admins need control, not summaries. The entire top section is user management — adding users, changing roles, deleting accounts. Analytics come second. The Hindsight memory insights panel, where admins can query any student's memory bank and run AI reflections, lives in its own section with a clear visual boundary from the rest of the admin tools.
The Animated Background — Why It Is Not Just a Gimmick
I made a deliberate decision to add an animated gradient background that runs on every page. I know how that sounds — it sounds like something a junior developer does when they run out of real features to add. But the reasoning was specific.
Educational tools tend to split into two camps: clinical and boring (think blackboard interfaces from 2012) or cartoonishly bright and condescending (think edtech apps designed for five-year-olds). Neither was right for a platform used by college students and teachers. I wanted something that felt ambient and calm — present enough that the interface feels alive, slow enough that it stops registering as movement after a few seconds.
The implementation: a CSS keyframe animation on a gradient that cycles over 15 seconds. No JavaScript, no canvas, no performance cost. The gradient is seeded from the brand color palette. It runs on GPU-composited layers so it never affects layout performance. I verified this with Chrome DevTools paint profiler before shipping.
When Hindsight Changed Everything About the Interface
Before we integrated Hindsight memory, the student dashboard's recommendations section was showing platform-wide trending topics. It was essentially useless — a student studying quantum physics was seeing recommendations about photosynthesis because that happened to be what most other students searched that week. It felt generic and it was generic.
After Hindsight, the recommendations pull from the student's own recalled memory facts. If you have been searching topics in organic chemistry for two weeks, your recommendations are neighboring topics in that space — not whatever is popular on the platform. The change sounds simple. The UX impact was significant. Testers in early access mentioned the recommendations section specifically as something they actually used, which had not happened before.
I also built a 'Pick up where you left off' section on the student dashboard that only appears if the student has prior memory. It shows their last three search topics with a button to continue. That feature took about 40 lines of code. It was the most commented-on UI element in feedback sessions. Memory made that possible.
The Memory Bank UI — Designing for Two Audiences
The Hindsight memory view needed to work for two very different people: students looking at their own history, and admins looking at any student's history. The information is the same but the purpose is completely different.
For students, I show only reflected summaries — natural language descriptions of their learning patterns generated by the reflect endpoint. I do not show raw retained facts, which can look like database entries and feel surveilling. A student should see 'You have covered topics in thermodynamics and kinetics over the past two weeks' — not a list of every query they made with timestamps.
For admins, I expose the full interface: the ability to query any user's memory bank by topic, run a fresh reflection, and see when memory facts were last updated. This is a diagnostic tool for teachers trying to understand struggling students, and they need the detail.
The Bug That Cost Me Half a Day
There was a point where the search interface would occasionally submit the query twice — sending two requests to the RAG endpoint instead of one, generating two overlapping answers in the response area. It was intermittent, which made it worse to debug.
The cause was embarrassingly simple. The search form had an onSubmit handler and the search button inside it had its own onClick handler. Both were firing on a button click. Pressing Enter only triggered onSubmit. Clicking the button triggered both. I had not noticed because I was mostly testing with the keyboard.
The fix was two characters — removing onClick from the button and letting the form's onSubmit handle everything. The lesson was not about React event handling, which I knew. The lesson was about testing your UI the way users actually use it. Users click buttons. Test with mouse clicks, not just keyboard.
Things I Learned That I Would Pass On
• Design for AI output variability from the first line of layout code. Every AI response card needs fluid height, overflow handling, and a loading skeleton.
• Skeleton loaders over spinners for any operation that takes more than 1.5 seconds. They communicate the nature of the operation, not just its existence.
• When you have memory data available, use it to replace generic content first. Trending topics became personalized recommendations — same UI slot, completely different value.
• Design decisions for memory-surfaced data are product decisions: what to show, to whom, in what form. Get those decisions made explicitly, not by accident.
• Test user-facing features with mouse input, keyboard input, and touch. They are not equivalent interaction paths.
Where I Would Take This Next
The study plan generator on EduRag currently creates a 7-day AI-guided schedule for any topic a student enters. It is useful but it is not personalized — it does not know what the student already understands or where their gaps are. With Hindsight memory injected into the study plan generation prompt, that changes. The plan could start from where the student actually is, not from scratch.
That feature is on the roadmap. It is also a good example of what I mean when I say memory changes the design space of the product. Once you have continuity, features that were not worth building become worth building. The interface can do things it simply could not do before.

— Dwig Yadav, Frontend Engineer & AI Interaction Designer, Team FireFury

DEV Community

"Designing the UI for an AI That Does Not Always Behave the Same Way Twice"

Top comments (0)