Shivangi Gupta

Posted on Jun 6

I taught Hindsight to turn chat into database writes

The hard part was not getting a model to answer a student. The hard part was deciding when a sentence in chat should become a durable change in the system.

That distinction shaped the way I built Student Copilot. I did not want another assistant that could summarize a long thread and then forget the consequence five minutes later. I wanted a workspace where a student could paste an opportunity announcement, forward a team message, write a daily check-in, or ask about overdue work, and the system would keep the operational record in sync.

That sounds simple until you try to ship it. Chat is loose. Databases are not. Users write half-sentences, change their minds, omit dates, paste screenshots, and use names inconsistently. A useful assistant has to handle all of that without turning every ambiguous sentence into a bad write.

The core lesson I took from integrating Hindsight agent memory on GitHub is that memory should not be treated as a prettier chat log. Memory is part of the execution model. It needs boundaries, evidence, retention rules, and a way to distinguish “remember this fact” from “mutate application state now.”

What Student Copilot does

Student Copilot is a personal workspace for tracking student commitments: opportunities, deadlines, teams, tasks, check-ins, weekly reports, memories, and conversation history. The React client is a single-page app served through a small Express backend. The backend owns the API surface, the local persistence layer, and the model calls.

The app has a few main loops:

Save and update opportunities with deadlines and registration links.
Create task backlogs manually or from pasted text.
Maintain team rosters from forms, messages, or images.
Record daily check-ins and generate weekly summaries.
Let the user ask a memory assistant about what they have committed to.
Store explicit memories separately from raw conversation history.

The structure is intentionally plain. server.ts defines the REST API. src/server/db.ts owns persistence and domain operations. src/server/gemini.ts owns model interaction, fallback parsing, structured extraction, and action execution. src/App.tsx carries the UI state and calls the API.

That split matters because the interesting part of this system is not the chat box. It is the path from natural language to a bounded database operation.

The through-line: chat is input, not state

The first version of any assistant-style app tends to make chat history the center of the universe. The user says something, the model responds, and the transcript becomes the only durable artifact.

That breaks down quickly in a productivity system.

If I say, “I submitted my application and Riya is handling the backend tasks,” I do not want that buried in message history. I want at least three possible updates:

The opportunity may need a new status.
The team roster may need a role update.
A memory about Riya’s responsibility may need to be retained.

Those are not the same operation. They have different lifetimes, different query patterns, and different failure modes.

This is where Hindsight documentation for agent memory influenced the design. Hindsight’s retain, recall, and reflect model is a useful mental model because it pushes memory out of “append this transcript” thinking. In this app, I used that idea as an application architecture rule: memory is only useful when it can be recalled in the right context and converted into safer decisions later.

The backend expresses that rule with a small action vocabulary. The model is allowed to propose actions, but the server decides how to execute them.

interface ExtractionResult {
  reply: string;
  extractedActions?: Array<{
    action:
      | 'create_opportunity'
      | 'update_opportunity'
      | 'create_task'
      | 'update_task'
      | 'update_team'
      | 'create_memory'
      | 'daily_checkin';
    data: any;
  }>;
}

This is not elaborate, but it is the important boundary. The model does not get to write arbitrary records. It emits one of a small number of verbs. Each verb maps to code I can inspect, test, and harden.

Building the memory context

When the user sends a message, the assistant does not answer from the transcript alone. It builds a compact workspace context from the current system state: opportunities, tasks, teams, memories, recent logs, and check-ins.

const contextData = {
  currentTime: new Date().toISOString(),
  localDate: '2026-06-05',
  opportunities: opportunities.map(o => ({
    id: o.id,
    title: o.title,
    type: o.type,
    status: o.status,
    deadline: o.deadline,
    notes: o.notes
  })),
  tasks: tasks.map(t => ({
    id: t.id,
    title: t.title,
    dueDate: t.dueDate,
    priority: t.priority,
    status: t.status
  })),
  teams: teams.map(t => ({
    id: t.id,
    name: t.name,
    opportunityId: t.opportunityId,
    members: t.members
  })),
  memories: memories.map(m => ({
    category: m.category,
    title: m.title,
    content: m.content
  }))
};

The important detail is that identifiers are included. If the assistant suggests updating an existing task or opportunity, it can reference a real target instead of guessing from a title. That is a small decision, but it prevents a lot of duplicate records.

This is also where Vectorize’s explanation of agent memory maps well to the app: the memory layer is not just about retrieval. It is about preserving enough state for the agent to reason across time without asking the user to restate everything.

In Student Copilot, a useful memory might be “I prefer backend tasks in the evening,” “Ananya usually owns UI work,” or “this fellowship requires a transcript before submission.” Those facts are not all tasks. They are context that should shape future task creation, team suggestions, and reminders.

Structured output made the system debuggable

The model call asks for JSON with a reply and optional extracted actions. The reply is what the user sees. The actions are what the server may execute.

const response = await generateWithRetry(ai, {
  model: 'gemini-3.5-flash',
  contents: userMessage,
  config: {
    systemInstruction,
    responseMimeType: 'application/json',
    responseSchema: {
      type: Type.OBJECT,
      properties: {
        reply: { type: Type.STRING },
        extractedActions: {
          type: Type.ARRAY,
          items: {
            type: Type.OBJECT,
            properties: {
              action: { type: Type.STRING },
              data: { type: Type.OBJECT }
            },
            required: ['action', 'data']
          }
        }
      },
      required: ['reply']
    }
  }
});

This is the difference between “AI feature” and maintainable software. I can log the action list. I can reject malformed actions. I can add validation per action. I can replay a user message and compare the proposed mutations. I can make the UI optimistic or conservative depending on the operation.

I also learned that schema-constrained output does not remove the need for defensive programming. It narrows the failure space. That is still valuable. A malformed JSON response is easier to handle than a persuasive paragraph that vaguely says it updated something.

The execution boundary

Once the response is parsed, the server saves the conversation and then executes extracted actions. The implementation is intentionally boring.

switch (action) {
  case 'create_task': {
    db.createTask(userId, {
      title: data.title || 'AI Task',
      description: data.description || 'Auto-created via chat message details.',
      dueDate: data.dueDate || '2026-06-15',
      priority: data.priority || 'Medium',
      status: data.status || 'Pending',
      opportunityId: data.opportunityId
    });
    break;
  }
  case 'create_memory': {
    db.createMemory(userId, {
      title: data.title || 'Factual Note',
      content: data.content || '',
      category: data.category || 'goal'
    });
    break;
  }
}

There are no hidden side effects here. A model-proposed create_task becomes exactly one task creation. A model-proposed create_memory becomes exactly one memory. If I want stricter validation, deduplication, approval prompts, or audit trails, this is the seam where those controls belong.

I would not ship this kind of system with the model directly calling a generic database API. The action vocabulary is the contract. It is also the part that makes Hindsight useful: memory can inform the proposed action, but application code still owns authority.

Production systems need boring fallbacks

The least glamorous part of the repo is one of the most important: the assistant still works when the model path is unavailable.

export function getGeminiClient(): GoogleGenAI | null {
  if (!aiClient) {
    const key = process.env.GEMINI_API_KEY;
    if (!key) {
      console.log('GEMINI_API_KEY environment variable is not defined. Using mock AI response fallback.');
      return null;
    }
    aiClient = new GoogleGenAI({ apiKey: key });
  }
  return aiClient;
}

The fallback path answers common questions about deadlines, teams, and pending tasks from the database. It can also create simple records from recognizable phrases. That is not as flexible as the model path, but it keeps the product honest: the workspace is still useful because the core state is first-class.

I prefer this shape to an app where every button is secretly a prompt. Forms, REST endpoints, and deterministic queries remain the foundation. The assistant adds a faster input method and a memory-aware retrieval layer.

A concrete interaction

Here is the kind of flow the system is designed to support.

I paste this into chat:

Registered for the summer research fellowship. Deadline is June 30. Ananya will review the essays, and I need to prepare my transcript this week.

The assistant should not only respond with encouragement. It should propose concrete actions:

Create an opportunity with a June 30 deadline.
Create a task to prepare the transcript.
Save a memory that Ananya is reviewing essays.

The visible answer might say:

I added the fellowship to your opportunity tracker, created a transcript task for this week, and saved Ananya’s review role so I can remember it later.

The next day, if I ask, “What am I blocked on for the fellowship?”, the system should have enough retained state to answer from actual records rather than vibes. It can inspect the opportunity, related tasks, saved memories, and recent check-ins.

That is the product value of Hindsight in this architecture. It helps turn accumulated context into better future behavior without making the transcript the database.

What I learned

The first reusable lesson is that memory should have types. A chat message, a task, a preference, and a durable fact are not interchangeable. Storing them separately makes the assistant more useful and the system easier to debug.

The second lesson is that agents need verbs, not permissions. “You may update the database” is too broad. “You may propose create_task, update_task, and create_memory actions that this server validates” is something I can reason about.

The third lesson is that Hindsight works best when it is part of the application’s control flow. I do not want memory bolted on after response generation. I want recall to shape the context before a decision, retention to happen after meaningful events, and reflection to improve what gets carried forward.

The fourth lesson is that fallback behavior is product behavior. If the model is down or the key is missing, users should still be able to see deadlines, tasks, teams, and memories. The assistant can degrade. The workspace should not disappear.

The fifth lesson is that the UI should expose state, not magic. Student Copilot has tabs for opportunities, tasks, teams, check-ins, memories, reports, and profile data because users need to inspect and correct what the assistant inferred. A memory system without correction paths eventually becomes a liability.

The part I would keep

If I rebuilt this from scratch, I would keep the same core idea: chat is an input surface, not the source of truth.

Hindsight makes that more practical because it gives the agent a way to learn from prior interactions without stuffing the entire past into every prompt. But the application still needs strong boundaries. The database owns records. The server owns mutations. The assistant proposes typed actions. The memory layer improves context over time.

That architecture is less flashy than a fully autonomous assistant, but it is the one I trust. It lets the system get more useful as it learns while keeping the important question visible in code: what exactly is this message allowed to change?