jaydeep sureliya

Posted on Mar 13

How I Built MindStash - An AI Second Brain (and Why I Had To)

#ai #python #nextjs #buildinpublic

I lost a good idea last year.

Not in some dramatic way. I was in the middle of something else and a thought crossed my mind, a product angle I had been turning over for weeks, finally sharp enough to act on. I told myself I would write it down in a minute. I did not. By evening it was gone. Not fuzzy. Completely gone. Like it never happened.

I tried everything after that. Apple Notes. Notion. Pocket. Obsidian. A folder in my bookmarks bar I called "later" that became a graveyard of 300 unread links. The problem was never capturing. Every app does that. The problem was that nothing ever gave things back. I would save something and it would disappear into the void. Organised, yes. Retrievable, technically. But in practice: gone.

So I built MindStash. An AI-powered system where you drop a thought in under 10 seconds and the AI handles the rest. Categorisation, tagging, urgency detection, reminders, resurfacing, and a conversational agent to find anything in plain English.

Here is how I built it.

The Core Idea: Capture First, Think Later

The insight that drove the whole design was simple. The cost of organising is higher than the cost of capturing, and that is why everything fails.

People do not save things because they do not want to decide where it goes right now. And "later" never comes. So I designed MindStash around one constraint: capturing must be zero-friction. You type, hit save, and you are done. The AI decides everything else.

The 500-character limit was a deliberate product decision, not a technical one. It forces clarity, keeps AI costs predictable, and it turns out almost every real thought fits in 500 chars. If it does not, that is a sign you need to break it into multiple thoughts, which is a feature, not a bug.

The Stack

Backend:   Python 3.12, FastAPI, SQLAlchemy 2.0, PostgreSQL, Alembic
Frontend:  Next.js (App Router), React 19, TypeScript, Tailwind CSS 4, Framer Motion
AI:        Anthropic Claude (agent), AI/ML API for categorisation
State:     TanStack React Query + custom hooks
Deploy:    Vercel (frontend), Railway (backend), Supabase (PostgreSQL)

I want to be honest about stack choices because a lot of "how I built X" posts skip the actual reasoning.

FastAPI over Django/Flask. I wanted async-first, type-safe, auto-generated OpenAPI docs with zero boilerplate. FastAPI with Pydantic gives you that. The developer experience is genuinely better for an API-first service.

Next.js App Router. SSR for the landing page, client components for the dashboard. The split is clean and it deploys free on Vercel.

Supabase over self-managed Postgres. Managed Postgres with a free tier, built-in pgvector extension for future semantic search, and a decent dashboard. No ops overhead for a personal project.

The AI Architecture: Two Models, Not One

This was the most interesting engineering decision I made.

The system has two distinct AI tasks:

Categorisation — classify an item into one of 12 categories, extract 10 intelligence signals (urgency, intent, priority, time context, etc.), generate tags and a summary
Conversation — a chat agent that can search, create, update, and delete items via tool calls

I use different models for each:

Categorisation: AI/ML API (OpenAI-compatible) — cheaper, faster, good enough for structured JSON extraction
Chat agent: claude-haiku-4-5-20251001 (Anthropic direct) — better reasoning for multi-step agentic tasks, native tool calling

This split saves significant cost. The categorisation endpoint fires on every single save. Haiku on Anthropic would work fine too, but the AI/ML API being OpenAI-compatible meant I could swap models without touching code, which was useful during testing.

What the AI Extracts

Here is the Pydantic schema for what comes back from the categoriser:

class AICategorizationResult(BaseModel):
    category: Literal["read", "watch", "ideas", "tasks", "people",
                       "notes", "goals", "buy", "places", "journal",
                       "learn", "save"]
    tags: list[str]
    summary: str
    confidence: float
    priority: Literal["low", "medium", "high", "critical"]
    time_sensitivity: Literal["none", "flexible", "soon", "urgent", "overdue"]
    intent: str
    action_required: bool
    urgency: Literal["low", "medium", "high"]
    time_context: str | None
    resurface_strategy: Literal["none", "once", "periodic", "deadline"]
    suggested_bucket: str

From "Call Rahul about the project kickoff before Thursday", the AI returns:

category: "tasks"
action_required: true
urgency: "high"
time_sensitivity: "urgent"
time_context: "before Thursday"
resurface_strategy: "deadline"

Ten signals from one sentence. No user input beyond typing the thought.

The 12 categories are fixed and non-configurable. I got pushback on this during early testing. "Why can not I add my own?" My answer: because infinite categories mean the user has to think. The whole product is about not thinking. If I let users create categories, capture friction goes up. The 12 cover every real-world thought type I have encountered.

The Agent: Tool Calling Loop in Python

The chat agent (services/ai/agent.py) runs a synchronous tool-calling loop and yields Server-Sent Events. Here is the simplified structure:

async def run_agent(
    db: Session,
    user_id: str,
    session_id: str,
    message: str,
) -> AsyncGenerator[dict, None]:

    history = get_chat_history(db, session_id)
    history.append({"role": "user", "content": message})

    while True:
        response = anthropic_client.messages.create(
            model="claude-haiku-4-5-20251001",
            system=AGENT_SYSTEM_PROMPT,
            messages=history,
            tools=tool_registry.get_schemas(),
            max_tokens=2048,
        )

        if response.stop_reason == "end_turn":
            yield {"event": "text_delta", "data": extract_text(response)}
            break

        if response.stop_reason == "tool_use":
            yield {"event": "tool_start", "data": get_user_message(tool_name)}

            tool_result = await tool_registry.execute(
                tool_name, db, user_id, tool_input
            )

            yield {"event": "tool_result", "data": tool_result}

            # Append assistant response + tool result and continue the loop
            history.append({"role": "assistant", "content": response.content})
            history.append({"role": "user", "content": [tool_result_block]})

The SSE events matter a lot for UX. Users see "Searching your items..." then "Found 4 results" then the final response. The streaming effect makes it feel alive, not like a spinner.

Tool Registry Pattern

Tools are registered centrally, not scattered through the codebase:

registry.register(
    name="search_items",
    schema={
        "name": "search_items",
        "description": "Search the user's saved items",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string"},
                "category": {"type": "string"},
                "limit": {"type": "integer", "default": 10},
            },
        },
    },
    handler=search_items_handler,
)

Currently registered tools: search_items, create_item, update_item, delete_item, mark_complete, get_counts, get_upcoming_notifications, get_digest_preview, generate_daily_briefing.

The agent decides which tools to call based on the user message. "What ideas did I save this week?" triggers search_items. "Mark the React article as done" triggers search_items then mark_complete. Multi-step, no manual orchestration.

SSE Streaming on the Frontend

The chat endpoint uses Server-Sent Events. The frontend uses native fetch, not axios, because axios does not handle SSE streams cleanly:

const response = await fetch(`${API_URL}/api/chat/`, {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${token}`,
  },
  body: JSON.stringify({ session_id: sessionId, message }),
});

const reader = response.body!.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split("\n");

  for (const line of lines) {
    if (line.startsWith("data: ")) {
      const event = JSON.parse(line.slice(6));

      if (event.event === "text_delta") {
        setCurrentMessage((prev) => prev + event.data);
      }
      if (event.event === "tool_result" && event.data.mutated) {
        queryClient.invalidateQueries({ queryKey: ["items"] });
      }
    }
  }
}

The mutated flag on tool results is the clever bit. When the agent creates or updates an item, it signals the frontend to invalidate the React Query cache. The dashboard updates in real time without any polling.

The Database Migration Chain

Alembic migrations evolved as features shipped:

1. initial tables           — users, items
2. 12-category fields       — ai metadata, tags, confidence
3. AI intelligence signals  — intent, urgency, time_context, resurface_strategy
4. notification fields      — next_notification_at, frequency, enabled
5. completion fields        — is_completed, completed_at
6. last_surfaced_at         — for the Today module resurfacing logic
7. chat + memory tables     — ChatSession, ChatMessage, UserMemory
8. google_id                — for Google OAuth (hashed_password nullable)

One mistake I made early on: I put everything in a ai_metadata JSONB column. When I needed to query by urgency or filter by action_required, I had to pull the whole JSON blob and filter in Python. Lesson: if you will query on a field, make it a real column. Even if it feels like extra schema up front, you will pay for the shortcut later.

What I Would Do Differently

Plan the item schema up front. The migration chain above tells the story. I kept discovering new fields I needed. If I had thought through the full AI signal set on day one, I would have saved four migrations and a couple of hours of head-scratching.

Rate limit earlier. I added rate limiting late using slowapi. In the meantime a single user could hammer the AI endpoint. The 500-char limit helped keep costs low even without limits, but proper rate limiting should be first-class from day one.

The 500-char limit is correct. Do not second-guess it. Every time I tested relaxing it, the UX got worse. Short thoughts get categorised better. The constraint is a feature.

What Is Next

The codebase already has the pgvector extension and embedding schema in place. Once I top up OpenAI API credits, a backfill script runs and semantic search activates across the whole vault. The embeddings are text-embedding-3-small (1536 dimensions), compared via cosine similarity directly in Postgres.

Other things on the list:

A browser extension for one-click capture from any tab
A mobile-first version (currently web only)
Shared vaults and collaborative capture

Try It

MindStash is live and free to use, no credit card needed.

If you have ever lost a good idea because the moment passed, it is for you.

I am in the comments if you have questions or feedback. Happy to go deeper on any part of the implementation.

Built with FastAPI, Next.js 15, React 19, Tailwind CSS 4, Framer Motion, and Claude. Deployed on Vercel + Railway + Supabase. Live at mindstashhq.space.

DEV Community