DEV Community

milton rojas
milton rojas

Posted on

I built a personal AI assistant that actually remembers things (Claude + Telegram + SQLite)

Most AI assistants have a fundamental flaw: every conversation starts cold. ChatGPT doesn't know you told it yesterday that you prefer bullet points. Claude forgets you're working on a Node project. The context you built up over weeks evaporates the moment you close the tab.

I wanted something different — a personal assistant that lives in Telegram, runs on my own machine, and actually retains state between sessions. Here's how I built it and why the architecture works.


The shape of the problem

Before writing any code, it's worth being precise about what "remembering things" actually means for a chat assistant. There are two distinct memory types:

Short-term memory — the conversation context Claude needs to give coherent replies. "What did the user just ask?" If you don't pass prior messages back to the API, the model is stateless by definition.

Long-term memory — facts that should persist across sessions. "The user prefers concise answers." "They're building a Node.js project." These shouldn't vanish when the process restarts.

Most tutorials handle neither. They show you how to get a single Claude response and call it a day. This post covers both.


Stack

  • Node.js — runtime
  • node-telegram-bot-api — Telegram interface
  • @anthropic-ai/sdk — Claude Haiku (fast, ~$0.001/message)
  • better-sqlite3 — persistent memory (single .db file, no server)
  • dotenv — config

Total dependencies: 4. The whole thing is one file.


Database schema — two tables, one file

The memory architecture is intentionally simple:

CREATE TABLE IF NOT EXISTS messages (
  id      INTEGER PRIMARY KEY AUTOINCREMENT,
  role    TEXT NOT NULL,        -- 'user' or 'assistant'
  content TEXT NOT NULL,
  created INTEGER NOT NULL DEFAULT (unixepoch())
);

CREATE TABLE IF NOT EXISTS notes (
  id      INTEGER PRIMARY KEY AUTOINCREMENT,
  content TEXT NOT NULL,        -- free-form personal fact
  created INTEGER NOT NULL DEFAULT (unixepoch())
);
Enter fullscreen mode Exit fullscreen mode

messages handles short-term memory — the rolling conversation window. notes handles long-term memory — facts the user explicitly stores with /remember.

They're separate for a reason. Conversation history has a natural expiry: old exchanges stop being relevant after N turns. Personal notes don't expire — "I prefer Celsius" is true forever. Mixing them in one table would mean either polluting your context with stale messages or accidentally pruning permanent facts.


The rolling window — how short-term memory stays bounded

Every Claude API call costs tokens. Passing 500 messages of history is both expensive and counterproductive (the model's attention dilutes over very long contexts). The fix is a rolling window:

function saveMessage(role, content) {
  db.prepare(`INSERT INTO messages (role, content) VALUES (?, ?)`).run(role, content);

  // Prune: keep only the latest MAX_HISTORY * 2 rows
  db.prepare(`
    DELETE FROM messages WHERE id NOT IN (
      SELECT id FROM messages ORDER BY id DESC LIMIT ?
    )
  `).run(MAX_HISTORY * 2);
}

function getHistory() {
  return db
    .prepare(`SELECT role, content FROM messages ORDER BY id ASC`)
    .all();
}
Enter fullscreen mode Exit fullscreen mode

After every save, a DELETE prunes everything outside the window. The default is 10 exchanges (20 rows — user + assistant pairs). This is configurable via MEMORY_WINDOW in .env.

Why MAX_HISTORY * 2? The Anthropic messages API requires alternating user/assistant turns. If you store them separately and count by message count, you'd need even pairs anyway. Doubling the window size and pruning by row count naturally maintains that invariant.


Long-term notes — injecting facts into the system prompt

This is the key part. Notes aren't passed as messages — they're injected into the system prompt. This matters because:

  1. System prompt content is read before any conversation turn
  2. It doesn't count against the conversational history limit
  3. The model treats it as foundational context, not a thing the user once said
async function askClaude(userMessage) {
  const notes = getNotes();
  let systemText = SYSTEM_PROMPT;

  if (notes.length > 0) {
    const noteList = notes.map((n) => `- ${n.content}`).join('\n');
    systemText += `\n\nPersonal notes from the user (always honor these):\n${noteList}`;
  }

  const history = getHistory();
  const messages = [
    ...history,
    { role: 'user', content: userMessage },
  ];

  const response = await anthropic.messages.create({
    model: CLAUDE_MODEL,
    max_tokens: 1024,
    system: systemText,
    messages,
  });

  return response.content[0].text;
}
Enter fullscreen mode Exit fullscreen mode

When you run /remember I work in Central Time, that note goes into the notes table. Every subsequent API call re-reads all notes and appends them to the system prompt. The model now knows your timezone permanently — across restarts, across cleared histories — because it's always in the system prompt.


The full message handler

The Telegram side is straightforward once the database helpers are in place:

bot.on('message', async (msg) => {
  if (!isAllowed(msg)) return;   // single allowed chat ID — private by default
  const text = msg.text;
  if (!text || text.startsWith('/')) return;  // commands handled separately

  bot.sendChatAction(msg.chat.id, 'typing');  // show typing indicator

  saveMessage('user', text);
  const reply = await askClaude(text);
  saveMessage('assistant', reply);

  await bot.sendMessage(msg.chat.id, reply);
});
Enter fullscreen mode Exit fullscreen mode

The auth guard (isAllowed) checks msg.chat.id against TELEGRAM_ALLOWED_CHAT_ID from .env. This prevents the bot from responding to anyone who stumbles on your bot token — a real concern if you ever accidentally expose it.


Built-in commands

Three commands ship out of the box:

/remember <text>  — store a permanent note
/forget           — clear conversation history (notes survive)
/status           — uptime, message count, memory stats
Enter fullscreen mode Exit fullscreen mode

/forget clears messages but leaves notes untouched. This is intentional — starting a fresh conversation shouldn't erase the facts you've stored. You'd need a separate /clearnotes command if you wanted that.

/status is handy for debugging:

Assistant Status

Uptime: 2h 14m
Model: claude-haiku-4-5-20251001
Messages this session: 8 in context
Total messages ever: 312
Stored notes: 4
Memory window: 10 exchanges
DB: memory.db
Enter fullscreen mode Exit fullscreen mode

Running cost

Claude Haiku is priced at $0.80 per million input tokens and $4.00 per million output tokens. A typical message exchange — 500 tokens in, 200 out — costs about $0.0012. Run it 30 times a day and you're at $0.036/day. A $5 Anthropic credit lasts roughly 4-5 months of normal personal use.

Telegram bots are free. SQLite is free. The only recurring cost is the Anthropic API, and at this usage level it's effectively rounding error.


Why SQLite and not a vector DB

Every AI memory tutorial wants to reach for Pinecone, Weaviate, or at minimum Redis. For a personal assistant handling your notes, this is massive overkill.

You will not have 100,000 notes. You might have 50. A SELECT * FROM notes returning 50 rows and injecting them into a system prompt is instant, free, and never breaks. Semantic search across 50 items is not a problem that needs solving.

SQLite also gives you durability for free. better-sqlite3 writes are synchronous — there's no async error surface, no connection pool to manage. The file is just there when you restart.

Scale the vector approach when you have a real semantic retrieval problem (thousands of documents, fuzzy lookup, cross-user search). For one person's notes: a SQLite table is the right tool.


Extending it

The architecture makes extensions clean because each new command is independent. Some examples of what drops in naturally:

Daily briefings via cron:

const cron = require('node-cron');

cron.schedule('0 8 * * *', async () => {
  const briefing = await askClaude(
    'Give me a short morning briefing. Check my notes for context.'
  );
  bot.sendMessage(ALLOWED_CHAT_ID, `Good morning!\n\n${briefing}`);
});
Enter fullscreen mode Exit fullscreen mode

Reminders stored in SQLite:

bot.onText(/^\/remind (.+) in (\d+)(m|h)$/i, (msg, match) => {
  if (!isAllowed(msg)) return;
  const ms = match[3] === 'h' ? match[2] * 3600000 : match[2] * 60000;
  db.prepare('INSERT INTO reminders (message, fire_at) VALUES (?, ?)')
    .run(match[1], Date.now() + ms);
  bot.sendMessage(msg.chat.id, `Reminder set.`);
});
Enter fullscreen mode Exit fullscreen mode

Vision — analyze photos you send:

bot.on('photo', async (msg) => {
  if (!isAllowed(msg)) return;
  const fileId = msg.photo[msg.photo.length - 1].file_id;
  const fileLink = await bot.getFileLink(fileId);
  // fetch as base64, send to Claude with image content block
  // claude-haiku handles images natively — no separate vision model needed
});
Enter fullscreen mode Exit fullscreen mode

Each of these is independent. Adding one doesn't touch the others.


What this architecture can't do

Being honest about limitations:

  • No semantic search — notes are injected verbatim. If you have 200 notes, you'd want to filter by relevance before injecting. At 20-30 notes this isn't a problem.
  • Single user only — the ALLOWED_CHAT_ID guard is intentional. Multi-user requires per-user memory tables and more careful auth.
  • No streamingbot.sendMessage blocks until Claude replies. For long responses, a streaming approach would show text as it generates. Doable, just not in this minimal version.
  • Local persistence onlymemory.db lives wherever you run the process. If you move machines, bring the file. No automatic cloud sync.

The full .env

ANTHROPIC_API_KEY=sk-ant-...
TELEGRAM_BOT_TOKEN=your_bot_token_from_botfather
TELEGRAM_ALLOWED_CHAT_ID=your_numeric_chat_id

# Optional — these have sane defaults
CLAUDE_MODEL=claude-haiku-4-5-20251001
MEMORY_WINDOW=10
SYSTEM_PROMPT=You are a personal AI assistant. Be direct and concise.
Enter fullscreen mode Exit fullscreen mode

Get your chat ID by messaging @userinfobot on Telegram.


Running it

npm install
cp .env.example .env
# fill in your three required keys
node index.js
Enter fullscreen mode Exit fullscreen mode

For persistent operation, wrap it in PM2:

npm install -g pm2
pm2 start index.js --name assistant
pm2 startup   # auto-restart on reboot
pm2 save
Enter fullscreen mode Exit fullscreen mode

Closing thoughts

The core insight here is that "memory" for an LLM assistant is really just careful data management. Short-term context is a rolling SQLite window passed as the messages array. Long-term facts are a separate table injected into the system prompt. Neither requires a vector database, a cloud service, or anything beyond the four npm packages listed above.

The total implementation is around 280 lines of well-commented JavaScript. That size is deliberate — small enough to read in one sitting, large enough to be a real working system rather than a hello-world skeleton.

This is the same memory pattern used in a larger AI system running 88 MCP tools and full autonomy loops. The difference is 18 months of additions. The core memory architecture hasn't changed.


If you want the full starter kit — the complete index.js, a 30-minute setup guide, and 10 pre-built extensions (weather, reminders, vision, cron briefings, web search, FTS5 note search, Pomodoro timer, and more) — it's at https://milkyway801.gumroad.com/l/hkdaox for $19 one-time. No subscription, no license key. Unzip, fill in .env, run.

Extensions you've built on this pattern, or questions about the memory architecture — drop them in the comments.

Top comments (0)