DEV Community

Cover image for I Turned Claude Code Into a Personal AI Butler That Runs My Life
Mateus Medeiros
Mateus Medeiros

Posted on

I Turned Claude Code Into a Personal AI Butler That Runs My Life

Not another chatbot wrapper: a proactive assistant that runs on Claude Code headless, wired into my whole life through MCP.


Introduction

Most generic AI assistants are smart but forgetful. They know a lot about the world and nothing about you. Every conversation starts from scratch. Every question is answered generically. Ask them to help you plan your week and they'll give you a productivity framework from 2019.

I wanted something different. I wanted an assistant that knows I lead engineering at a fintech, that I'm working through a GTD backlog, that I've been putting off a specific task for three weeks, that I go to sleep around midnight, and that I absolutely hate soup for dinner.

So I built Jarvis.

This isn't a tutorial. It's a walkthrough of what I built, how I built it, and why the combination of context, MCP, and Claude Code changed how I think about personal tooling.


Part 1: What Jarvis Does

Let me give you a concrete example before I list anything.

Ten minutes before a call with my CTO, I hadn't opened Slack, hadn't reviewed my notes, hadn't done anything. Then, across the room, my Alexa spoke up in Jarvis's voice. It had pulled the last three technical decisions we had open, flagged an unread DM that was likely to come up, and surfaced something I'd written in my diary two days earlier that was directly relevant. I walked into that call better prepared than I would have been after an hour of manual review, and I hadn't asked for any of it. And it doesn't stop when the call ends: afterward, Jarvis reads back through the meeting, the calendar entry, the notes, what was discussed, picks out the loose ends that landed on me, and files them as tasks in my GTD. By the time I'm back at my desk, the follow-ups are already waiting, categorized and prioritized, without me writing a single one of them down.

That's what "useful" looks like in practice. Not "it can do X," but "it showed up before I even knew I needed it."

But "useful" is only half the story. The other half still catches me off guard: Jarvis improves itself.

A few weekends ago I was driving to the mall with my wife. I wanted to hand a couple of things off to Jarvis, but I wasn't about to type paragraphs into Telegram from behind the wheel, and at that point it only understood text. So, stopped at a red light, I told it to fix exactly that: add support for receiving voice messages on Telegram. It delegated the work to Claude Code, which branched, wired up the transcription pipeline, and opened a pull request. I merged it right there, still in the car, and my very next message to Jarvis was a voice note.

Read that back: I used the assistant to give the assistant a new way for me to use it. It didn't just run a task; it extended its own surface area, on request, while I kept my hands on the wheel.

Here's what makes that possible. Jarvis is connected, via MCP (Model Context Protocol), to everything I actually use:

Productivity & Work
Google Calendar, Gmail (two accounts: personal and professional), Google Drive and Docs, Slack. Full GTD task management with categories, priorities, due dates, and statuses. The assistant doesn't just read these; it can act on them, create events, search emails, list tasks by energy level, or find a document buried in a folder I haven't opened in months.

Personal life
A reading list, a games backlog, a watchlist for films and series, a link library. And the watchlist isn't just a place things sit: every Saturday evening, based on how my week went and what I've already watched, Jarvis suggests something new to add. A personal finance module connected to real bank data, not manual entries. A diary with structured entry types (personal, work, technical, idea, meeting, learning). And a shopping list, because even butlers deal with groceries.

Memory: the part that changes everything
Jarvis has two memory layers. Structured facts, the things I've explicitly told it to remember: my name, my wife's name, the football team I support, the birthdays that matter, the company I work for, and that I hate soup for dinner. And semantic memory, a Pinecone vector store that indexes conversations, diary entries, and patterns over time. Before answering anything contextual, it queries that store. When I mention "that infrastructure decision from last week," it knows what I mean. When I ask about someone by name, it already has context.

Automated Routines
This is where it stops feeling like a tool and starts feeling like an actual assistant. Jarvis has 31 scheduled routines running throughout the week. The ones that matter most:

  • Morning briefing: before I open anything, I already have the day: calendar, tasks, messages worth knowing, how my investments did overnight, and what's going on in the world. The day starts contextualized.
  • Pre-meeting briefing: what you just read above. Polls the calendar every 15 minutes, fires exactly once per meeting, 10 to 25 minutes before it starts.
  • Finance report: weekly summary of spending, investments, and outstanding debts. Finds me; I don't chase it.
  • Weekly review: once a week it walks my GTD with me, what moved, what stalled, what's been sitting in 'waiting' too long, and helps me reset for the week ahead.
  • Deadline alerts: tasks approaching their due date surface automatically.

The assistant also has a persona: a dry British wit that occasionally points out when you're repeating yourself. It's not neutral, and that's on purpose.


Part 2: How It's Built


The Stack

The backend is a FastAPI application on Python 3.14, running on a Compute Engine VM on GCP behind Nginx. Data lives in Google Cloud Datastore, a managed NoSQL document database.

MCP as the Integration Layer

MCP (Model Context Protocol) is what makes this scale without becoming a mess. Each domain is its own MCP server: Google Workspace, Slack, GTD, Finance, Diary, Memory, Alexa, and more. Twelve servers in total, exposing 35+ tools to the agent.

The agent stays clean. Adding a new integration means writing a new MCP server, not touching the orchestration logic.

Claude as the Brain

The agent is a Python runner, but it doesn't call the Anthropic API directly. Instead, it shells out to Claude Code running headless (claude -p), handing it the persona and system instructions, the MCP server config, and an allow-list of tools. Claude Code does the heavy lifting: it builds the context window, connects to every active MCP server, and drives the tool-calling loop. The runner just feeds it the user message and parses the structured JSON ({response, actions}) that streams back. The neat part: the same Claude Code that writes Jarvis's code is also the engine that runs it. The assistant doesn't just get built by an agent; it is one.

A Persona, On Purpose

Jarvis's personality lives in its own file, persona.md, kept deliberately separate from the operational instructions that tell the runner how to behave (output format, channel rules, when to delegate to Claude Code). The mechanics change often; the identity shouldn't. Splitting them means I can rewrite how Jarvis works without ever touching who Jarvis is, and the character stays consistent across every channel, whether it's answering on Telegram or speaking through an Echo.

And it's a real persona, not a one-line "be friendly and concise." It's a full character brief: an impeccable British butler who happens to run complex systems, dry wit by default, always first person, never the telemarketing enthusiasm of "Great question!" The file even ships with calibration examples so the tone stays sharp:

"Meeting tomorrow at 2pm. I've filed it under 'this time he actually shows up.'"

"The bug is on line 47. The real cause is an architecture decision from two weeks ago, but let's pretend we didn't see that for now."

That's the difference between an assistant that answers and one you'd actually want around. Generic is a choice, and I chose against it.

OpenAI as a Supporting Layer

While Claude drives conversations, OpenAI handles specific tasks: Whisper transcribes the voice messages I send over Telegram, and DALL-E 3 handles image generation (when I asked Jarvis what it thought it looked like, it drew its own self-portrait).

Channels

Telegram for async. The voice side started from a constraint I'd built for myself: I have six Echo devices around the house, one per room, and I wanted to actually put them to use, just not as Alexa. I wanted them to speak as Jarvis. So for voice, the pipeline goes: response text → ElevenLabs → custom audio file → Alexa device. Jarvis speaks in its own voice, not Amazon's default TTS.

But the proactive side is where it gets technically unusual.

Amazon's official Alexa Skills Kit is reactive: it waits to be called. If you want the assistant to speak up on its own, whether a morning briefing, a deadline alert, or a pre-meeting summary, the Skills Kit won't help you. So I reverse-engineered the internal Amazon API. It's undocumented. It's unofficial. Amazon could break it tomorrow. But it's the only path to an assistant with genuine initiative, and to me, that tradeoff was obvious. The fragility is a feature, not a bug: it means I had to care enough to go find it.

Semantic Memory

Conversations, diary entries, and pattern analyses get turned into vectors with OpenAI's text-embedding-3-small and upserted into a Pinecone index, each one tagged with a timestamp and type. Before responding to anything contextual, Jarvis embeds the incoming message and runs a top-k similarity search, pulling back the handful of past moments closest to what I'm asking about right now. A separate nightly routine does the heavier lifting: it reads back over the last day, distills behavioral patterns, and writes those summaries back into the same store, tagged as pattern_analysis, so the next morning's briefing can reference them. The result is what makes it feel continuous: not a chatbot that resets, but something that was paying attention yesterday too.

Routines via Cloud Scheduler

31 cron jobs, fired as HTTP calls from Cloud Scheduler to internal Nginx routes. Each job fetches context (diary, tasks, semantic memory, calendar, Slack), builds a prompt, calls Claude, and sends the result via Telegram or Alexa.

Built with Claude Code

Here's the part that still surprises me every time: I built almost all of this using Claude Code, Anthropic's agentic coding tool that runs in the terminal.

The workflow: I describe what I want, Claude Code creates a branch, implements it, commits, and opens a PR. I review and merge. For most features, I never write a line of code manually. The agent understands the codebase, follows the established patterns, and usually makes the call I would have.

This creates a feedback loop that's genuinely strange to describe. The assistant I'm building is also my development partner. When I ask Jarvis via Telegram to add a feature, it delegates to Claude Code, which opens a PR. The system extends itself.


What It Costs to Run

I expected a system this involved to be expensive. It isn't.

  • Claude, the brain: $20/month. Jarvis runs on a Claude Pro subscription through Claude Code, not the metered API, so there's no per-token bill no matter how much I talk to it. It even routes models by task: Haiku for everyday conversation, Opus when it's writing code.
  • GCE e2-small VM: ~$13/month. A single small Compute Engine instance hosts the whole backend.
  • ElevenLabs voice: $6/month on the Starter plan, and even that is optional: there are free TTS options if you don't mind dropping the custom voice.
  • OpenAI (Whisper, DALL·E, embeddings): under $2/month. Usage is low and these are cheap per call.
  • Pinecone, Pluggy, and the rest of GCP (Datastore, Cloud Scheduler): $0. All comfortably inside their free tiers.

All in, Jarvis runs for around $40 a month, and the single biggest line item is a consumer Claude subscription I'd probably be paying for anyway. The part that should feel expensive, the always-on intelligence, is the part that's basically flat-rate, precisely because it runs on Claude Code instead of billing me by the token.


What Makes It Actually Useful

The gap between "AI assistant" and "useful AI assistant" is almost entirely context. Generic assistants are impressive in demos and mediocre in practice because they have no idea who you are.

What made Jarvis cross that gap:

  1. Deep integrations: not API wrappers, but MCP tools with real access and real schemas
  2. Persistent memory: facts you tell it and patterns it finds on its own
  3. Proactive outputs: it shows up, it doesn't just respond
  4. A persona with opinions: less tool, more collaborator

It's not perfect. MCP servers disconnect. Prompts drift. Occasionally it contradicts something from last week. But it's useful in a way that no off-the-shelf assistant has been, and at this point, I'd notice if it disappeared.

Top comments (0)