DEV Community: Tomoki Ikeda

I Gave Claude Code a Memory — Here's How MCP Connects AI Tools to Your Knowledge Base

Tomoki Ikeda — Tue, 24 Mar 2026 15:21:27 +0000

Claude Code doesn't remember what you built last week.

Neither does ChatGPT. Neither does Cursor. Every AI tool starts fresh — no memory of your architecture decisions, debugging sessions, or that clever workaround you spent two hours on.

In my previous article, I showed how Nokos auto-captures coding sessions. But capturing is only half the story. The other half: letting AI tools search that knowledge.

That's where MCP comes in.

What is MCP?

Model Context Protocol is an open standard that lets AI tools call external functions. Think of it as "APIs for AI" — instead of you copying context into a prompt, the AI calls a tool to fetch what it needs.

Nokos exposes three MCP tools:

Tool	What it does
`search_nokos`	Search memos + coding sessions by keyword or meaning
`get_nokos`	Fetch full content of any item
`save_session`	Save the current AI conversation to Nokos

With these three tools, any MCP-compatible AI can read from and write to your personal knowledge base.

The "Aha" Moment

Here's what it looks like in practice:

You (in Claude Code): "How did I implement auth in this project?"

Claude Code calls: search_nokos({ query: "auth implementation" })

Nokos returns: 3 relevant items
  - Session from Feb 15: "Implemented Firebase Auth with Device Authorization Flow"
  - Memo from Feb 10: "Auth architecture: stateless JWT + RLS"
  - Session from Feb 8: "Added unified auth middleware for API key, JWT, and Firebase"

Claude Code: "Based on your previous sessions, you implemented a three-layer
auth approach: Firebase for web, Device Auth JWT for CLI, and API keys
for programmatic access. The unified middleware is in unified-auth.ts..."

Claude Code answered with your actual implementation history — not generic advice, not hallucinated code, but what you actually built. The AI searched your knowledge base, found relevant sessions, and synthesized an answer.

Two MCP Servers, One Protocol

We run two MCP servers that expose the same three tools:

Local Agent (stdio)

For AI tools running on your machine — Claude Code, Codex, any MCP client:

Claude Code ←→ stdio ←→ @nokos/cli mcp ←→ Nokos API

The CLI starts an MCP server over stdio. Authentication uses Device Auth JWT tokens stored in ~/.nokos/credentials.json. Setup is one command:

nokos setup  # registers MCP server + hooks

Remote Server (Streamable HTTP)

For cloud-based AI tools — claude.ai, ChatGPT (when MCP-enabled):

claude.ai ←→ HTTPS ←→ mcp.nokos.ai ←→ PostgreSQL

This runs on Cloud Run with OAuth 2.1 + PKCE for authentication. The remote server connects directly to the database — no extra API hop.

Why two servers? Latency and access patterns are different. The local agent proxies through the API (simple, works everywhere). The remote server has direct DB access (faster for cloud-based tools that need quick responses).

How Search Works Behind the Scenes

When an AI calls search_nokos, it hits a hybrid search pipeline:

Full-text search (pg_bigm) — exact keyword matching, great for Japanese + English
Vector search (pgvector) — semantic matching via embeddings, finds related content even without keyword overlap

The results are merged and deduplicated. This means "auth implementation" matches sessions that talk about "Firebase authentication middleware" even without the exact word "auth."

Both memos and coding sessions live in the same index. Your handwritten notes and AI-generated session logs are searchable together.

The Knowledge Loop

The real power isn't any single tool — it's the loop:

1. You work with Claude Code → session auto-captured to Nokos
2. You write a memo in Nokos → indexed and embedded
3. Next Claude Code session → AI searches Nokos for context
4. AI gives better answers → captured again

Each cycle adds to the knowledge base. Over time, your AI gets more useful because it has more of your context.

This isn't AGI memory. It's a practical, queryable database of your actual work — sessions, notes, decisions — that any AI tool can tap into through a standard protocol.

What MCP Gets Right (And What's Still Hard)

What works well:

Standard protocol means one integration works across multiple AI tools
Tool descriptions guide the AI on when to call what
Structured responses let the AI reason about results

What's still hard:

AI tools don't always know when to search. Sometimes Claude Code will try to answer from training data when your actual implementation is one MCP call away
Context window limits mean you can't dump everything — the AI needs to be selective about what it retrieves
OAuth setup for remote MCP is still friction. The local stdio path is much smoother

Try It

If you use Claude Code:

npm install -g @nokos/cli
nokos login
nokos setup

After setup, try asking Claude Code: "Search my Nokos for recent sessions about [topic]." It will call search_nokos and return your actual work history.

For claude.ai, connect via the MCP settings panel at mcp.nokos.ai.

nokos.ai — free plan available.

This is article 5 in my series about building a SaaS with AI. Article 1: Zero Lines of Code. Article 2: AI Cost Split. Article 3: PostgreSQL RLS. Article 4: Session Capture.

Launching on Product Hunt March 31st — follow for updates!

"How I Auto-Capture Coding Sessions From 25+ AI Tools (Architecture Deep Dive)

Tomoki Ikeda — Sun, 22 Mar 2026 13:35:30 +0000

How many AI conversations did you have this week? 10? 50? 100?

How many can you find right now?

That's the problem. AI coding tools generate enormous amounts of knowledge — architecture decisions, debugging sessions, implementation discussions — and all of it vanishes when you close the terminal.

I built a system that captures every AI conversation automatically. It works with 25+ tools. The entire architecture is a hook, a CLI, and a parser pipeline.

Here's how it works.

The Problem: Knowledge That Disappears

Every AI coding tool stores conversations differently:

Claude Code writes JSONL to ~/.claude/projects/
Codex writes JSONL to ~/.codex/sessions/
Cursor stores data in SQLite
ChatGPT is accessible only via export
Copilot Chat logs to VS Code output channels

Some tools give you hooks. Some give you files. Some give you nothing.

I needed one system that could ingest all of them, normalize the data, and make it searchable. Not a viewer for each tool's format — a unified knowledge base.

Architecture Overview

AI Tool (session ends)
  ↓ hook / file watcher / manual push
@nokos/cli (local)
  ↓ reads session file, gzip compresses
Nokos API (cloud)
  ↓ tool-specific parser extracts messages
  ↓ AI generates title + metadata + summary
  ↓ vector embedding for semantic search
PostgreSQL (storage)
  ↓ full-text search + vector search
Nokos UI (web/mobile)

Fully automatic for tools that support hooks. One command for everything else.

Capturing Sessions: Three Patterns

Pattern 1: Session Hooks (Claude Code, Codex)

Claude Code has a SessionEnd hook. When a conversation ends, it fires automatically:

{
  "hooks": {
    "SessionEnd": [{
      "command": "nokos push --from-hook",
      "timeout": 30000
    }]
  }
}

The hook pipes session metadata to stdin. The CLI reads the transcript file, compresses it, and sends it to the API. The user does nothing.

We also hook into PreCompact — when Claude Code compresses a long conversation, we capture the intermediate state before it's compacted. This means you don't lose progress during long sessions.

Codex has a similar model — agent-turn-complete notifications with the conversation payload. Same CLI, different input format.

As a safety net, nokos setup also configures a cron job that runs nokos watch every 30 minutes. It detects changed session files and pushes them automatically — catching anything the hooks might miss.

Pattern 2: Extension APIs (Roo Code, Cline)

For VS Code-based tools, we built an extension that listens for task completion events and file changes, then sends conversations to the API automatically.

Pattern 3: Manual Push

For tools without hooks:

nokos push session.jsonl --tool cursor

One command. The CLI auto-detects the tool from the file path when possible.

The Parser Pipeline

This is where it gets interesting. Every AI tool has a different conversation format. Claude Code uses JSONL with content blocks. Codex uses OpenAI-style messages. Copilot Chat uses request/response pairs. The Anthropic format is a JSON array, not JSONL.

We have 7 dedicated parsers and a generic fallback:

Parser	Tools	Key difference
Claude Code	Claude Code	`userType: "human"`, content blocks
Codex	Codex CLI	`role` field, OpenAI format
Anthropic	Roo Code, Cline	JSON array of `MessageParam[]`
Copilot Chat	Copilot Chat	`request`/`response` pairs
Cursor	Cursor	Workspace JSON with threads
Aider	Aider	Markdown chat history
Gemini CLI	Gemini CLI	Gemini-specific content parts
Generic	Everything else	Best-effort: JSON, JSONL, plain text

Every parser produces the same output: a normalized array of {role, content} messages plus token stats and file change lists.

The real challenge is format detection — each tool's JSONL looks similar but has different field names, nesting structures, and content representations. Each parser checks for its tool's fingerprint before parsing. The generic fallback handles anything we haven't seen before, and it covers more cases than you'd expect.

AI Summary + Semantic Search

Raw session data is noisy — hundreds of JSONL lines with tool calls, file edits, and internal reasoning. Nobody wants to read that.

The API generates a structured summary (title, category, tags, sentiment) and a vector embedding for each session. This means you can search by meaning, not just keywords. "That session where I fixed the auth bug" matches even if the word "auth" never appears in the transcript.

Sessions and handwritten memos live in the same search index. Ask "What did I work on last week?" and you get both. This also powers Personal AI (RAG) — when you chat with Nokos, it retrieves relevant memos and sessions as context.

The Compaction Problem

Here's a production quirk nobody warns you about: Claude Code fires "Compacting conversation" multiple times per session.

Every compaction triggers the SessionEnd hook. A single coding session can generate 3-5 hook events. Without deduplication, you'd store 5 copies of the same conversation.

Our solution: daily session IDs.

const today = new Date().toISOString().slice(0, 10); // "2026-03-22"
const sessionId = `${baseSessionId}_${today}`;

Each push upserts by session ID. Multiple pushes in one day update the same record. The next day starts fresh. Simple, and it solved the problem completely.

MCP: Closing the Loop

The capture pipeline gets data into Nokos. MCP (Model Context Protocol) lets AI tools search from Nokos.

You (in Claude Code): "What was the approach I used for auth last month?"
  ↓ MCP tool call: search_nokos({ query: "auth approach" })
  ↓ Returns relevant memos and sessions
Claude Code: "Based on your session from Feb 15, you implemented..."

Three MCP tools: search_nokos, get_nokos, save_session. Available via local MCP (stdio, for Claude Code/Codex) and remote MCP (HTTP, for claude.ai/ChatGPT).

The loop closes: AI tools generate knowledge → Nokos captures it → AI tools retrieve it. Your AI remembers what you've built.

Try It

Set up auto-capture in 2 minutes:

npm install -g @nokos/cli
nokos login
nokos setup   # configures Claude Code SessionEnd hook

That's it. Every Claude Code session is now automatically captured, summarized, and searchable.

nokos.ai — free plan available to try it out.

This is article 4 in my series about building a SaaS with AI. Article 1: Zero Lines of Code. Article 2: AI Cost Split. Article 3: PostgreSQL RLS.

Launching on Product Hunt March 31st — follow for updates!

PostgreSQL Row-Level Security Saved My SaaS From Bugs I Didn't Know I Had

Tomoki Ikeda — Fri, 20 Mar 2026 12:58:04 +0000

PostgreSQL Row-Level Security Saved My SaaS From Bugs I Didn't Know I Had

I build Nokos, an AI note-taking app. Every user's memos, diaries, and coding sessions are stored in one PostgreSQL database. One authorization bug = one user sees another's private data.

Most apps have one layer of defense: application-level auth checks. We have two. The second layer — PostgreSQL Row-Level Security — has already caught bugs that our application code missed.

The Setup: One Function, Total Isolation

Our entire RLS system hinges on one PostgreSQL function:

CREATE OR REPLACE FUNCTION current_app_user_id() RETURNS UUID AS $$
  SELECT NULLIF(current_setting('app.current_user_id', true), '')::UUID;
$$ LANGUAGE SQL STABLE SECURITY DEFINER;

Every table policy checks: WHERE user_id = current_app_user_id().

On every API request, we set the session variable inside a transaction:

export async function withRLS<T>(
  userId: string,
  callback: (tx: TransactionClient) => Promise<T>,
): Promise<T> {
  return prisma.$transaction(async (tx) => {
    await tx.$executeRaw`SELECT set_config('app.current_user_id', ${userId}, true)`;
    return callback(tx);
  });
}

The true in set_config makes it transaction-local. When the transaction ends, the variable resets. No leakage between requests.

The Bug That RLS Caught

We use fire-and-forget patterns for non-critical async work — like generating embeddings after a memo is saved:

// After memo is created, generate embedding async
generateEmbedding(contentText)
  .then(async (embedding) => {
    // BUG: This runs OUTSIDE the withRLS transaction
    await prisma.$executeRawUnsafe(
      `UPDATE memos SET embedding = $1::vector WHERE id = $2::uuid`,
      embedding, memoId,
    );
  });

This looks correct. The UPDATE has the right memo ID. What could go wrong?

Everything. Because memos has FORCE ROW LEVEL SECURITY, and this code runs outside withRLS(). There's no app.current_user_id set. The function returns NULL. The policy evaluates user_id = NULL, which is always false.

Result: 0 rows updated. No error. No warning. Complete silence.

The embedding was generated, the API call to Gemini was paid for, and the UPDATE ran successfully — it just matched zero rows. PostgreSQL doesn't consider "zero rows updated" an error.

The fix:

generateEmbedding(contentText)
  .then(async (embedding) => {
    // FIXED: wrap in withRLS so the policy can match
    await withRLS(userId, async (tx) => {
      await tx.$executeRawUnsafe(
        `UPDATE memos SET embedding = $1::vector WHERE id = $2::uuid`,
        embedding, memoId,
      );
    });
  });

Without RLS, this bug would have silently worked in dev and production. The UPDATE would hit the row directly, no policy check. We'd never know the auth boundary was missing — until someone found a way to exploit it.

FORCE vs NO FORCE: The Decision That Matters

PostgreSQL RLS has a subtle setting: FORCE ROW LEVEL SECURITY applies policies even to the table owner. Without FORCE, the owner role bypasses all policies.

We use FORCE on every table except five:

Table	NO FORCE	Why
`users`	✓	Auth middleware looks up users by `firebase_uid` before user_id is known
`user_usage`	✓	Usage records are created during first login, before RLS context exists
`api_keys`	✓	API key auth looks up keys by SHA-256 hash before user_id is known
`device_refresh_tokens`	✓	Token refresh happens before user auth
`device_codes`	✓	Device Authorization Flow — codes exist before any user is authenticated

The pattern: tables that are accessed before authentication completes need NO FORCE. Everything else — memos, books, tags, chat sessions, media — uses FORCE.

This means if we accidentally run a query outside withRLS() on a FORCE table, it returns zero rows instead of leaking data. Default deny.

Two Roles: App vs Batch

We use two PostgreSQL roles:

nokos_app — Used by the API. Subject to RLS.
nokos_batch — Used by batch jobs (diary generation, embedding backfill). Has row_security = off, completely bypassing RLS.

Why? Batch jobs need to process data across all users. The daily diary generator reads all users' memos to create personalized diaries. Running this through RLS would require setting app.current_user_id for each user in a loop — technically possible, but fragile and slow.

The trade-off: nokos_batch has no authorization boundary. We accept this because batch code runs in a controlled environment (Cloud Scheduler → Cloud Run endpoint with secret-based auth), never exposed to user input.

Policy Patterns

Direct ownership (most tables):

CREATE POLICY memos_select ON memos
  FOR SELECT USING (user_id = current_app_user_id());

JOIN-based (junction tables without user_id):

CREATE POLICY memo_tags_select ON memo_tags
  FOR SELECT USING (
    EXISTS (SELECT 1 FROM memos
            WHERE memos.id = memo_tags.memo_id
            AND memos.user_id = current_app_user_id())
  );

Open insert, restricted read (users table — anyone can sign up):

CREATE POLICY users_insert ON users
  FOR INSERT WITH CHECK (true);
CREATE POLICY users_select ON users
  FOR SELECT USING (id = current_app_user_id());

Each policy is idempotent (DROP POLICY IF EXISTS before CREATE POLICY), so we safely reapply them on every deployment.

What I'd Do Differently

Start with FORCE on every table. We initially had some tables without it and had to migrate. Starting strict is easier than tightening later.
Test the silent failure. We now have integration tests that verify: without set_config, a query on a FORCE table returns zero rows. This catches the exact bug class described above.

it("Without set_config, non-superuser sees no data", async () => {
  const result = await appPrisma.$transaction(async (tx) => {
    // No set_config call — should see nothing
    return tx.memo.findMany();
  });
  expect(result).toHaveLength(0);
});

Document the NO FORCE exceptions. Every NO FORCE table should have a comment explaining why. Future developers (or future AI agents writing your code) need to know the reasoning.

Five Takeaways

RLS is your second line of defense, not a replacement for application auth. Both layers should exist.
FORCE ROW LEVEL SECURITY is the critical setting. Without it, your table owner bypasses all policies — which is most ORM connections.
Silent zero-row updates are the real danger. RLS doesn't throw errors; it just filters. Test for this explicitly.
Pre-auth tables need NO FORCE. Any table accessed before you know the user_id must opt out of FORCE.
Separate roles for app vs batch. Don't hack around RLS in batch jobs — give them a role that bypasses it cleanly.

Try It

Nokos is protected by this exact RLS setup across 20+ tables. Free plan available — your data is isolated at the database level, not just the application level.

Have you implemented RLS in your project? What patterns worked for you?

This is article 3 in my series about building a SaaS with AI. Article 1: Zero Lines of Code. Article 2: Claude + Gemini Cost Split.

Launching on Product Hunt March 31st — follow for updates!

Claude Writes the Code, Gemini Runs It: How Two Competing AIs Cut My SaaS Costs by 30x

Tomoki Ikeda — Wed, 18 Mar 2026 11:20:24 +0000

Claude Writes the Code, Gemini Runs It: How Two Competing AIs Cut My SaaS Costs by 30x

I build Nokos, an AI-powered note-taking app that auto-captures conversations from 25+ AI tools. Here's the thing — the product itself runs entirely on AI, and picking the wrong model for the wrong job almost killed the economics.

This is the story of how I went from "this will never be profitable" to "break-even at 150 users" by splitting my AI stack between two competing providers.

The Original Architecture (And Why It Was Bleeding Money)

When I first built Nokos, I used Anthropic's Claude for everything:

Feature	Model	Cost per call
Metadata generation	Claude Haiku	~¥0.5
AI Chat	Claude Sonnet	~¥4.5
Personal AI (RAG)	Claude Sonnet	~¥4.5
Daily Diary generation	Claude Haiku	~¥1.5
Session summary	Claude Haiku	~¥1.0
Natural Language Search	Claude Haiku	~¥0.5

The per-user costs added up fast. With the Plus plan priced at ¥480/month, I was losing money on every paying user. The math was simple: this product could never work.

The Realization: Not Every AI Call Needs a Genius

Here's what I noticed looking at my AI usage patterns:

Metadata generation — Extract title, tags, category, sentiment from a memo. Claude Sonnet is wildly overqualified for this. It's pattern matching, not reasoning.

Chat responses — Most user questions are "What did I write about X last week?" The answer is in the RAG context. The model just needs to synthesize it, not think deeply.

Diary generation — Take today's memos, write a narrative. This is structured content generation with clear inputs and outputs.

None of these need the most powerful model. They need a fast, cheap, good-enough model.

But one thing absolutely does need the best: writing the code itself.

The Split: Claude for Code, Gemini for Production

I migrated every production AI feature to Google's Gemini Flash in a single day. Here's the new architecture:

Claude Code (Opus) — The Architect

Claude Code writes all the application code — the API routes, the React components, the database migrations, the infrastructure config. This is where reasoning quality matters most. One subtle bug in RLS policy logic or Stripe webhook handling could be catastrophic.

Cost: High per-session, but I'm the only "user." Fixed cost, not per-customer.

Gemini Flash — The Production Workhorse

Every AI feature that runs for actual users:

Feature	Model	Cost per call
Metadata generation	Gemini Flash	~¥0.02
AI Chat	Gemini Flash	~¥0.07
Personal AI (RAG)	Gemini Flash	~¥0.15
Daily Diary	Gemini Flash	~¥1.0
Session summary	Gemini Flash	~¥0.15
Natural Language Search	Gemini Flash	~¥0.05
Embedding	gemini-embedding-001	~¥0.01

The Result

The chat/RAG queries — the most expensive calls — dropped from ~¥4.5 to ~¥0.07-0.15. That's where the "30x cheaper" comes from.

Across all plans, per-user costs dropped by 3x to 7x. The Free plan became sustainable. The paid plans became profitable.

What Cheap AI Unlocked

The cost reduction didn't just improve margins — it changed what the product could be:

1. Free users get real AI features

When per-user costs were high, giving Free users AI chat was financial suicide. After the migration, I can afford 50 chat turns/month as a taste of the product. Small cost, huge conversion driver.

2. "Kimagure" Diary for Free

Free users get a "whimsical diary" — Nokos (the AI) writes one when it feels like it (triggered on login, a few times per month). At ~¥1/diary, this is viable as a free feature. Magical enough to drive upgrades.

3. Stamina-based pricing instead of feature gating

Instead of locking features behind plans, every plan gets access to everything — just with different quotas. This only works when the per-call cost is low enough that occasional use doesn't destroy your margins.

4. Session ingestion at scale

Coding sessions from Claude Code, Codex, Cursor, and others get summarized by Gemini Flash at ~¥0.15/session. At the old Claude Haiku cost (~¥1.0), high-volume session ingestion would have been economically impossible.

Break-Even Math

With the migration complete and pricing adjusted (Plus ¥980/month, Pro ¥2,980/month), the break-even point landed at roughly 150 users.

The assumption: the vast majority of users (~90%+) will be on the Free plan — that's standard for freemium SaaS. The cost reduction made this survivable. At the old per-user costs, I would have needed 700+ users just to break even.

The biggest cost driver to watch? Session ingestion. Claude Code fires "Compacting conversation" 3-5 times per coding session, each counting as a separate ingest. Heavy users can rack up hundreds of sessions per month. This is the line item I monitor most closely.

Quality: Did It Actually Get Worse?

Honestly? For these use cases, I can't tell the difference.

Metadata extraction — Gemini Flash correctly identifies category, sentiment, tags, people, locations from memo text. The structured JSON output is reliable.
Chat/RAG — When the relevant context is already retrieved by vector search, the model just needs to synthesize an answer. Flash does this well.
Diary generation — The narrative quality is comparable. Users read their own memos reflected back as a story. The memos provide the substance; the model provides the structure.

Where I would notice a difference: complex multi-step reasoning, nuanced code generation, architectural decisions. That's why Claude Code still writes the code.

The Uncomfortable Truth About Model Selection

Most AI features in production apps are glorified text transformation. Extract these fields. Summarize this text. Generate a response given this context.

You don't need the most intelligent model for text transformation. You need:

Reliable structured output (JSON mode)
Good instruction following
Low latency
Low cost

Gemini Flash delivers all four.

The hard part — the part that actually requires intelligence — is designing the system, writing the prompts, building the data pipeline, and catching the edge cases. That's where Claude Code (Opus) earns its cost, once, during development.

Five Lessons for Your AI Stack

Audit your AI calls by complexity. Most of them are simpler than you think.
The model that builds your product and the model that runs it don't need to be the same. Claude builds; Gemini runs. Each does what it's best at.
Per-call cost determines your product design. At ¥4.5/query, you gate features. At ¥0.07/query, you give them away as samples.
Session ingestion is a hidden cost bomb. If your product processes AI coding sessions, one "session" can actually be 3-5 API calls due to conversation compacting.
Run the P&L before you pick a model. I built a spreadsheet with per-feature costs × expected usage × plan distribution. The answer was obvious once I saw the numbers.

Try It Yourself

Nokos is live — free plan with AI chat, diary generation, and session capture from Claude Code, ChatGPT, Cursor, and more. The entire product is powered by the dual-AI architecture described above.

What's your AI cost optimization story? I'd love to hear how others are handling the model selection tradeoff.

This is article 2 in a series about building Nokos as a solo developer. Article 1: Zero Lines of Code covered how the product was built entirely by AI.

Follow me for more — I'm launching on Product Hunt on March 31st!

Zero Lines of Code: How Claude Code and Gemini Built My SaaS

Tomoki Ikeda — Mon, 16 Mar 2026 09:02:00 +0000

I didn't write a single line of code. Not one.

Claude Code (Anthropic) wrote every line — frontend, backend, database, infra, tests. Gemini (Google) powers all the AI features in production.

Two competing AIs built one product. I just told them what to do.

The product is Nokos — an AI note-taking app that auto-captures your conversations from Claude Code, ChatGPT, Cursor, Copilot, and 20+ other AI tools. It's live and it works. Free tier available.

Here's exactly how it happened.

My Role: The AI Orchestrator

People ask: "If you didn't code, what did you do for 30 days?"

Everything that isn't code:

Product vision & design docs: I wrote the project plan and technical design with Claude.ai — describing the product concept, data model, and architecture in conversation. The AI drafted the documents; I made every decision
Architecture decisions: Every technical choice — database schema, auth strategy, document format — was mine. I described them in plain language, and Claude Code implemented them
AI cost audit: Manually reviewed every endpoint. Found 4 critical bugs Claude Code had written — including a storage limit that was tracked but never enforced
Pricing strategy: Researched 10+ competitors, designed a soft-gate model where free users taste every premium feature
Legal: Terms of Service (16 articles, 10 languages), downgrade policy, inactive account policy
QA: Ran every flow, filed bugs, described fixes for Claude Code to implement
Design review: Had Gemini 2.5 Pro review screenshots and critique the UI

This role isn't "non-technical." It's "technical without typing." You need to understand databases, APIs, and auth flows to make good decisions. You just don't type the code.

My workflow:

Me (Product Manager)
  ↓ "Build a session ingest endpoint with rate limiting"
Claude Code (Developer)
  ↓ writes code, runs tests, commits
  ↓ calls Gemini API to test AI features
Gemini Flash (Production AI)
  ↓ generates metadata, writes diaries, powers RAG

Three roles. Two AIs from competing companies. One product.

Why Two AIs?

Claude Code is the best AI developer I've found. 1M token context window holds the entire project. It refactors across dozens of files in a single pass.

Gemini Flash is the best production AI for the price. ~30x cheaper than Claude Sonnet. It powers all of Nokos's features: auto-tagging, daily diaries, natural language search, and Personal AI (RAG).

I didn't pick sides. I picked the best tool for each job.

Here's the wild part: during development, Claude Code called the Gemini API directly — testing prompts, evaluating outputs, iterating until the AI pipeline worked. An Anthropic AI invoking a Google AI, debugging its responses, and adjusting prompts to improve them.

But they didn't just coexist silently. They debated. When I asked Claude Code to consult Gemini on a design decision, Gemini would give its opinion. Claude would consider it, blend it with its own perspective, and then present me with a synthesized recommendation: "Here's what Gemini suggested, here's what I think, and here's my recommendation — what would you like to do?"

Two AIs from competing companies, having a constructive discussion, with a human making the final call.

The Tech Stack

Layer	Tech	Why
Frontend	Next.js 15	App Router, RSC, single codebase for web + mobile
Backend	Hono	Lightweight, fast, perfect for Cloud Run
Database	PostgreSQL + pgvector + pg_bigm	Vector search for RAG, bigram search for Japanese
Production AI	Gemini Flash	~30x cheaper than Claude. Fast metadata/diary/report generation
Embedding	gemini-embedding-001	768 dimensions, fire-and-forget on every save
Development	Claude Code (Opus)	1M context, wrote 100% of the codebase
Design Docs	Claude.ai (Sonnet)	Co-authored project plan and technical design
Design Review	Gemini 2.5 Pro	Screenshot analysis, UI/UX feedback
Auth	Firebase Auth	Google + GitHub + Email
Billing	Stripe	3 plans, multi-currency (JPY/USD)
Infra	GCP (Cloud Run, Cloud SQL, Cloud Storage)	Managed, auto-scaling
i18n	next-intl	10 languages from day one

The Claude-Gemini Collaboration, In Practice

Here's a real example. I asked Claude Code to build the AI metadata generation feature — when you save a memo, AI automatically generates a title, tags, category, sentiment, and importance.

Claude Code:

Wrote the Gemini API client
Designed the prompt (in both Japanese and English)
Called Gemini Flash to test the prompt with sample memos
Evaluated the JSON output quality
Adjusted the prompt based on Gemini's responses
Built the API endpoint with proper error handling
Added fire-and-forget embedding generation
Wrote tests

An Anthropic AI writing code that calls a Google AI, testing the Google AI's outputs, and iterating on prompts to improve them. This happened dozens of times throughout development.

When it came to pricing strategy, I had Claude Code call Gemini to analyze competitor pricing and evaluate our positioning. Gemini came back with sharp criticism of our approach. Claude didn't just pass it along — it incorporated Gemini's feedback, added its own analysis, and proposed a revised strategy. Then it asked me: "What do you think?" I made the call, and Claude implemented the changes across the codebase and documentation in one session.

What Went Wrong

1. Claude Code doesn't understand your business

It writes code. It doesn't understand why. I had to constantly prevent it from adding features I didn't need or over-engineering simple functions.

My fix: CLAUDE.md — a 300+ line file at the repo root describing every architecture decision, convention, and constraint. Claude Code reads it at the start of every session. It's the most important file in the repo.

2. AI-generated billing code is dangerous

In a single audit session, I found 4 critical issues:

A storage limit constant that was defined but never enforced in the upload handler
A backward-compatible endpoint that bypassed all rate limits
A memo counter that never decremented on delete (free users could get permanently locked out)
A monthly reset that only triggered from one code path (users hitting another path first would be blocked with stale counters)

Claude Code wrote all of this. Each piece worked in isolation. None worked together correctly.

Lesson: Always manually audit security and billing logic. AI doesn't think about exploit paths.

3. "It works locally" doesn't mean it deploys

Claude Code can't test against real infrastructure. I lost days to:

Docker build failing because pnpm strict isolation needed node-linker=hoisted (one line)
Cloud SQL Proxy TLS failing because the slim Docker image was missing ca-certificates
Prisma 7 breaking the url = env() syntax that Prisma 6 required

Each fix was trivial once found. Finding them was the hard part.

The Numbers

After 30 days of solo development with Claude Code + Gemini:

24 database tables with Row-Level Security on every single one
19 API routes, ~60 endpoints
15 AI tool integrations (Claude Code, Codex, Cursor, Copilot Chat, Aider, and more)
10 languages (Japanese, English, Chinese, Korean, Hindi, Spanish, Portuguese, German, Turkish, French)
473 tests passing
3 deployed services on Cloud Run
27 Playwright E2E tests
~$70/month infrastructure cost
0 lines of code written by a human
1 founder

What I Learned

The PM role becomes more important, not less. When AI writes all the code, the bottleneck shifts to decision-making. What to build, why, and in what order — these questions don't go away. They become everything.
Use competing AIs — and let them debate. Claude Code is great at building. Gemini is great at evaluating. When they disagree, you get a richer perspective. The human's job is to be the tiebreaker.
Design docs matter more than ever. I co-authored the project plan and technical design with Claude.ai before writing any code. These documents became the shared context that kept Claude Code on track. Without them, AI writes what it thinks you want. With them, AI writes what you actually want.
Audit everything that touches money or security. AI generates plausible-looking code that can have subtle, critical bugs. Trust but verify.
Ship before you're ready. I spent too long on world-building and 10-language support when I should have been getting user feedback.

Try Nokos

nokos.ai — unlimited memos, AI chat, and auto-capture from 20+ AI tools. Free to start.

If you use Claude Code, Cursor, or ChatGPT daily, try connecting them to Nokos. Your AI conversations are full of knowledge that vanishes after every session. Nokos catches it all.