DEV Community: Ryosuke Tsuji

Cutting Self-Built MCP Server Token Usage by 90% — The Parking Pattern

Ryosuke Tsuji — Fri, 01 May 2026 01:10:27 +0000

Hi, I'm Ryan, CTO at airCloset.

In my previous posts I introduced the full picture of our 17 internal MCP servers, an MCP server that lets you search 991 internal tables in natural language, a Graph RAG MCP for measuring initiative impact, and the Sandbox MCP that lets non-engineers publish AI-built apps safely.

This time I want to share something that came out of running those in production — a small trick we use to cut token consumption on self-built MCP servers.

The Annoyance: MCPs Eat More Tokens Than You'd Think

The first surprise when extending an AI agent with MCP is that token consumption is higher than expected.

An MCP tool call is, at the end of the day, JSON-RPC over HTTP. Both the arguments the AI sends and the result the tool returns land directly in the conversation context. If you implement things naively:

Sending whole files as arguments → thousands of lines of source code stick to the context
Returning all DB query rows → a multi-thousand-row × multi-column table sticks to the context

A single tool call can easily consume tens of thousands of tokens, putting the Claude Code session straight into compaction.

It's worse than just inefficiency: above a certain row count, the response simply fails to come back at all because it exceeds MCP's payload size limit.

When we were ramping up our internal MCP fleet, this little mismatch was reliably making the tool experience worse.

The Pattern: Park the Big Stuff Elsewhere, Pass Only a Key

The fix is embarrassingly simple:

Take the parts that tend to grow and move them off the MCP wire. Pass only a reference key (or URL) through MCP itself.

Both the request side and the response side benefit from the same idea.

Direction	What to remove	Where to park it
Request	Large files / source code	GitHub, Drive, or any object store
Response	Large list data / query results	Spreadsheet / GCS / BigQuery

Two examples from airCloset.

Example 1: Lighter Requests — Sandbox MCP × Self-Hosted Git Server

Last time I wrote about Sandbox MCP, the platform that lets non-engineers publish AI-built apps internally. The first iteration was fully MCP tool-driven file uploads.

sandbox_write_file(app_name: "todo-app", path: "index.html", content: "<html>...")
sandbox_write_file(app_name: "todo-app", path: "app.js", content: "import ...")
sandbox_publish(app_name: "todo-app")

The moment apps got slightly bigger, this collapsed:

Constant chunking: hitting the payload size limit, the AI looped through "first half of file A → second half → first half of file B → ..."
Tokens going up in flames: full source code landed in the conversation context — a single deploy of a few-thousand-line app could burn tens of thousands of tokens
Retries made it worse: the AI would "verify after sending" by re-reading the same file with sandbox_read_file. Write → read → write loops

So we changed the contract: MCP only returns a URL; the actual content moves over git push.

# 1. MCP returns a git URL — no payload involved
sandbox_init_repo(app_name: "todo-app")
# → https://mcp-sandbox.example.com/git/sandbox/ryan/todo-app.git

# 2. AI runs git in the background — MCP isn't involved
git init && git add . && git commit -m "init"
git remote add sandbox <returned URL>
git push sandbox main

# 3. Only the deploy command goes through MCP
sandbox_publish(app_name: "todo-app")

git push gives us:

No file size limit
Differential transfer — second-time pushes are fast
Source code never lands in the MCP conversation context

From the AI's point of view, it's just "I got handed a git URL; I push to it." Fundamentally different in token economics.

By the way, we don't use GitHub Organizations here. Issuing GitHub seats for every employee wasn't worth the cost or operational overhead, and we already had a self-hosted Git Server on GCE for a different purpose, so we just added one repo (sandbox-apps). The "park" doesn't have to be something you build from scratch.

Example 2: Lighter Responses — DB Graph MCP × Spreadsheet

DB Graph MCP is the MCP that lets us search and query 991 internal tables in natural language.

The annoying-but-common case here is "give me everything"-style queries:

SELECT * FROM service_main.user WHERE created_at >= '2026-01-01'

When the result is several thousand to tens of thousands of rows, you get either:

A multi-million-token response that triggers immediate session compaction
An MCP error because the payload exceeds the size limit

Or both. The "right" AI behavior is to do LIMIT 100 and analyze a sample — but if the user actually wanted the full list as a CSV, that doesn't help them.

So we built a "export to spreadsheet, return only the URL" mode into DB Graph MCP. You can opt in explicitly, but the MCP also auto-falls back to this mode whenever the result exceeds a row-count threshold. Even if the AI forgets to add a LIMIT and the query is about to return 10,000 rows, the server decides "this is too big to return inline," exports to a spreadsheet, and hands back the URL.

// Conceptual call (the real shape is documented in the tool description)
sql_query_database({
  query: "SELECT * FROM ...",
  output: "spreadsheet"  // ← explicit export mode
})

// Without `output`, the server still auto-falls back over a threshold (e.g. 500 rows)
sql_query_database({
  query: "SELECT * FROM ..."
})
// → server detects row count → spreadsheet export + URL response

// Either way, the response shape is the same
{
  url: "https://docs.google.com/spreadsheets/d/{...}/edit",
  rows: 12483,
  columns: ["id", "email", "created_at", ...],
  exported_reason: "row_count_exceeded"  // set on auto-fallback
}

The response is just a URL plus metadata. The real data never enters the context. "Light if you're careful" becomes "light even when you're not" — and that's what makes it feel safe in day-to-day operation.

This pattern works because a surprisingly large fraction of real use cases are just "I want this data somewhere I can use it later" — not "let's analyze this in chat with AI." Things like:

Save it to a spreadsheet I can stare at later
Share it with another team
VLOOKUP it against another sheet

For those, MCP's job ends at "write the query, drop the result somewhere." That's enough.

If the user genuinely does want AI-side analysis, you do still need the data in context. The standard workflow becomes a two-step: LIMIT 100 for sample analysis, then output: spreadsheet for the full export once the conclusion is clear.

How Much Did It Save?

Every MCP we run logs every tool call. After rolling these patterns out, total token consumption across all tools dropped 70–90%.

Bonus: Google Workspace OAuth Pairs Beautifully With This

A note on choosing where to "park" data: if your MCP authenticates via Google Workspace OAuth, this whole design becomes much easier.

The reason is that you get two things from a single OAuth flow — two birds with one stone:

Authentication for MCP itself — figuring out who's using the tool
Authorization for Workspace apps — scoped access to Spreadsheet / Drive / Gmail / Calendar

Once the user has logged into the MCP, you don't have to ask for any additional permissions to write to the park location. Which means you can:

Use the operating user's own permissions
To save files to that user's My Drive
Without the MCP itself owning a write-anywhere service account

Files end up in the user's drive, not on a shared service account. "Accidentally world-readable" or "visible to people who shouldn't see it" stops being a realistic accident — it's structurally prevented.

You also dodge the operational cost of issuing a separate GCP service account, storing its key safely, and managing its IAM policy out of band. The safety property genuinely comes for free.

There's one catch though:

The AI agent has to be able to read the spreadsheet URL it got back.

Returning a URL alone doesn't help the AI access the underlying data. Stock tooling in Claude Code can't read a Spreadsheet directly, so you need a separate Workspace-operating MCP.

At airCloset we run a dedicated MCP that wraps the Google Workspace APIs (Drive / Sheets / Gmail / Calendar). Combined with the export pattern above, it gives us a clean flow: "drop results into a spreadsheet → call into the Workspace MCP later if the AI wants to actually read them."

DB Graph MCP → exports to Spreadsheet → returns URL
                                          ↓
              Workspace MCP ← invoked when the AI decides it needs to read the data

From the user's side, this naturally produces the rhythm of "dump it into a spreadsheet first, ask AI to analyze only when needed."

Wrap-Up

A few small tricks for keeping self-built MCP server token consumption under control:

Move the parts that tend to grow off the MCP wire
Park them somewhere — Git server, Spreadsheet, GCS — and only pass keys/URLs through MCP
Pick a park that pairs well with Google Workspace OAuth — you get safety almost for free
If you want the AI to read parked data later, run a Workspace-style MCP alongside

It's an unflashy design move, but the difference in MCP usability before and after is dramatic.

If you're running self-built MCP servers internally and feeling the token squeeze, give it a try.

At airCloset, we're looking for engineers who want to build a new development experience together with AI. If you're interested, please check out our careers page at airCloset Quest.

Bridging 'I Want to Build' and 'I Want to Publish Safely' for Non-Engineers — Sandbox MCP

Ryosuke Tsuji — Mon, 27 Apr 2026 23:04:57 +0000

Hi, I'm Ryan, CTO at airCloset.

In my previous posts, I've introduced our internal MCP servers: an MCP server for natural-language search across all our databases, the full picture of our 17 internal MCP servers, and a custom Graph RAG that lets AI answer "Did that initiative actually work?".

This time I'm covering something a bit different: Sandbox MCP — a platform that lets non-engineer employees deploy apps they built with AI to a safe, internal-only URL with a single command.

The pitch is simple: "If Claude Code can build an app, why not publish it directly?" The hard part is making "directly" mean safely.

The Problem: Building Got Easy. Publishing Safely Did Not.

The arrival of Claude Code and other AI coding agents is reshaping how work happens inside our company.

"Building an app" used to be an engineer's job. You had to do requirements, design, frontend, backend, database, CI/CD, production deploy — all in one head.

Now PMs, designers, and customer-success folks are talking to Claude Code with "build me a screen that does X" and getting working mockups on the spot. Inside airCloset we're seeing more and more:

Mockups for new project proposals
Interactive reports that visualize research findings
KPI dashboards used only by a single team
Small tools for everyday operational improvements

These non-engineer outputs are growing fast. People are even saying "let's just run with this in production for a bit."

That's where the wall hits.

Easy to Build. Hard to Publish Safely.

Anyone can build something that runs locally now. Spin up python -m http.server 8000, view it on your Mac — five minutes max.

But the moment it becomes "I want my team to see this" or "I want others to actually use it," the difficulty curve goes vertical.

Where do you run it? Cloud means GCP/AWS accounts, IAM, billing.
What URL? Domain registration, DNS, SSL certificates, Cloudflare.
What about auth? If it touches confidential info, you need employees-only. OAuth implementation, domain restriction.
And the data? Is localStorage enough, or do you need a real DB? If a DB, who manages the password?
How do you deploy? Can you write a Dockerfile? Cloud Run config, env vars, service accounts, IAM.
What about security? What if the AI-written code has a vulnerability? An auth bypass?

You could "let the AI write all of it." But the result is left to the AI. Cloudflare misconfigured and exposed to the world. Auth bypassed. A service account with production database write access slipped into the code. The more code AI writes, the higher the risk of these accidents.

When a non-engineer says "I want to try building this," we need to clearly separate what the builder is responsible for from what the platform must guarantee by default.

There's also a quieter problem.

UI Inconsistency and Data Sprawl

When non-engineers build apps independently:

One person uses React, another Vue, another raw HTML
Buttons look and behave differently
Some store data in localStorage, some in Google Sheets, some in Firebase

After 10 or 20 such apps, internal tooling becomes chaos. Users wonder "wait, who built this one?" and "why does this button work differently?"

Even for internal tools, you need a baseline of consistency — both in design and in where data lives.

Sandbox MCP — Standing Between "Build" and "Publish"

That's why we built Sandbox MCP.

A non-engineer just says "build this" to Claude Code, and:

An app is generated using a unified UI Kit
They can verify it works locally
A single command deploys it to https://sbx-{nickname}--{app-name}.example.com/
Self-hosted OAuth on the Cloudflare Worker enforces internal-only access
Data is stored, isolated, in a dedicated Firestore database

— all of this completes within a single chat session with the AI.
The builder is only responsible for functionality. Security, data isolation, domain & SSL, authentication are all handled by the Sandbox MCP platform by default.

Scale

Resource	Details
MCP tools	10 (publish, status, schedule, list, delete, write_file, read_file, list_files, init_repo, unschedule)
Supported runtimes	Python (Flask + gunicorn), Node.js, static HTML/SPA, custom Dockerfile
URL	`sbx-{nickname}--{app-name}.example.com` (covered by Universal SSL, no ACM)
Authentication	Self-hosted OAuth on a Cloudflare Worker (Google Workspace)
Data	Firestore named DB `sandbox`, namespaced per nickname × app
Infrastructure	Self-hosted Git Server (GCE) + Cloud Run + Cloudflare Worker + KV
Deploy time	Typically 2–5 minutes (git push to public URL)

Let's walk through the internals.

What It Does — Web, API, DB, and Cron

Sandbox MCP supports four app shapes so it can cover almost any "I want to ship something internally" use case.

Type	Detected by	Use cases
Python	`.py` files present	Flask + gunicorn for APIs, analysis tools with a UI
Node.js	`package.json` present	Express APIs + UI; Bun also works
Static HTML/SPA	only `.html` files (no Python/Node)	nginx-served, React/Vue dist supported
Custom	includes a `Dockerfile`	Any runtime — Go, Rust, Bun, anything

Pick any of these and sandbox_publish deploys it with no extra config.

There's also sandbox_schedule for scheduled batch apps via Cloud Scheduler. Things like "post a risk summary to Slack at 9 AM every morning" become one-line cron setups.

sandbox_schedule(
  app_name: "risk-alert",
  schedule: "0 9 * * *",
  path: "/api/cron",
  timezone: "Asia/Tokyo"
)

Cloud Scheduler now hits the app's /api/cron every morning at 9. No need to open the scheduler UI or translate cron syntax into IaC.

Frontend — Unified Design via sandbox-ui-kit

Even apps built by non-engineers should feel consistent as a tool family. That's the job of the sandbox-ui-kit repo.

It lives on mcp-sandbox.example.com/git and provides:

File	Contents
`sandbox-ui.css`	Design tokens + glass-morphism component styles (dark/light)
`sandbox-ui.js`	Theme switcher, modals, toasts, generic JS utilities
`sandbox-db.js`	SandboxDB client SDK (more below)
`index.html`	Storybook-style component catalog
`README.md`	Full API documentation

The key: it's designed for AI to read and use.

The sandbox_publish tool description literally says:

When building an app, first read README.md with read_file and use the UI Kit.

When Claude Code builds a new app, it read_files this README, learns which CSS/JS to load and which component names to use, then generates code accordingly. Instead of a human walking the AI through UI guidelines, we centralized the "how to use" in one place targeted at the AI.

The result: apps built by anyone (with AI) end up with consistent buttons, modals, and forms.

Backend — Auto-Generated Dockerfile + Cloud Run

"I don't want to write Docker." "I don't want to think about runtime configuration." Classic non-engineer requests.

Sandbox MCP inspects the source files and generates a Dockerfile automatically.

// apps/mcp/git-server/src/sandbox/tools.ts
if (hasPy) {
  dockerfile = generatePythonDockerfile(hasRequirements);
  // Auto-create requirements.txt if missing
  if (!hasRequirements) {
    await writeFile('requirements.txt', 'flask\ngunicorn\n');
  }
} else if (hasPackageJson) {
  dockerfile = generateNodeDockerfile(true);
} else if (hasHtml) {
  dockerfile = generateStaticDockerfile();
}

For example, a Python app gets:

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV PORT=8080
CMD ["python", "-u", "$(ls *.py | head -1)"]

If requirements.txt is missing, flask + gunicorn get added automatically. AI can write from flask import Flask and the dependencies will resolve — no missing-package surprises.

Deployment uses gcloud run deploy --source, with Cloud Build handling the image build. App authors can write a Dockerfile, but they don't have to. No Dockerfile gets the standard, with one customizes — friendly to both non-engineers and engineers.

Database — Transparent Fallback Between localStorage and Firestore

"I want to save data. I don't want to set up a database."

The SandboxDB SDK handles that. The same code uses localStorage locally and Firestore once deployed.

<script src="https://mcp-sandbox.example.com/api/db/sdk.js"></script>
<script type="module">
  const db = new SandboxDB({ token: googleOAuthAccessToken });

  // Save (storage location auto-detected from hostname)
  const { id } = await db.collection('items').add({ name: 'test' });

  // List
  const items = await db.collection('items').get();

  // Get / update / delete
  await db.collection('items').doc(id).update({ name: 'updated' });
  await db.collection('items').doc(id).delete();
</script>

The SDK internals:

this._isLocal = location.hostname === 'localhost'
              || location.hostname === '127.0.0.1';

async add(data) {
  if (this._db._isLocal) return this._localAdd(data);  // localStorage
  return this._req('', 'POST', data);                  // Firestore REST API
}

When running on localhost, it uses localStorage. The moment it's deployed under sbx-*.example.com, it switches to Firestore. No code changes required.

This dramatically improves the experience of building apps with AI:

Local: no network, no auth, all features work
Deployed: same code runs, data is properly persisted
Development data never leaks into systems outside Sandbox (it physically can't reach them)

Firestore Namespace Isolation

Once deployed, data paths are strictly isolated:

sandbox_data/{nickname}--{app}/{collection}/{docId}

nickname: user identifier resolved via OAuth
app: Sandbox app name
_createdAt / _updatedAt: auto-attached by the SDK

Data from different apps is physically unreachable from each other. Even apps built by the same person live in different paths.

The most important point: we use a dedicated sandbox named database. It's a completely separate Firestore database from the (default) DB used by other internal systems. No matter how badly an app's code misbehaves, it can never touch data outside Sandbox.

Infrastructure — Wildcard DNS + Cloudflare Worker + Self-Hosted Git Server

Now for the infrastructure highlights.

How URLs Are Determined

The public URL takes the form:

https://sbx-{nickname}--{app-name}.example.com/

nickname is automatically pulled from the MCP OAuth session. When a user logs into Sandbox MCP via Google, the email is looked up in a Firestore users collection to resolve the nickname. Users never have to repeat "I am ryan" each time.

r.tsuji@air-closet.com → users[r.tsuji@air-closet.com].nickname → "ryan"
                                                       ↓
                                  sbx-ryan--todo-app.example.com

Note: The users collection is kept in sync from a separate internal pipeline (a daily batch that pulls from our HR system and Google Workspace directory). Sandbox MCP just reads from it — no need to maintain its own employee master.

The benefit: you can tell whose app it is just by reading the URL. When someone says "go look at ryan's todo-app," reading the URL aloud naturally communicates ownership.

Instant Publishing via Cloudflare Worker

Normally, publishing a new subdomain requires:

Adding A/CNAME DNS records
Issuing an SSL certificate (15–30 minute wait with ACM or Let's Encrypt)
Configuring a load balancer or DomainMapping

Sandbox MCP skips all of this with a Cloudflare Edge Router Worker.

DNS is fixed as *.example.com wildcard + Cloudflare proxy, with Universal SSL automatically covering every subdomain. The Cloudflare Worker receives all *.example.com/* traffic and routes by subdomain.

The logic is three-tier:

// apps/worker/edge-router/src/index.ts
export async function handleRequest(request, env) {
  const url = new URL(request.url);

  // ① sbx-* prefix → Sandbox routing
  const sandboxSub = extractSandboxSubdomain(url.hostname);
  if (sandboxSub !== null) {
    return handleSandboxRequest(request, url, sandboxSub, env);
  }

  // ② KV route:{subdomain} registered → Cloud Run proxy
  const subdomain = extractSubdomain(url.hostname);
  if (subdomain) {
    const proxyResponse = await handleCloudRunProxy(request, url, subdomain, env);
    if (proxyResponse) return proxyResponse;
  }

  // ③ Otherwise → fetch(request) passthrough
  return fetch(request);
}

When sandbox_publish finishes, all it does is write a route:{nickname}/{app} key into Cloudflare KV. That single write makes the new subdomain routable instantly.

await kvPut(`route:${nickname}/${appName}`, serviceUrl);

No DNS setup. No waiting for SSL issuance. No IaC deploy. Everything completes within the MCP tool execution.

Self-Hosted Git Server for Larger Apps

This setup actually started out without git at all.

Since the primary users were going to be PMs and CS folks, we figured "git concepts are too high a bar — let's keep everything inside MCP tools." Write files via sandbox_write_file, deploy via sandbox_publish. That should be enough, we thought.

The approach hit two walls quickly.

Wall 1: Constant chunking

MCP tool calls travel over HTTP, with a payload size limit. React/Vue build bundles, SPAs with images, business tools with dozens of files — they don't fit in a single call. We added an append mode to sandbox_write_file for chunking, but every "first half of file A → second half of file A → first half of file B → ..." sequence triggered error recovery and retries. Deployments became flaky.

Wall 2: Massive token consumption

This was the real killer. When you tell the AI "deploy this app," it sends the entire source as MCP tool arguments. The file contents land in the conversation context, and a few-thousand-line app burns through tokens fast. A single deploy easily consumed tens of thousands of tokens, and Claude Code sessions hit compaction quickly.

Worse, the AI tends to "verify after sending" — re-reading the same file via sandbox_read_file. Write → read → write loops, with tokens going up in flames.

So we pivoted to using git push as well. With git push:

No file size limit
Differential transfer — second-time pushes are fast
Source code stays out of the MCP conversation context (no AI tokens consumed)

We never expected business-side employees to run git push by hand. But if Claude Code runs git commands in the background, it's not a barrier. The user just says "build this and publish it" — the AI runs git init && git push on its own when needed.

Why a Self-Hosted Git Server?

Once we adopted git push, the next question was: where do we host the repos? We considered using GitHub Organizations but ruled it out.

Issuing and managing GitHub accounts for every employee — including non-engineers — wasn't worth the cost or the operational overhead. Paying for a GitHub seat just to ship one app is overkill.

Fortunately, we already operated a self-hosted Git Server on GCE for a different purpose: hosting an internal "read-only Git MCP for code investigation." A VM with repositories cloned under /mnt/repos/.

We just added a Git Smart HTTP Protocol endpoint and one new repo (sandbox-apps) to it. The VM was already running, so the marginal cost was near zero. Authentication piggybacks on the existing Google OAuth setup. Repository management is just OS directory operations. Borrowing space on the existing internal Git Server was vastly simpler than spinning up new infrastructure.

Actual Usage Flow

# 1. Get the git URL from the MCP tool (nickname is automatic)
sandbox_init_repo(app_name: "my-app")
# → https://mcp-sandbox.example.com/git/sandbox/ryan/my-app.git

# 2. Local commit (the AI does this in the background)
cd ~/my-app/
git init && git add . && git commit -m "init"
git remote add sandbox <returned URL>

# 3. Push
git push sandbox main
# Username: oauth2accesstoken
# Password: $(gcloud auth print-access-token)

# 4. Deploy
sandbox_publish(app_name: "my-app", description: "...")

Auth uses a Google OAuth token as the Basic Auth password (same pattern as GCP Source Repos). Only @air-closet.com accounts pass. No GitHub account required — any employee can push.

The remote repo is configured with receive.denyCurrentBranch=updateInstead, so the working tree updates server-side on push. Cloud Run uses that directory as --source, so there's no extra step between push and publish.

For small apps (a few files, hundreds of lines each), sandbox_write_file still works fine. Switch between MCP-only and git push depending on app size.

Security — Four Independent Gates

That covered the "convenient to build" side. Now the "safe to publish" side.

As I noted at the start, exposing AI-generated code in front of users is risky. So Sandbox MCP layers four independent safety mechanisms that don't depend on the app's own implementation.

① Public-Facing Gate — Self-Hosted OAuth on the Cloudflare Worker

sbx-*.example.com sits behind a self-hosted OAuth gate built into the same Cloudflare Worker that handles routing. When someone visits, the Worker first checks the cortex_session cookie; if it's missing or invalid, it redirects to a Google Workspace SSO entry point (auth.example.com/__edge/auth/start). Without an @air-closet.com account, requests never reach Cloud Run.

This is independent of the app's implementation. Even if the AI didn't write a single line of auth code, the Worker stops the request first. "Accidentally public" is physically impossible.

Why we migrated from ZeroTrust Access to self-hosted OAuth

The first iteration used Cloudflare ZeroTrust Access. You just configure the @air-closet.com domain restriction in the Cloudflare dashboard and you're done — no auth code at all. As a starting point it was ideal.

The catch: ZeroTrust's free tier caps at 50 users. As headcount grew and Sandbox MCP usage spread, we approached the cap, and switching to pay-as-you-go (~$7/user/month) wasn't trivially cheap. On top of that we wanted to share the same auth foundation with internal apps in production (KPI dashboards, inventory tools, etc.), so we decided to consolidate everything into a self-hosted OAuth with no user limit.

Conveniently, the Cloudflare Worker already in front of every *.example.com request — the routing layer Sandbox MCP relies on — was perfectly positioned for this. A small extension gave us:

auth.example.com/__edge/auth/start to kick off Google OAuth 2.0
auth.example.com/__edge/auth/callback to exchange tokens, persist the session in Upstash Redis, and issue a cortex_session cookie scoped to Domain=.example.com
Worker-level gating for sandbox + internal-app subdomains, injecting X-Cortex-User-Email and friends into the Cloud Run request when authenticated

All of this fits inside the existing Worker — no extra Cloud Run, no extra VM. Workers do have a CPU-time budget, but OAuth flows and cookie checks complete in single-digit milliseconds, so latency is indistinguishable from ZeroTrust.

Net result: the user cap is gone, anyone with @air-closet.com can use Sandbox out of the box, and the auth implementation is fully visible in our own codebase.

② Deploy Gate — MCP OAuth

Operations like sandbox_publish and sandbox_delete enforce Google OAuth on the MCP server side. Sandbox MCP implements RFC 8414 (/.well-known/oauth-authorization-server), so Claude Code runs the OAuth flow automatically on first connection.

The strongest guarantee is "you can't accidentally update or delete someone else's app."

When multiple people share a Sandbox MCP, an AI accident like "wait, I overwrote a coworker's app while updating mine" would be devastating. To prevent that, the AI doesn't get to decide whose app is being touched. The server injects nickname automatically from the OAuth session.

// Strip the `nickname` property from the MCP tool schema and have
// the server force-inject the logged-in user's nickname.
function injectNickname(tool: McpTool, userNickname?: string): McpTool {
  const { nickname: _, ...restProperties } = tool.schema.inputSchema.properties;
  return {
    schema: { ...tool.schema, inputSchema: { ...tool.schema.inputSchema, properties: restProperties } },
    execute: (args, ctx) => tool.execute({ ...args, nickname: userNickname }, ctx),
  };
}

From the AI's perspective, the nickname input doesn't exist. Even with a prompt injection like "delete ryan's app," there's no mechanism to do so. "You can only touch your own apps" is enforced at the API spec level.

On top of that, inputs are validated strictly against /^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$/, rejecting shell-injection and path-traversal patterns (.., /).

③ Data Gate — SandboxDB Namespace Isolation

As mentioned earlier, data lives at:

sandbox_data/{nickname}--{app}/...

Per request, the SandboxDB API resolves the path server-side:

Browser (OAuth): resolve email → users → nickname, take app from the Origin header
Backend (SA token): take nickname/app from the X-Sandbox-App header (required — missing returns 400)

The client cannot spoof the path.

We deliberately do not use the K-Service header (the Cloud Run-injected service name). That's a client-spoofable header, and another implementation that relied on it had a "read another app's data" vulnerability disclosed. Requiring X-Sandbox-App keeps the only valid route through an explicitly server-validated path.

The clincher: a dedicated named database for Sandbox. Instead of the (default) DB (which contains data from other systems), we use an independent Firestore database called sandbox, and the Cloud Run SA gets an IAM Condition that allows access only to the sandbox DB.

// From infra/mcp/git-server/index.ts
// IAM Condition on roles/datastore.user:
//   resource.name == "projects/.../databases/sandbox" ||
//   resource.name.startsWith("projects/.../databases/sandbox/")

No matter how badly the AI-written code goes wrong, it physically cannot reach data outside Sandbox.

④ Execution Gate — Cloud Run SA + IAM

All sandbox-* Cloud Run services run under a single shared SA (e.g. sandbox-run). The permissions on that SA are minimal.

roles/logging.logWriter (write its own logs)
roles/bigquery.jobUser + bigquery.dataViewer scoped to the sandbox_logs dataset only (its own access logs, nothing else)
roles/datastore.user (IAM Condition limiting to sandbox DB)

What it does not have:

Access to the (default) Firestore that holds data from other systems
Access to BigQuery datasets used by other internal systems
Direct access to Secret Manager
Permission to manage other Cloud Run services

In other words, even if a Sandbox app goes completely rogue, the blast radius is limited to sandbox_data and sandbox_logs. Nothing outside Sandbox is affected.

Logging — Apps Can Query Their Own Access Logs

Sandbox apps eventually want to look at logs too. "How many views did this page get?" "Who hit that error?"

We forward Cloud Run request logs to BigQuery via a Logging Sink:

// From infra/mcp/git-server/index.ts
const sandboxLogSink = new gcp.logging.ProjectSink('sandbox-logs-sink', {
  destination: `bigquery.googleapis.com/projects/${projectId}/datasets/sandbox_logs`,
  filter: [
    'resource.type="cloud_run_revision"',
    'resource.labels.service_name:"sandbox-"',
    'logName:"run.googleapis.com%2Frequests"',
  ].join(' AND '),
  bigqueryOptions: { usePartitionedTables: true },
});

The sandbox_logs dataset is locked down with project-owner-only ACLs (it contains PII like remoteIp and User-Agent), and the Sandbox SA gets a tightly scoped bigquery.dataViewer to it.

This lets apps query their own access logs from BigQuery. "Post last week's user count for this app to Slack" can be done entirely inside Sandbox.

Tool Design — Making AI Use Tools Correctly

Let me close with a note on tool definitions. I personally think this is where MCP design really makes or breaks.

Sandbox MCP exposes 10 tools:

Tool	Purpose
`sandbox_publish`	Start deploy (async)
`sandbox_deploy_status`	Check deploy status
`sandbox_init_repo`	Initialize git push repo
`sandbox_write_file`	Write file (overwrite/append)
`sandbox_list`	List apps
`sandbox_delete`	Delete app
`sandbox_schedule`	Configure Cloud Scheduler
`sandbox_unschedule`	Remove Cloud Scheduler
`sandbox_read_file`	Read source code
`sandbox_list_files`	List files

Whether the AI picks the right tool at the right moment is almost entirely determined by what's written in the tool description.

For example, the description for sandbox_publish covers not just functionality but also:

Supported app types and required files (Python / Node.js / static HTML / custom)
Startup command and PORT requirement per type
When to use write_file vs git push
How to use SandboxDB (with SDK code samples)
How to use the UI Kit (explicit instruction to fetch README.md via read_file)

With this in place, the AI can autonomously do:

User says "build me a tool that displays Slack emoji scores"
→ Reads sandbox_publish description and sees "first read the UI Kit README"
→ Calls read_file on sandbox-ui-kit/README.md
→ Generates HTML/CSS/JS following the guidelines
→ Sees the SandboxDB SDK usage in the description and integrates persistence
→ Calls sandbox_publish

— without asking the user a single follow-up question. Writing not just "what it does" but "what to do with it" into the tool definition is the secret to AI-friendly design.

If you write tool definitions tersely, the AI keeps coming back asking "what should I do next?" The description is less of a human-facing doc and more of an AI-facing runbook. That framing helps a lot.

Wrap-Up

Sandbox MCP exists to answer two challenges of building internal tools in the AI era:

Building is now possible for anyone, thanks to AI
Publishing safely remains hard

To close that gap, we:

Standardized every layer on the platform side: frontend / backend / DB / infra / auth / domain / SSL
Embedded a runbook into tool descriptions so the AI naturally uses things correctly
Layered four access gates (Worker-level OAuth / MCP OAuth / namespace isolation / IAM) so safety doesn't depend on the implementation being correct

Building this, what struck me again is that the role of platforms in an AI-powered development era is shifting. Platforms used to optimize for "easy for humans." Now they also need to optimize for "used correctly by AI." Tool descriptions are AI-facing docs, and safety must be designed assuming AI will write incorrect code.

At the same time, by limiting what the builder is responsible for, we drastically lower the barrier to "let me just try something." That's the entry point that turns a non-engineer's "I want to build this" into actual operational improvements.

I hope this is useful for anyone designing internal platforms.

At airCloset, we're looking for engineers who want to build a new development experience together with AI. If you're interested, please check out our careers page at airCloset Quest.

Still Measuring Initiative Impact Manually? How We Used Graph RAG + MCP to Make It Explorable

Ryosuke Tsuji — Mon, 20 Apr 2026 15:27:35 +0000

Hi, I'm Ryan, CTO at airCloset.

In my previous posts, I introduced an MCP server that lets you search all company databases in natural language and showed the full picture of our 17 internal MCP servers. This time, I'm diving deep into what I briefly mentioned as "Biz Graph."

This is the story of how we represented the relationship between business initiatives and KPIs as a graph structure, enabling AI to answer "Did that initiative actually work?"

Why Graph RAG?

To get more value from AI, what matters is not just feeding it data — it's conveying the relationships between data.

If your data volume is small enough, tools like NotebookLM can deliver great results. But you can't fit all your business data into a context window. Initiative reports, KPI spreadsheets, marketing weekly reports, logistics daily metrics — you simply cannot dump all of that into a prompt.

That's why I believe the best available option right now is Graph RAG: making the right data searchable at any time, along with its relationships. When AI is asked "What metrics are related to this initiative?", it can traverse the graph and extract only the information it needs — because that structure was built in advance.

But there's a catch.

Making Non-Graph Data Into a Graph

Many of you have heard of "knowledge graphs" and "GraphRAG." But when you actually try to build one, most people hit the same wall:

Business data doesn't naturally form a graph.

With our DB Graph project, things were different. Tables had foreign keys. ORMs had @JoinColumn and belongsTo. Relationships already existed in the data — we just had to parse and convert them.

But the relationship between "initiatives" and "KPIs" has none of that.

A meeting slide says "SNS ad campaign launched"
A spreadsheet records "This week's new members: 1,234"
There's no FK between these. No join key.

"The SNS campaign affected new member signups" — that relationship exists only in someone's head. It's nowhere in the spreadsheet.

This is what "business data doesn't form a graph" means. The relationships between entities aren't self-evident — you have to design the graph structure itself.

The Problem: "Did That Initiative Actually Work?"

Every week, our company reports initiative progress in all-hands meetings and group-level standups.

"We launched the spring SNS ad campaign"
"We improved the recommendation engine"
"We're raising our CS SLA achievement rate"

— Dozens of initiatives reported weekly. Hundreds per year. Over 5,000 total.

Meanwhile, a separate spreadsheet tracks 200+ metrics daily and weekly: member count, new signups, retention rate, satisfaction scores, acquisition CPA...

The problem: these two worlds are completely disconnected.

"How much did last month's SNS campaign contribute to new member acquisition?"

Answering this requires:

Confirm the initiative's execution period (which slide was that again?)
Find KPI data for that period (which sheet, which tab?)
Align timeframes and compare numbers (week-over-week? month-over-month? year-over-year?)
Check if other initiatives were running simultaneously (confounding factors?)

This manual analysis takes 30-60 minutes, happening every week for multiple initiatives. Realistically, most initiative effectiveness reviews end with "it probably worked, I think."

Biz Graph: The Big Picture

We built Biz Graph to solve this.

Scale

Note: The numbers below differ from actual values but convey the order of magnitude. In any case, this is far too much data to fit in an LLM's context window.

Resource	Count
Nodes	~10,000 (14 types)
Edges	~71,000 (22 types)
Initiatives	~5,000
KPI Metrics	~4,000 (members/signups/retention/satisfaction/UX/marketing/logistics)
Marketing Channels	~100 (SEM/LINE/email/CRM etc.)
Data Sources	9 tables/spreadsheets

Three Components

Biz Graph Transformer — Weekly graph rebuild from all data sources (Cloud Run Job, every Friday 22:00)
Biz Graph MCP Server — Graph search + time series analysis accessible from AI (Cloud Run)
Biz Data Loader — Daily auto-import of marketing/logistics data (Cloud Run Job, every morning 6:00)

The Core Design: The Week Node

Here's the heart of this article.

How do you connect "initiatives" and "metrics" in a graph? The obvious first thought is direct edges:

Initiative("SNS campaign") ──AFFECTS──→ Metric("new_members")

This design breaks down. Three reasons:

Edge explosion: 5,000 initiatives × 4,000 metrics = up to 20 million edges
Causal uncertainty: "SNS campaign affected new members" is a hypothesis, not a fact. Direct edges make it look like a confirmed relationship
Missing temporal info: There's no way to express when the impact occurred

Instead, we designed Week nodes as shared anchors for indirect connections.

Initiative("SNS campaign")     ──ACTIVE_DURING_WEEK──→  Week:2026-03-03
Metric("new_members")          ──HAS_DATA_AT──→         Week:2026-03-03
QualityMetric("avg_rating")    ──HAS_QUALITY_DATA_AT──→ Week:2026-03-03
MarketingChannel("SEM brand")  ──HAS_MARKETING_DATA_AT──→ Week:2026-03-03

Initiatives and metrics aren't directly connected — they're indirectly linked through the same week.

Why This Works

1. Prevents edge explosion

Initiatives only connect to "weeks they were active." Metrics only connect to "weeks that have data." Instead of a cross-product, each connects independently to Week nodes — edge count grows linearly.

2. Expresses co-occurrence, not causation

"Initiatives that were active the same week as metric fluctuations" — this isn't asserting causation, it's a structure for discovering causal candidates. It leaves room for human or AI judgment.

3. Edge types distinguish data sources

Same Week node, but HAS_DATA_AT (business KPIs), HAS_QUALITY_DATA_AT (service quality), HAS_UX_DATA_AT (UX metrics), HAS_MARKETING_DATA_AT (marketing), HAS_LOGI_DATA_AT (logistics) — "what kind of data" is embedded in the edge type itself.

4. Time series traversal is natural

Week nodes are connected by NEXT_WEEK edges. "How did metrics change in the 3 weeks before and after initiative start?" can be expressed as graph traversal.

MetricDomain: Bridging Worlds Without Join Keys

Week nodes tell us "what happened the same week," but not which metrics are relevant to a given initiative. There's no point looking at logistics data when analyzing an SNS ad campaign.

However, there's no join key between initiative categories ("Marketing (Advertising)") and metric groups ("New Acquisition"). The knowledge that "ad initiatives relate to new acquisition" is tacit — it exists only in people's heads.

MetricDomain (6 domains) structuralizes this tacit knowledge.

Domain	Meaning	Connected metric types
acquisition	New acquisition	Marketing channels, new member count, registration CV
retention	Retention / churn prevention	Member count, churn rate, plan transitions
service_quality	Service quality	Satisfaction, ratings
operations	Operations	Selection, shipping, returns, logistics KPIs
ux	UX experience	Sessions, funnels
revenue	Revenue / purchases	Purchase CV, upsell

These 6 domains aren't fixed — they can be freely added or split as the business grows and the organization evolves. Domain definitions are just mapping tables in code, so the cost of expansion is nearly zero.

By humans defining the mapping between initiative categories and MetricDomains, and between metric groups and MetricDomains, we enable "automatically show acquisition-related metrics when viewing a marketing initiative."

Category("Marketing ads") ──CATEGORY_IN_DOMAIN──→ MetricDomain("acquisition")
                                                           ↑ IN_DOMAIN
                                                  MetricGroup("New Acquisition")
                                                  MarketingChannel("SEM brand")
                                                  UxMetric("registration_completed")

Result: Pass domain: "acquisition" to compare_metrics, and the initiative overlay automatically filters to acquisition-related initiatives only.

SIMILAR_TO: AI Answers "Have We Done Something Like This Before?"

Another unique design element: SIMILAR_TO edges.

Initiative text (title + description) is vectorized to 768 dimensions using Vertex AI's gemini-embedding-001, then BigQuery's VECTOR_SEARCH auto-detects similar pairs with cosine similarity >= 0.75.

SELECT base.id, query.id, distance
FROM VECTOR_SEARCH(
  TABLE cortex.biz_graph_nodes,
  'embedding',
  (SELECT id, embedding FROM cortex.biz_graph_nodes WHERE node_type = 'Initiative'),
  top_k => 6,
  distance_type => 'COSINE'
)
WHERE base.id != query.id AND distance <= 0.25  -- distance <= 0.25 = similarity >= 0.75

Currently ~13,000 SIMILAR_TO edges exist. Up to 5 similar initiatives are pre-computed for each one.

"Didn't we run a similar SNS campaign last summer? How did that one perform?" — traverse similar initiatives on the graph instantly, then compare KPI changes during weeks those initiatives were active.

Real Usage Examples

Here's how exploration works via MCP tools.

All tool execution examples below run through MCP from an AI coding agent. The response format matches the real system, but numbers are dummy values and content is simplified.

"Find marketing initiatives that drove acquisition"

search_initiatives({
  "query": "SNS advertising for new acquisition",
  "domain": "acquisition",
  "dateFrom": "2025-10-01",
  "dateTo": "2026-03-31",
  "limit": 5
})

Response (excerpt):

5 initiatives found (by vector similarity):

1. SNS Ad Spring Collection Campaign (2026-03-09)
   Category: Marketing (Advertising)
   Similarity: 892/1000

2. Instagram Reels Ad Test (2026-02-23)
   Category: Marketing (Advertising)
   Similarity: 845/1000
   ...

"Show me the impact of that initiative"

get_initiative_context({
  "initiative_id": "Initiative:2026-03-09:SNS Ad Spring Collection Campaign",
  "metric_window_days": 30
})

Response (excerpt):

## Initiative Context

Title: SNS Ad Spring Collection Campaign
Execution Period: 2026-03-01 to 2026-03-31
Category: Marketing (Advertising)
Target Domain: acquisition

## Similar Initiatives (SIMILAR_TO)
- Instagram Reels Ad Test (similarity: 0.82)
- 1-Month Free Trial Campaign (similarity: 0.78)

## KPI Changes During Initiative (30-day window)
| Metric | Pre-avg | Post-avg | Change |
|--------|---------|----------|--------|
| new_regular | 50 | 60 | +20.0% |
| new_lite | 30 | 35 | +16.7% |
| monthly | 1,000 | 1,050 | +5.0% |

## Service Quality Metrics
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| avg_rating | 3.50 | 3.60 | +2.9% |

## UX Metrics
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| total_sessions | 10,000 | 12,000 | +20.0% |
| registration_completed | 100 | 130 | +30.0% |

This is the power of the Week node design. Identify the weeks an initiative was active, then automatically pull all metrics (KPIs, quality, UX, marketing, logistics) from those same weeks.

"Visualize new acquisition YoY with initiative overlay"

compare_metrics({
  "metrics": ["new_regular", "new_lite", "new_monthly"],
  "dateFrom": "2025-10-01",
  "dateTo": "2026-03-31",
  "granularity": "weekly",
  "overlay_initiatives": true,
  "domain": "acquisition"
})

Time series data with acquisition-domain initiatives overlaid on the same timeframe. KPI spikes become instantly attributable to "that initiative's timing."

The Build Pipeline: 9 Phases

The graph is constructed in 9 phases:

Phase	Content	Output
1	Initiative nodes + Category/Business/Team	Initiative, Category, Business, Team
2	Daily KPIs (50 metrics)	Metric → MetricGroup (10 groups)
3	Business KPIs + Departments	Department → Metric (DEPT_TRACKS)
4	Week nodes (shared anchors)	HAS_DATA_AT + ACTIVE_DURING_WEEK + NEXT_WEEK
5	Service quality metrics (~50)	QualityMetric → Week
6	UX metrics (~40)	UxMetric → Week
7	Marketing channels (~100)	MarketingChannel → Week
8	MetricDomain (semantic bridge)	6 domains + IN_DOMAIN + TARGETS_DOMAIN
9	Logistics KPIs (~10 categories)	LogiMetric → Week

Phases 4 and 8 are the key design points. Other phases simply "turn data into nodes" — these two "structuralize relationships that don't exist."

Phase 4: Week Node Generation

// Convert initiative execution period to ISO weeks, generate ACTIVE_DURING_WEEK edges
for (const initiative of initiatives) {
  const weeks = getISOWeeksBetween(
    initiative.executionStartDate,
    initiative.executionEndDate
  );
  // Cap at 52 weeks (guard against long-running initiatives)
  for (const week of weeks.slice(0, 52)) {
    edges.push({
      edge_type: 'ACTIVE_DURING_WEEK',
      source_id: initiative.id,
      target_id: `Week:${week}`,
    });
  }
}

// Generate HAS_DATA_AT edges for weeks that have metric data
for (const metricWeek of metricWeeks) {
  edges.push({
    edge_type: 'HAS_DATA_AT',
    source_id: `Metric:${metricWeek.metric}`,
    target_id: `Week:${metricWeek.week}`,
  });
}

// NEXT_WEEK edges for time series traversal
const sortedWeeks = [...allWeeks].sort();
for (let i = 0; i < sortedWeeks.length - 1; i++) {
  edges.push({
    edge_type: 'NEXT_WEEK',
    source_id: `Week:${sortedWeeks[i]}`,
    target_id: `Week:${sortedWeeks[i + 1]}`,
  });
}

Phase 8: MetricDomain Generation

// Category → Domain (semantic mapping defined by humans)
const CATEGORY_TO_DOMAINS: Record<string, string[]> = {
  'Marketing (Advertising)': ['acquisition'],
  'CRM / Retention': ['retention'],
  'Quality / Service Improvement': ['service_quality'],
  'Operations Improvement': ['operations'],
  'New Feature': ['ux', 'revenue'],
  // ...
};

// Initiative → TARGETS_DOMAIN (main business only — limited to where KPI data exists)
for (const initiative of initiatives) {
  if (initiative.business !== MAIN_BUSINESS) continue;
  const domains = CATEGORY_TO_DOMAINS[initiative.category] ?? [];
  for (const domain of domains) {
    edges.push({
      edge_type: 'TARGETS_DOMAIN',
      source_id: initiative.id,
      target_id: `MetricDomain:${domain}`,
    });
  }
}

Why Not a Dedicated Graph DB or OSS Libraries?

We implemented the graph using BigQuery alone, without Neo4j, Amazon Neptune, or OSS like Microsoft's GraphRAG.

Why not a dedicated graph DB?

Aspect	Dedicated Graph DB	BigQuery
Graph traversal	Fast (native)	Fast enough (~10,000 node scale)
Vector search	Requires separate service	VECTOR_SEARCH built-in
Time series analysis	Weak	Native (window functions)
Operating cost	Always-on instances	Serverless (pay per query)
Joining other data	ETL required	Same project, instant JOIN

For Biz Graph, "graph structure + time series analysis + vector search combined" matters more than "deep graph traversal." BigQuery handles all three in one engine.

Additionally, BigQuery has announced Graph capabilities — once GA, native graph queries on node/edge tables will be available. Currently we traverse with SQL JOINs, but we expect to migrate to faster, more intuitive queries in the future.

Why not OSS libraries / SaaS?

OSS like Microsoft GraphRAG and various Graph RAG SaaS products focus on automatically extracting entities and relationships from text documents. Great for research papers or news articles, but not for our use case.

The reason is simple: we need to design the graph structure itself.

The concept of Week nodes as "temporal anchors" doesn't exist in generic tools
MetricDomain "semantic bridging" reflects our specific business structure
The Initiative → Week → Metric indirect connection pattern won't emerge from LLM entity extraction

Generic tools "auto-generate graphs from text." What we needed was "design the graph schema ourselves and integrate heterogeneous data sources." Fundamentally different problems.

Internal query example (get_initiative_context):

-- Get weeks the initiative was active
WITH active_weeks AS (
  SELECT target_id AS week_id
  FROM cortex.biz_graph_edges
  WHERE source_id = @initiative_id
    AND edge_type = 'ACTIVE_DURING_WEEK'
),
-- Get metrics that have data in those same weeks
co_occurring_metrics AS (
  SELECT e.source_id AS metric_id, e.edge_type, w.week_id
  FROM cortex.biz_graph_edges e
  JOIN active_weeks w ON e.target_id = w.week_id
  WHERE e.edge_type IN (
    'HAS_DATA_AT', 'HAS_QUALITY_DATA_AT',
    'HAS_UX_DATA_AT', 'HAS_MARKETING_DATA_AT'
  )
)
SELECT * FROM co_occurring_metrics

Graph traversal and time series data retrieval complete in a single SQL query. With a dedicated graph DB, you'd need to pass traversal results to another service for time series queries — an extra hop.

Initiative Data Ingestion: Auto-Extraction from Meeting Slides

Graph quality depends on source data quality. Initiative data comes from all-hands and group meeting slides.

Source	Format	Frequency
All-hands	pptx in Drive → Slides conversion → text extraction	Weekly
Group standups	Google Slides (cumulative, latest week appended)	Weekly

Text is extracted from meeting slides and structured by AI into the initiative table.

interface InitiativeRow {
  meetingDate: string;       // Meeting date
  source: string;            // Source (all-hands / group standup etc.)
  business: string;          // Business unit
  category: string;          // Marketing (Ads), New Feature, ...
  title: string;             // Initiative title
  description: string;       // Detailed description
  team: string;              // Executing team
  executionStartDate: string; // Execution start date
  executionEndDate: string;   // Execution end date
  metrics: string;           // JSON format numeric metrics
  status: string;            // planned / in_progress / retrospective
}

Critical: executionStartDate / executionEndDate. The meeting date (meetingDate) differs from when the initiative actually runs. "We started the SNS campaign last week," reported on 3/9, means executionStartDate is 3/1. This distinction is essential for accurate Week node connections.

Operating Cost

Resource	Cost
Vertex AI Embedding (weekly)	~$0.05/run
Claude Code (initiative extraction)	Within monthly plan
BQ storage	A few GB (negligible)
Cloud Run Jobs	Nearly free (1x weekly + 1x daily)
MCP Server	Nearly free (Cloud Run min-instances=0)

A few dollars per month to maintain a 10,000-node, 71,000-edge graph.

Comparison With Typical Knowledge Graphs

Let's take a step back and see how this design differs from conventional approaches.

Aspect	Typical Knowledge Graph	Biz Graph
Node design	Entities mapped directly to nodes	Deliberately designed temporal anchors ("Week")
Edge semantics	Relationships described as-is	Edge types encode data source classification
Intermediate nodes	Taxonomies for classification	MetricDomain as semantic bridge (structuralized tacit knowledge)
Graph construction	Relationships extracted from existing data	Deliberately designed graph from data with no inherent relationships
Use case	Primarily search and navigation	Goes further into causal candidate exploration for initiative impact
Similarity search	Text-based search	Pre-computed SIMILAR_TO edges via Embedding

In one sentence:

Our DB Graph "made existing relationships discoverable." Biz Graph "designed and created relationships that didn't exist."

The former is an analysis problem. The latter is a design problem — designing the graph structure from scratch and integrating heterogeneous data sources (meeting slides, spreadsheets, BQ tables) into a single explorable structure. That's the essence of Biz Graph.

Why Graph RAG Over Flat RAG

Let's revisit the "why Graph RAG?" question from the introduction.

For initiative effectiveness analysis, consider what happens with standard vector search (flat RAG). Ask "What was the SNS campaign's impact?" — flat RAG returns text chunks similar to the initiative description. You get info about the initiative itself.

But it won't return concurrent KPI changes. It won't return results from past similar initiatives. It won't return related domain metrics.

These are information connected "through the graph," not by "text similarity." You can only reach them by traversing Week nodes. This "need to follow relationships" use case is exactly where Graph RAG has a clear advantage over flat RAG.

Design Honesty: Not Asserting Causation

One thing I was conscious of in this design: not asserting causation.

Many BI tools and AI analyses want to declare "this initiative impacted this KPI." But in reality, there's no such certainty. Multiple initiatives may have been running simultaneously, it could be seasonal, it could be external market changes.

Week node indirect connections simply "lay out what happened in the same period." Causal judgment is left to human or AI reasoning. I believe this is a statistically honest approach.

"A structure for discovering causal candidates" — not "a structure for asserting causation." This distinction matters.

Limitations: The Designer's Tacit Knowledge Is the Bottleneck

Let me be honest about the weaknesses of this approach.

MetricDomain mappings ("Marketing Advertising → acquisition domain") are hardcoded by humans. If this design is wrong, the entire graph's exploration results are skewed.

This is simultaneously the answer to "why build it yourself." Off-the-shelf graph tools can't reflect your business structure — which initiative categories relate to which metric groups. Structuralizing this tacit knowledge requires someone who knows the business.

Going forward, we're considering having AI propose these mappings with humans reviewing them. Full automation is hard, but an "AI suggests, humans approve" workflow could reduce the maintenance cost of domain knowledge.

Summary

Turning business data into a graph is more of a design challenge than a technical one.

There's no FK between "initiatives" and "KPIs." No join key. But by deliberately designing two structures — temporal axis (Week nodes) and semantic domains (MetricDomain) — it becomes an explorable graph.

Week nodes: Indirect connections via "same week" instead of direct initiative-metric edges. A structure for discovering causal candidates
MetricDomain: Semantic bridge between initiative categories and metric groups. Structuralized tacit knowledge
SIMILAR_TO: Pre-computed similar initiatives via AI Embedding. Instant answers to "have we done this before?"

As a result, questions like "Did that initiative work?", "Find initiatives that drove acquisition", "Show metrics YoY with initiative overlay" — AI can now autonomously explore the graph to answer these.

Graphs aren't something you "find" — they're something you design. Especially for business data.

At airCloset, we're looking for people who want to redefine how we work alongside AI. If interested, check out airCloset Quest (careers).

How We Built an Automated Meeting Intelligence System with Google Meet, Slack, and RAG

Ryosuke Tsuji — Sat, 11 Apr 2026 09:11:59 +0000

Hi, I'm Ryan, CTO at airCloset — a fashion subscription service based in Japan.

In previous posts, I wrote about building a DB Graph MCP server that lets you query 991 database tables across 15 schemas with natural language, and a suite of 17 MCP servers that opened our internal operations to AI.

This time, it's not about MCP. It's about something more fundamental — turning meetings into a searchable knowledge base. This is the system I've wanted to build first when thinking about digitizing our company's information assets.

We built a system that automatically shares Google Meet recordings and transcripts to Slack channels, and makes past meeting content searchable with natural language.

The Problem: Context Disappears the Moment a Meeting Ends

Face-to-face communication is fast and dense. A decision that takes 30 minutes over text can happen in 5 minutes in a meeting. That's the biggest advantage of meetings.

But the problem is that context starts disappearing the moment the meeting ends.

"What did we decide in that meeting again?"
"There's a recording but I don't have the energy to rewatch an hour-long video"
"Where did I write those meeting notes?"
"We keep having the same discussion over and over"

Building a habit of writing meeting notes is one solution, but honestly, getting everyone to consistently write good notes is hard. Even when they do, the nuance of the conversation is lost.

Meetings are a treasure trove of information, yet they're not being utilized. That's a huge waste.

What We Built

We built a system that automates four things:

One-click Meet creation from Google Calendar — A Chrome extension creates a Meet with recording, transcription, and notes all enabled by default
Automatic Slack notification when a meeting ends — Instant notification, followed by recording and transcript links minutes later
Automatic permission granting — Access is automatically given to Slack channel members, meeting participants, and Calendar invitees
RAG search over transcripts and screen shares — Ask a Slack Bot "What was the release date we discussed last week?" and get an answer

User Flow

Step 1: Create a Meeting (~10 seconds)

In Google Calendar's event editor, click the "AI Fassy Meet" button added by our Chrome extension.

The "AI Fassy Meet" button appears next to Google Meet's native video conferencing option

Select the Slack channel where notifications should be sent. Previously selected channels appear at the top, followed by your most active channels.

Channel search and selection dialog, sorted by selection history and activity

Click "Create Meet" and the Meet URL is automatically set on the Calendar event.

The Meet URL is set on the event with recording, transcription, and notes all enabled by default. The "Use Gemini to create meeting notes" shown on screen is Google Meet's native feature — our system additionally integrates Gemini 3 Flash for independent transcription and screen share analysis

Recording, transcription, and meeting notes are all ON by default. Users don't need to think about settings at all.

The channel dropdown shows previously selected channels first, then channels you're a member of, sorted by message activity. For recurring meetings, last week's channel is always one click away.

Step 2: Hold the Meeting

Just have your meeting normally. Recording and transcription run automatically in the background.

Step 3: Automatic Notification When the Meeting Ends

When the meeting ends, an instant notification appears in the designated Slack channel.

A few minutes later, a follow-up notification arrives in the thread with links to the recording and transcript. Channel members can view them immediately.

Step 4: Search Past Meetings with Natural Language

In the same thread, mention the Bot to ask about the meeting content.

Full thread flow: ①Meeting ended notification → ②Recording and transcript links → ③User asks "Give me a summary of this meeting" → ④Bot responds with a structured summary

The Bot searches past meeting transcripts, summarizes the relevant parts, and responds with source links. Screen-shared slides and code are also searchable.

Now let's dive into the technical implementation.

Architecture Overview

The system consists of four components:

Component	Role	Deployment
Chrome Extension + meet-calendar API	Meet creation UI + backend API	Chrome / Cloud Run
workspace-pipeline	Workspace Events API subscription management	Shared package
meet-pipeline	Core event processing: artifact storage, permissions, embedding generation	Cloud Run
Slack Bot	Meet creation + RAG search	Cloud Run

Shared domain logic (Space creation, Firestore operations, Drive access, caching) is extracted into a common package, reused by both the Chrome Extension API and the Slack Bot.

Tech Stack

Layer	Technology
Frontend	Chrome Extension (Manifest V3)
API	Cloud Run (Hono)
Event Processing	Cloud Pub/Sub → Cloud Run
Workspace Integration	Meet REST API, Drive API, Workspace Events API, Calendar API
AI/ML	Vertex AI Embeddings (gemini-embedding-001), Gemini 3 Flash
Data Stores	Firestore, BigQuery, Cloud Storage, Upstash Redis
Notifications	Slack Block Kit API
Infrastructure	Pulumi (TypeScript)

Deep Dive 1: Pre-Pooling Meet Spaces — LIFO Cache

Problem: Meet Creation Is Slow

Creating a new Google Meet Space via API takes 1–2 seconds for a response. Making users wait several seconds after clicking a button is an unacceptable UX.

Solution: Pre-Create and Pool

The idea is simple: pre-create Meet Spaces via API and return them instantly on request. Replenish in the background when consumed.

class MeetSpaceCache {
  private cachePool: CachedMeetSpace[] = [];
  private readonly targetSize = 3;
  private readonly maxSize = 5;
  private readonly ttlMs = 24 * 60 * 60 * 1000; // 24 hours

  getMeetSpaceFromCache(): CachedMeetSpace | undefined {
    // Filter expired entries, then pop the newest
    this.cachePool = this.cachePool.filter(s => !this.isExpired(s));
    const space = this.cachePool.pop(); // LIFO
    if (space) {
      this.emitter.emit('spaceConsumed'); // Trigger background replenishment
    }
    return space;
  }
}

Why LIFO? By always returning the newest Space, we minimize the risk of serving an expired one. Older Spaces naturally expire and get filtered out on the next pop().

Replenishment is event-driven via EventEmitter. When a Space is consumed, replenish() runs in the background after a 100ms delay. A mutex (isReplenishing flag) prevents concurrent API requests.

initializeMeetCache(createSpace) {
  this.emitter.on('spaceConsumed', () => {
    setTimeout(() => this.replenish(createSpace), 100);
  });
  // Build initial pool on startup
  this.replenish(createSpace);
}

This brings most requests down to under 100ms latency for returning a Meet URL. The cache lives in a shared domain package, reused by both the Chrome Extension API and the Slack Bot.

Deep Dive 2: Designing for Adoption — Chrome Extension

We Started with a Slack Command

The first thing we built was a /meet command in Slack. Mention the bot and it returns a Meet link. Technically, it worked perfectly.

But nobody used it.

Why? The meeting creation flow is "create a Calendar event → invite participants → set the Meet URL." The Slack command is outside this flow. Switching to Slack, typing a command, copying the URL, pasting it into Calendar — that's too much friction.

Meet Users Where They Already Are

The insight was that features must be placed on the user's existing path to get adopted.

Google Calendar's event editor is a place everyone passes through when scheduling a meeting. Put a button there and it's one click. That's why we built a Chrome Extension.

The Slack command still exists and some people use it. But adoption skyrocketed after shipping the Chrome Extension.

Optimizing Channel Selection

We also put effort into the channel selection UX. The dropdown order is determined by the following logic:

Tier 1: Personal Selection History (Redis ZSET)

// Store in Redis ZSET with score=timestamp
async saveChannelSelection(userId, channel) {
  // Remove duplicate of same channel
  await redis.zrem(key, existingMember);
  // Add with latest timestamp
  await redis.zadd(key, { score: Date.now(), member: JSON.stringify(channel) });
  // Cap at 50 entries
  await redis.zremrangebyrank(key, 0, -(MAX_RECENT + 1));
}

Previously selected channels appear at the top. For recurring meetings, last week's channel is always first. Using Redis ZSET with timestamps as scores gives O(log N) insertion and natural chronological ordering.

Tier 2: Channel Activity (Firestore sortPriority)

Channels without selection history are sorted by a pre-computed sortPriority (based on message volume) in Firestore. Frequently used channels rank higher.

Both sources are fetched in parallel, with Redis results taking priority in the merge, ensuring a useful list even on first load.

Deep Dive 3: Domain-Wide Delegation — Why a "Proxy Account" Is Needed

The File Ownership Problem

When you enable recording in Google Meet, the recording and transcript files are created in the organizer's personal Drive. This is a Google Workspace behavior that cannot be changed.

This is a major problem.

When files are scattered across different organizers' Drives, the system cannot uniformly access them. Copying recordings to GCS, loading transcripts into BQ, granting permissions to channel members — all these automated operations require reliable file access. If the organizer differs each time, you'd have to track which Drive the file is in and manage each person's OAuth tokens. This is operationally untenable.

Solution: Impersonation via a Shared Service Account

We use Domain-Wide Delegation (DWD) to have a service account act as a Workspace admin.

const auth = new google.auth.JWT({
  email: serviceAccountEmail,  // Service account
  key: privateKey,
  scopes: [
    'https://www.googleapis.com/auth/meetings.space.created',
    'https://www.googleapis.com/auth/drive',
  ],
  subject: workspaceAdminEmail,  // Act as this admin
});

Since APIs execute as the Workspace admin specified in subject, both Meet Space creation and Drive file ownership are consolidated under this shared account.

When creating a Space, we set recording and transcription to ON by default via artifactConfig:

body: JSON.stringify({
  config: {
    accessType: 'TRUSTED',
    entryPointAccess: 'ALL',
    artifactConfig: {
      recordingConfig: {
        autoRecordingGeneration: 'ON',  // Recording: ON by default
      },
      transcriptionConfig: {
        autoTranscriptionGeneration: 'ON',  // Transcription: ON by default
      },
    },
  },
}),

Users never "forget to turn on recording." Every Meet created through this system is guaranteed to be recorded and transcribed.

Benefits:

Files are always consolidated in the same account's Drive → uniform system access
No individual OAuth token management needed
Same credentials work regardless of who organizes the meeting
One-time setup in Workspace Admin Console, then it just works with the service account key

Workspace Admin privileges are required for the initial setup, but it's a one-time task.

Calendar Search via DWD

When notifying Slack on meeting end, we need the meeting title. But the Meet API doesn't provide it — the title only exists on the Calendar side.

DWD helps here too. We first search the organizer's Calendar, then iterate through participants' Calendars.

async function searchCalendarEventTitle(meetCode, creatorEmail, participants) {
  // 1. Search the organizer's calendar first
  const creatorEvent = await searchCalendar(creatorEmail, meetCode);
  if (creatorEvent) return creatorEvent.summary;

  // 2. Fall back to participants
  for (const participant of participants) {
    const event = await searchCalendar(participant.email, meetCode);
    if (event) return event.summary;
  }

  // 3. Fall back to Firestore cache
  return meetInfo.calendarTitle ?? null;
}

With DWD, you can search any user's Calendar by simply swapping the subject. No Calendar sharing settings needed.

Deep Dive 4: Workspace Events API — Real-Time Event-Driven Architecture

No Polling

"How do we detect when a Meet ends?" — this was the first challenge.

Polling the API for status checks lacks real-time responsiveness and increases API call volume.

Google Workspace Events API lets you receive Meet lifecycle events in real-time via Pub/Sub.

const subscription = await workspaceEvents.subscriptions.create({
  requestBody: {
    targetResource: `//meet.googleapis.com/${spaceName}`,
    eventTypes: [
      'google.workspace.meet.conference.v2.ended',        // Meeting ended
      'google.workspace.meet.recording.v2.fileGenerated',  // Recording ready
      'google.workspace.meet.transcript.v2.fileGenerated', // Transcript ready
    ],
    notificationEndpoint: {
      pubsubTopic: `projects/${projectId}/topics/meet-events`,
    },
    payloadOptions: { includeResource: true },
  },
});

We create a Subscription when the Meet Space is created, delivering three event types to the meet-events Pub/Sub topic.

Fighting the 7-Day Expiration

However, these Subscriptions have a 7-day maximum TTL (604,800 seconds). This is a Google API constraint that cannot be changed. Left unattended, subscriptions expire and events stop arriving.

This becomes a problem in cases like:

Recurring meetings — A weekly Monday standup reuses the same Meet Space. The subscription expires before next Monday
Future meetings — Creating a Meet in advance for next week's 1:1. If more than 7 days pass from creation, events won't arrive on the meeting day

In other words, without automatic subscription renewal, recurring and future meetings won't work.

Daily Batch Auto-Renewal

We run a daily batch via Cloud Scheduler at 5:00 AM JST, processing in two phases:

async function renewSubscriptions(): Promise<RenewalResult> {
  // Phase 1: Invalidate old Spaces (run before renewal)
  // → Processing invalidations first excludes them from Phase 2
  const spacesToInvalidate = await getMeetSpacesNeedingInvalidation(thirtyDaysAgo);
  for (const space of spacesToInvalidate) {
    await invalidateMeetSpace(space.spaceName);  // isValid = false
  }

  // Phase 2: Renew Subscriptions
  const spacesToRenew = await getMeetSpacesNeedingRenewal(sixDaysAgo);
  for (const space of spacesToRenew) {
    // Create new Subscription (old one auto-expires)
    const newSubscriptionName = await createMeetSubscription(
      space.spaceName, subscriptionConfig,
    );
    await updateMeetSpaceSubscription(space.spaceName, newSubscriptionName);
  }
}

Phase 1: Invalidation — Spaces where meetingEndAt is over 30 days ago are set to isValid: false. After 30 days since a meeting ended, no recording or transcript events will arrive. Invalidation excludes them from Phase 2, reducing unnecessary API calls.

Phase 2: Renewal — Spaces where subscribedAt is 6+ days ago (one day before expiration) get a new Subscription. Old subscriptions auto-expire, so explicit deletion is unnecessary.

Subscription Lifecycle

Day 0: Meet created → Subscription created (TTL: 7 days)
Day 6: Daily batch → Subscription renewed (new TTL: 7 days)
Day 12: Daily batch → Subscription renewed (new TTL: 7 days)
  ...repeats...
Day 30+: Daily batch → isValid=false → renewal stops

With this mechanism, even if you create a Meet today for a meeting next month, the subscription is auto-renewed daily so events are guaranteed to arrive on the meeting day. Recurring meetings similarly work across multiple weeks with the same Meet Space.

Deep Dive 5: Event Processing Pipeline

From meeting end to Slack notification to vector data generation for RAG search — everything starts from receiving a Pub/Sub message.

Event Router: Dispatching to Three Handlers

async function handleMeetEvent(pubsubMessage) {
  const eventType = pubsubMessage.attributes?.['ce-type'];
  const spaceName = normalizeSpaceName(pubsubMessage.attributes?.['ce-subject']);

  // Fetch space info from Firestore
  const meetInfo = await getMeetSpaceInfo(spaceName);

  switch (eventType) {
    case 'google.workspace.meet.conference.v2.ended':
      return handleMeetEnded(meetInfo, pubsubMessage);
    case 'google.workspace.meet.recording.v2.fileGenerated':
      return handleRecordingGenerated(meetInfo, pubsubMessage);
    case 'google.workspace.meet.transcript.v2.fileGenerated':
      return handleTranscriptGenerated(meetInfo, pubsubMessage);
  }
}

One caveat: the Pub/Sub event's targetResource may contain a conferenceRecordId instead of a spaceName. Google Meet creates a new conference record for each session in the same Space. In that case, we resolve conferenceRecordId → spaceName via the Meet API.

① handleMeetEnded — On Meeting End

Update Firestore status to ended
Fetch participant list from Meet API
Search Calendar API for the meeting title (DWD to search participants' calendars)
Save participant info to BQ (making "who attended" searchable via RAG)
Send "meeting ended" notification to Slack
Save notification ts (timestamp) to Firestore → subsequent notifications thread under it

② handleRecordingGenerated — On Recording Completion

The recording handler is the most complex:

Drive → GCS copy → Grant permissions → Update Firestore
                 → Gemini transcription (async)
                 → Screen share analysis (async)

Idempotency is critical. Pub/Sub guarantees at-least-once delivery, so duplicate messages are possible. We strictly maintain this order:

async function handleRecordingGenerated(meetInfo, message) {
  // Idempotency check: skip if already processed
  if (meetInfo.recordingReady && meetInfo.artifacts?.recording?.gcsUri) {
    return;
  }

  // 1. Get file info from Drive
  const fileInfo = await getFileInfo(driveFileId);

  // 2. Stream copy to GCS (with existence check)
  if (!(await gcsFileExists(gcsPath))) {
    await copyDriveFileToGCS(fileInfo.id, gcsPath);
  }

  // 3. Grant permissions to channel members ← BEFORE setting the flag
  await shareFileWithChannelMembers(fileInfo.id, meetInfo.channelId);

  // 4. Save artifact info to Firestore
  await updateMeetSpaceArtifact(spaceName, 'recording', { driveFileId, gcsUri });

  // 5. AI processing is async fire-and-forget
  processGeminiTranscription(gcsUri, meetInfo).catch(logError);
  processScreenShareAnalysis(gcsUri, meetInfo).catch(logError);

  // 6. Check if both are ready → send Slack notification if so
  await checkAndNotifyArtifacts(spaceName);
}

Why grant permissions before setting the flag? If the flag is set first, a retry would skip via the idempotency check, and permissions would never be granted. Drive permission granting is idempotent (HTTP 400 means permission already exists), so it's safe to execute multiple times.

③ handleTranscriptGenerated — On Transcript Completion

Structurally mirrors the recording handler. Extracts the Google Docs transcript as text, saves to GCS, then feeds into the embedding pipeline.

When Both Are Ready: Final Notification + Calendar Attachment

checkAndNotifyArtifacts() executes when both recording and transcript are Ready:

Send artifact notification to Slack
Attach recording and transcript files to the Calendar event
Grant permissions to Calendar invitees

Point 2 is key. Normally, Google Meet automatically attaches files to the Calendar event when recording and transcription complete. In our system, DWD creates the Meet under a different account, so that auto-attachment doesn't work. We explicitly attach files via the Calendar API to preserve the same experience as default Meet.

async function attachFilesToCalendarEvent(event, artifacts) {
  const attachments = [];
  if (artifacts.recording) {
    attachments.push({ fileUrl: artifacts.recording.webViewLink, title: 'Recording' });
  }
  if (artifacts.transcript) {
    attachments.push({ fileUrl: artifacts.transcript.webViewLink, title: 'Transcript' });
  }

  // Deduplicate by fileUrl to be idempotent
  const existing = event.attachments ?? [];
  const newAttachments = attachments.filter(
    a => !existing.some(e => e.fileUrl === a.fileUrl)
  );

  await calendar.events.patch({
    calendarId: organizerEmail,
    eventId: event.id,
    requestBody: { attachments: [...existing, ...newAttachments] },
    supportsAttachments: true,
  });
}

This lets users access recordings and transcripts directly from the Calendar event detail view — whether they come via Slack or Calendar.

Deep Dive 6: Three-Layer Permission Model

"Who gets access?" is the most delicate design point. Too narrow and it's useless; too broad and it's a security risk.

Layer 1: Slack Channel Members

When each artifact is generated, all members of the linked Slack channel get Drive viewer access.

async function shareFileWithChannelMembers(fileId, channelId) {
  // Enumerate channel members via Slack API
  const members = await getChannelMembers(channelId);

  for (const member of members) {
    // Slack ID → Firestore → email
    const userInfo = await getUserInfo(member);
    if (!userInfo.email?.endsWith('@air-closet.com')) continue; // Domain filter

    const role = (member === organizerSlackId) ? 'writer' : 'reader';
    await shareFileWithUser(fileId, userInfo.email, role);
  }
}

Importantly, members who join the channel later also get access. Since permissions are granted using the latest member list on each Pub/Sub retry, people who joined after the meeting naturally receive access.

The organizer gets writer permissions, allowing them to manage the recording file (rename, change sharing settings, etc.).

Layer 2: Meeting Participants

On meeting end, participant info from the Meet API is saved to BQ. Participants may be guests not in the Slack channel, requiring a separate permission axis from Layer 1.

Layer 3: Calendar Invitees

When both artifacts are ready, permissions are also granted to Calendar event invitees.

async function attachToCalendarAndShareWithAttendees(meetInfo, artifacts) {
  const event = await getCalendarEventByMeetCode(meetInfo.meetingCode);
  if (!event) return;

  // Attach files to the Calendar event
  await attachFilesToCalendarEvent(event, artifacts);

  // Grant permissions to all invitees (organizer = writer, others = reader)
  const emails = event.attendees.map(a => a.email);
  await shareFilesWithEmails(artifacts, emails, event.organizer.email);
}

People not in the Slack channel but on the Calendar invite (e.g., a manager who only wants to review meeting notes) also get access.

Security Guarantees

Common security rules apply across all three layers:

Domain filter: Only @air-closet.com email addresses are eligible. Prevents sharing with external users
Idempotent permission grants: HTTP 400 (permission already exists) is not treated as an error
Notification suppression: sendNotificationEmail: false prevents a flood of "X shared a file with you" emails

Deep Dive 7: Embedding Generation & RAG Search Pipeline

This was the most exciting part to build.

Three Content Sources

Up to three types of text are extracted from each meeting and vectorized separately:

Content Type	Source	Purpose
`transcript`	Google Meet's native transcript (Google Docs)	Spoken word text
`gemini_transcript`	Gemini-generated transcript from the recording	Higher quality than native
`screen_share`	Gemini Vision-extracted screen share content	Slides, code, documents

Text Chunking: Bilingual Sentence Boundary Detection

function chunkText(text: string, chunkSize = 1000, overlap = 100): string[] {
  const chunks: string[] = [];
  let start = 0;

  while (start < text.length) {
    let end = Math.min(start + chunkSize, text.length);

    if (end < text.length) {
      // Find a sentence boundary to avoid cutting mid-sentence
      end = findSentenceBreak(text, end, start + 100);
    }

    chunks.push(text.slice(start, end));
    start = end - overlap; // Overlap preserves context across chunks
  }
  return chunks;
}

findSentenceBreak() searches backward from the chunk boundary for sentence-ending punctuation. It supports both Japanese (。, ！, ？) and English (., !, ?), with fallback to spaces and fullwidth spaces. A minimum of 100 characters per chunk is enforced.

Meeting transcripts frequently mix Japanese and English, making bilingual boundary detection essential.

Screen Share Content Extraction with Gemini

Transcripts alone miss content shown via screen sharing — slides, code, documents. When you need to find "that thing on the slide," it's not searchable.

We use Gemini 3 Flash (gemini-3-flash-preview) multimodal input to extract screen share content directly from the recording video.

async function analyzeScreenShareFromVideo(gcsUri: string): Promise<string> {
  const result = await gemini.generateContent({
    model: GEMINI_MODEL,  // gemini-3-flash-preview
    contents: [{
      parts: [{
        fileData: { mimeType: 'video/mp4', fileUri: gcsUri },
        // Unlike transcription, video frames matter here — higher fps
        videoMetadata: { fps: 0.2 },
      }, {
        text: `Extract the content shown via screen sharing in this video.
               Transcribe any slide text, document content,
               or code that appears.`,
      }],
    }],
    generationConfig: { temperature: 0.2 },
  });
  return result.response.text();
}

The fps differentiation is key. For transcription, only audio matters, so fps: 0.1 (1 frame per 10 seconds) minimizes video tokens. For screen share analysis, visual content matters, so fps: 0.2 (1 frame per 5 seconds).

For long meetings that hit the input token limit, an automatic fallback splits the video into 30-minute chunks:

async function transcribeFromVideo(gcsUri: string): Promise<string> {
  try {
    // Try processing the full video first
    return await callGemini(gcsUri);
  } catch (error) {
    if (isTokenLimitError(error)) {
      // Token limit hit → split into 30-minute chunks
      return await transcribeVideoInChunks(gcsUri, 30 * 60);
    }
    throw error;
  }
}

BigQuery Vector Search

Vector data is stored in per-channel BQ tables (meet_{channelId}). Splitting tables by channel enables filter-free Vector Search for within-channel queries. A separate aggregated table with channel_id clustering handles cross-channel search.

async function insertMeetChunks(chunks, meetInfo) {
  const channelTableId = `meet_${meetInfo.channelId}`;

  // Auto-create table if it doesn't exist (day-partitioned)
  await ensureMeetChannelTable(channelTableId);

  for (const chunk of chunks) {
    await insertRow(channelTableId, chunk);
  }
}

Access Control at Search Time

SELECT
  chunkText, meetingId, channelId,
  ML.DISTANCE(text_embedding, @query_embedding, 'COSINE') AS distance
FROM `meet_chunks`
WHERE channelId IN UNNEST(@accessible_channels)  -- Access control
ORDER BY distance
LIMIT 10

@accessible_channels is the list of Slack channel IDs the user is a member of. Meeting content from channels you're not in will never appear in results, even if it exists in BQ.

COSINE distance is converted to a 0–1 relevance score via 1 - distance / 2. Only chunks above the threshold are fed into Gemini's context to generate the answer.

Deep Dive 8: GCS Operations

Streaming Copy from Drive to GCS

Recording files can be hundreds of MBs. Loading everything into memory would exhaust Cloud Run's memory, so we stream downloads directly into uploads.

async function copyDriveFileToGCS(driveFileId: string, gcsPath: string) {
  // Stream download from Drive API
  const response = await fetch(
    `https://www.googleapis.com/drive/v3/files/${driveFileId}?alt=media`,
    { headers: { Authorization: `Bearer ${token}` } }
  );

  // Stream upload to GCS JSON API
  await fetch(
    `https://storage.googleapis.com/upload/storage/v1/b/${bucket}/o?name=${gcsPath}&uploadType=media`,
    {
      method: 'POST',
      headers: { Authorization: `Bearer ${token}`, 'Content-Type': mimeType },
      body: response.body,  // Pass ReadableStream directly
    }
  );
}

Note: We use the GCS JSON API directly instead of @google-cloud/storage's file.save() because the latter has a bug where multipart boundary strings get mixed into binary data during upload, corrupting recording files.

GCS File Structure

gs://bucket/
└── meet/
    └── {channelId}/
        └── {spaceId}/
            ├── recording.mp4              # Recording file
            ├── transcript_original.txt    # Google Docs transcript
            ├── gemini_transcript.txt      # Gemini transcript
            └── screen_share.txt           # Screen share analysis

The channelId → spaceId hierarchy makes per-channel data management and lifecycle policy application straightforward. GCS lifecycle auto-deletes after 90 days (originals remain on Drive).

Deep Dive 9: Slack Notification Design

Two-Phase Notification

To avoid making users wait, we split notifications into two phases:

Phase 1 (immediately after meeting end):

🎬 Meeting ended

"Weekly Standup" has ended.
We'll notify you when the recording and transcript are ready.

Created by: @tanaka

At this point, the recording and transcript are still processing. But users can confirm that the meeting was successfully recorded.

Phase 2 (after artifacts are ready — thread reply):

📹 Recording and transcript are ready!

🎥 Recording
   https://drive.google.com/file/d/xxx

📝 Transcript
   https://docs.google.com/document/d/xxx

ℹ️ Channel members have viewing access

Phase 2 is sent as a thread reply to Phase 1. The Phase 1 message's ts (timestamp) is saved to Firestore and used as the thread parent for Phase 2.

Observability: OpenTelemetry + Grafana + Prometheus

All processing in this system is instrumented with OpenTelemetry and aggregated in Grafana. Meet Space creation, Pub/Sub event processing, Drive→GCS copy, embedding generation, Slack notifications — latency and error rates for each step are visible on a single dashboard.

Through the Grafana MCP introduced in the previous article, these logs and metrics are also accessible via MCP. Investigations like "Show me error logs from yesterday's Meet pipeline" can be done directly from Claude Code.

For Gemini API costs, we track actual usage and costs via Prometheus. Token consumption for transcription and screen share analysis is visualized in real-time, so cost anomalies are caught immediately.

Beyond: Meeting Data as a Project Knowledge Base

The system described so far is about "sharing and searching meeting recordings and transcripts." But this data is already being leveraged in a broader context.

Project-Level Meeting Data Integration

At airCloset, Slack channels are created per project. The mapping between channels and projects is managed in Firestore, and through our Project Management MCP (described in the previous article), meeting data linked to a project is searchable via MCP.

For example, "Tell me what was discussed about this spec in Project X's past meetings" searches all meeting transcripts from that project's Slack channel and returns relevant excerpts.

Unified Search with Slack Messages

Beyond meeting transcripts, Slack messages themselves are also stored and vectorized in BigQuery using the same approach. The same MCP can search across both meeting content and Slack discussions.

What was decided in a meeting and how it was implemented in Slack afterward. Conversely, what was debated in Slack and which meeting made the final call. Being able to search across meetings and chat as two unified communication channels is remarkably powerful in practice.

Exploring Code Review Integration

We're currently exploring whether business context from meeting and Slack data could be used for specification checks during code reviews.

If we could automatically surface meeting decisions and Slack spec discussions related to code changes in a PR, and verify "Is this change consistent with the spec decided in the meeting on date X?" during review, we might be able to prevent bugs caused by misunderstood requirements. It's still in the conceptual stage, but the potential for meeting data utilization continues to expand.

Summary: Maximizing Meeting Value

Here's what this system achieves:

Problem	Solution
Effort of writing meeting notes	Auto-transcribed and auto-shared
Effort of rewatching recordings	Ask in natural language, get a summary
Effort of managing permissions	Auto-granted to channel members, participants, and invitees
Effort of creating Meets	One click from the Chrome extension
"What was that thing we discussed?"	Instantly found via RAG search
Screen-shared content not preserved	Auto-extracted by Gemini Vision

Technical highlights:

LIFO cache bringing Meet Space creation to under 100ms
Chrome Extension placing features on users' existing workflow, dramatically boosting adoption
Domain-Wide Delegation solving the file ownership problem
Workspace Events API + daily batch covering the 7-day TTL constraint
Idempotent event processing handling Pub/Sub's at-least-once delivery
Three-layer permission model ensuring access for all stakeholders
Per-channel table strategy enabling both scoped and cross-channel search
Gemini Vision fps differentiation optimizing transcription and screen share analysis costs

Meetings are a treasure trove of information. Letting that information sleep is a waste.

Google Workspace × GCP × Slack — maximizing the value of every meeting. I hope this helps anyone facing similar challenges.

References

We Built 17 MCP Servers to Let AI Run Our Internal Operations

Ryosuke Tsuji — Tue, 07 Apr 2026 16:22:59 +0000

Introduction

In a previous article, I introduced "DB Graph MCP" — a system that enables safe, cross-schema search and query execution across our entire database estate of 17 DBs and 994 tables.

https://dev.to/ryosuke_tsuji_f08e20fdca1/democratizing-internal-data-building-an-mcp-server-that-lets-you-search-991-tables-in-natural-1da5

Thanks to the positive response, this time I'd like to introduce the rest of our MCP server fleet beyond DB Graph.

These were all built in roughly 3 months starting January 2026. We now have 17 MCP servers in production, covering databases, infrastructure, documentation, project management, observability, CI/CD, and even code editing and deployment by non-engineers — making virtually every aspect of our operations accessible to AI.

Overview

Here's the full lineup:

Category	Server	Description
Data	DB Graph	Company-wide DB dictionary + query execution (previous article)
Infrastructure	GCloud	GCP resources, read-only
	AWS	AWS resources, read-only
Docs & Knowledge	GWS	Full Google Workspace access
	Git Server	All Git repos, read-only
Graph	Code Graph	Codebase analysis (function → API → DB → event dependency tracking)
	Product Graph	Unified knowledge graph: code + DB + docs
	Biz Graph	Business initiative × KPI relationship graph
Observability	Grafana	Logs, metrics, and alert inspection
CI/CD	CircleCI	Pipeline execution, build logs, test results
Project Management	Project Management	BQ/Firestore/Sheets-integrated PM support
Domain-Specific	Stylist Insights	Stylist performance & KPI data
	UX Insights	UX analytics from BQ
	freee	Accounting API integration
Dev Platform	Workspace	ACL-gated monorepo editing & deployment
	Sandbox	App deployment for non-engineers

All servers are implemented in TypeScript, deployed to GCP via Pulumi, and authenticated with Google OAuth.

Design Philosophy

Why So Many Servers?

We could have built one monolithic MCP server, but we deliberately split them. Here's why:

Auth scope isolation — GWS needs Workspace API scopes; the DB query server doesn't. Minimizing scopes prevents privilege escalation.
Deploy independence — A Grafana server change doesn't affect DB queries. Blast radius stays small.
Per-user selection — Engineers add everything; marketing adds only GWS. Just put what you need in .mcp.json.

Shared Foundation

Every server shares common patterns:

Auth: A shared package implements Google OAuth 2.0 + PKCE with RFC 8414 auto-discovery. Just add the URL to .mcp.json and Claude Code handles the auth flow automatically. For business users, we simply register them as custom connectors in the Claude organization settings.

{
  "mcpServers": {
    "server-name": {
      "type": "http",
      "url": "https://mcp-xxx.your-domain.example/mcp"
    }
  }
}

That's it. No auth block needed. Same format for every server.

Session management: Upstash Redis as a shared session store across all servers. SSO cookies mean one login grants access to everything.

Tool usage logging: Every tool invocation is recorded in BigQuery. Who used what, when — fully auditable. We monitor usage rates, error rates, and usage patterns to drive improvements.

Infrastructure: GCloud / AWS

Have you ever wanted to let AI investigate your cloud environment? And simultaneously thought: "Is it safe to let it do that?"

In my case, I have admin-level privileges, which makes it even scarier. So I built MCP servers that are physically incapable of writing anything.

Two key design decisions:

OIDC / STS / Impersonate for secure auth — Zero persistent credentials
Per-account audit logging — Individual email addresses recorded in GCP Audit Log / CloudTrail

GCloud MCP

Claude Code → MCP Server → gcloud CLI subprocess → GCP APIs

Runs gcloud CLI on Cloud Run. The key point: writes are made impossible at the OAuth scope level.

OAuth scope: cloud-platform.read-only
GCP APIs check both scope and IAM — even admin users cannot write
GCP Audit Log records the user's email address
Account revocation on departure: just disable the Google Workspace account

# What you can do
"Show me the Cloud Run services in prod"
"Check the env vars for this service"
"List the Secret Manager secrets"

AWS MCP

Same philosophy, but AWS can't accept Google OAuth directly, so we use STS as a bridge.

Claude Code → MCP Server → GCP metadata → ID Token
                         → AWS STS AssumeRoleWithWebIdentity → temp credentials
                         → aws CLI subprocess → AWS APIs

Two layers of safety:

IAM Role with ReadOnlyAccess policy only
Temporary credentials with 1-hour expiry

Supports multiple AWS accounts via profile parameter. CloudTrail records assumed-role/mcp-aws-readonly/user@example.com.

Docs & Knowledge: GWS / Git Server

GWS (Google Workspace) MCP

Operate all Google Workspace services from Claude Code.

Claude Code → MCP Server → gws CLI subprocess → Google Workspace APIs

Runs gws CLI remotely, passing the user's OAuth access token directly. Each user accesses resources with their own permissions — you can see your Drive but not someone else's.

Since OAuth authentication and Google Workspace authorization happen simultaneously, the moment you connect to the MCP you have immediate access to your Workspace resources. No additional login or token setup required — the experience is seamless.

# What you can do
"Summarize the sales data in this spreadsheet"
"Extract meeting notes from last week's calendar"
"Summarize this document"

Git Server MCP

A read-only server for all company Git repositories.

The motivation: bypassing GitHub MCP rate limits. GitHub's official MCP server hits the GitHub API under the hood, and the rate limit kicks in surprisingly fast when AI is investigating a codebase.

Git Server MCP keeps main-branch clones of all repos on a GCE VM, operating via local git commands with zero rate limiting. Query as much as you want.

Tool	Description
`git_blame`	Last change commit per line
`git_log`	Commit history
`git_grep`	Cross-repo text search
`git_show`	Commit details
`git_diff`	Diff between commits
`read_file`	Read file contents
`list_files`	List directory contents
`search_repos`	Search repositories

No GitHub account needed — OAuth authentication is sufficient.

Observability: Grafana MCP

The official mcp/grafana Docker image deployed on Cloud Run, with an OAuth proxy in front.

Claude Code → OAuth Proxy → mcp-grafana → Grafana Cloud

Supports PromQL/LogQL queries, dashboard inspection, and alert rule review.

What's important is that Grafana dashboards and alert rules are also defined in the same repository as Pulumi (TypeScript). This means:

Write application code
Define alert rules in the same repo
Alert fires in production
Claude Code reads logs via Grafana MCP
Fix the code in the same repo

The code → infra → observability → investigation → fix loop is completely closed.

CI/CD: CircleCI MCP

Integrates with CircleCI API v2. A shared CircleCI token sits behind Google SSO, so the whole team uses it without managing tokens.

Claude Code → OAuth Proxy → CircleCI MCP (sidecar) → CircleCI API v2

Cloud Run multi-container setup: the official @circleci/mcp-server-circleci runs as a sidecar, with our OAuth proxy in front.

# What you can do
"What's the status of the latest pipeline on main?"
"Show me the failure logs for this build"
"Find flaky tests"

Project Management MCP

A server for managing issues in Firestore and semantically searching Slack/Meet conversations.

Key capabilities:

Issue management: Create, update status, and list Issues in Firestore (with spreadsheet dual-write)
Context search: Vector search + Gemini summarization across Meet notes and Slack conversations
Project overview: View milestones, members, design docs, and test cases for your projects
Backlog integration: Retrieve ticket parent-child relationships via BQ

Domain-Specific

Stylist Insights / UX Insights MCP

Servers providing access to stylist performance/KPI data and UX analytics, respectively. Query interfaces over BQ aggregate tables.

freee MCP

An OAuth-authenticated proxy to the freee API for accounting data access.

Dev Platform: Workspace / Sandbox

This might be the most unique part.

Workspace MCP — Code Editing Without a GitHub Account

Provides ACL-gated file editing, commits, PR creation, and deployment for our internal monorepo.

No GitHub account required. Only a Google Workspace account (OAuth) is needed.

1. workspace_init          → Create worktree, initialize branch
2. workspace_write_file    → Edit code
3. workspace_diff          → Review changes
4. workspace_commit        → Commit
5. workspace_push          → Push to GitHub
6. workspace_deploy        → Deploy from feature branch (test)
7. Verify it works
8. workspace_create_pr     → Request review

Access control is managed in Firestore. Admins configure which stacks (directories) each user can edit and deploy.

{
  "allowedPaths": ["apps/web/xxx/", "apps/api/xxx/"],
  "allowedStacks": ["api-xxx", "pages-xxx"],
  "role": "developer"
}

Non-engineers can safely edit and deploy only the stacks they're authorized for. In practice, a non-engineer team member is already using AI + Workspace MCP to improve a full-scratch KPI dashboard.

Sandbox MCP — App Deployment for Non-Engineers

Going even further: non-engineers can deploy their own apps for internal use.

1. sandbox_init_repo(app_name: "my-tool")    → Initialize repo
2. sandbox_write_file(...)                    → Write files
3. sandbox_publish(app_name: "my-tool")       → Deploy to Cloud Run
   → https://sbx-{nickname}--my-tool.example.com/

No gcloud, no Docker. Just tell Claude "I want a tool that does X" and it's published on an internal URL.

Deployed apps are protected by Cloudflare Access with Google Workspace authentication, so only internal members can access them. Even though they're on the public internet, access from outside the organization is impossible.

I wrote detail article.

Graph Servers: Code Graph / Product Graph / Biz Graph

A family of servers that analyze codebases and business logic as graph structures.

Server	Scope	Key Feature
DB Graph	Company-wide DBs (previous article)	Table dictionary + semantic search + live DB queries + PII anonymization
Code Graph	All source code (cross-repository)	Static analysis tracking function → API → DB → event dependencies across repos
Product Graph	Internal monorepo	Unified knowledge graph of code + DB + docs. Every node has business context
Biz Graph	Business initiatives & metrics	Initiative × metric relationship graph

Each has a different design philosophy and solves different problems. See the previous article for DB Graph; details on the others are coming in future posts.

Security Model

Here's the security approach shared across all servers.

Defense in Depth

Layer 1: Google Workspace OAuth + domain restriction
  → Organization domain only. External users cannot log in.

Layer 2: SSO + session management
  → Upstash Redis, 7-day TTL, sliding window

Layer 3: Per-server scope restrictions
  → GCloud: cloud-platform.read-only
  → AWS: ReadOnlyAccess policy
  → DB Graph: SELECT only + PII anonymization

Layer 4: Data-level protection
  → Automatic PII anonymization (40+ column patterns)
  → Confidential datasets controlled by BQ IAM
  → Production DBs via read replicas only

Layer 5: Audit logging
  → All tool invocations recorded in BQ
  → Individual email in GCP Audit Log / CloudTrail

Automatic Revocation on Departure

Since every server depends on Google OAuth, disabling a Google Workspace account instantly revokes access to all MCP servers. No individual token revocation or account cleanup needed.

Takeaways

Lessons learned from building and operating our MCP server fleet:

1. Centralize authentication
Building OAuth as a shared package made adding new servers dramatically easier. Auth code per server is about 10 lines.

2. Start read-only
GCloud, AWS, and Git Server are all read-only. Allow reads first; add writes only when truly needed. This keeps security discussions simple.

3. Wrap existing tools
gcloud CLI, aws CLI, gws CLI, CircleCI MCP — put existing CLIs and MCP servers behind an OAuth proxy and the whole team can use them safely. No need to build from scratch.

4. Non-engineer access is the most exciting frontier
Workspace MCP and Sandbox MCP provide the foundation for non-engineers to edit code and deploy without a GitHub account. It's still early and the big wins are ahead, but this is where the most potential lies.

5. Keep everything in one repository
Application code, infrastructure (Pulumi), observability (Grafana alert rules), MCP servers — all in a single monorepo. This closes the loop: write code → deploy → monitor → find issues → fix.

In the DB Graph article, I described the problem of "how tables relate to each other existing only in specific people's heads." Looking at the full MCP server fleet, it's clear this isn't limited to databases.

Infrastructure state, code dependencies, document contents, project progress, user behavior logs — all of these were trapped in people's heads. Eliminating that is the essential role of our MCP server fleet.

Externalizing knowledge into a form that AI can access. That's the common theme across all our MCP servers.

[Boost]

Ryosuke Tsuji — Sat, 28 Mar 2026 13:02:45 +0000

Ryosuke Tsuji

Mar 25

Democratizing Internal Data — Building an MCP Server That Lets You Search 991 Tables in Natural Language

#showdev #ai #mcp #graphrag

Comments 1

14 min read

Democratizing Internal Data — Building an MCP Server That Lets You Search 991 Tables in Natural Language

Ryosuke Tsuji — Wed, 25 Mar 2026 18:15:40 +0000

Hi, I'm Ryan, CTO at airCloset — Japan's leading fashion rental subscription service.

Today I want to share something I'm genuinely proud of: DB Graph and DB Graph MCP — a Model Context Protocol (MCP) server that lets anyone in our company search and query 15 schemas, 991 tables, 11 SQL databases, and 6 MongoDB instances using natural language through Claude Code.

You don't need to know a single table name. Ask "find tables related to returns" and it gives you the answer — across schemas, across database engines. And yes, it can query production data safely.

In this post, I'll walk through everything: what it does, how it works, the tool design, actual response formats, how we built the graph, how we operate it, and how we handle permissions and security.

The Problem: Nobody Knows All 991 Tables

airCloset has been running since 2015 — that's 10 years of accumulated database schema.

Resource	Count
SQL Databases	11 (MySQL 8 + PostgreSQL 3)
MongoDB Databases	6 (DocumentDB 5 + Atlas 1)
Schemas	15
Tables/Collections	991
ORMs	4 (TypeORM, Sequelize, Drizzle, Mongoose)
Repositories	28

Nobody in the company knows all of them. Not even close.

Here's a real scenario. Customer support asks: "This customer's app shows the return as completed, but has the warehouse actually confirmed receiving it?"

Think about what you need to investigate this.

The app-side return status lives in the aircloset schema's delivery order table. If the delivery status is "RETURNED", the app considers it done. Some people might know this much.

But the warehouse-side confirmation lives in the bridge schema. A receive record table's status being "COMPLETE" means the warehouse has physically processed the returned package.

The problem? These two live in completely separate databases. No foreign key connects them. To bridge the gap, there's an intermediate mapping table in aircloset that holds a warehouse order code (varchar) — which corresponds to a shipping order code in bridge. No FK, just a varchar match across schemas.

aircloset delivery order table (status = RETURNED)
  ↓ order_id
aircloset warehouse mapping table
  ↓ warehouse_order_code (varchar)
bridge shipping order table (matched by code — no FK!)
  ↓ shipping_order_id
bridge receive record table (status = COMPLETE = warehouse confirmed)

Table names are generalized for this article.

Four tables, two schemas, a foreign-key-less varchar join. How many people in the company know this path? You could count them on one hand. And if they're on vacation, the investigation stalls.

This is daily life in a 991-table × 15-schema world. It's not just "I don't know the table name." It's that the connections between schemas exist only in specific people's heads. That was the real problem.

DB Graph MCP — The Big Picture

This is what we built to solve it.

Four components:

DB Dictionary Graph Builder — A daily batch job that parses ORM definitions from 28 repositories and stores table/column/relationship info as a graph in BigQuery
DB Dictionary Review UI — A web app where humans verify AI-generated descriptions, mark deprecated columns, and add annotations. Review data survives daily rebuilds
DB Graph MCP Server — An MCP server (Cloud Run) that combines graph search with live DB querying
DB Account Pipeline — Fully automated DB access provisioning: application → approval → account creation → notification

Seeing It in Action

Let's solve the return investigation from above using DB Graph MCP.

Tool response examples below use generalized table/column names. The response format reflects actual output.

Step 1: Natural Language Table Search

Ask Claude Code: "Find tables related to return processing confirmation." Under the hood, search_tables runs a semantic search.

> search_tables(query: "return processing confirmation", search_type: "semantic")

5 tables found (by vector similarity):

bridge.return_packages (postgresql) (distance: 0.2557)
bridge.receive_records (postgresql) (distance: 0.2720)
cella.receive_confirmation_results (mysql) (distance: 0.2921)
bridge.receive_record_details (postgresql) (distance: 0.2951)
aircloset.return_status_change_histories (mysql) (distance: 0.3170)

A single search returns tables across three schemas (bridge, cella, aircloset). The table name "receive_records" doesn't contain the word "return" — but the AI-generated description includes "rental return processing" and "warehouse receiving", so it matches semantically.

Step 2: Table Detail

The second hit in bridge looks promising. Let's get the details.

> get_table_detail(table_name: "bridge.receive_records")

# bridge.receive_records
DB: POSTGRESQL / ORM: typeorm / Repository: bridge-api

## Columns (9)
- id: int [PK, AI, NOT NULL]
- code: varchar [NOT NULL]
- shipping_order_id: varchar [NOT NULL]
- status: enum [NOT NULL, default=IN_PROGRESS]
- type: enum [NOT NULL]
- receive_datetime: varchar [NOT NULL]
- operated_by: varchar [NOT NULL]
- created_at / updated_at: datetime

## References (2)
- shipping_order_id → bridge.shipping_orders.id (explicit)
- operated_by → bridge.users.id (explicit)

## Referenced By (1)
- bridge.receive_record_details.record_id → id (explicit)

## Enum Definitions (2)
- Status: COMPLETE=Received, IN_PROGRESS=Processing
- Type: RENTAL_RETURN=Rental return, BUSINESS_RETURN=Business return,
        RENTAL_RETURN_LACK=Rental return (missing items), BUSINESS_RETURN_LACK=Business return (missing items)

status = COMPLETE means "the warehouse has finished receiving." Exactly what we needed. Plus type = RENTAL_RETURN distinguishes rental returns from business returns. Enum definitions with human-readable labels — visible at a glance.

Step 3: Discovering the Cross-Schema Path

Now the question: how do we connect the aircloset delivery order (app side) to the bridge receive record (warehouse side)? Let's use trace_relationships.

> trace_relationships(table_name: "bridge.shipping_orders", direction: "both", max_depth: 1)

# Relationship trace: bridge.shipping_orders
Nodes: 23, Edges: 22

## Relationships (excerpt)
- shipping_orders.shop_id → shops.id (explicit)
- shipping_orders.warehouse_id → warehouses.id (explicit)
- receive_records.shipping_order_id → shipping_orders.id (explicit)     ← warehouse confirmation!
- return_packages.shipping_order_id → shipping_orders.id (explicit)     ← return shipment
- shipping_packages.shipping_order_id → shipping_orders.id (explicit)   ← outbound shipment
- shipping_inspections.shipping_order_id → shipping_orders.id (explicit) ← inspection
...

Found the path from bridge.shipping_orders to receive_records. Next, we find the mapping table connecting aircloset and bridge.

> search_tables(query: "warehouse_mapping", search_type: "table", adjacent_depth: 1)

aircloset.warehouse_shipping_relations (mysql)

### Related Tables
  → aircloset.delivery_orders (order_id → id)

> get_table_detail(table_name: "aircloset.warehouse_shipping_relations")

## Columns (4)
- order_id: int [PK, NOT NULL]              ← aircloset delivery order ID
- warehouse_order_code: varchar [NOT NULL]   ← bridge shipping order code

Found it. order_id links to the aircloset side, warehouse_order_code links to the bridge side. No FK, but this varchar is the only key connecting two schemas.

Step 4: Querying Real Data

Now we build cross-schema queries. First, get the delivery order and warehouse code from aircloset.

> sql_query_database(database: "aircloset", sql: "SELECT ... WHERE user_id = 12345 AND status = 'RETURNED'")

**aircloset** (staging) — 1 row

| id     | status   | returned_date       | warehouse_order_code |
|--------|----------|---------------------|----------------------|
| 98765  | RETURNED | 2026-03-20 10:30:00 | SO-2026-00012345     |

> **Table**: Manages the full lifecycle of delivery orders — styling → shipping → return status tracking

### Column Descriptions
- **status**: Delivery status (1=Awaiting shipment, 2=Ready, 3=Delivered, 4=Returned, 5=Cancelled)
- **returned_date**: Date/time the warehouse received the customer's return
- **warehouse_order_code**: Mapping code to bridge shipping order

### Related Tables
- → **aircloset.users** (user_id → id): Customer profile...
- → **aircloset.plans** (plan_id → id): Subscription plan definitions...
- ← **aircloset.styling_feedbacks** (delivery_id → id): Customer feedback on styling...
- ← **aircloset.rental_items** (delivery_id → id): Items in this order...

Notice that column descriptions and related tables are automatically appended below the query result. This metadata is pulled from the graph data cached in Redis (cache-invalidated on graph updates). AI can read this enrichment to determine its next step — like "use the warehouse code to query bridge."

Now check the warehouse side:

> sql_query_database(database: "bridge", sql: "SELECT ... WHERE code = 'SO-2026-00012345'")

**bridge** (staging) — 1 row

| code             | status  | receive_status | type          | receive_datetime    |
|------------------|---------|---------------|---------------|---------------------|
| SO-2026-00012345 | SHIPPED | COMPLETE      | RENTAL_RETURN | 2026-03-21 14:22:00 |

> **Table**: Records warehouse receiving operations — arrival confirmation and inspection status

### Column Descriptions
- **status**: Shipping order status (ORDERED→ALLOCATED→PICKED→INSPECTED→SHIPPED→CANCELED)
- **receive_status**: Receive status (IN_PROGRESS=Processing, COMPLETE=Received)
- **type**: Receive type (RENTAL_RETURN=Rental return, BUSINESS_RETURN=Business return)

### Related Tables
- → **bridge.warehouses** (warehouse_id → id): Source warehouse...
- → **bridge.shops** (shop_id → id): Source shop...
- ← **bridge.receive_record_details** (record_id → id): Individual item details...
- ← **bridge.shipping_packages** (order_id → id): Outbound package info...

receive_status = COMPLETE — the warehouse has confirmed receipt. Both the app-side return status and the warehouse-side physical confirmation are verified.

This enrichment is the key to AI-powered investigation. Claude Code reads the column descriptions and related tables to autonomously decide "what to query next" and "how to interpret these values." No human guidance needed.

Beyond Operations: Cross-Service Analytics

This isn't limited to operational investigations. It works for business analytics too.

Try asking Claude Code:

How many customers used our spot rental service last week, what percentage of them are airCloset monthly subscribers, and how frequently do those subscribers use the main service?

Answering this requires crossing the spot rental order table (spot_rental schema) with the main service's member and usage tables (aircloset schema).

Claude Code uses DB Graph MCP to identify the relevant tables via search_tables, discover join keys via trace_relationships, and run queries against both databases to produce the aggregated result. Cross-service analytics from a single natural language question — that's the core value.

Without DB Graph MCP

Imagine doing these investigations without any tooling:

Return confirmation:

You need to know the delivery order table exists in aircloset
You need to know about the warehouse mapping table that bridges schemas
You need to know that a varchar warehouse code maps to bridge's shipping code
You need to know that bridge's receive record table is the warehouse confirmation
You need to know what enum values like COMPLETE and RENTAL_RETURN mean

Cross-service analytics:

You need to know the spot rental DB schema name and table structure
You need to know the join key to the main service's member table
You need connection credentials for both databases
You need to correctly interpret member statuses and usage counts

In both cases, the required knowledge spans multiple services and schemas. Probably fewer than five people hold all of it in their heads. With DB Graph MCP, anyone can get there through natural language search → table detail → relationship tracing → live queries.

Now let's dive into how this works.

Tool Design: 7 Tools in 3 Categories

Dictionary Tools (no DB credentials required)

Tool	Purpose
`search_tables`	Name search + vector similarity search across tables/columns
`get_table_detail`	Full table info: columns, FKs, enums, DEAD annotations
`trace_relationships`	BFS traversal of table relationships

Dictionary tools read pre-built graph data from BigQuery — no individual DB credentials needed. Anyone with a Google OAuth login can use them immediately, with no access request.

Query Tools (DB credentials required)

Tool	Purpose
`list_databases`	List databases you have access to
`sql_query_database`	Execute SELECT queries against MySQL/PostgreSQL
`describe_database_table`	Get live schema from actual DB
`mongo_query_database`	Execute find/aggregate against DocumentDB/Atlas

Query tools use per-user credentials stored in Firestore. You only see databases you've been granted access to.

This separation is intentional. The dictionary is open to everyone; data access is permission-controlled. "Everyone should know what tables exist, but accessing the data requires authorization."

Why BigQuery? — Technology Choices

We use BigQuery as the graph store. "Shouldn't a graph DB use Neo4j?" you might ask.

We chose BigQuery because one store handles graph + vector search + analytics:

VECTOR_SEARCH: Store 768-dimensional embeddings and run cosine similarity search natively. No separate vector DB needed
Graph traversal: Node + edge table design enables BFS traversal through simple recursive JOINs
JSON type: JSON_SET on a properties column lets us flexibly append review data without schema changes
Serverless: No instance management. Pay only for queries, not idle time
Vertex AI integration: Gemini 3 Flash for description generation and embedding models connect seamlessly within GCP
Google Workspace integration: OAuth uses Google Accounts directly. Domain restriction, nickname resolution, and permission management all flow through the same identity — no separate IdP needed

A dedicated graph DB like Neo4j has superior traversal performance, but at 991 tables, BigQuery is more than sufficient. The operational simplicity of "vector search, JSON, analytics, and graph all in one place" far outweighs the performance difference.

How Natural Language Search Works

How does "return processing confirmation" find a receive records table?

Step 1: Generate Table Descriptions

The DB Dictionary Graph Builder runs daily at 6:00 AM JST, generating AI descriptions for each table using Gemini 3 Flash:

Example: bridge.receive_records
→ "Records warehouse receiving operations. Tracks rental returns
   and business returns with completion/in-progress status.
   Links to shipping orders to trace which order a return belongs to."

Step 2: Generate Embeddings

Each description is converted to a 768-dimensional vector using Vertex AI's embedding model and stored in BigQuery.

Step 3: VECTOR_SEARCH

The user's query is also converted to a 768-dimensional vector, then matched via BigQuery's VECTOR_SEARCH using cosine distance:

SELECT base.qualifiedName, distance
FROM VECTOR_SEARCH(
  TABLE `project.db_graph_nodes`,
  'embedding',
  (SELECT @query_embedding AS embedding),
  top_k => 20,
  distance_type => 'COSINE'
)
WHERE base.nodeType = 'Table'
ORDER BY distance ASC

Even if "return" doesn't appear in the table name, the AI description's mention of "rental return processing" places it close in vector space. That's the core of natural language search.

Building the Graph

6-Phase Pipeline

The builder runs six phases daily:

(See the Builder section of the diagram)

① ORM Parsing — Parse 4 ORM types (TypeORM, Sequelize, Drizzle, Mongoose) across 28 repositories to extract table definitions.

② Live DB Validation — Query actual staging DBs via Lambda to compare code definitions against real schemas. Auto-exclude tables that exist in code but not in the database.

③ AI Description — Generate table/column descriptions with Gemini 3 Flash. Incremental detection regenerates only changed tables to minimize AI cost.

④ Graph Construction — Generate 4 node types (Schema/Table/Column/Enum) and 5 edge types (HAS_TABLE/HAS_COLUMN/REFERENCES/USES_ENUM/SAME_ENTITY).

⑤ Embedding Generation — Generate 768-dimensional vectors per table via Vertex AI.

⑥ BQ MERGE — Load into BigQuery using MERGE, preserving human-written descriptions and DEAD flags. Auto-generated data never overwrites manual annotations.

Relationship Confidence Levels

Foreign key detection has varying confidence:

Confidence	Detection Method	Reliability
`explicit`	Directly from ORM `@JoinColumn()` or `belongsTo()`	Certain
`inferred`	Naming convention: `xxx_id` → `xxx` table	High probability
`manual`	Added by human reviewers	Certain

This lets AI judge the reliability of suggested JOIN conditions before using them.

SAME_ENTITY Edges

The same logical entity sometimes exists in both SQL and MongoDB — for example, a MySQL users table and a MongoDB user statistics collection both represent the same user. SAME_ENTITY edges express these cross-engine correspondences, enabling seamless cross-database discovery.

Human Review: AI Alone Isn't Enough

"Are AI-generated descriptions actually accurate?" Honestly — not always.

Gemini 3 Flash produces decent high-level descriptions, but 10 years of business context — "this column was migrated 3 years ago but never dropped from the schema", "enum value 5 is actually never used" — that kind of tacit knowledge can't be filled by AI alone.

That's why we built human review into the system from day one.

Review Web UI

We have a dedicated review web app for the DB Dictionary.

The schema list shows review progress bars. The table list supports filtering by "unchecked", "checked", and "has deprecated items."

The table detail screen displays columns with type badges, FK targets, and enum definitions — with inline editing for descriptions and deprecation flags.

Review UI: FK targets and enum definitions shown as badges. Descriptions can be edited inline.

Available review actions:

Action	Description
Edit table description	Supplement or rewrite the AI-generated description
Edit column description	Per-column annotations ("deprecated", "use XX instead", etc.)
Mark as DEAD	Deprecation flag + reason + empty percentage, at table or column level
Mark as Checked	Review completion flag — records who checked and when
Bulk DEAD marking	Mark up to 500 tables/columns as deprecated at once

DEAD Flags: Surfacing 10 Years of Tacit Knowledge

After 10 years, deprecated columns accumulate. A flag that once represented member type — migrated years ago, now NULL in every row — still sits in the schema.

When a reviewer marks a column as deprecated, the MCP table detail shows:

- old_member_flag: int [NOT NULL, default=0, DEAD] ⚠ Deprecated. Use membership_status instead
- cancel_date: datetime [DEAD] ⚠ All rows NULL
- legacy_import_id: varchar [DEAD] ⚠ Legacy CSV import field. No longer used

This matters because it prevents AI from writing code that references the wrong column. When Claude Code loads table details into context and sees a DEAD flag, it knows to avoid that column.

Change Detection and Diff Review

When the daily build detects changes in table structure or AI descriptions, they're recorded as "pending changes." Reviewers can view before/after diffs in the web UI and mark them as reviewed.

This ensures nothing slips through — if yesterday's build changed something, someone will see it.

Review Data Persistence

Review data is stored in Firestore and never overwritten by daily builds.

The daily build follows this sequence:

ORM parsing → graph construction — Re-extract table definitions from latest code
BQ MERGE — Merge while preserving human-written textForEmbedding and embedding
Re-apply Firestore reviews — Write humanDescription, isDead, deadNote, checkedAt back to BQ properties

Reviews survive unlimited daily rebuild cycles. Firestore is the source of truth; BQ is its reflection.

Crossing the VPC Wall: Cross-Cloud Architecture

Now for the security design I'm most proud of.

Problem: The MCP server runs on Google Cloud (Cloud Run). The databases are inside AWS VPCs. Cloud Run can't directly reach VPC-internal RDS/DocumentDB instances.

Solution: A three-stage authentication chain — GCP OIDC → AWS STS → VPC Lambda — enables secure cross-cloud connectivity.

Authentication Flow

1. Cloud Run (GCP) → Get OIDC token from GCP metadata server
2. OIDC token → AWS STS AssumeRoleWithWebIdentity
3. STS → Return temporary AWS credentials (1-hour TTL)
4. Temporary credentials → Invoke VPC-internal Lambda
5. Lambda → Execute query against VPC-internal RDS/DocumentDB

Key points:

Zero static AWS credentials. Dynamically obtained from GCP service account.
Temporary credentials cached for 5 minutes. Avoids per-request STS overhead.
Lambda executes inside VPC. DB connections never leave the VPC.
Production queries use Read Replicas only. Never connects to the master.

SQL Validation (Defense in Depth)

Query safety is enforced at two layers:

MCP layer (1st):

Allowed: SELECT, SHOW, DESCRIBE, DESC, EXPLAIN, WITH...SELECT
Blocked: INSERT, UPDATE, DELETE, DROP, CREATE, ALTER, TRUNCATE, multi-statement via semicolons

Lambda layer (2nd):
The same validation runs inside Lambda. Even if the MCP layer is somehow bypassed, Lambda blocks it.

Protecting Production Data — PII Anonymization

Querying production data is powerful, but handling personally identifiable information (PII) requires the most care.

Automatic Anonymization Rules

For production + view permission queries, PII column values are automatically anonymized:

Column Pattern	Replacement
Email fields	`*@*.com`
Name fields	`***`
Phone fields	`*--**`
Postal code fields	`*-**`
Address fields	`***`
Password fields	`[REDACTED]`
Date of birth fields	`**--**`
Card number fields	`[REDACTED]`

Table-specific rules handle ambiguous columns. For example, a generic name column isn't PII globally, but users.name or orders.buyer_name clearly is. These are configured per-table.

Staging vs Production

Environment	PII Anonymization	Connection Target
Staging	None	Master DB
Production (view)	Auto-applied	Read Replica
Production (edit)	None	Read Replica

Staging uses test data, so no anonymization needed. Only production view queries get automatic PII protection.

Fully Automated Access Management — DB Account Pipeline

"Who do I talk to about getting database access?"

This question doesn't get asked anymore. The DB Account Pipeline automates everything.

Flow

User submits a workflow request — nickname, email, desired databases (multiple allowed)
Manager approves
Cloud Run Job processes automatically — reads approved requests, generates CREATE USER statements per DB, executes via Lambda
Credentials saved to Firestore + Secret Manager — passwords never stored in plaintext
Slack DM with connection info — includes bastion server guide

Zero Plaintext Passwords

Passwords are stored only in Secret Manager.

Firestore db_credentials:
  host: "xxx.rds.amazonaws.com"
  port: 3306
  username: "ryan_view_user"
  passwordSecretId: "db-cred-xxxxx"  ← Reference to Secret Manager only
  permLevel: "view"

When the MCP Server executes a query, it decrypts the password from Secret Manager via passwordSecretId and caches it in memory for 5 minutes. Cloud Run restarts clear the cache.

No plaintext password exists anywhere — this was a deliberate design decision we're particularly proud of.

Operations

Daily Cron

A cron job fires at 6:00 AM JST daily, triggering a Cloud Run Job:

6:00 AM JST — Cron fires
├── ORM parsing (28 repos × 4 ORMs)
├── Live DB validation (11 staging DBs)
├── Gemini description generation (incremental only)
├── Graph construction + Embedding
├── BQ MERGE (preserving annotations)
└── Slack notification

Cost

Resource	Cost
Gemini 3 Flash (daily, incremental)	~$0.10-0.20/day
Vertex AI Embedding	~$0.01/day
Cloud Run Job	Near-free (once daily)
BQ Storage	A few GB
Lambda	Shared with DB Account Pipeline

Thanks to incremental detection, we maintain an AI-powered dictionary for 991 tables at under $10/month.

Incremental Detection

Regenerating all table descriptions daily would spike Gemini costs. So we introduced change detection:

1. Compare previous property hashes
2. Detect column structure changes (additions/removals/type changes)
3. Identify affected tables via enum dependency graph
→ Regenerate only changed tables

If a status enum changes, all tables using that enum are regenerated. No changes? Skip. This cuts AI costs by roughly 90%.

Security Summary

Layer	Protection
OAuth	Google Account + corporate domain restriction
Credential Resolution	email → nickname → per-user DB credentials
Permission Filter	Per-user × database × environment × permission level
SQL Validation (MCP)	SELECT-only enforcement
SQL Validation (Lambda)	Same validation (defense in depth)
PII Anonymization	Production + view queries only
Production Connection	Read Replicas only
Passwords	Secret Manager only, 5-min TTL memory cache
Cross-Cloud Auth	GCP OIDC → AWS STS (zero static credentials)
Logging	Passwords and query results never logged

Takeaways

DB Graph MCP goes beyond solving the fundamental database problem of "you can't use what you don't know exists." It enables anyone to search real data without knowing SQL at all.

As a dictionary — Search 991 tables' structure, relationships, and enum definitions in natural language
As a query tool — Securely query staging and production data with automatic PII protection
As a knowledge base — DEAD flags and column annotations surface 10 years of tacit knowledge

The biggest lesson from building this: the real value of MCP is giving AI context. Table structure, relationships, enum definitions, column warnings — when these enter AI's context window, the SQL and code Claude Code writes become dramatically more accurate.

Making that happen required building the graph, securing cross-cloud access, automating permission management, and protecting PII — unglamorous but essential infrastructure, built with care.

I hope this helps anyone wrestling with internal database management at scale.

I'm CTO at airCloset, a fashion rental subscription service in Japan. We're building the future of AI-powered development. If you're interested, check out our engineering careers page.