Takayuki Kawazoe

Posted on May 7

"Killing the 5-MCP setup tax with one PyPI package and Device Code Flow"

#mcp #claude #python #oauth

A few weeks ago, onboarding a new user to Codens looked like this. They opened .claude/settings.json and pasted five MCP server entries, one for each of our product surfaces. Then they ran five separate login commands. Five OAuth callbacks, five JWTs scattered across config files, five chances to typo a URL. The first time I walked someone through it on a call, they very politely said "this is a lot." They were right.

We shipped codens-mcp to fix it. One PyPI package, one MCP entry, one login. Thirty-one tools across Purple (the orchestrator), Red (auto-fix), Blue (E2E QA), Green (PRD), and Auth (shared SSO and billing). And because half our users live on remote dev boxes where browser-based OAuth is a pain, we added a Device Code Flow login that works over SSH, in containers, and inside GitHub Codespaces. This post is about the design choices, including the ones I'm still slightly unsure about.

There's a particular kind of friction in a five-product onboarding that I want to name before we get into the architecture. It's not the time. The actual install commands take maybe four minutes. It's the suspicion. When you're being asked to paste five entries into a config file you don't fully trust yet, every step makes you wonder whether the product is going to be worth the setup. By the third login prompt, half of the users I watched gave up on configuring it correctly and just used Purple in isolation. The product loss from that wasn't a few users. It was the cross-product workflows that never got tried, because nobody ever saw them light up.

Why one package, not five

Codens grew the same way most multi-product systems grow: by accident. Purple shipped first. It had 16 MCP tools and a CLI called purple-codens-mcp. People liked it. Then Red got its own MCP surface for bug reports and fix plans. Then Blue for test generation. Then Green for PRD consultations. Then Auth for signup and pricing lookups.

Each of those products had its own backend, its own JWT, its own MCP server entry. From a code-organization standpoint that was fine. From a user standpoint it was a mess. People don't want to install five packages from PyPI. They want to install one thing and get all the tools.

We considered three approaches:

The first was: keep five packages, add a meta-package that depends on all of them. Clean dependency graph, but users still need five MCP server entries because each package exposes its own stdio binary. That solves nothing.

The second was: collapse everything into purple-codens-mcp and rename the package later. Tempting, but purple-codens-mcp already had a userbase pinning >=X.Y.Z in their lockfiles. Adding 15 new tools to that package would have been a stealth API expansion and the name would have been wrong forever.

The third option, which is what we shipped: a new package called codens-mcp that re-exports Purple's 16 tools and registers Red/Blue/Green/Auth tools alongside them. The package lives in a sibling directory (purple-codens/codens-mcp/) and declares purple-codens-mcp as a runtime dependency. Users who already have purple-codens-mcp in production keep it working. New users install one thing. The bundling cost is one extra dependency in pip list, which nobody is going to notice.

The thing I like about this layout is that it keeps the auth code in exactly one place. Login logic lives in purple_codens_mcp.auth. The new package imports that module rather than copying it. If a future Auth Codens migration changes the OAuth flow, I fix it in one file. The duplication trap is real and we deliberately walked around it.

A subtler benefit of the bundled approach: the four product surfaces share a credential helper. Each tool that calls Red, Blue, or Green does so via a _red_client(api_url) style accessor that reads the same credentials file Purple uses. There's no per-product login state to reconcile. If a token gets refreshed, everyone sees the new value on the next call. Earlier in the design I had each product's tools tracking auth independently, and the corner cases around expired tokens were the kind of thing I'd debug at midnight. Sharing one credential dict made those bugs go away because they couldn't exist in the first place.

The cross-product registration tool

Once all the tools sit in one package, you can write tools that span products. The first one we shipped is codens_register_project_unified. It takes a GitHub repo and registers it across Purple, Red, Blue, and Green in a single call.

Here's the design tension: Purple, Red, Blue, and Green each have their own database and their own /api/v1/projects endpoint. There's no two-phase commit across them. So what happens when you call this tool and Green's API is having a bad afternoon?

The transactional answer would be: roll everything back, fail loudly. But "rolling back" a project creation is annoying because some of those backends fire off webhooks and Slack notifications on creation, and reversing those side effects is messy. Worse, a user who wants to retry shouldn't be punished by having to delete three half-created projects manually.

So we went best-effort. The tool returns a dict like this:

{
  "purple_project_id": "prj_a1b2c3",
  "red_project_id":    "prj_d4e5f6",
  "blue_project_id":   None,
  "green_project_id":  "prj_g7h8i9",
  "errors": [
    {"product": "blue", "error_message": "503 Service Unavailable"}
  ]
}

If three out of four succeed, you get three IDs and a clearly-labeled failure for the one that didn't. The user (or the LLM driving the tool) can re-run the call with products=["blue"] to retry just the failed one. That products parameter defaults to all four, but accepting a subset turns out to be useful in two other situations: when a customer doesn't pay for one of the products yet, and when an LLM is exploring and only wants to register on Purple before committing to the rest.

The honesty of the errors array matters. Earlier drafts of the tool tried to be clever and aggregate failures into a single string. That made it harder for an agent to programmatically decide what to retry. The list-of-dicts shape is uglier in a logfile but trivially correct to parse.

There's still one edge case I'm not happy with. If Purple succeeds but the network drops before Red is called, the user has a Purple project ID and no record that the tool was even attempted on the others. The current answer is "rerun with products=["red", "blue", "green"]," which works, but it relies on the user noticing. A better long-term answer is probably an idempotency key sent to all four backends so retries dedupe naturally. That's on the list.

One thing the cross-product tool does well: it sets the same name, github_owner, and github_repo everywhere. That sounds trivial but it was a real source of bugs in the five-package world, where users would type the repo name slightly differently in each product's MCP tool and end up with acme/my-app registered in Purple and acme/my_app in Green, and then wonder why the cross-product views showed nothing. Centralizing the registration call removes a whole class of typo-driven mismatch. The LLM driving the tool can't accidentally rename the repo halfway through, because there's only one place the repo name is supplied.

Device Code Flow, because SSH

The classic OAuth flow that Codens shipped originally went like this. You ran purple_login from inside Claude Code. The CLI started a tiny HTTP server on a random local port, opened your browser, you signed in with Google, the browser redirected to http://localhost:54321/callback, the CLI captured the auth code, exchanged it for a JWT, stored the JWT, done.

That works beautifully on a laptop. It falls apart in roughly half the environments our users actually work in.

If you're SSH'd into a dev box, there's no browser to open, and even if there were, the redirect to http://localhost:54321/callback would hit the wrong machine. Same story for dev containers, Docker exec sessions, and GitHub Codespaces. You can sometimes paper over it with port forwarding, but you have to remember to set that up before running the login command, and most people don't. They just see a hung CLI and a broken Google sign-in page.

The fix is RFC 8628, the OAuth 2.0 Device Authorization Grant. It's the same flow that lets you sign into your TV's Netflix app by typing a short code on your phone. The CLI never opens a browser locally. It posts to the auth server and receives a device_code, a short user_code, and a verification URL. It prints the URL and the user code. You open the URL on whatever device has a browser, type the code, approve. The CLI is meanwhile polling a token endpoint every few seconds. The moment you approve, the next poll succeeds and the CLI gets a JWT.

Running it looks like this:

$ codens-mcp login
Logging in to https://api.purple.codens.ai via https://api.auth.codens.ai ...

============================================================
  Device Authorization Required
============================================================

  1. Open this URL on any device:
     https://app.auth.codens.ai/device

  2. Enter this code when prompted:
     ABCD-1234

  Waiting for authorization (expires in 15 minutes)...
============================================================

  Authorization complete!

Logged in as you@example.com

You can sign in from your phone while SSH'd into a server. You can sign in from your laptop while pair-programming on a colleague's box. The CLI doesn't care where the browser is.

The polling loop has one detail that's worth calling out. RFC 8628 says the auth server can return a slow_down error to tell the client it's polling too aggressively. When that happens, the spec says the client must add at least 5 seconds to its polling interval. We honor that. It looks unimportant in code, but if you ignore it, a misbehaving client gets rate-limited and the user sees a login that just times out for no obvious reason. The spec is right; pay the 5 seconds.

The other interesting bit: we share the credential file with purple-codens-mcp. Both packages read and write ~/.purple-codens/credentials.json, mode 0600. One login authenticates all the Codens product backends, because Auth Codens is the SSO root and every product backend trusts JWTs it issues. If you have both packages installed, they coexist. If you have only codens-mcp, the file is still at the same path, which means a user can install purple-codens-mcp later and not need to log in again.

Mode 0600 is the kind of thing that's easy to forget. Python's default Path.write_text doesn't restrict permissions. We explicitly chmod 0600 after every write. If a credential file gets group-readable on a shared dev box, the JWT inside it is good for the token's full lifetime and there's no MFA prompt to slow an attacker down. The chmod is one line; it should be one line in every credential-storing CLI.

The polling timeout is set to 15 minutes, which is long enough to walk away from your terminal, find your phone, swear at the captcha, log in, and come back. We considered shorter values. The trade-off is that a 5-minute timeout looks more "responsive" but punishes the user who happened to be in a meeting when they ran the command. Fifteen minutes is what RFC 8628 suggests as a reasonable default and we didn't find a reason to argue with it.

Subcommands, defaults, and not breaking anyone

The CLI structure is argparse with subparsers:

codens-mcp [-h]
codens-mcp login   [--auth-url URL] [--api-url URL]
codens-mcp whoami  [--api-url URL]
codens-mcp serve

serve starts the stdio MCP server. login runs Device Code Flow. whoami prints the email, user ID, organization, and remaining JPY credits for the currently authenticated user.

The only mildly unusual choice is that codens-mcp with no arguments runs serve. That's deliberate. MCP servers in .claude/settings.json are configured by command name. Existing entries written before the CLI got subcommands look like "command": "codens-mcp", "args": []. If we'd made serve mandatory, every existing config would break the moment users upgraded. So parse_args() falls through to _cmd_serve when args.command is None. The cost is that running codens-mcp interactively in a terminal blocks on stdio, which feels weird if you're not expecting it, but the tradeoff is that 0.4.0 is a no-config-change upgrade.

The _cmd_login function is twelve lines. All it does is call purple_codens_mcp.auth.device_code_login, then hand the resulting tokens to PurpleCodensClient.login_with_device_token, which writes them to disk and fetches the user's profile. We deliberately don't reimplement the OAuth flow here. If we did, we'd have two copies of an RFC 8628 polling loop, and one of them would inevitably drift. By delegating, the unified package gets new auth features for free whenever purple-codens-mcp ships them.

This is also why we kept purple-codens-mcp published as a separate package. It's the dependency target. It has the auth module, the client class, the credential storage. codens-mcp builds on top of it. Some users with legacy automation still install purple-codens-mcp directly and we don't want to break them. The deprecation path, if there ever is one, would be to make purple-codens-mcp a thin shim that re-exports from codens-mcp. But that's a future problem and probably not worth solving until the userbase says it is.

What setup looks like now

Three lines of config, two commands:

{
  "mcpServers": {
    "codens": { "command": "codens-mcp", "args": [] }
  }
}

pip install codens-mcp
codens-mcp login

That's it. One install. One login. Thirty-one tools available inside Claude Code: Purple's project and credit management, Red's bug analysis and fix plans, Blue's E2E test generation and execution, Green's PRD consultations and kickoff creation, Auth's signup and pricing lookups, plus the cross-product registration tool. The same JWT works against every backend because Auth Codens is the issuer and every product validates against the same public key.

For users on a laptop, the original browser-based purple_login MCP tool still works exactly as before, and it's slightly faster than Device Code Flow because there's no polling delay. We kept it. The CLI's codens-mcp login is for the headless cases. Both flows write to the same credential file in the same format.

The release timeline was tight. 0.1.0 went out on May 6 with the unified package and the 31 tools. 0.3.0 followed the same day with codens_register_project_unified. 0.4.0 shipped on May 7 with the CLI subcommands and the no-arg serve default. The semver minors reflect API additions, not breakage; every version is install-and-go from the previous one.

If I were starting over, I'd do the unified package first and skip the five-package phase entirely. But the five-package phase is how we figured out which tools each product actually needed, and you can't skip that. The cleanup is the easy part. The hard part is knowing what to keep.

Pointers

Package on PyPI: pypi.org/project/codens-mcp. Help docs with the canonical agent reference, including all 31 tool signatures: help.codens.ai/en/. Codens itself: codens.ai/en/. If you've been hand-rolling MCP server entries for a multi-product setup, the takeaway is: bundle the package, share the credential file, and add Device Code Flow before the first user complains about SSH. The work is small. The friction it removes is not.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.