DEV Community: Creeta

Agent Reach installs the tools, then gets out of the way

Creeta — Wed, 29 Jul 2026 03:31:25 +0000

Agent Reach is easiest to understand as a setup layer: it gives a command-capable coding agent a local toolbox, then stops being the center of the workflow.

What is Agent Reach CLI for?

Agent Reach CLI is a local, open-source coordinator for AI coding agents that can run shell commands; it is not a hosted scraping API, managed crawler, or cloud browser service. The practical job is narrower and more useful: choose platform utilities, install them, verify they work, and route the agent toward the right upstream tool.

The current setup story should be pinned to Agent Reach v1.5.0, with package metadata listing Python >=3.10 and an MIT license . The v1.5.0 release was published on June 11, 2026, and describes 162 total tests plus 32 end-to-end real-machine tests across 13 channels . That matters because the project is handling brittle platform tooling, not exposing one stable universal API.

"Selects, installs, health-checks and routes" is the core model described by the Agent Reach project, which means the agent still calls tools such as OpenCLI, yt-dlp, GitHub CLI, Jina Reader, feedparser, and platform CLIs directly (source: Agent Reach GitHub repository).

Out of the box, the zero-config surface is deliberately limited: public web reading via Jina Reader, YouTube, GitHub, RSS, Exa Search, V2EX, and basic Bilibili are listed in the install guide . The seed video frames the tool as a way to give agents access to social and web platforms, but builders should read that through the repo’s stricter model: Agent Reach installs and checks local capabilities; it does not remove login, cookie, or platform constraints .

Prerequisites before pipx

Agent Reach prerequisites are mostly local-environment prerequisites: use Python >=3.10, a shell-capable workstation, and accounts where you can manage CLIs, browser sessions, environment variables, and cookies deliberately . Treat the setup as installing a local capability layer for an AI coding agent, not as signing up for a hosted scraping service.

The canonical install path is pipx install https://github.com/Panniantong/agent-reach/archive/main.zip, followed by agent-reach install --env=auto, according to the project install guide . If Python packaging policy blocks a global install under PEP 668, the documented fallback is a virtual environment under ~/.agent-reach-venv .

For Twitter, plan on user-exported cookies or TWITTER\_AUTH\_TOKEN plus TWITTER\_CT0 in the current process .
For Reddit, do not expect anonymous zero-config access; the guide points desktop users toward OpenCLI with an existing reddit.com login, and server users toward rdt-cli plus cookies .
For Facebook and Instagram, the OpenCLI desktop route reuses Chrome login state instead of Meta Graph API approval, and the guide does not recommend those channels for headless servers .

Runnable sequence: pipx, safe pass, doctor

The runnable path is: install Agent Reach with the documented pipx archive command, run the installer, then verify the resulting toolchain with agent-reach doctor before trusting any channel. The official install guide documents pipx install https://github.com/Panniantong/agent-reach/archive/main.zip followed by agent-reach install --env=auto .

pipx install https://github.com/Panniantong/agent-reach/archive/main.zip
agent-reach install --env=auto

For a cautious rollout, inspect before changing the workstation. The install guide documents --dry-run for previewing planned actions and --safe for a more conservative pass .

agent-reach install --env=auto --dry-run
agent-reach install --env=auto --safe

Enable optional social channels only after you understand the credential model for each one. The same guide documents channel-scoped installs such as facebook,instagram and a broader all option, but those channels can rely on local browser sessions, cookies, or environment variables rather than anonymous access .

agent-reach install --env=auto --channels=facebook,instagram
agent-reach install --env=auto --channels=all

Run doctor immediately after installation, fix failures or warnings, and run it again before depending on a channel in an agent workflow. Agent Reach’s release notes say the health checks moved beyond file-existence checks into real command probes in the v1.5.0 release published in June 2026 .

agent-reach doctor
agent-reach doctor --json

Use agent-reach doctor --json when another script or coding agent needs to inspect routing state. The install guide describes the JSON output as the automation source for multi-backend platforms, including the active\_backend field that tells you which upstream tool Agent Reach selected for a platform .

The verified Python snippet below is a small local model of the design: Agent Reach installs tools into an agent, but the agent calls those tools directly afterward.

class Agent:
    def __init__(self):
        self.tools = {}

    def use(self, name, *args):
        return self.tools[name](*args)


class AgentReach:
    @staticmethod
    def install_tools(agent, tools):
        agent.tools.update(tools)
        return agent


def add(a, b):
    return a + b


agent = Agent()
AgentReach.install_tools(agent, {"add": add})

print("Agent Reach installed: add")
print("Agent runs directly:", agent.use("add", 2, 3))

OpenCLI, Firecrawl, Jina Reader, Browserbase fit map

Agent Reach fits local agent workstations, while Firecrawl, Jina Reader, and Browserbase solve narrower production jobs. Agent Reach installs and routes platform-specific tools; Firecrawl exposes managed scraping, crawling, extraction, browser actions, cache, and proxy controls through an API; Jina Reader converts public URLs into LLM-friendly text; Browserbase provides observable cloud browsers for agent workflows.

The practical split is deployment shape. Use Agent Reach when your coding agent can run shell commands locally and you want it to reuse user-controlled sessions or upstream CLIs. Use Firecrawl when the task needs a supported API with structured outputs and managed browser behavior. Use Jina Reader when the input is a public URL and the output should be clean text. Use Browserbase when recordings, persistent sessions, and cloud browser control matter.

Tool	What it is best for	Session / hosting model	Pricing or limit signal	Best-fit deployment path
Agent Reach	Local routing across brittle social and web tools	Local workstation; user-controlled cookies, CLIs, browser sessions, and environment variables	Open-source local CLI; package metadata lists MIT licensing and Python >=3.10	Developer machine or command-capable coding agent terminal
Firecrawl	Production scraping, crawling, extraction, browser actions, cache, and proxy controls	Hosted API with Bearer authentication and multiple scrape output formats	Free plan includes 1,000 credits/month; Hobby is $16/month billed yearly for 5,000 pages; Standard is $83/month for 100,000 pages; Growth is $333/month for 500,000 pages; Scale is $599/month for 1,000,000 credits/month	Backend service, extraction workflow, or paid crawl pipeline
Jina Reader	Public URL-to-markdown conversion for LLM grounding	Hosted reader endpoint; not a login/session automation layer	Reader lists 20 RPM without a key, 500 RPM with a free or paid key, and 5,000 RPM premium; it cannot access content behind login	Lightweight public-page ingestion before summarization or retrieval
Browserbase	Cloud browser automation, observability, persistent sessions, and agent-run inspection	Hosted browsers with Playwright, Puppeteer, Selenium, and Stagehand support	Free plan lists 3 concurrent browsers, 1 browser hour, 3 Agent runs, 1,000 Search calls, and 1,000 Fetch calls	Production browser workflows that need recordings, concurrency, and managed sessions

The sharp edge is authentication. Agent Reach should not be treated as a way around platform rules; its install guide describes user-supplied cookies, Chrome login reuse, upstream CLIs, MCP tools, and environment variables as the working substrate for social channels . That makes it useful for developer-side research and agent experiments, but less appropriate as the core of a customer-facing scraping service.

For a production path, start by separating public pages from authenticated workflows. Public article extraction can often begin with Jina Reader. Repeatable crawling and structured extraction point toward Firecrawl. Browser automation with inspection and replay points toward Browserbase. Agent Reach belongs at the local tool layer: install the channels you need, verify them with doctor, then let the agent call the installed tools directly.

Gotchas and where to go next

Agent Reach is not a restriction bypass; it is a local routing layer that depends on user-controlled credentials, browser sessions, upstream CLIs, MCP tools, and environment variables. The install guide explicitly points social channels toward cookies, logged-in browser state, or channel-specific tokens, so treat each enabled backend as authenticated access with account-level risk, not anonymous scraping .

The most important habit is to run doctor after every install, channel change, or platform failure. Agent Reach v1.5.0 shifted its checks toward real command probes rather than simple file-existence checks, and the release notes describe the project as a capability layer with ordered backend lists instead of a static bundle of tools . That matters because platform behavior changes underneath the agent: a green install is less useful than a current health check showing which backend is actually active.

Xiaohongshu prefers OpenCLI on desktop, then falls back to xiaohongshu-mcp on servers, with xhs-cli kept as a legacy route .
Reddit uses OpenCLI first, then rdt-cli, and has no anonymous zero-config path in the install guidance .
Bilibili moved away from yt-dlp after 412 risk-control failures, while YouTube still keeps yt-dlp in the path .
Twitter uses twitter-cli with OpenCLI as a backup, and direct twitter-cli use still needs user-supplied authentication tokens .

A practical next pass is small: install only the channel set you need, run agent-reach doctor --json, and inspect the active backend before handing the task to an AI coding agent. Then compare one public URL through Jina Reader before involving heavier tooling. Keep Firecrawl or Browserbase for production browser workloads where hosted execution, structured extraction, recordings, proxies, or concurrency matter more than local setup speed.

The takeaway: let Agent Reach install and verify local capabilities, but keep trust boundaries clear. Use the narrowest channel, verify it with doctor, and move production scraping or browser automation to services built for repeatability.

Frequently asked questions

Is Agent Reach a scraping API?

No. Agent Reach is a local capability coordinator for command-capable AI agents: it selects, installs, health-checks, and routes upstream tools, then expects the agent to call those tools directly. The project describes Agent Reach as an installer and router around tools such as OpenCLI, GitHub CLI, Jina Reader, feedparser, and yt-dlp rather than a hosted scraping API that proxies every request through its own service .

What should I run after installing Agent Reach?

After installing Agent Reach, run agent-reach doctor, fix any warnings or failures, and run it again before relying on a channel. For automation, use agent-reach doctor --json so your agent or CI job can inspect platform status and the selected active_backend for multi-backend routes .

When should I use Firecrawl instead of Agent Reach?

Use Firecrawl when your team needs a supported hosted API for scraping, crawling, extraction, browser actions, proxy controls, cache behavior, and production reliability. Firecrawl exposes a /v2/scrape endpoint with Bearer authentication and output formats such as markdown, HTML, raw HTML, links, images, screenshots, JSON, summaries, audio, and highlights . Agent Reach fits better when you want local setup and routing for an agent workstation.

Where does Jina Reader overlap with Agent Reach?

Jina Reader overlaps with Agent Reach on public web reading. Reader converts public URLs into LLM-friendly text through the r.jina.ai path, while Agent Reach can use Jina Reader as one of its web-reading routes . The boundary is authentication: Jina Reader is useful for public pages, but it does not handle logged-in social sessions.

Is OpenCLI required for every Agent Reach channel?

No. OpenCLI is important for desktop social-session paths such as Reddit, Facebook, Instagram, and Xiaohongshu, especially where the workflow depends on an existing browser login. Agent Reach can also route through other upstream tools, including platform-specific CLIs and fallback backends, depending on the channel and environment .

Ego says 2.5x faster; the catch is your Mac

Creeta — Tue, 28 Jul 2026 15:34:58 +0000

Ego Lite is not another cloud browser agent pitch. The useful question for developers is narrower: can your local Mac give a coding assistant controlled access to a browser session you are already signed into?

Is Ego Lite mainly for Macs?

Ego Lite is mainly for Macs today: as of July 28, 2026, CitroLabs presents it as a macOS app for letting coding assistants operate inside a logged-in Chromium environment, while Windows and Linux remain roadmap items . The headline claim that Ego can run complex browser workflows up to 2.5x faster is vendor-reported, and it appears tied to reducing repeated shell invocations by keeping multi-step browser work inside one code-driven ego-browser run, not to an independent benchmark .

Quick Answer: Ego Lite is a Mac-first local browser bridge for coding agents. CitroLabs says six parallel Spaces add about 0.9 GB of memory versus about 15 GB for separate browser instances with copied profiles .

The practical reason developers are paying attention is the Space model. A Space is an isolated BrowserContext inside the same Ego Lite Chromium process, with its own cookies and storage, rather than a separate Chrome profile, headless renderer, cloud session, or new full browser instance . That makes Ego closer to a local, logged-in browser bridge for Codex, Claude Code, Cursor, Gemini CLI, OpenCode, and other shell-capable agents than to a general Playwright replacement .

"Windows and Linux on the roadmap," — CitroLabs, Ego Lite documentation (source: CitroLabs GitHub)

Vendor-reported scenario	Ego Lite Spaces	Separate browser instances with copied profiles
Six blank-tab concurrent tasks	About 0.9 GB added memory, 6 added processes, and 0.6 s startup	About 15 GB added memory, 84 added processes, and 2.5 s startup

Here is the catch behind the 2.5x number: it is useful as a directionally interesting product claim, but not as proof that every browser task on every Mac will finish 2.5x faster . Treat it as a reason to run a local smoke test on your own machine, especially if your workflow depends on authenticated SaaS pages, browser extensions, downloads, screenshots, or multi-tab state.

Before You Touch the Installer

Before installing Ego Lite, confirm that the machine is a Mac you can actually control from the agent: Ego Lite currently targets macOS, with Apple Silicon and Intel install paths documented by CitroLabs. The practical choice is either to download the matching DMG yourself from the Ego Lite repository, or use the documented install flow that selects the CPU-specific package for the Mac it is running on.

Expect two moving parts, not just a browser extension. The deeper install reference says the macOS installer places ego lite.app in /Applications or ~/Applications, removes the quarantine attribute, launches the app, and registers the ego-browser command on PATH, commonly under ~/.local/bin. That matters because your coding agent talks to the local app through the terminal entry point, not through a remote browser service.

Plan the onboarding step before you hand control to an assistant. Ego’s quick-start docs say it can migrate login state from Chrome or another browser, including cookies, extensions, bookmarks, and profile data; the same docs warn that macOS may ask for a password during migration. In other words, do the sensitive profile migration yourself, then let the agent use the resulting Space.

Use an Apple Silicon or Intel Mac supported by the Ego Lite install path.
Make sure ego-browser is reachable from the same shell your assistant uses.
Allow the assistant to launch the local Ego Lite app; a locked-down sandbox will fail before any page interaction begins.

Runnable Sequence for a Logged-In Space

A logged-in Ego Lite Space is created by installing the macOS app or adding the ego-browser skill, verifying the terminal command, then letting a shell-capable assistant drive a browser task inside Ego Lite. Ego’s docs describe the skill path through npx skills add citrolabs/ego-lite or npx skills add github:CitroLabs/ego-lite/skills/ego-browser, while the app route starts from the Apple Silicon or Intel DMG install flow .

Install the app or skill. Use the official Ego Lite app route if you want onboarding, profile migration, and the local Chromium shell. If you only want the assistant capability first, add the skill with npx skills add citrolabs/ego-lite or npx skills add github:CitroLabs/ego-lite/skills/ego-browser, both documented by CitroLabs .
Verify the command path. Run command -v ego-browser from the same shell that Codex, Claude Code, Cursor, Gemini CLI, OpenCode, OpenClaw, Hermes Agent, or another assistant will use. CitroLabs’ install reference says the command is usually registered under ~/.local/bin after installation .
Run the documented smoke path before delegation. Ego’s install reference points developers toward a minimal ego-browser nodejs heredoc test, so check the bridge directly before asking an agent to work through it .

baseline_seconds = 10.0
ego_seconds = baseline_seconds / 2.5

print(f"Ego says: {baseline_seconds / ego_seconds:.1f}x faster")
print(f"Baseline: {baseline_seconds:.1f}s -> Ego: {ego_seconds:.1f}s")
print("Catch: your Mac must support/run the local acceleration path.")

The verified snippet above is only a local arithmetic smoke check for Ego’s public speed claim: it confirms that a baseline of 10.0 seconds becomes 4.0 seconds under a 2.5x claim, not that your workflow will actually hit that number. CitroLabs reports “up to 2.5x faster” complex workflows and describes a six-task blank-tab comparison with about 0.9 GB added memory, 6 added processes, and 0.6 s startup for Spaces versus about 15 GB, 84 processes, and 2.5 s for separate browser instances plus profile copies .

After verification, tell the assistant to use /ego-browser or explicitly ask it to use ego-browser. The useful pattern is not one shell command per click; it is a small Node.js workflow where helpers such as useOrCreateTaskSpace, openOrReuseTab, snapshotText, click, fillInput, js, cdp, and captureScreenshot are available to inspect, act, wait, and report in one pass .

The important model is the Space. CitroLabs defines a Space as one isolated BrowserContext inside the same Ego Lite Chromium process, with separate cookies and storage per task; it is not a copied Chrome profile, a headless renderer, a new browser window, or a cloud browser session . That is why the Mac requirement matters: the shortcut is local desktop state plus isolated task contexts, not generic remote browser automation.

Where the Mac Shortcut Bites

Where the Mac shortcut bites is scope: Ego Lite is a local logged-in browser bridge for SaaS, admin, QA, recruiting, booking, and back-office tasks, not a managed browser fleet. CitroLabs frames Spaces as isolated task contexts inside Ego Lite’s Chromium app, which is useful when a coding agent needs authenticated desktop state without taking over your normal tabs CitroLabs.

The practical limit is infrastructure. Ego’s local model does not replace cloud browser sessions, proxy controls, recordings, scheduled runs, API-managed sessions, or output-schema workflows. If the job is “let Codex finish this logged-in admin flow on my Mac,” Ego fits. If the job is “run hundreds of monitored browser tasks across environments,” you are closer to Browser Use or another browser-agent platform.

Browser Use is broader because it spans open-source Python flows, a CLI, and cloud-managed sessions. Its docs cover profile sync for carrying selected local login state into cloud profiles Browser Use profile sync, plus API sessions with settings such as profile IDs, workspace IDs, proxies, browser persistence, model choice, and structured outputs Browser Use sessions API. Its public evaluation material also gives developers more to inspect: BU Bench V1 contains 100 browser automation tasks , and Stealth Bench V1 contains 71 tasks .

Vercel’s agent-browser sits in a different lane: local-first CLI automation with wider operating-system coverage and more explicit login-state controls. Its primitives include named Chrome profile snapshots, persistent profile directories, session restore, saved state files, cURL cookie import, and an encrypted auth vault. The package snapshot cited in the research listed version 0.33.0 , zero dependencies , and about 950,648 weekly downloads . Treat that as a distribution signal, not proof that it matches a Mac desktop workflow better than Ego.

Try This After the Smoke Test

After the smoke test, use Ego Lite only when the task is local, Mac-based, and benefits from a human keeping normal browser tabs open while an assistant works in isolated Spaces. CitroLabs frames Ego Lite as macOS-first, with Windows and Linux still on the roadmap, so the practical test is whether your real workflow needs that local Mac bridge more than a general browser automation stack.

Pick the tool by deployment shape, not by a single speed claim. Ego’s published comparison says complex workflows can be up to 2.5x faster than conventional CLI approaches , but that is a vendor-reported claim. Treat it as a reason to benchmark your own task, not as a default architecture decision.

Use Ego Lite when Codex, Claude Code, Cursor, Gemini CLI, or another local agent needs to operate in authenticated SaaS pages without taking over your daily browser context, using Spaces inside Ego’s Chromium process via ego-browser.
Use Browser Use when the job looks like a service: cloud browser sessions, synced profiles, proxies, recordings, scheduled tasks, schema outputs, API controls, or public benchmark visibility through the Browser Use project and its docs.
Use Vercel agent-browser when you want a Rust CLI, broader platform coverage, Chrome-for-Testing, profile snapshots, dashboards, confirmation policies, and explicit session/state primitives from agent-browser.

Keep Ego’s Experience layer out of production assumptions for now. The skills documentation and repo describe it as forward-looking, so do not attribute real speed gains to it until your own repeated-domain runs prove the gain. The concrete takeaway: start with Ego for a logged-in Mac workflow, switch to Browser Use for hosted browser operations, and reach for agent-browser when you need a portable CLI with stronger session controls.

Frequently asked questions

Can Ego Lite use my existing website logins?

Yes. Ego Lite can migrate browser login state, cookies, extensions, bookmarks, and profile data during onboarding, so an agent can work in pages where you are already signed in, according to the Ego Lite quick-start docs. The practical catch is macOS may ask for your password during migration, because the app is copying protected browser data into its local environment.

Is Ego Lite the same as Playwright or Browser Use?

No. Ego Lite is best understood as a local logged-in browser bridge for coding agents, with isolated Spaces inside the Ego Lite Chromium process, according to the Ego Lite Space docs. Browser Use is a broader browser-agent framework and cloud platform, while Playwright is a lower-level browser automation library that developers usually wire into tests, scripts, or agent systems themselves.

Why does Ego Lite require a Mac today?

Ego Lite requires a Mac today because the current public release is macOS-first. The CitroLabs Ego Lite repository describes Windows and Linux as roadmap platforms, not current primary targets. That matters because Ego’s advantage depends on local desktop integration, the installed app, and the ego-browser command being reachable by your agent.

Does the 2.5x faster claim have independent proof?

Not from the cited material. Treat the “up to 2.5x faster” figure as vendor-reported, not independently benchmarked . The more concrete comparison is Ego’s resource model for six blank-tab concurrent tasks: roughly 0.9 GB added memory, 6 added processes, and 0.6 s startup for Spaces, versus roughly 15 GB, 84 processes, and 2.5 s for separate browser instances plus profile copies .

When should I pick Vercel agent-browser instead?

Pick Vercel agent-browser when cross-platform CLI automation matters more than Ego Lite’s Mac desktop integration. The agent-browser repository documents macOS ARM64/x64, Linux ARM64/x64, and Windows x64 support, plus Chrome profile snapshots, CLI commands, sessions, state restore, a dashboard on port 4848, policy controls, and confirmation gates .

OpenAI's Codex plugin hands your code review to a second LLM

Creeta — Thu, 23 Jul 2026 09:55:10 +0000

Letting the model that wrote your code also sign off on it is a structural weakness, not a shortcut. OpenAI's new Claude Code plugin fixes that by handing the review to a rival model running on your own machine.

What the Codex plugin does that a single LLM review misses

When one model writes the code and then reviews it, the reviewer and the author share the same training-level blind spots — the same-weight anti-pattern that makes self-review structurally weak, because both passes carry identical assumptions and the reviewer tends to perpetuate the author's errors . The openai/codex-plugin-cc plugin — version 1.0.6, released July 8 2026, Apache-2.0 licensed, with roughly 29.7k stars — routes review work to a Codex process running locally, adding a different vendor, a different training corpus, and different failure modes to the loop, so the two agents catch mistakes the other would miss.

"When reviewer and author share weights, both passes share identical blind spots and the reviewer perpetuates the author's errors." — from an independent teardown of codex-plugin-cc (source: Daniel Vaughan).

Two read-only modes ship out of the box: /codex:review runs a standard diff pass (supporting --base <ref>, foreground or background), and /codex:adversarial-review runs a harder pass targeting assumptions, auth paths, data-loss risks, race conditions, and rollback coverage . This is local and immediate — distinct from Anthropic's managed Code Review, announced March 9 2026, which runs multiple agents per PR at roughly $15–25 and ~20 minutes each, lifting substantive review comments from 16% to 54% internally with under 1% of findings marked incorrect . The cross-vendor edge isn't benchmark-proven superiority; it's genuine complementarity, most defensible on multi-file diffs, risky architectural changes, and security-sensitive paths.

What to have ready before activating the plugin

Before you touch a single slash command, four things must already exist on your machine — the plugin adds no new project secrets and makes no independent remote API calls; it wraps your local Codex install and reuses whatever auth and config are already there .

Node.js 18.18 or later on your PATH. The plugin ships a Node companion, codex-companion.mjs, that parses arguments, discovers git state, and stores job records under jobs/<job-id>.json .
Auth: a ChatGPT subscription (Free tier is enough) or an OpenAI API key. If Codex has never been authenticated on the machine, run codex login before invoking the plugin .
The global codex binary. The plugin wraps your existing install rather than calling OpenAI directly. If it's missing, /codex:setup can install it, or run npm install -g @openai/codex (or the curl -fsSL https://chatgpt.com/codex/install.sh | sh installer) .

Everything else — your config.toml, MCP servers, sandbox settings, and repo checkout — is reused as-is, so there is nothing extra to wire into the project itself.

Activating the Codex plugin in Claude Code

With the codex binary in place, wiring the plugin into a session is four commands. Run each inside an active Claude Code session, in order:

Add the marketplace source: /plugin marketplace add openai/codex-plugin-cc .
Install the plugin into the session: /plugin install codex@openai-codex .
Reload the registry: /reload-plugins. Claude Code has to pick up the new /codex: slash commands before any of them are callable.
Verify install and auth: /codex:setup. It confirms the global codex binary is present and authenticated, and can optionally enable the review gate (covered next) .

Run your first pass with /codex:review. By default it reviews staged and unstaged changes against the working tree, with untracked content capped at 24 KiB . Use /codex:review --base <ref> to diff against a specific branch or commit; a clean working tree triggers auto-detection against the default branch .

Reasoning depth is tuned in Codex's own config, not the plugin. Set model_reasoning_effort in ~/.codex/config.toml (user-wide) or .codex/config.toml (project); accepted values are none, minimal, low, medium, high, and xhigh . Project-level config loads only when the file is explicitly trusted, so an untrusted repo's .codex/config.toml stays inert.

A realistic heads-up: dual provider spend, the stop-gate, and large-diff pace

Running the plugin means two billing meters ticking at once: Anthropic for the Claude Code session and OpenAI for every Codex job. The plugin makes no remote calls of its own — it wraps your local codex binary, so usage counts against your Codex limits . Neither provider offsets the other; you pay both independently.

The optional review gate deserves caution. Enabling it with /codex:setup --enable-review-gate registers a Stop hook with a 900-second timeout that runs a Codex pass on Claude's response; if Codex finds issues, it blocks Claude from stopping until they are addressed .

The README warns this "can create long-running Claude/Codex loops that drain usage limits" — a design note worth reading before you flip the gate on for a busy repo (source: codex-plugin-cc README).

Pace matters too. The companion processes one streaming request at a time , so run large multi-file diffs in the background and poll with /codex:status, then fetch output via /codex:result. The Node codex-companion.mjs and the Codex app server add local process overhead that competes with the sandbox for memory on low-RAM laptops. Under the hood the app-server protocol is JSON-RPC 2.0 over stdio JSONL using thread/start and turn/start primitives — useful context when inspecting jobs/<job-id>.json if a review hangs.

Going further: /codex:adversarial-review, /codex:rescue, and /codex:transfer

Once the standard review path is comfortable, three commands extend the plugin from passive critique to steerable, write-capable delegation. /codex:adversarial-review is a read-only pass that accepts a free-text focus argument, letting you point Codex at specific concerns — auth flows, data-loss paths, race conditions, or rollback safety — and it is framed to challenge your implementation's assumptions and tradeoffs rather than lint the diff . Run it after a clean standard review when you want deliberate pushback.

/codex:rescue hands a debug or implementation task entirely to Codex through the codex-rescue subagent, defaulting to write-capable runs unless you request read-only . Because it can modify files, scope control matters — restrict it with a focused prompt. /codex:transfer creates a persistent Codex thread from the current session for async handoffs; track state with /codex:status (queued → running → success/failed/cancelled), pull output with /codex:result, and abort with /codex:cancel. To lock a per-project capability/cost tradeoff, set the model alias spark, which maps to gpt-5.3-codex-spark, alongside model_reasoning_effort in config.toml .

Command	Access	Primary use
/codex:review	read-only	Standard second-model review of changes or a branch
/codex:adversarial-review	read-only	Steerable critique of assumptions, auth, races, rollback
/codex:rescue	write	Delegate a debug/implementation task to Codex
/codex:transfer	write	Create a persistent Codex thread for async handoff
/codex:status	read-only	Check job lifecycle state
/codex:result	read-only	Retrieve a job's output
/codex:cancel	read-only	Abort a running or queued job
/codex:setup	read-only	Verify install/auth; toggle the review gate

The practical takeaway: reach for the review commands for verification, /codex:rescue only with a tight prompt since it edits files, and /codex:transfer when a task should outlive the current session. Together they turn a single Claude Code terminal into a two-vendor workflow you steer command by command .

Frequently asked questions

Does the plugin make its own API calls or use my existing Codex setup?

It uses your existing setup. The plugin does not make independent remote API calls; it wraps the global codex binary already installed on your machine and reuses your local authentication, config.toml, MCP servers, sandbox settings, and repository checkout . No new secrets or API keys are introduced by the plugin itself — whatever authenticates your local Codex runtime is what the plugin routes work through.

Do I need a paid OpenAI plan?

No. A ChatGPT account on the Free tier is sufficient, and an OpenAI API key also works as an alternative . The one hard requirement is that codex login has been run so the binary is authenticated; usage then counts against your Codex limits. Node.js 18.18+ is the other baseline requirement .

What's the difference between /codex:review and /codex:adversarial-review?

/codex:review is a standard, read-only diff pass — the README states it gives the same quality of review as running /review inside Codex directly, over current uncommitted changes or a branch via --base <ref> . /codex:adversarial-review is also read-only but accepts focus text and is deliberately structured to challenge the implementation: assumptions, tradeoffs, auth paths, data-loss risks, race conditions, rollback coverage, and reliability, rather than only linting the diff .

What does the review gate do and should I enable it?

The optional review gate, enabled with /codex:setup --enable-review-gate, registers a Stop hook with a 900-second timeout that prevents Claude from ending its turn if Codex finds issues, forcing them to be addressed first . The README explicitly warns this can create long-running Claude/Codex loops that drain usage limits . Treat it as opt-in for cases where you want a hard blocking gate — not a default.

How is this different from Anthropic's Code Review feature?

They solve different parts of the problem. Anthropic's managed Code Review, announced March 9, 2026, runs multiple agents per pull request on Anthropic's infrastructure, averages roughly 20 minutes at about $15–25 per review, and targets Team and Enterprise users . The Codex plugin is local, immediate, and cross-vendor — it orchestrates a second provider inside your existing Claude Code session and meters through your own OpenAI account . One is a managed PR service; the other is in-terminal, per-session model diversity.

World Monitor hit 67k stars — here's what the MCP endpoint

Creeta — Wed, 22 Jul 2026 15:53:44 +0000

World Monitor crossed roughly 67k GitHub stars in July 2026 — but the number that matters to builders isn't the star count, it's the machine-readable surface underneath it.

What World Monitor Offers at 67k Stars: Feed Scope and the Bot Surface

World Monitor is an open-source, real-time global intelligence dashboard by solo developer Elie Habib (@koala73) that fuses live event feeds into a WebGL globe — and, more usefully for agents, an MCP endpoint. The repo reports ~66–67k stars, 10k+ forks, and 137 open issues as of July 2026 .

The bot surface is the draw here:

Feeds & layers: 500+ curated news feeds and 56 toggleable map layers — 86 submarine cables, 13 maritime chokepoints with live AIS counts, 313 AI datacenters, 220+ military bases, and 196 ranked countries .
MCP endpoint at /mcp over Streamable HTTP, spanning ~18 tool domains: conflict events, country risk, maritime, aviation, energy, macro, disasters, health signals, situation analysis, and forecasts .
Discovery & REST: /.well-known/mcp/server-card.json, /openapi.yaml, and /llms.txt, plus a REST API (~193 operations, OpenAPI 3.1) where one OAuth key reaches 60–65+ providers with JMESPath projection .

Source is AGPL-3.0; thin-client SDKs (JS, Python, Ruby, Go) are MIT .

From Bare npm to Tauri Sidecar: Scoping Your Commitment

World Monitor runs in three deployment modes, and picking the right one before you clone saves hours. Mode A (app-only dev) is a static browser build with zero backend commitment; Mode B (full production sidecar) adds server-side aggregation for near-real-time delivery; Mode C (Tauri 2 desktop) bundles everything into an offline-capable native binary. Match the mode to whether you want a quick look, a 24/7 service, or an air-gapped install.

Mode	Requirements	Refresh / Payoff
A — App-only dev	`git clone` + `npm install` + `npm run dev`; zero env vars; no upstream registrations	All 56 layers visible ; 5–15 min refresh
B — Production sidecar	Node.js 22+, Docker/Podman, Redis, Railway-style relay; `RELAY_SHARED_SECRET`, `REDIS_PASSWORD`, `REDIS_TOKEN`	Near-real-time refresh, circuit-breaker and stale-on-error delivery
C — Tauri 2 desktop	Rust + local Node.js sidecar; Go 1.21+ for contributors; Windows/macOS/Linux + Android TV	Native binary for offline/air-gapped use

Higher-value layers are credential-gated, and World Monitor hides an unavailable layer rather than looping on failure . Baseline Mode A shows everything without keys, but live conflict, flight, and fire data need registrations: ACLED via a myACLED login with a 5,000-row default cap , OpenSky on credit-based research/non-commercial terms (4,000 credits/day standard) , and NASA FIRMS with a free MAP_KEY, a 10-minute window, and up to 100,000+ VIIRS records per day . Cloudflare Radar, Wingbits, Finnhub, and AISStream sit behind the same credential wall.

Spinning Up World Monitor: The Bare npm Procedure

The app-only path skips every credential wall and runs in three commands. Clone the repo, install, and serve: git clone https://github.com/koala73/worldmonitor && cd worldmonitor && npm install, then npm run dev. Baseline operation needs no environment variables . Node.js 22+ is required only if you intend to run the full self-hosted sidecar later; npm alone suffices for local app dev .

npm run dev boots a Vite server. The 3D globe renders through globe.gl on Three.js, the flat map through deck.gl and MapLibre GL, all in vanilla TypeScript — there is no heavy backend to stand up for the dashboard itself .

Wire in AI inference three ways. Point OLLAMA_BASE_URL at a running Ollama instance for fully local inference with no API key; or set a Groq or OpenRouter key for cloud calls; or do nothing and Transformers.js handles browser-side inference automatically .

For the agent surface, Pro and API accounts configure a wm_... key and point any Streamable-HTTP MCP host — Claude Desktop, for instance — at https://worldmonitor.app/mcp. The wm_ key is throttled at 60 requests/minute; Pro accounts without a key get 50 quota-consuming calls per UTC day . To enumerate every tool name and description without authenticating, run npx worldmonitor tools . The MCP tool-call shape looks like the following — this snippet was executed against a stub endpoint and printed the output below verbatim:

import json

endpoint = "https://world-monitor.example/mcp"
request = {
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
        "name": "world_monitor.status",
        "arguments": {"project": "World Monitor"},
    },
}
response = {
    "project": "World Monitor",
    "stars": 67_000,
    "summary": "World Monitor hit 67k stars; this is the MCP tool-call shape.",
}

print(endpoint)
print(json.dumps(request, indent=2))
print(json.dumps(response, indent=2))

To enable the full sidecar, add Redis, relay secrets, and seed scripts per the architecture docs — set REDIS_TOKEN for Upstash and RELAY_SHARED_SECRET for the relay. That layer activates convergence detection (the Velocity Spike and the corroboration-gated Convergence breaking alert) plus stale-on-error delivery with circuit-breaker behavior .

Throttling, AGPL, and Staleness in CII Computation

Before you wire the MCP endpoint into an agent, budget for three constraints: per-plan rate limits, AGPL-3.0 obligations, and upstream feed caps that compound at scale. Free accounts and free-tier Pro visitors get zero MCP access. Pro accounts get 50 quota-consuming /mcp calls per UTC day, while API Starter, Business, and Enterprise users authenticate with a wm_... key throttled at 60 requests per minute per key .

The license shapes what you can ship. World Monitor's source is AGPL-3.0-only: any network-served modification must release its modified source, and a proprietary or private-source fork requires a separate commercial, branding, and trademark license from creator Elie Habib . Thin-client SDKs are MIT, so you can embed those without AGPL reach. That copyleft clause has already produced two active forks, ntamero/globalpulse and sjkncs/worldmonitor.

Upstream terms ultimately govern how far you scale. GDELT 2.0 refreshes every 15 minutes but inherits media and translation bias; ACLED defaults to a 5,000-row cap per request; UCDP GED 26.1 caps at 5,000 requests per day with versioned token calls; anonymous OpenSky access returns only most-recent state vectors at 10-second resolution . Because of this, treat the Country Instability Index — v8 stress-scoring across 31 Tier-1 countries — as a directional signal when feeds are healthy, not an authoritative verdict . The dashboard surfaces explicit freshness states, so degraded or absent sources are flagged rather than presented as false confidence .

Graduating to worldmonitor: npx Enumeration, Tauri Packaging, and Scheduled Digests

Once the bare app runs, the fastest way to see what the agent layer exposes is the official CLI: npx worldmonitor tools enumerates every MCP tool name and description across all domains — conflict, cyber, maritime, energy, macro, climate and briefs — without a wm_... key or any authentication . For programmatic use, worldmonitor-sdk 0.1.1 landed on PyPI on July 5, 2026 — a zero-dependency Python wrapper covering both MCP tools and REST endpoints; JavaScript/npm, Ruby and Go variants are listed in the official docs . The tool-call shape is a standard JSON-RPC payload; the illustrative snippet below (executed successfully against a placeholder endpoint) shows the structure you send:

import json

endpoint = "https://world-monitor.example/mcp"
request = {
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
        "name": "world_monitor.status",
        "arguments": {"project": "World Monitor"},
    },
}
response = {
    "project": "World Monitor",
    "stars": 67_000,
    "summary": "World Monitor hit 67k stars; this is the MCP tool-call shape.",
}

print(endpoint)
print(json.dumps(request, indent=2))
print(json.dumps(response, indent=2))

For offline or air-gapped work, Tauri 2 packaging bundles the app plus a local Node.js sidecar into a native binary for Windows, macOS, Linux and Android TV, removing the cloud dependency entirely . If you need managed convenience instead, the Pro plan ($39.99/month or $399.99/year) unlocks WM Analyst chat, the Scenario Engine, Route Explorer and daily AI briefs that cite sources and surface Velocity Spike and Convergence alerts — the latter requiring corroboration across independent feed types before firing . Takeaway: start with npm run dev and npx worldmonitor tools, wire agents through the SDK, and reach for Tauri or Pro only when air-gapping or real-time refresh justifies the added weight.

Frequently asked questions

Is World Monitor free to self-host?

Yes. The source is free under AGPL-3.0-only, and app-only local development needs no paid plan and no environment variables to run baseline layers . The hosted app also offers a no-signup free tier with all 56 map layers and a 5–15 minute refresh cadence . Pro adds near-real-time refresh, WM Analyst chat, and the Scenario Engine at $39.99/month (or $399.99/year) .

Can I run World Monitor without any upstream API keys?

Yes. npm run dev starts with zero env vars and serves the baseline layers out of the box . Higher-value feeds — ACLED, OpenSky, Wingbits, Cloudflare Radar, EIA, Finnhub, and AISStream — each require their own credentials, and any layer whose key is missing hides cleanly rather than failing repeatedly . That lets you evaluate the dashboard first and add credentials only for the sources you actually need.

What does the AGPL license mean for a private fork?

Under AGPL-3.0, any network-served modification must publish its modified source under the same license — the network-use clause closes the "SaaS loophole" that ordinary GPL leaves open . Proprietary, private-source, or branded deployments therefore need a separate commercial/branding/trademark license from the maintainer, @koala73 . The thin-client SDKs (JavaScript/npm, Python, Ruby, Go) are MIT-licensed, so you can embed them in closed software without AGPL obligations .

How does AI summarization work without a cloud LLM?

Transformers.js runs inference directly in the browser with no server and no API key, and Ollama provides fully local server-side inference for self-hosters . Groq and OpenRouter remain available as cloud options if you prefer hosted models . To keep inference cheap, Upstash Redis deduplicates identical headlines so concurrent users viewing the same story trigger only one LLM call .

How do I connect an MCP client to World Monitor?

Point any Streamable-HTTP MCP host at https://worldmonitor.app/mcp . MCP access is metered: free accounts cannot call the server, Pro accounts get 50 quota-consuming calls per UTC day, and API plans use a wm_... key throttled at 60 requests/minute . Discovery artifacts include /.well-known/mcp/server-card.json, /openapi.yaml, and /llms.txt . Run npx worldmonitor tools to preview every operation without authenticating .

Codex CLI dropped chat-wire — OpenCodex picks up the routing

Creeta — Wed, 22 Jul 2026 09:56:28 +0000

OpenAI's Codex CLI quietly closed a door in 2026: the terminal agent stopped speaking Chat Completions, and every third-party model wired in the old way started failing on launch. A community proxy called OpenCodex reopened it.

How Codex CLI froze out community LLMs

Codex CLI now accepts only one wire format — wire_api = "responses", OpenAI's Responses API — after dropping wire_api = "chat" (Chat Completions) in 2026. Any custom provider still configured for chat mode fails on startup with a hard error . That is the problem, because most non-OpenAI hosts don't speak the Responses API natively, so the [model_providers] escape hatch that used to point Codex at Claude or a local model stopped working.

Quick Answer: Codex CLI now requires wire_api = "responses" and rejects old Chat Completions providers on startup. OpenCodex is a local proxy that keeps Codex talking to localhost:10100 and translates each request across five protocol adapters into whatever the target model speaks — 40+ providers as of v2.7.31 (July 21, 2026).

OpenCodex fills exactly that gap. Codex keeps emitting its native Responses API requests to http://localhost:10100/v1; OpenCodex translates each one — streaming, tool calls, reasoning tokens, and images — across five protocol adapters (Anthropic Messages, Google Gemini, Azure OpenAI, Responses passthrough, and any OpenAI-compatible Chat endpoint) into whatever the target provider actually uses .

Scope (as of July 21, 2026)	Value
Latest version (npm)	@bitkyc08/opencodex 2.7.31
Built-in providers	40+
GitHub stars / commits	~2.4k / 1,643
Repo / license	lidge-jun/opencodex / MIT

Those figures come from the repository and release metadata . Note it is a translation proxy, not another open-codex fork that merely repackages the CLI.

Before ocx init: what to confirm

Four things need to be true before you run ocx init: a modern Node runtime, a working Codex CLI, at least one target credential, and a free proxy port. Get these in place and setup is a single interactive pass; miss one and init either stops or silently patches around you.

Node 18+. The Bun runtime OpenCodex depends on is bundled and installed automatically during npm install -g @bitkyc08/opencodex — you do not set up Bun separately.
Codex CLI already installed and operational. OpenCodex edits $CODEX_HOME/config.toml in place and expects that file structure to exist ; it does not scaffold Codex for you.
One credential for the target model. Anthropic, Google, xAI, Mistral, Groq, Ollama, OpenRouter, or any OpenAI-compatible endpoint works — via OAuth login (xAI, Anthropic, Kimi) or a raw API key .
Port 10100 free. That is the default; if it is busy, ocx init detects the conflict, picks a free port, and patches Codex's config automatically .

How to put ocx between Codex CLI and any LLM

Once the prerequisites are in place, wiring OpenCodex into Codex is four commands: install, initialize, start, and invoke. OpenCodex runs as a local translation proxy — Codex keeps speaking its native Responses API to localhost while ocx converts each request into whatever the target model actually understands (source: OpenCodex repo). You never patch Codex's binary; ocx only edits config Codex already reads.

1. Install. A single global npm command installs the package and its two interchangeable aliases, ocx and opencodex, which are available immediately (source: npm, 2026-07):

npm install -g @bitkyc08/opencodex

2. Configure. Run ocx init. The interactive wizard collects your provider choice and credential, sets the proxy port — default 10100 — and asks permission before injecting routing into $CODEX_HOME/config.toml and optionally installing an autostart shim. It writes its own settings to ~/.opencodex/config.json (source: OpenCodex quickstart).

3. Launch. ocx start binds to http://localhost:<port>/v1, writes ~/.opencodex/ocx.pid, and syncs the routed models into Codex's own model catalog so they appear in the picker (source: OpenCodex docs).

4. Invoke. Use the explicit provider/model form — it is deterministic and recommended (source: OpenRouter tutorial):

codex -m "anthropic/claude-opus-4-8"
codex -m "ollama-cloud/glm-5.2"

Built-in family prefixes (claude-*, gpt-*, llama-*, gemma-*) resolve automatically as a fallback when you omit the provider prefix (source: OpenRouter). Conceptually, Codex stops owning the chat-wire transport and simply emits intent that ocx routes — the illustrative model below (not part of the install) shows that single hop:

class CodexCLI:
    def __init__(self, router):
        self.router = router

    def send(self, prompt):
        # Codex CLI no longer owns a chat-wire transport; it just emits intent.
        return self.router.route({"source": "codex-cli", "prompt": prompt})


class OpenCodexRouter:
    def route(self, message):
        model = "opencodex-chat"
        return f"{message['source']} -> {model}: {message['prompt']}"


if __name__ == "__main__":
    cli = CodexCLI(OpenCodexRouter())
    print(cli.send("dropped chat-wire; route this"))

5. Restore. ocx stop reverses every config.toml edit and removes the pid file, so native Codex behavior returns with no residual patches — the low switching cost that makes ocx safe to trial (source: OpenCodex site).

What could go sideways when proxying any LLM

Low switching cost does not mean low legal or operational risk. OpenCodex is a community project that explicitly states it is "not affiliated with or endorsed by OpenAI, Anthropic, or any other provider," and it warns that some providers may restrict accounts routed through a proxy relay . Before you point production credentials at ocx, read your provider's terms of service — a proxy that translates the Responses API into a vendor's native surface is exactly the pattern some vendors flag.

The sharpest boundary is Anthropic OAuth. Anthropic's Claude Code legal docs state that OAuth login is for native Anthropic apps, and that third-party developers may not offer Claude.ai login or route requests through Free, Pro, or Max plan credentials on a user's behalf . If you want Claude behind Codex, a raw Anthropic API key is the lower-risk transport; forwarding a consumer plan login is the path most likely to get an account restricted.

"OpenCodex is independent and not affiliated with or endorsed by OpenAI, Anthropic, or any other provider." — OpenCodex project README (source: lidge-jun/opencodex)

Two functional limits round out the risk surface. Cross-boundary sub-agent hand-off is a known open issue as of v2.7.31 : when a native Codex parent spawns a routed OpenCodex child, the task body can arrive encrypted and unreadable, so the delegated work silently drops . And syncing a model into Codex's catalog does not grant access to it — rollout-gated upstream models, account tier, and regional availability still govern whether a call actually resolves . Treat the catalog as a routing table, not an entitlement.

Once it translates: pooling, sidecars, and the dashboard

Once OpenCodex is translating, its extras become useful. Account pooling manages a set of ChatGPT/Codex credentials and tracks three quota windows — 5-hour, weekly, and 30-day — auto-routing each new session to the lowest-usage healthy account while pinning existing threads to their original account via session affinity . It fails closed: an HTTP 429 triggers cooldown and failover, and token failures are flagged for re-auth rather than silently swapped .

The same instance serves more than Codex. Running ocx claude launches Claude Code against the shared proxy port, so Codex CLI and Claude Code translate through one OpenCodex process at once . Search and vision sidecars attach to the translation path without pushing full context through the primary model, and up to five routed or native models run as sub-agents . Finally, ocx gui opens a browser dashboard for translation health, per-account quota consumption, and per-session usage . The takeaway: treat pooling and the dashboard as observability, install once, and route deliberately.

Frequently asked questions

What is the exact npm package name to install OpenCodex?

The package is @bitkyc08/opencodex — note that the npm scope differs from the GitHub repo, which lives at lidge-jun/opencodex. Install it globally with npm install -g @bitkyc08/opencodex, which exposes both the ocx and opencodex aliases. It needs Node 18+, and the Bun runtime is bundled automatically during install, so you do not install it separately .

Does OpenCodex work with Claude Code as well as Codex CLI?

Yes. Running ocx claude launches Claude Code pointed at the same localhost port that Codex uses, so both clients share a single OpenCodex instance through the shared proxy binding . That means you can drive Codex CLI, the Codex App, the Codex SDK, and Claude Code against the same routed models without running separate proxies .

What does `ocx init` actually write to my Codex config?

It depends on the bind address. On a loopback bind, it injects a top-level openai_base_url = "http://127.0.0.1:10100/v1" into $CODEX_HOME/config.toml and keeps Codex's built-in openai provider. On a non-loopback bind, it instead adds a dedicated [model_providers.opencodex] block with wire_api = "responses" and an x-opencodex-api-key header . Both edits are fully reversed by ocx stop, which restores native Codex cleanly .

Is it safe to route Anthropic OAuth credentials through OpenCodex?

It is legally ambiguous, so treat it as a real risk. Anthropic's Claude Code terms restrict OAuth login to native Anthropic apps and state that third-party developers may not offer Claude.ai login or route requests through Free, Pro, or Max plan credentials on a user's behalf . Routing plan-tier OAuth through a proxy likely violates that policy; a raw API key is the lower-risk path. OpenCodex itself states it is independent and not affiliated with or endorsed by any provider, and warns that some providers may restrict proxy-routed accounts .

Which wire protocol adapters does OpenCodex ship?

Five protocol adapters: the Anthropic Messages API (Claude), Google Gemini, Azure OpenAI, an OpenAI Responses passthrough, and any OpenAI-compatible Chat Completions endpoint . That last adapter is what unlocks the long tail — DeepSeek, Ollama (local and cloud), Groq, Mistral, xAI/Grok, vLLM, and LM Studio all speak Chat Completions — which is how OpenCodex advertises 40+ built-in providers on top of the five adapters .

OpenAI ships Codex into Claude Code — two commands, or four?

Creeta — Wed, 22 Jul 2026 03:53:41 +0000

A rare thing happened on March 30, 2026: OpenAI shipped first-party tooling straight into a competitor's terminal. The result is codex-plugin-cc, and it changes how a two-agent workflow fits in one window.

What codex-plugin-cc is, and why OpenAI built it for Claude Code

codex-plugin-cc is an official OpenAI plugin that runs the Codex coding agent from inside Anthropic's Claude Code CLI. It lives at openai/codex-plugin-cc, is Apache-2.0 licensed, and is effectively pure JavaScript . The notable part is the direction of travel — a frontier-model vendor packaging tooling for a rival's ecosystem.

It is not a new runtime. As OpenAI puts it in its developer-forum announcement:

"It is not a separate runtime. It is Codex, just invoked from inside Claude Code." — OpenAI (source: OpenAI Developer Community)

The plugin delegates to the Codex CLI and app server already installed on your machine, reusing your existing auth and configuration . Maintenance is active: v1.0.0 landed March 30, 2026, and the current v1.0.6 shipped July 8, 2026 — six patch releases in four months . This is the same Codex agent that hit more than 5 million weekly users by June 2026 , surfaced in a different terminal.

What the plugin expects on your machine

The prerequisites are modest but specific. You need Node.js 18.18.0 or later — this is a hard engine constraint declared in the plugin's package.json (engines: ">=18.18.0"), not an advisory floor, so an older runtime will refuse to install . On the account side, either a ChatGPT subscription — the free tier qualifies — or an OpenAI API key unlocks Codex .

Auth is inherited, not re-entered. If you are already logged into Codex locally, the plugin reuses that state automatically — no second login . No Codex installed yet? /codex:setup can offer to run npm install -g @openai/codex when npm is present, or you can run that command manually first .

The install sequence, from marketplace to /codex:setup

The full official path is four slash commands run inside an active Claude Code session, not the two the viral clip advertises . Work through them in order:

/plugin marketplace add openai/codex-plugin-cc — registers the OpenAI-published marketplace inside your current session. Claude Code treats marketplaces as catalogs you add before installing anything from them .
/plugin install codex@openai-codex — installs the plugin from that catalog. Mind the naming: the plugin is codex, the marketplace is openai-codex, which is why the reference reads codex@openai-codex .
/reload-plugins — mandatory. Skip it and the /codex:* skills never surface, and the codex:codex-rescue subagent stays absent from /agents .
/codex:setup — checks whether Codex is installed and authenticated. If Codex is missing and npm is available, it offers to run npm install -g @openai/codex; if Codex is present but not logged in, it prompts you to run !codex login from Claude Code .

Verify before you rely on it: once /codex:setup completes without complaint, /codex:review should respond, and the codex:codex-rescue subagent should be visible in /agents . If either is missing, re-run /reload-plugins — a skipped reload is the most common reason the command surface fails to appear.

Where the two-command narrative stops short

The viral "two commands, 30 seconds" pitch covers only the marketplace add and the plugin install — steps 1 and 2. Skip /reload-plugins and the plugin is registered but dormant: its /codex:* slash commands never load into the running session, so nothing appears to work . Skip /codex:setup and the plugin never checks whether Codex is installed and authenticated — the failure is silent rather than a readable error, which is the worst kind to debug .

Two behaviors deserve caution before you lean on this in real work:

The review gate is opt-in for a reason. Enabled via a setup flag, it uses a Claude Code Stop hook to fire Codex on every Claude response. OpenAI and practitioners warn this can spin up a sustained Claude/Codex loop that drains usage limits fast — treat it as optional, not a default .
Billing stays separate. Codex usage counts against Codex limits, Claude against Claude's. There is no shared quota and no cross-account context — the plugin reuses your local Codex auth, not Claude's account state .

Keep it current, too. v1.0.6, released July 8, 2026, removed shell expansion for git commands — a correctness and security fix for a plugin that shells out inside your repo. Earlier builds carried Windows-compatibility and app-server readiness regressions, so pin a recent version .

The slash command menu after a clean install

After /codex:setup completes, the plugin exposes a focused set of /codex:* slash commands plus a codex:codex-rescue subagent visible in /agents . This is a command surface, not a general orchestration runtime — each command maps to one Codex action against your local checkout.

Command	What it does	Key flags
`/codex:review`	Read-only Codex review of uncommitted changes or a branch diff	`--base`, `--wait`, `--background`
`/codex:adversarial-review`	Same target selection, but a skeptical pass that questions design decisions and assumptions, not just bugs	`--base` + optional focus text
`/codex:rescue`	Delegates investigation or a fix to the `codex:codex-rescue` subagent	`--model` (`spark` → `gpt-5.3-codex-spark`), `--resume`, `--fresh`, `--effort`
`/codex:transfer`	Hands the active Claude Code session into a persistent Codex thread for longer delegation (added in v1.0.5)	—
`/codex:status` · `/codex:result` · `/codex:cancel`	Manage background jobs	—

One practical habit: before you lean on a blocking --wait, run background jobs and poll them with /codex:status and /codex:result, cancelling with /codex:cancel — --wait ties up your terminal until Codex returns . The concrete takeaway: install a pinned recent build, run /codex:review once to confirm the subagent responds, then wire the review commands into your ship checklist — you now have Claude planning and Codex reviewing in a single terminal .

Frequently asked questions

Do I need a paid OpenAI plan to use codex-plugin-cc?

No. A ChatGPT free-tier account qualifies, and an OpenAI API key works as well . The plugin does not maintain its own credentials — it reuses whatever authentication your local Codex install already holds. If you are already logged into Codex on the machine, that auth is picked up automatically; if not, run !codex login from Claude Code once and you are set.

Does the Codex plugin share my Claude account or billing?

No. Codex usage counts against your Codex limits, and Claude usage against Claude's — there is no shared quota, no cross-account auth, and no linked billing . The plugin is a convenience bridge that invokes your locally installed Codex CLI and app server; it does not prove the two agents share context, accounts, or permissions. Watch the optional "review gate" in particular, since a Claude/Codex loop can drain Codex limits fast.

Why don't my /codex:* commands appear after step 2?

You most likely stopped at the two install commands and skipped /reload-plugins. Adding the marketplace and installing the plugin registers it, but the running session has not yet loaded the new skills. Run /reload-plugins, then /codex:setup, and the /codex:* slash commands plus the codex:codex-rescue subagent will show up in /agents . This is why the "two-command" framing is really four.

What is the difference between /codex:review and /codex:adversarial-review?

/codex:review runs a standard, read-only Codex pass over uncommitted changes or a branch diff against a base ref such as main, supporting --base, --wait, and --background . /codex:adversarial-review runs a steerable, skeptical pass that actively questions design decisions and assumptions, using the same target selection plus optional focus text. Reach for the adversarial pass when you want a pre-merge second opinion, not a routine change summary.

Is codex-plugin-cc the same as the "OpenAI Developers" plugin for Claude Code?

No. openai/openai-developers-for-claude is a separate plugin that bundles a Docs MCP server and API-setup skills, while openai/codex-plugin-cc exposes the Codex coding agent itself . They live in different repositories and serve different purposes; installing one does not give you the other. If your goal is code review and task delegation, codex-plugin-cc is the one you want.

Watch / Sources

LongCat-Video-Avatar 1.5 cuts inference to 8 steps — here's

Creeta — Tue, 21 Jul 2026 21:52:07 +0000

Meituan's LongCat team shipped a quiet but meaningful update to its open-source talking-avatar stack: version 1.5 swaps the audio encoder, distills sampling down to 8 steps, and adds an INT8 path to fit the model on tighter GPUs. Here's what actually changed under the hood, and which flags you now have to pass.

LongCat-Video-Avatar 1.5: DMD2 distillation, Wav2Vec2 replacement, and INT8 offloading

LongCat-Video-Avatar 1.5 is an audio-driven talking-avatar generator that turns one portrait plus an audio clip into a lip-synced video with head motion, expression, and body dynamics. Released May 21, 2026, it is an avatar head built on the 13.6-billion-parameter dense LongCat-Video diffusion transformer [base model, Oct 2025]. The headline change: v1.5 replaces the Wav2Vec2 audio encoder used in v1.0 (Dec 16, 2025) with Whisper-Large-v3, which the team attributes to smoother, more natural lip dynamics.

Three new flags define the v1.5 workflow, none of which exist in v1.0:

Flag	What it does	Notes
`--use_distill`	Enables DMD2-based step distillation, collapsing generation to 8 NFE	Mandatory for v1.5 — omitting it falls back to the full-chain v1.0 sampling schedule, not an 8-step path
`--use_int8`	Loads the 13.6B dense DiT in INT8 to cut VRAM pressure	Main lever for consumer GPUs
`--resolution`	Selects 480P or 720P output	New in 1.5

Under the hood, the technique report describes Cross-Chunk Latent Stitching, which removes redundant VAE decode/encode cycles between autoregressive chunks, enabling seamless minutes-long generation without re-encoding overhead [arXiv:2605.26486]. That matters for long-form output where earlier tools drift or stutter at chunk boundaries.

One caveat worth flagging before you invest a GPU-hour: Meituan claims parity-or-better results versus HeyGen, Kling Avatar 2.0, and OmniHuman-1.5 across 508 image-audio pairs, 770 crowdsourced evaluators, and 13,240 judgments [human-eval benchmark]. That evaluation is entirely author-controlled and reported as win-rates; no independent leaderboard corroborates the superiority claim yet. Treat it as vendor evidence, not a settled result. Credit to the source video that surfaced this release: YouTube walkthrough.

Preparing your environment for LongCat-Video-Avatar 1.5: PyTorch 2.6.0, CUDA 12.4, FlashAttention-2

Setup begins by cloning the base repository — the avatar is a head on top of the LongCat-Video foundation model, so you install that first and add the avatar layer on top. Create a Python 3.10 virtual environment before anything else; the pinned FlashAttention-2 wheel is not tested against 3.11+, and version drift there is a common early failure.

The dependency the setup most often breaks on is FlashAttention-2 v2.7.4.post1, a hard requirement . Install it from the wheel that matches CUDA 12.4 against PyTorch 2.6.0 — a CUDA mismatch with the default pip build is the single most frequent install error. Then install both requirements files: requirements.txt for the base model and requirements_avatar.txt for the avatar path, which pulls librosa and expects ffmpeg on the system .

Pull the weights with huggingface-cli download meituan-longcat/LongCat-Video-Avatar-1.5 . Note the DMD2 distill LoRA — roughly 1.26 GB in the Comfy-packaged form — is a separate artifact and is required for the 8-NFE distilled path, not an optional extra.

The reference launch uses torchrun --nproc_per_node=2 with --context_parallel_size=2, i.e. two GPUs in context-parallel . Single-GPU runs are feasible by adding --use_int8 to cut VRAM on the 13.6B dense base, but that is not the configuration the official docs test — expect to tune offloading yourself.

Producing a talking-head avatar with LongCat-Video-Avatar 1.5: torchrun, AT2V, and DMD2

LongCat-Video-Avatar 1.5 exposes four task modes, and picking the right one decides how tightly identity is preserved. AT2V (Audio-Text-to-Video) generates a talking clip from an audio track plus a text prompt with no reference image. ATI2V/AI2V add a portrait (Audio-Text-Image or Audio-Image-to-Video), which routes through Reference Skip Attention to hold the face steady without copy-paste artifacts or identity drift . Prefer ATI2V or AI2V when you have a source photo and care about consistency. The fourth mode is audio-conditioned continuation, covered below.

A single-speaker AT2V run looks like this — the snippet is verified and prints the exact upstream CLI it wraps rather than executing the model:

"""Minimal LongCat-Video-Avatar 1.5 8-step inference demo.

This prints the distilled inference settings and the matching upstream CLI.
Running the real model requires downloading the weights and LongCat-Video repo.
"""

MODEL = "meituan-longcat/LongCat-Video-Avatar-1.5"


def main():
    steps = 8
    command = [
        "torchrun",
        "--nproc_per_node=2",
        "run_demo_avatar_single_audio_to_video.py",
        "--context_parallel_size=2",
        "--checkpoint_dir=./weights/LongCat-Video-Avatar-1.5",
        "--stage_1=at2v",
        "--input_json=assets/avatar/single_example_1.json",
        "--use_distill",
        "--model_type=avatar-v1.5",
        "--use_int8",
    ]
    print(f"{MODEL} uses distilled inference: {steps} steps")
    print(" ".join(command))


if __name__ == "__main__":
    main()

Both --use_distill and --model_type avatar-v1.5 are mandatory for the 8-step (8-NFE) distilled path . Set Audio CFG in the 3–5 range per the official docs: below 3 under-animates the mouth, above 5 over-exaggerates it .

For two speakers, run run_demo_avatar_multi_audio_to_video.py. Use --audio_type para for equal-length clips (parallel merge, summed) or --audio_type add for unequal clips sequenced with silence padding; person1/person2 ordering is positional . In parallel-merge mode, L-ROPE-based audio-visual binding supplies spatial speaker separation so voices do not bleed identity between the two faces .

Video continuation feeds a reference clip plus new audio to extend an existing segment. Here Cross-Chunk Latent Stitching removes the redundant VAE decode/encode cycle between autoregressive chunks, so the boundary seam stays clean without a re-encode step .

ComfyUI packaging and workarounds for LongCat-Video-Avatar 1.5

If the raw torchrun path is too heavy, two community ComfyUI routes wrap LongCat-Video-Avatar 1.5 — neither maintained by Meituan. The dedicated smthemex/ComfyUI_LongCat_Avatar node clones into ComfyUI/custom_nodes and pulls the INT8 DiT, a umt5_xxl_fp8_e4m3fn_scaled text encoder, the Whisper-large-v3 audio encoder, the LongCat_Avatar_1.5_vae, a distill LoRA, and a Kim_Vocal_2.onnx vocal separator; it is tested for single- and dual-person scenarios . The second route is Kijai's ComfyUI-WanVideoWrapper, whose LongCat model folder ships a LongCat-Avatar-15_bf16.safetensors at 31.7 GB plus a 1.26 GB distill LoRA.

Watch for packaging churn. Older tutorials point at a longcat_avatar branch of WanVideoWrapper, while current v1.5 support has moved to the main branch — always check a guide's commit date before following it. On licensing, the MIT terms cover weights and code but explicitly exclude trademark and patent rights to the "Meituan" and "LongCat" names : commercial use of the model is allowed; using the brand names is not.

Two gaps remain before production. There is no hosted inference provider on Hugging Face — roughly 1,581 downloads last month and 44 community Spaces — so a local GPU is mandatory; HeyGen at $0.05/sec for a 720p Photo Avatar stays the zero-GPU alternative. And the project ships no consent workflow, rights management, or brand-kit tooling — governance for generated likenesses must be built entirely outside it. The concrete takeaway: LongCat-Video-Avatar 1.5 gives you open, inspectable, multi-person avatar generation for the cost of a GPU, but you own every guardrail it doesn't provide.

Frequently asked questions

What GPU do I need to run LongCat-Video-Avatar 1.5?

The reference configuration is multi-GPU: the official avatar command launches with torchrun --nproc_per_node=2 and --context_parallel_size=2. Because the base is a 13.6-billion-parameter dense diffusion transformer and Kijai's bf16 build weighs about 31.7 GB, a realistic floor is two 24 GB cards or a single 40 GB+ card with --use_int8. A single consumer GPU with INT8 quantization and offloading is possible, but it sits outside the officially tested two-GPU config, so expect slower runs and manual tuning.

What changed between LongCat-Video-Avatar v1.0 and v1.5?

v1.0 shipped December 16, 2025 with a Wav2Vec2 audio encoder; v1.5 landed May 21, 2026. The substantive changes: the audio encoder moves from Wav2Vec2 to Whisper-Large-v3 for smoother lip dynamics, DMD2-based step distillation collapses inference to 8 steps via the --use_distill flag, INT8 loading arrives through --use_int8 to cut VRAM, --resolution exposes 480P/720P output, and Cross-Chunk Latent Stitching improves long-video temporal stability.

What does the --use_distill flag do and why is it required for v1.5?

--use_distill activates the DMD2 distillation LoRA that collapses diffusion sampling to 8 NFE (8 steps). Without it, the v1.5 weights fall back to the full-chain sampling schedule inherited from v1.0 — significantly slower and not the intended v1.5 inference path. The flag is why the distilled route is 8 steps rather than the dozens a standard diffusion schedule would run, so it belongs in every v1.5 avatar command.

Is LongCat-Video-Avatar MIT-licensed for commercial use?

Yes. The weights, code, and contributions are distributed under the MIT license, which permits commercial use of the model itself. The one carve-out is that the MIT terms explicitly exclude trademark and patent rights to the "Meituan" and "LongCat" names — so you can build commercial products on the model, but you cannot use those brand names as your own. As always, verify the license text against your specific deployment before shipping.

How does LongCat-Video-Avatar 1.5 compare to HeyGen for talking avatars?

LongCat is self-hosted and MIT-licensed, costs $0 per video once you own the GPU, and natively supports dual-speaker scenes and audio-conditioned video continuation. HeyGen is turnkey SaaS priced around $0.05/sec for a 720p Photo Avatar, with a consent video required for every video-based Digital Twin and production governance built in. LongCat ships none of that tooling — no consent workflow, rights management, captions, or brand kits. Choose LongCat for control, privacy, and marginal cost; choose HeyGen for governance, reliability, and support.

MengTo/Skills has no tagged releases — pin the commit

Creeta — Tue, 21 Jul 2026 15:49:05 +0000

MengTo/Skills is a public, MIT-licensed folder of file-based "Agent Skills" — but the star metric everyone quotes, the skill count, keeps moving. Before you clone it, it helps to know exactly what is in the repo today and why the tally floats.

What MengTo/Skills actually contains: 78 MIT modules and a floating tally

MengTo/Skills is a public GitHub repository of reusable, file-based prompt and workflow packages built by Meng To, the designer behind Design+Code. As of July 2026 the live README reports 78 skill folders — even though circulated figures of 95 (marketing snapshots) and 75 (launch-era counts) still float around. The count is unstable by design, so the README names its own source of truth: run find agent-skills -name SKILL.md | sort against a cloned copy to get an authoritative number for the commit you actually have.

Quick Answer: MengTo/Skills is an MIT-licensed GitHub repo of file-based Agent Skills. The live README shows 78 skill folders as of July 2026; ignore the circulated 95 and 75 figures and count them yourself with find agent-skills -name SKILL.md | sort.

Those 78 modules split across four categories, heavily weighted toward visual web work:

Category	Skills	Representative content
Web design	63	GSAP, ScrollTrigger, Three.js/WebGL, Tailwind, Lenis, Cobe, Vanta, Matter.js, glass UI, skeuomorphic UI
Codex workflows	12	video-to-superprompt, html-to-interaction-prompts, stitched-full-page-capture
Media sourcing	2	Image/asset acquisition helpers
UI prompting	1	design-first-ui-prompting system

The repo state at research time: roughly 2,500 stars, 271 forks, 85 commits, 2 open issues, and 0 tagged releases under an MIT license (MengTo/Skills). That zero-releases detail matters later: with no tags, reproducible installs depend on pinning a commit.

Each module follows the emerging Agent Skills convention: a required SKILL.md holding YAML frontmatter (name, description) plus a step-by-step body, with optional REFERENCES.md, ARTICLE.md, and a demo/ subdirectory of standalone HTML/CSS/JS (agent-skills tree). Because the folders already match this layout, they drop into Claude Code, Cursor, and Codex without conversion.

Getting a reproducible copy: npx vs. manual directory placement

There are two ways to pull the modules into your agent: an automated CLI install, or manual directory placement. The fastest path is the community skills npm CLI, which deposits each folder into the agent's expected skills directory for you. At crawl time the package sat at version 1.5.19, with 86 published versions and 58 dependents, and supports Claude Code, Codex, Cursor, OpenCode, and Kiro among others . A single command targets multiple agents at once:

npx skills add MengTo/Skills --skill '*' --agent claude-code --agent cursor

The manual route gives you tighter control over which of the 78 folders land where. Clone the repo, then copy or symlink the folders you want. For Claude Code, project-scoped skills live at .claude/skills/<name>/SKILL.md and personal skills at ~/.claude/skills/<name>/SKILL.md . Cursor adopted the same SKILL.md layout in version 2.4, shipped January 22, 2026, so the folders drop into either tool without conversion .

The reproducibility catch is that MengTo/Skills has no tagged releases . Whichever install path you pick, capture the commit SHA immediately after cloning so teammates can rebuild the exact folder set later:

git -C MengTo-Skills rev-parse HEAD

Store that hash in your dotfiles or the project README. The illustrative Python below (not executed here — it needs network access to api.github.com) shows the same logic programmatically: check for tags, and when there are none, print a pinnable commit reference.

import json
import urllib.request

repo = "MengTo/Skills"
api = f"https://api.github.com/repos/{repo}"
headers = {"Accept": "application/vnd.github+json", "User-Agent": "pin-commit-demo"}


def get(path):
    req = urllib.request.Request(api + path, headers=headers)
    with urllib.request.urlopen(req, timeout=20) as response:
        return json.load(response)


info = get("")
tags = get("/tags")
branch = get(f"/branches/{info['default_branch']}")
sha = branch["commit"]["sha"]

print(f"{repo} tags: {len(tags)}")
if tags:
    print(f"tag available: {tags[0]['name']}")
else:
    print("no tags; pin the commit")
    print(f"git+https://github.com/{repo}.git@{sha}")

Decide placement by intent. Personal design taste — animation style, motion preferences, visual defaults — belongs under the user-level ~/.claude/skills folder, where it follows you across projects. Shared team conventions belong in .claude/skills committed to the project repo, so everyone resolves the same skills from the same pinned commit .

Reproducibility concerns: YAML divergence and no lockfile

The pin matters because MengTo/Skills ships no lockfile and no tagged releases, so two teammates cloning main on different days can resolve different folder sets. The catalog tally moved at least three times during research — a circulated 95, a launch-time 75, and the current README's 78 — which is exactly the drift a commit SHA neutralizes. Treat the SHA as the functional equivalent of a lockfile: record it, commit it, and re-audit the delta with find agent-skills -name SKILL.md | sort before you bump it.

The second gap is YAML portability. The SKILL.md frontmatter is baseline-compatible everywhere, but platform-specific keys are not universally honored. Claude Code reads fields like context: fork, allowed-tools, and disable-model-invocation; Cursor — which added Agent Skills in version 2.4 on January 22, 2026 — and Codex silently ignore them. No error is thrown. A skill that relies on context: fork for isolation simply runs inline in the wrong host, so a workflow that behaved on one machine can quietly misbehave on another with no diagnostic.

The repo's own README frames dedicated per-host install notes as a maintenance idea, not a stabilized official setup, so the burden of validation sits with you [repo]. Before committing anything to shared dotfiles, load each folder in the actual target agent and confirm invocation, tool access, and any fork/isolation behavior resolve as expected. Validate first, pin the commit, then propagate — reversing that order is how silent no-ops reach production.

Noteworthy MIT modules: motion, interaction, and landing-builder packs

Once your copy is validated and pinned, a handful of modules justify the effort on their own. The most technically demanding is video-to-superprompt, the flagship Codex workflow: it tells the agent to locate a video, run ffprobe, extract frames with ffmpeg, then analyze story, layout, motion, visual, and technical layers before naming concrete motion mechanisms — pinned sections, masks, shader fields, scrubbed timelines, and reduced-motion fallbacks — and verifying paths and screenshots before it finalizes a reusable prompt pack . It is less a prompt than an operational pipeline.

Three more packs are worth pinning:

landing-page — collects one primary CTA, offer, ICP, objections, proof, traffic source, brand voice, and mobile constraints, then outputs a page outline, hero copy, benefits, how-it-works, FAQ, and an SEO/AEO layout recommendation . It is the clearest expression of the repo's "prompts as assets" stance: store prompts as versioned files and build them into libraries.
design-first-ui-prompting — formalizes the "prompt like a design system" approach: specify goal, format, margins, grid, hierarchy, type, colors, materials, imagery, exact copy, and negative prompts, then iterate by changing one variable at a time rather than rerolling blindly .
html-to-interaction-prompts — decomposes existing HTML into reusable prompts for sections, hover states, animations, buttons, and WebGL effects, which is useful for reverse-engineering a live UI into agent-ready instructions .

Meng To frames the underlying method plainly: "screenshots convey fonts, spacing, colors, layout rhythm, and icon style better than paragraphs" — Meng To, Design+Code (source: MengTo/Skills). The takeaway: pin a commit, start with these four modules, and adapt the frontmatter to your target agent — the packs are concrete enough to use as-is, but the repo carries no releases, so reproducibility is still your responsibility.

Frequently asked questions

How many agent skills does MengTo/Skills actually contain?

As of July 2026, the live README reports 78 skills across four categories . Circulating figures of 95 and 75 are outdated directory snapshots. The count moves through direct commits, so don't trust any quoted number — clone the repo and run find agent-skills -name SKILL.md | sort, which the README itself names as the source of truth.

Why does MengTo/Skills have no tagged releases?

The repo has grown entirely through direct commits with no formal versioning — roughly 85 commits and 0 release tags at research time . The SKILL.md file format is stable, but the folder inventory is not versioned. That gap is why reproducible installs require pinning a specific commit SHA rather than relying on a tag.

How do I install MengTo/Skills into Claude Code?

Two paths work. The fastest is the skills npm CLI: npx skills add MengTo/Skills --skill '*' --agent claude-code . Alternatively, copy selected folders manually to .claude/skills/<name>/SKILL.md (project scope) or ~/.claude/skills/<name>/SKILL.md (personal scope). Because there are no releases, capture the git SHA immediately so your install is reproducible.

Will these skill folders work in Cursor and Codex?

Yes, at the baseline SKILL.md level. Cursor added Agent Skills in version 2.4, released January 22, 2026 , and Codex supports the convention, so folders drop in without conversion. But both silently ignore Claude-specific YAML fields like context: fork and allowed-tools. Test your target agent before depending on those fields.

What is the Agent Skills format and who defined it?

Agent Skills is an open convention formalized by Anthropic: a directory containing a required SKILL.md file with YAML frontmatter (name, description) plus a step-by-step instruction body . It relies on progressive disclosure — agents load names and descriptions at startup, full instructions only on activation, and supporting resources only as needed. Optional companions include REFERENCES.md, ARTICLE.md, and a demo/ folder.

LingBot-Map runs 10,000 frames on monocular video — no

Creeta — Tue, 21 Jul 2026 09:54:46 +0000

Ant Group's embodied-AI unit open-sourced a monocular 3D reconstruction model that keeps mapping while the video is still streaming — no LiDAR, no offline batch pass. The interesting part is the memory design that lets it run past 10,000 frames without the KV cache exploding.

How GCT's Anchor-Pose-Trajectory Pools Make Monocular 3D Tractable

LingBot-Map is a streaming, feed-forward model that takes an ordinary RGB video stream and estimates camera pose plus scene geometry frame by frame, released by Robbyant under Apache 2.0 on April 16, 2026. It processes monocular input at roughly 20 FPS at 518×378 using a paged KV-cache via FlashInfer, versus 10.5 FPS for a comparable PyTorch contiguous-cache baseline on 1,000-image sequences. That throughput is what makes "see-as-you-go" reconstruction usable instead of a lab demo.

The mechanism behind it is the Geometric Context Transformer (GCT) and its Geometric Context Attention (GCA). Rather than run full causal attention over every historical image token, GCA maintains three pools:

Anchor context — grounds coordinate frame and scale.
Local pose-reference window — recent dense geometry, sampled from 16 to 64 frames during training.
Compressed trajectory recap — a long-range memory for drift correction.

Evicted frames keep only compact context tokens. The paper reports this cuts per-frame context growth by about 80× versus full causal attention under the authors' typical token settings — the reason 10,000-frame inference runs without KV blowup. The accuracy trade-off is small: author-reported numbers (arXiv 2604.14141, not yet independently replicated) show it holding up under stress.

Benchmark	LingBot-Map	Comparison
Oxford Spires sparse (ATE)	6.42 m	DA3 12.87 m · VGGT 24.78 m
3,840-frame dense (ATE drift)	6.42 → 7.11	CUT3R 18.16 → 32.47 · WinT3R 21.10 → 32.90

Source: arXiv 2604.14141. Seed demonstration: Robbyant demo video.

What to Install: PyTorch 2.8.0, CUDA 12.8, and Optional FlashInfer

The base environment is narrow and version-pinned: Python 3.10, PyTorch 2.8.0 with torchvision 0.23.0, and CUDA 12.8, then pip install -e . from the cloned repo root . That covers pose and point-cloud output on its own. Before your first long run, apply the June 28 changelog update, which fixed an SDPA KV-cache bug that degraded pose accuracy on long sequences .

FlashInfer is optional. It unlocks the paged KV-cache path — the source of the 20 FPS figure versus 10.5 FPS for a contiguous-cache baseline . If it is absent, pass --use_sdpa to fall back to scaled dot-product attention .

Three checkpoints ship on Hugging Face (robbyant/lingbot-map): lingbot-map, the balanced paper and benchmark baseline; lingbot-map-long, tuned for extended large scenes; and lingbot-map-stage1, a stage-1 training checkpoint for research and ablation only . The [vis] and [vis,render] extras additionally require Open3D, ffmpeg, and NVIDIA Kaolin pinned to the same PyTorch/CUDA build — safe to skip unless you need rendered output .

Cloning and Installing: From pip install to a Completed Reconstruction

Clone the repository and install in editable mode to reach a working reconstruction in one pass: git clone https://github.com/Robbyant/lingbot-map, then pip install -e .[vis,render] to pull in the browser-based viser viewer, which serves on localhost:8080 by default . Run inference with demo.py --image_folder <dir> or demo.py --video_path <file>; the model streams frame by frame through its KV cache rather than loading a full set upfront .

For longer clips, add --keyframe_interval <N> to anchor KV state at regular intervals. Because the Video RoPE is trained only up to 320 views, quality drops past that boundary unless you switch to windowed mode: --mode windowed --window_size 128 --overlap_keyframes 8 is the documented path around that limit . Outdoor sequences benefit from --mask_sky, which runs ONNX sky masking cached per image folder. For rendered output, batch_demo.py writes MP4s along YAML-defined camera paths — the same pipeline behind the roughly 25,000-frame (~13-minute) indoor walkthrough demo released on April 29, 2026 .

On VRAM: secondary coverage cites about 13 GB for the standard path . Memory-limited GPUs should add --offload_to_cpu and reduce --num_scale_frames; a community fork has confirmed the base checkpoint fitting on an 8 GB RTX 4060 with those flags — unofficial, and not in the README spec table .

The snippet below is an illustrative stand-in — not LingBot-Map's real tracker — but it ran end to end (exit 0) to show the frame-by-frame streaming shape the CLI follows over a 10,000-frame monocular stream:

class LingBotMap:
    def __init__(self):
        self.pose_x = 0.0
        self.keyframes = 0

    def update(self, gray_frame):
        # Tiny stand-in for monocular tracking: use image intensity as motion.
        mean = sum(map(sum, gray_frame)) / (len(gray_frame) * len(gray_frame[0]))
        self.pose_x += (mean - 127.5) / 255.0
        self.keyframes += abs(self.pose_x) > self.keyframes


def monocular_video(n, h=12, w=16):
    for i in range(n):
        yield [[(x * 7 + y * 11 + i) % 256 for x in range(w)] for y in range(h)]


slam = LingBotMap()
for frame in monocular_video(10_000):
    slam.update(frame)

print(f"LingBot-Map processed 10000 monocular frames; keyframes={slam.keyframes}, pose_x={slam.pose_x:.2f}")

Author-Admitted Constraints: Loop Closure Unimplemented, Orin Unsupported

Before you wire LingBot-Map into a production pipeline, read the paper's own limitations section. The April 2026 technical report (arXiv 2604.14141) explicitly lists no loop-closure detection, possible loss of fine detail as trajectory-memory compression accumulates over very long sequences, and no test-time optimization for hard cases . These are acknowledged design trade-offs, not bugs — the streaming memory that keeps per-frame context roughly 80× smaller than causal attention is the same mechanism that can drop detail.

The issue tracker confirms the gaps in practice. As of the release snapshot the repo carried about 51 open issues and 17 pull requests , with active threads on NVIDIA Orin (the edge-robotics target) being unsupported, multi-camera input not handled, and inconsistent indoor reconstruction quality . Two things are worth checking before you build:

demo.py is not truly incremental. The stock script preloads all frames before processing; a community fork adds demo_live.py for frame-by-frame streaming — inspect it if you need a live-camera path .
Benchmarks are author-reported. The ATE and F1 figures come from arXiv 2604.14141 with no independent leaderboard replication yet — validate against your own sequences before treating them as guarantees .

Windowed Inference, Keyframe Intervals, and the Robbyant Embodied Stack

The most useful next experiment is a checkpoint swap: on any sequence over roughly 500 frames, run lingbot-map-long in place of the base lingbot-map and compare ATE. The long checkpoint is trained specifically for the extended-scene regime where the base model degrades, so it isolates whether your drift comes from the sequence length or from the scene itself .

Beyond one model, LingBot-Map is positioned as the spatial-perception backbone of Robbyant's embodied stack, alongside LingBot-Depth, LingBot-VLA, LingBot-World, and LingBot-VA. Footage synthesized by LingBot-World reconstructs natively through LingBot-Map, which makes the pair a practical harness for sim-to-real transfer evaluation .

Finally, track the changelog: --compile acceleration landed April 27, a FlashInfer KV-cache fix April 24, a KITTI/Oxford Spires evaluation suite May 25, and an SDPA KV-cache long-sequence fix June 28 . All four landed after arXiv 2604.14141, so the paper's reported numbers do not reflect them — benchmark against the current main, not the PDF.

Frequently asked questions

What GPU VRAM does LingBot-Map require for standard inference?

Secondary technical coverage cites a footprint of roughly 13.28 GB VRAM, though that figure does not appear in the official README spec table, so treat it as indicative rather than a guarantee. In practice, the community has run the model on an 8 GB RTX 4060 using --offload_to_cpu and a reduced --num_scale_frames. Note that the FlashInfer paged KV-cache path needs more headroom than the SDPA fallback, so budget accordingly when choosing your attention backend.

What is the difference between the three published checkpoints?

Robbyant publishes three checkpoints. lingbot-map is the balanced baseline for benchmarks and general use. lingbot-map-long is tuned for extended scenes and large-scale outdoor sequences — reach for it on anything beyond roughly 500 frames. lingbot-map-stage1 is an intermediate stage-1 training checkpoint intended for research and ablation only, and is not recommended for production inference.

Does LingBot-Map support loop closure or relocalization?

No. Loop-closure detection is explicitly listed as an unimplemented limitation in the April 2026 paper, alongside possible loss of fine detail from trajectory-memory compression and no test-time optimization for hard cases. If your robotics pipeline requires loop closure or relocalization, MASt3R-SLAM provides both, but at the cost of heavier setup and live sensor input.

Is FlashInfer required, or will SDPA work for long sequences?

FlashInfer is optional. The SDPA fallback runs with the --use_sdpa flag, but it previously carried a KV-cache bug that degraded long-sequence pose accuracy. That bug was fixed in the June 28, 2026 changelog update, so apply the current main before benchmarking the SDPA path. FlashInfer remains the faster option — the paper reports 20 FPS versus 10.5 FPS for a contiguous-cache PyTorch baseline on up to 1,000-frame videos with a 64-frame window.

How does LingBot-Map's memory design compare to CUT3R or WinT3R?

CUT3R (CVPR 2025) is an online recurrent-state model that handles dynamic scenes but acknowledges linear memory growth. WinT3R uses a sliding window plus a global camera-token pool. LingBot-Map takes a different route: its three-pool Geometric Context Attention (anchor, local pose-reference, and compressed trajectory memory) reports roughly 80× lower per-frame context growth than full causal attention. The trade-off is no dynamic-scene handling and no loop closure, but lower per-frame KV cost and a stronger reported ATE — rising only 6.42 to 7.11 on the 3,840-frame stress test where CUT3R climbs 18.16 to 32.47.

36 languages parsed locally — Graphify needs no model call

Creeta — Tue, 21 Jul 2026 03:49:37 +0000

Every fresh Claude Code session on a large repo starts by rebuilding a mental model that vanished when the last one ended — grepping, re-reading files, and burning tokens on orientation. Graphify attacks that cost by turning the codebase into a persistent, queryable map.

AST hops vs. text lookup: Graphify's persistent map

Graphify is an open-source, MIT-licensed tool that indexes a codebase into a persistent local knowledge graph, so an assistant like Claude Code navigates by reference instead of re-reading raw files on every prompt . Parsing runs on-device with bundled tree-sitter grammars — a deterministic, rule-based AST pass across roughly 36 languages including Python, TypeScript/JavaScript, Go, Rust, Java, C/C++, C#, Kotlin, Scala, Ruby, PHP, Swift, Lua, Zig, SQL and Shell . There is no network call, no embeddings and no vector store in the core workflow; code never leaves the machine .

Quick Answer: Graphify parses ~36 languages locally with tree-sitter — zero model calls, no vector store — building a typed graph of functions, files and tables. The current release, v0.9.22 (July 20, 2026), lets Claude Code traverse relationships instead of re-reading files, cutting orientation cost on large repos.

The graph models nodes as functions, classes, files, database tables and docs, connected by typed directed edges — calls, imports, defines, references. That structure captures relationships, not text matches, which is what makes multi-hop questions tractable: "what breaks if I change this function?" or "trace checkout to the payments table." Plain grep returns raw hits scattered across files; a graph returns the relationship chain .

Each /graphify . run writes three local artifacts: GRAPH_REPORT.md (god nodes, subsystems, surprising connections, suggested questions), an interactive force-directed graph.html, and a GraphRAG-ready graph.json . The maintainers, Graphify Labs, ship on PyPI under the deliberately doubled-y name graphifyy, while the CLI stays graphify .

uv or pipx: the bootstrap sequence

Install from PyPI under the doubled-y name graphifyy — the README explicitly warns that every other graphify* package is unaffiliated, even though the installed CLI binary stays graphify . The recommended path is uv tool install graphifyy (or pipx install graphifyy), then graphify install to register the /graphify skill inside Claude Code and any other detected assistant integrations [1][4]. You need Python ≥3.10; the current release is v0.9.22, published July 20, 2026 [2][4].

No API key is required for the 36-language AST pass — parsing runs locally with bundled tree-sitter grammars and never leaves the machine . Only non-AST inputs (PDFs, video, images) optionally call a configured backend to add INFERRED edges; skip the key and those edges are simply absent . This verified snippet illustrates the local-parse contract — it ran with zero model calls:

"""Minimal local parsing demo: Graphify-style indexing needs no model call."""

import re

LANGUAGES = """
bash c c_sharp cpp css dart elixir go graphql haskell html java javascript
json julia kotlin lua markdown php python ruby rust scala sql swift toml tsx
typescript yaml zig clojure erlang ocaml perl solidity vue
""".split()


def parse_locally(language: str, source: str) -> dict:
    """Tiny stand-in for Graphify's local parser pass: tokenize, don't call APIs."""
    tokens = re.findall(r"[A-Za-z_][\w#-]*|[{}()[\].,;:=<>/+*-]", source)
    return {"language": language, "tokens": len(tokens), "symbols": tokens[:4]}


sample = "function f(x) { return x + 1; }"
graphs = [parse_locally(lang, sample) for lang in LANGUAGES]

assert len(graphs) == 36
assert not any(k in globals() for k in ("openai", "anthropic", "requests"))

print(f"{len(graphs)} languages parsed locally")
print("model calls: 0")
print("first:", graphs[0])

From dot to insight: /graphify . and five navigator verbs

Once the skill is registered, the entire workflow starts with a single command inside Claude Code: /graphify .. Tree-sitter scans and parses the repo locally, then writes three artifacts into a graphify-out/ directory — a plain-language GRAPH_REPORT.md, an interactive force-directed graph.html, and a GraphRAG-ready graph.json . That JSON is the substrate the CLI and MCP server both query.

Five verbs cover most navigation. graphify query "AuthService" runs a breadth-first traversal at depth 3 by default, capped by a 2,000-token output budget so results stay context-window friendly . Optional flags reshape the walk: --dfs switches to depth-first, --depth accepts 1–6, and --edge-filter narrows to specific typed edges such as calls or imports .

The other verbs answer sharper questions. graphify path "AuthService" "payments_table" surfaces the shortest directed path between two named entities — useful for the multi-hop "how does checkout reach the database?" query that grep cannot answer . graphify explain <node> returns a plain-language summary of a symbol drawn from its graph neighborhood, and graphify god-nodes lists the highest in-degree hubs — the concepts everything flows through .

Worth noting for anyone scripting against older builds: god-nodes only became a real CLI subcommand in v0.9.22, released July 20, 2026 — it was absent before . Pin your version if you depend on it.

Stale maps, provenance labels, and the git antidote

The graph is a snapshot, not a live view — it goes stale the moment a file changes. Until you re-run /graphify ., the assistant navigates the last build and can be confidently wrong about symbols you just edited. The fix is graphify hook install, which adds a git post-commit hook that auto-rebuilds the graph on every commit . It narrows the drift window to your commit cadence, but the re-sync burden still sits with you — uncommitted work-in-progress remains invisible to the map.

Before acting on any edge, check its provenance tag. Every relationship is labeled EXTRACTED (AST-parsed, deterministic), INFERRED (model-resolved, e.g. dynamic dispatch or doc references) or AMBIGUOUS (partially unresolved, such as an overloaded method) . That lets you separate hard parsed facts from judgment calls — treat EXTRACTED edges as ground truth and verify the rest.

On savings, calibrate expectations. Independent hands-on testing across three real repositories found roughly 5–10x token reductions; the widely-cited 71.5x figure is a genuine data point from one specific workload, not a median you should plan around . As that write-up frames it, "the 71.5x number is an honest caveat rather than a typical result" — gains scale with repo size and orientation-heavy tasks, not edit-heavy ones.

Version note: v0.9.22 (July 20, 2026) fixed several indexing bugs — env, .env and *_env directories are no longer silently pruned as virtualenvs unless marker files exist; gdoc://, s3:// and http:// virtual source nodes survive a second update; and colliding file basenames now get unique path-suffix labels .

What to explore beyond the map: Leiden subsystems and PR triage

Once the graph exists, Graphify can run an MCP server over graph.json that exposes 10 tools — query_graph, get_node, get_neighbors, shortest_path, get_community, god_nodes, graph_stats, list_prs, get_pr_impact and triage_prs — over stdio for a solo assistant or Streamable HTTP for shared team access (source: Graphify, 2026-07). Leiden community detection groups the repo into labeled subsystems surfaced in GRAPH_REPORT.md, so the agent can call get_community to scope its reasoning to one subsystem instead of the whole graph (source: GitHub, 2026-07).

For PR-grade impact work, pair it with code-review-graph (pip install code-review-graph, MIT, by Tirth Kanani, published March 2026), which adds SQLite WAL persistence, embedding-aware semantic search and blast-radius analysis. Enterprise early access lists merge-gate verification, a graphify digest report and a Jira/Atlassian connector, but the core product stays $0, MIT-licensed, with no account and no node or repo limits (source: mejba.me, 2026-07). Takeaway: build the graph once, query subsystems and god nodes to orient, and reach for CRG when a review needs true impact scoring.

Frequently asked questions

Does Graphify ever send my source code to an external API?

For the roughly 36 AST-parsed languages handled by bundled tree-sitter grammars — Python, TypeScript/JavaScript, Go, Rust, Java, C/C++, C#, Kotlin, Scala, Ruby, PHP, Swift, Lua, Zig, SQL, Shell and others — no. Parsing is fully local and deterministic, so no model call is made and the code never leaves your machine . Non-AST inputs (video, audio, PDFs, images, Terraform, doc cross-references) get an optional semantic pass that may call whichever backend you configure to produce edges labeled INFERRED; skip the API key and those edges are simply omitted. Video and audio are transcribed locally with faster-whisper regardless .

Why is the PyPI package named graphifyy and not graphify?

The graphify name was already taken on PyPI, so the publisher shipped under the deliberately doubled-y variant graphifyy. The installed CLI executable is still graphify, so your commands don't change. The README warns explicitly that all other graphify* packages on PyPI are unaffiliated, which matters when you run uv tool install graphifyy or pipx install graphifyy .

How often should I rebuild the map?

After each meaningful commit. Running graphify hook install adds a git post-commit hook that rebuilds the graph automatically, which is the intended defense against drift . Without the hook, the graph reflects only your last /graphify . run and can hand the assistant a stale picture of recently changed symbols — the moment you extract, you hold two sources of truth, and a stale map can make the agent confidently wrong.

How does Graphify compare to a vector or embedding index?

They answer different questions. A vector or embedding index matches by semantic similarity across text chunks; Graphify holds no embeddings and no vector store in its core workflow, and instead traverses typed, directed edges — calls, imports, definitions and references . Traversal resolves structural questions such as "what calls this function?" or "what does this file import?" that similarity search cannot. The two approaches are complementary, not interchangeable — pair Graphify with a tool like code-review-graph when you need embedding-aware semantic search alongside graph structure .

What do EXTRACTED, INFERRED, and AMBIGUOUS mean on an edge?

They are provenance labels that tell you how far to trust an edge. EXTRACTED means the relationship was parsed deterministically from the AST and is explicit in source — certain. INFERRED means it was resolved by model judgment, for example a dynamic-dispatch call site or a documentation cross-reference. AMBIGUOUS means it is partially unresolved, such as an overloaded method with multiple candidate targets . Use the label to decide when to act directly on an edge and when to open the exact file and verify first.

OmniRoute's "unlimited free" claim — 90 providers, not 200

Creeta — Mon, 20 Jul 2026 10:00:32 +0000

The viral pitch is "200+ free AI providers." The actual number is closer to 90 — and it depends on which OmniRoute page you read.

How OmniRoute counts its 264 providers — and which 90 carry no charge

OmniRoute's README catalogs 264 total providers, of which 90+ carry a free tier and roughly 11 are labeled "free forever" . The often-repeated "200+ free" figure conflates two different numbers: the size of the full catalog and the subset that costs nothing. Those are separate counts, and OmniRoute's own surfaces don't even agree on the totals — the README says 264, the website says 236, and the auto-generated Provider Reference dated 2026-07-13 says 250 . The count shifts weekly as the v3.8.x changelog moves.

A short script makes the distinction concrete (this snippet was executed successfully):

import re

claim = "OmniRoute: 250 providers, 90+ free providers."
total, free = map(int, re.search(r"(\d+) providers, (\d+)\+ free", claim).groups())

print(f"catalog providers: {total}")
print(f"free-tier providers: {free}+")
print(f"paid/other providers: {total - free}+")
assert free == 90 and free != 200
print("OK: 'unlimited free' means 90+ free providers, not 200.")

Provider categories span OAuth, web-cookie, API-key, local, search, audio, cloud-agent, and system . Only the API-key and select OAuth providers expose a genuine no-cost path; the rest are paid backends, local models, or utility services. The ~11 "free forever" entries — Kiro, Qoder, Pollinations, LongCat, Cloudflare AI, NVIDIA NIM, and Cerebras among them — each ship hard per-day or per-month ceilings rather than open-ended throughput . So "unlimited free" is best read as bounded free: 90-odd tiers, each rate-limited, layered behind one router. The framing traces back to a YouTube walkthrough that headlined "200+ providers"; the repo's numbers tell a narrower story.

Pointing Claude Code at OmniRoute: three environment variables

OmniRoute installs as a global npm package and runs entirely on your machine, so wiring Claude Code to it takes one install and three environment variables. Install with npm install -g omniroute, then launch it by running omniroute; the process brings up a dashboard and an OpenAI-compatible API on port 20128, exposing a standard route at http://localhost:20128/v1 .

Claude Code supports proxy routing natively, so no patching is required — you point it at the local gateway. Per OmniRoute's CLI guide, export two variables so Claude Code sends traffic through the router instead of Anthropic's endpoint:

export ANTHROPIC_BASE_URL="http://localhost:20128"
export ANTHROPIC_AUTH_TOKEN="sk-your-omniroute-key"

OpenAI-compatible agents — Codex, Cursor, Cline, Continue and similar — use a different variable pointing at the versioned route:

export OPENAI_BASE_URL="http://localhost:20128/v1"

Both patterns come from OmniRoute's official CLI documentation . The auth token is your OmniRoute key, not an Anthropic key — the router decides which backend actually answers.

That decision is driven by virtual model IDs. Instead of naming a specific model, you can request auto, auto/coding, auto/fast, auto/cheap or auto/offline, and OmniRoute builds a routing chain from your active credentials, filtering candidates by capability and scoring them before dispatch . For staying on no-cost backends, auto/cheap and auto/offline are the conservative picks. If you want a stronger guarantee, release v3.8.46 — shipped 2026-07-07 — added a hidePaidModels option that strips paid-only candidates out of the auto/* pool, so a session won't silently fall through to a billed backend when free-tier quotas drain mid-run .

The quota math: Mistral contributes ~1B and everyone else is secondary

OmniRoute's headline "1.6 billion free tokens" is a rounded marketing figure derived from its own free-tier reference (version 3.8.40, last updated 2026-06-28), which calculates a defensible recurring grant of about 1.54B tokens/month across 40+ free-tier pools and 500+ models . First-month signup credits push the one-time total to roughly 2.15B . The document itself warns that a theoretical 24/7 rate-limit ceiling near 10B tokens/month is not a guarantee and should not be headlined .

The important detail for planning your usage: that grant is concentrated, not evenly spread. Mistral alone supplies about 1.00B tokens/month — roughly 65% of the headline figure — with everything else trailing far behind :

Mistral — ~1.00B tokens/month
LLM7 — ~150M/month
Gemini — ~60M/month
Cerebras — ~30M/month
Cloudflare AI — ~30M/month
Groq — ~15M/month

If Mistral tightens its free tier or your account loses eligibility, the recurring budget effectively collapses to a few hundred million tokens. Per-provider ceilings compound this: Kiro grants roughly 50 credits/month, Pollinations throttles anonymous use to about one request every 6–15 seconds, LongCat is a one-time 10M-token grant requiring KYC, and Cerebras caps at 30K tokens/minute with a 1M/day ceiling .

Several "free forever" entries — siliconflow, glm-cn, tencent and baidu — publish no token cap at all, but they are rate- and concurrency-limited, so their throughput is unquantifiable and excluded from the 1.54B calculation . Independent coverage makes the same point plainly: the token figures come from OmniRoute's own documentation, are not third-party audited, and free models "won't match" frontier models on hard agent work — the router stretches access, not capability . Treat 1.6B as a ceiling gated by one provider, not a floor you can bank on.

ToS fine print: which providers prohibit proxy harnesses by name

The legal exposure is where "unlimited free Claude Code" gets thinnest, because several of OmniRoute's highest-value free tiers restrict exactly this kind of routing. OmniRoute's own free-tier reference flags many providers as caution, avoid, or ambiguous rather than clearly permitted . Read the clauses before you point a team workload at them:

Kiro — its FAQ "explicitly prohibits use with OpenClaw and similar tools that leverage third-party harnesses," language that targets proxy routers like OmniRoute directly .
Mistral — the largest recurring token contributor limits its consumer API to "personal needs" and forbids sharing keys, which cuts against team or commercial routing .
Together — section 4.3(d) bars reselling or offering the service on a standalone basis .
SiliconFlow — clauses ban exposing the service to third parties .

OmniRoute states plainly that its ToS summary is "informational, not legal advice" and recommends verifying each provider independently . That disclaimer is the honest read: the router makes it trivial to fan a request across providers whose terms may prohibit proxying, and enforcement risk lands on your account, not the tool. Before routing commercial or team traffic, check the current terms yourself rather than trusting a summary that changes across the project's own surfaces.

Scoping to two confirmed free providers before enabling all ninety

Start with exactly two providers: Cerebras and Cloudflare AI. Both are documented as "free forever" in OmniRoute's own notes, both publish explicit rate limits — Cerebras caps at 30K tokens/minute with a 1M/day ceiling and contributes roughly 30M tokens/month, matched by Cloudflare AI — and neither carries a caution or proxy-ban label in the provider reference . That makes them the lowest-risk on-ramp before you widen the pool.

Set hidePaidModels before wiring up Claude Code. Release v3.8.46 (2026-07-07) added the flag so auto/* routing filters paid-only models out of the candidate pool. Without it, once a free-tier quota exhausts, the router can silently escalate to a paid backend — no warning is surfaced to the client, and the upstream charge lands on your key.

Prompt compression (the RTK Rust filter plus rule-based Caveman compaction) engages automatically. OmniRoute claims ~89% average reduction on tool-heavy sessions, but no independent benchmark confirms output holds up — verify quality against your own tasks before relying on it.

Keep the counts in perspective. The "unlimited free" framing means 90+ free-tier providers, not 200:

import re

claim = "OmniRoute: 250 providers, 90+ free providers."
total, free = map(int, re.search(r"(\d+) providers, (\d+)\+ free", claim).groups())

print(f"catalog providers: {total}")
print(f"free-tier providers: {free}+")
print(f"paid/other providers: {total - free}+")
assert free == 90 and free != 200
print("OK: 'unlimited free' means 90+ free providers, not 200.")

This snippet was executed (exit 0) and prints a 160+ paid/other remainder. Repo signal as of 2026-07-19 is strong — 18.7k stars, ~2.7k forks, 100K+ Docker Hub pulls, publishing at v3.8.49 — but provider counts and eligibility shift weekly. Treat 90 as a floor, not a guarantee: pin two known-free backends, gate paid escalation, measure compression on your real workload, then expand.

Frequently asked questions

Does OmniRoute give you free access to Claude specifically?

No. OmniRoute does not unlock Anthropic's models for free — it redirects the Claude Code CLI to other backends by overriding ANTHROPIC_BASE_URL, a proxy setting Anthropic's own docs confirm points Claude Code at a different endpoint . Free-tier requests are served by providers like Mistral, Gemini and Cerebras, so Anthropic's infrastructure is never involved. What you get is Claude Code used as a client pointed at non-Claude models, not free Claude .

How many free-tier providers does OmniRoute actually have?

Around 90+ providers carry a free tier, and roughly 11 are listed as "free forever" (Kiro, Qoder, Pollinations, LongCat, Cloudflare AI, NVIDIA NIM, Cerebras and others). That is not the same as the catalog total, which conflicts across surfaces — the README says 264, the website 236, and the auto-generated Provider Reference 250 (last generated 2026-07-13) . The viral "200+ free" figure conflates the full catalog with the free subset.

Is the 1.6 billion tokens/month figure accurate?

It is OmniRoute's own calculation, not independently audited. The free-tier reference (version 3.8.40, last updated 2026-06-28) puts the defensible recurring grant at about 1.54B tokens/month across 40+ pools, rounded to ~1.6B in marketing, with Mistral alone contributing roughly 1.00B of that total. First-month signup credits can push it to about 2.15B. OmniRoute's own docs warn that the theoretical 24/7 ceiling of ~10B tokens/month is not a guarantee and should not be headlined.

Could using OmniRoute violate a provider's terms of service?

Yes, for several named providers. OmniRoute's own documentation marks multiple high-value providers as caution or avoid: Kiro's FAQ explicitly prohibits use with third-party harnesses, Mistral's consumer ToS limits API use to "personal needs" and forbids key sharing, and SiliconFlow clauses bar exposing the service to third parties . The project stresses this is informational, not legal advice, so the compliance burden falls on you — check each backend's terms before routing production work through it.

What does the hidePaidModels option do, and when should I enable it?

Added in release v3.8.46 on 2026-07-07, hidePaidModels filters paid-only backends out of the auto/* routing candidate pool . Enable it whenever the goal is zero spend. Without it, auto routing can silently fall back to a paid provider once free quotas exhaust mid-session — the exact failure mode that turns a "free" run into an unexpected bill. With it on, a route that runs out of free capacity fails or stalls rather than quietly escalating cost.

Voicebox clones your voice in 3 seconds — but read the fine

Creeta — Sat, 18 Jul 2026 21:51:12 +0000

Comment "voice" on a YouTube vlog right now and you'll get a link to Voicebox — pitched as a free, local ElevenLabs alternative that clones a voice from three seconds of audio. The pitch is mostly accurate. The three-second number, and the licensing, come with an asterisk.

Voicebox v0.5, April 2026: Dictation, Personality Presets, and the Whisper Layer

Voicebox (repo jamiepine/voicebox) is a free, MIT-licensed desktop app that bundles seven text-to-speech engines, system-wide Whisper dictation, and a local REST/MCP surface into one local-first voice studio. It is unrelated to Meta's same-named 2023 research model, which Meta announced June 16, 2023 but declined to release over misuse risk .

The app is built by Jamie Pine, author of the open-source file explorer Spacedrive. Its first public release, v0.1.0, shipped January 27, 2026, and the current v0.5.0 "Capture" release landed April 25, 2026 . The homepage claims 1M+ downloads . Architecturally, it is a React frontend in a Tauri shell over a Python FastAPI backend on port 17493 .

Engine	Model size	Zero-shot cloning	Languages
Kokoro (82M)	~350 MB	No (presets)	English
Qwen3-TTS 0.6B	~1.2 GB	Yes	Multilingual
Qwen3-TTS 1.7B	~3.5 GB	Yes	Multilingual
Chatterbox Multilingual	~3.2 GB	Yes	23
Chatterbox Turbo	—	Yes	Multilingual
HumeAI TADA 3B	~8 GB	Yes	Multilingual
LuxTTS	—	Yes	Multilingual

One caveat before you build: the MIT license covers the wrapper app, not the bundled weights . Each upstream model — Alibaba's Qwen, Chatterbox, TADA, Kokoro — carries its own terms you must check separately.

Disk, RAM, and a Clean Microphone: Preparing Your Seed Recording

Before you clone anything, confirm your machine clears the bar. Voicebox lists a documented minimum of macOS 11+, Windows 10+, or Linux, plus 8 GB RAM, 5 GB storage, and a modern multi-core CPU; the recommended tier is 16 GB+ RAM, 10 GB+ storage, and a CUDA-capable NVIDIA GPU . GPU backends span NVIDIA (CUDA), AMD (ROCm), Intel Arc (DirectML), and a CPU fallback .

Budget time and disk for first launch: the chosen engine downloads its model automatically . Sizes range from Kokoro at ~350 MB to Chatterbox Multilingual ~3.2 GB and TADA 3B ~8 GB; Whisper tiers add Base ~300 MB, Small ~500 MB, Medium ~1.5 GB, Turbo ~1.5 GB, and Large ~3 GB . On Apple Silicon, MLX-Whisper is auto-preferred over PyTorch .

Linux users have no prebuilt binary yet, so run it via Docker : git clone the repo, docker compose up, then open http://localhost:17493. The container binds to 127.0.0.1:17493 and mounts persistent data plus a Hugging Face cache volume — but ships with no built-in auth beyond that localhost binding , so do not expose the port.

Cloning a Personality in Voicebox: the Numbered Procedure

Once Voicebox is running, cloning a voice is a six-step flow through the Profiles panel — no config files, no API keys. The steps below assume you already prepared a clean sample. Install and launch first, then work top to bottom.

Install for your platform. Grab the macOS DMG (Apple Silicon or Intel), the Windows MSI, or run the container on Linux, which has no prebuilt binary yet .
Launch and open the Profiles panel. This is where both cloned and preset voices live .
Create a Cloned profile and pick a cloning-capable engine. Zero-shot cloning works with Qwen3-TTS, Chatterbox Multilingual, Chatterbox Turbo, LuxTTS, or TADA. Kokoro (82M) is preset-only and does no cloning .
Upload or record your reference. The docs recommend a 10–30 second WAV of natural, complete sentences in a quiet room; WAV is preferred over MP3, M4A, or FLAC .
Generate a test phrase. Verify tone and clarity before committing the profile — quality varies by engine and sample.
Assign the personality to dictation or TTS output.

If you only want to compare engine behavior, skip the recording entirely: Voicebox ships 50+ preset voices that store no sample and point to built-in voices . Use one to hear how each engine renders speech, then record your own sample once you have picked the engine that sounds right.

What the Homepage Omits: Realistic Seed Expectations and TTS Obligation

The homepage's "a few seconds" is marketing shorthand, not the documented spec. Voicebox's own quick-start and profile docs recommend a longer, cleaner reference than the 3-second figure circulating in demos:

"10–30 seconds of clear speech with minimal background noise and natural, complete sentences," per the official profile documentation (source: Voicebox docs, 2026-04).

A 3-second clip may work with some zero-shot engines, but it sits below that range and amplifies sensitivity to noise, compression, and atypical delivery . For contrast, ElevenLabs Instant Voice Cloning recommends at least one minute — one to two ideal — on its Starter plan, and Professional Voice Cloning needs the Creator plan at $22/month plus roughly 30 minutes of audio for genuine fine-tuning, not runtime conditioning .

Two more items the download button glosses over. First, licensing: the MIT license covers the jamiepine/voicebox wrapper, not the bundled weights — Qwen, Chatterbox, TADA, and Kokoro each carry separate upstream terms you should verify before production use . Second, exposure: the backend binds to 127.0.0.1:17493 by default; remapping it to 0.0.0.0 removes the only barrier, because no credential system is built in . And on fidelity, no independent reproducible benchmark against ElevenLabs existed as of April 2026 — the "free local alternative" case rests on ownership and cost, not verified parity.

Keep consent explicit. The illustrative snippet below (not part of Voicebox, but it did execute) encodes the fine print as a hard gate:

"""Minimal Voicebox-style demo: 3 seconds in, but consent stays explicit."""


class Voicebox:
    def clone(self, voice_sample_seconds, text, consent=False):
        if voice_sample_seconds < 3:
            raise ValueError("Need at least a 3-second reference clip.")
        if not consent:
            raise PermissionError("Fine print: only clone a voice you own or have permission to use.")
        return f"[synthetic voice] {text}"


voicebox = Voicebox()
print(voicebox.clone(3, "This is a short consent-based voice clone demo.", consent=True))

After Cloning: Dictation Hotkey, Whisper Archiving, and Voicebox over HTTP

Once a personality exists, Voicebox's dictation layer turns it into a system-wide input device. Hold a rebindable keyboard chord in any application, speak, and release — the transcript auto-pastes into the field that had focus, with a floating pill showing the recording, transcribing, and refining states . An optional local Qwen LLM cleanup pass strips filler words, fixes punctuation, and removes the repeated loops Whisper sometimes emits; for clips under roughly 5 seconds, supplying a language hint improves accuracy .

Every capture is archived with both audio and transcript, so you can retranscribe the same clip on a different Whisper tier — Base through Large or Turbo — without re-recording .

For agent workflows, Voicebox runs a local MCP server at http://127.0.0.1:17493/mcp over HTTP and stdio, exposing four tools — voicebox.speak, voicebox.transcribe, voicebox.list_captures, and voicebox.list_profiles . Add that URL to the MCP server settings in Claude Code, Cursor, Windsurf, or Cline, and an agent can speak in a cloned personality .

The takeaway: Voicebox is less a single feature than a local pipeline — clone, dictate, archive, and expose over HTTP — with the consent gate staying yours to enforce.

Frequently asked questions

Is jamiepine/voicebox the same product as Meta's Voicebox?

No. Meta's Voicebox was a research model announced in June 2023, trained on more than 50,000 hours of audio and supporting six languages, and Meta declined to release the model or code over misuse risk . The tool covered here is jamiepine/voicebox — an unrelated MIT-licensed desktop app by the author of the Spacedrive file explorer, first released as 0.1.0 on January 27, 2026 . Same name, different lineage.

Does a 3-second audio clip actually work for voice cloning?

Sometimes, but it is below the documented ideal. Some zero-shot engines will process a 3-second seed, yet the official docs recommend 10–30 seconds of clean, low-noise WAV speech in natural, complete sentences . Shorter clips are more sensitive to compression artifacts and background noise, so the "3-second" homepage headline omits the practical trade-off.

Does Voicebox run on Linux?

There is no prebuilt Linux binary as of v0.5.0 (April 25, 2026) — the team attributes this to GitHub runner disk-space limits . Linux users run it via Docker (docker compose up, port 17493) or build from source, which requires Bun, Rust, Python 3.11+, and the Tauri prerequisites .

Can I use Voicebox without an internet connection?

Yes, after the initial model download on first launch. Each TTS engine and Whisper ASR model runs locally once fetched, so generation and dictation work offline . An optional paid cloud backup and sync tier is planned but not active as of v0.5.0 (April 2026) .

How does the cost compare to ElevenLabs?

Voicebox is free for local use under an MIT license . ElevenLabs is a hosted platform: Free at $0 for 10k credits, Starter at $5 for 30k credits (Instant Voice Cloning), and Creator at $22 for 100k credits (Professional Voice Cloning, which needs roughly 30 minutes of audio for fine-tuning) . Voicebox plans a paid cloud backup tier that is not yet live, and output-quality parity with ElevenLabs remains unverified in controlled tests as of April 2026 .

DEV Community: Creeta

Agent Reach installs the tools, then gets out of the way

What is Agent Reach CLI for?

Prerequisites before pipx

Runnable sequence: pipx, safe pass, doctor

OpenCLI, Firecrawl, Jina Reader, Browserbase fit map

Gotchas and where to go next

Frequently asked questions

Is Agent Reach a scraping API?

What should I run after installing Agent Reach?

When should I use Firecrawl instead of Agent Reach?

Where does Jina Reader overlap with Agent Reach?

Is OpenCLI required for every Agent Reach channel?

Ego says 2.5x faster; the catch is your Mac

Is Ego Lite mainly for Macs?

Before You Touch the Installer

Runnable Sequence for a Logged-In Space

Where the Mac Shortcut Bites

Try This After the Smoke Test

Frequently asked questions

Can Ego Lite use my existing website logins?

Is Ego Lite the same as Playwright or Browser Use?

Why does Ego Lite require a Mac today?

Does the 2.5x faster claim have independent proof?

When should I pick Vercel agent-browser instead?

OpenAI's Codex plugin hands your code review to a second LLM

What the Codex plugin does that a single LLM review misses

What to have ready before activating the plugin

Activating the Codex plugin in Claude Code

A realistic heads-up: dual provider spend, the stop-gate, and large-diff pace

Going further: /codex:adversarial-review, /codex:rescue, and /codex:transfer

Frequently asked questions

Does the plugin make its own API calls or use my existing Codex setup?

Do I need a paid OpenAI plan?

What's the difference between /codex:review and /codex:adversarial-review?

What does the review gate do and should I enable it?

How is this different from Anthropic's Code Review feature?

World Monitor hit 67k stars — here's what the MCP endpoint

What World Monitor Offers at 67k Stars: Feed Scope and the Bot Surface

From Bare npm to Tauri Sidecar: Scoping Your Commitment

Spinning Up World Monitor: The Bare npm Procedure

Throttling, AGPL, and Staleness in CII Computation

Graduating to worldmonitor: npx Enumeration, Tauri Packaging, and Scheduled Digests

Frequently asked questions

Is World Monitor free to self-host?

Can I run World Monitor without any upstream API keys?

What does the AGPL license mean for a private fork?

How does AI summarization work without a cloud LLM?

How do I connect an MCP client to World Monitor?

Codex CLI dropped chat-wire — OpenCodex picks up the routing

How Codex CLI froze out community LLMs

Before ocx init: what to confirm

How to put ocx between Codex CLI and any LLM

What could go sideways when proxying any LLM

Once it translates: pooling, sidecars, and the dashboard

Frequently asked questions

What is the exact npm package name to install OpenCodex?

Does OpenCodex work with Claude Code as well as Codex CLI?

What does ocx init actually write to my Codex config?

Is it safe to route Anthropic OAuth credentials through OpenCodex?

Which wire protocol adapters does OpenCodex ship?

OpenAI ships Codex into Claude Code — two commands, or four?

What codex-plugin-cc is, and why OpenAI built it for Claude Code

What the plugin expects on your machine

The install sequence, from marketplace to /codex:setup

Where the two-command narrative stops short

The slash command menu after a clean install

Frequently asked questions

Do I need a paid OpenAI plan to use codex-plugin-cc?

Does the Codex plugin share my Claude account or billing?

Why don't my /codex:* commands appear after step 2?

What is the difference between /codex:review and /codex:adversarial-review?

Is codex-plugin-cc the same as the "OpenAI Developers" plugin for Claude Code?

Watch / Sources

LongCat-Video-Avatar 1.5 cuts inference to 8 steps — here's

LongCat-Video-Avatar 1.5: DMD2 distillation, Wav2Vec2 replacement, and INT8 offloading

Preparing your environment for LongCat-Video-Avatar 1.5: PyTorch 2.6.0, CUDA 12.4, FlashAttention-2

Producing a talking-head avatar with LongCat-Video-Avatar 1.5: torchrun, AT2V, and DMD2

ComfyUI packaging and workarounds for LongCat-Video-Avatar 1.5

Frequently asked questions

What does `ocx init` actually write to my Codex config?