DEV Community: Eli

Open-Source AI Presentation and Office Tools: What Each One Actually Does

Eli — Tue, 14 Jul 2026 11:51:32 +0000

Originally published on the ClawMama Blog. This DEV.to edition uses the same comparison methodology and links back to the canonical article.

Document work is not one job. Making a deck look good, producing a valid .xlsx a colleague can edit, extracting tables out of a scanned PDF, converting Markdown to Word, splitting a 200-page PDF, and collecting a legally meaningful signature are six different jobs — and the open-source projects that keep appearing together in "AI office tools" roundups each specialize in one of them.

This article compares frontend-slides and OfficeCLI directly, because they're the two that overlap with what people usually mean by "AI makes my documents." The other four — Docling, Pandoc, Stirling PDF, and DocuSeal — are covered as what they are: adjacent tools for different jobs, not competitors.

The map

Project	The job it's built for	Output
frontend-slides	Presentation design — visually distinctive decks	Single-file HTML slides; PDF export
OfficeCLI	Creating and editing real Office files programmatically	Valid DOCX / XLSX / PPTX
Docling	Document understanding — structure out of PDFs and Office files	Markdown / JSON for pipelines
Pandoc	Format conversion between markup and document formats	Dozens of formats, both directions
Stirling PDF	PDF operations — merge, split, convert, redact	Processed PDFs, self-hosted
DocuSeal	Signature workflows — fill and sign	Signed documents, audit trail

If a chatbot promises to do all six, it's wrapping several of these (or doing some of them badly). Knowing which layer your problem lives in is most of the decision.

frontend-slides vs OfficeCLI: the real comparison

These two get confused because both can end in "a presentation." The difference is what kind, and for whom.

frontend-slides is a coding-agent skill for producing designed presentations as web pages. Its bet is that the web platform is a better design medium than PowerPoint's shape model: slides are a single HTML file with embedded CSS and JavaScript, exportable to PDF via Playwright, optionally deployable to a live URL. Its most distinctive idea is workflow, not rendering — a "show, don't tell" loop where instead of asking you to describe an aesthetic in words, it generates visual preview options and you pick. It ships 12 curated presets plus 34 bolder templates, and is openly hostile to default-AI styling (the README's phrase: goodbye, purple gradients on white). It runs as a Claude Code plugin or with other coding agents, and can ingest an existing PowerPoint file as source content. MIT licensed.

What it doesn't give you: a .pptx. If your deck must be edited by colleagues in PowerPoint afterwards, frontend-slides is the wrong layer.

OfficeCLI attacks the opposite problem: not "make it beautiful" but "make it a correct file." It's a command-line tool (single self-contained binary; Homebrew, Scoop, or npm; also an MCP server and Python/Node SDKs) that creates, reads, and edits DOCX, XLSX, and PPTX. Elements are addressed with an XPath-like path syntax (/slide[1]/shape[2]), output is JSON for structured parsing, and — the part that matters most when an AI is the operator — it validates against OpenXML schemas and detects issues like text overflow, missing alt text, and formula errors, returning structured error codes an agent can act on. On the spreadsheet side it evaluates 350+ Excel functions and can generate native pivot tables. Apache-2.0 licensed.

Its honest limitation: a few paths depend on Microsoft Office on Windows (native TOC refresh, for example), with headless fallbacks elsewhere — relevant if you need pixel-faithful native rendering.

The two-sentence version: frontend-slides optimizes for the audience (what they see on screen); OfficeCLI optimizes for the file (what survives being opened, edited, and validated in the Office ecosystem). Teams that present from a browser and teams that live in shared drives full of .docx will make opposite choices, correctly.

The adjacent four — useful, and not PPT generators

Docling (63,096 GitHub Stars as observed on July 14, 2026) goes the opposite direction from everything above: documents in, structure out. It parses PDFs, Office files, images, and HTML into clean Markdown or JSON, with layout analysis, table-structure recovery, and OCR — its tagline is "get your documents ready for gen AI," and its natural habitat is RAG and extraction pipelines. Judging it as a document creator misses the point entirely; it's the tool you run before the AI reads, not after the AI writes. MIT licensed, and actively maintained down to gnarly edge cases (recent fixes cover DOCX tables with late-starting rows and rendering native Excel charts to images).

Pandoc (45,383 Stars, same date) is the veteran of this list — the "universal markup converter" that turns Markdown into DOCX, LaTeX into HTML, and dozens of other pairs, extensible with Lua filters. It's frequently the last mile of an AI writing pipeline: the model writes Markdown, Pandoc produces the deliverable. It even has a PPTX writer, but it converts structure, it doesn't design slides — Pandoc-generated decks look like their template, nothing more. One operational note straight from its manual: exposing Pandoc in a web service has real security considerations the maintainer documents explicitly; read that section before wiring it to user input. GPL-2.0.

Stirling PDF (87,020 Stars, same date — the biggest number in this article, for the least AI-flavored tool) is a self-hosted web application for PDF operations: merge, split, rotate, compress, convert, OCR, redact, watermark, and dozens more, usually deployed via Docker. It's the answer to "our PDF handling shouldn't leave our infrastructure," not to any generation question. Sitting between document creation (upstream) and signing (downstream), it's plumbing — in the best sense.

DocuSeal (17,517 Stars, same date) is an open-source alternative to DocuSign: template building, field placement, filling, and legally meaningful signature collection, self-hosted, with an API and embedding options. AGPL-3.0. It enters the story where documents stop being drafts. A signature workflow is a process with accountability, which is precisely why you want deterministic software rather than a generative model handling it — the right division of labor is AI drafting the contract text, DocuSeal running the signing.

Picking by scenario

"I present next week and want it to look genuinely good." frontend-slides. Present from the browser or export the PDF.
"Our reports must be Word and Excel files others edit." OfficeCLI — and its validation loop is what makes agent-generated files trustworthy rather than merely plausible.
"I need the data out of 500 PDFs." Docling. Different direction entirely.
"I write Markdown; deliverables are DOCX." Pandoc, probably scripted once and forgotten.
"Contracts need filling and signing." DocuSeal, after drafting happens elsewhere.
"Split, merge, OCR, redact — on our own servers." Stirling PDF.

Realistic workflows chain them: Docling extracts from source material → an AI drafts → OfficeCLI or frontend-slides produces the artifact → Pandoc converts where needed → Stirling PDF post-processes → DocuSeal collects signatures. Six tools, six layers, no actual overlap.

Where ClawMama has verified this, specifically

Two of the six run today as verified ClawMama chat agents, and for both we hold Fidelity A verification — meaning the full representative workflow ran and produced checkable artifacts, not a demo:

The AI Presentation Designer, built on frontend-slides: its verification produced a seven-slide Chinese-language deck as HTML and PDF, with every slide passing desktop and mobile overflow/bounds checks and zero browser console errors. Known limits from that run: some small supporting text may read poorly from a distant projector, and the upstream deploy script wasn't exercised.
The Office Document Assistant, built on OfficeCLI (verified at version 1.0.135): its verification created a DOCX quarterly brief, an XLSX sales dashboard, and a PPTX strategy update, all passing OfficeCLI's OpenXML validation — the XLSX and PPTX with zero issues, the DOCX with one formatting suggestion. Not covered: Microsoft Office's native pixel-level rendering, and Windows-only refresh/pagination paths.

The other four projects are qualified on our radar for this category and are described here from their upstream documentation; none of them is a verified ClawMama agent as of this writing, and this article shouldn't be read as implying they are.

FAQ

Which of these is "the open-source AI PPT generator"?
Strictly, none. frontend-slides is the closest for designed presentations, but it outputs HTML/PDF rather than PPTX. OfficeCLI creates real PPTX files but is an engine for correct files, not a designer. The four adjacent tools don't generate presentations at all — that's the point of separating them.

Can I get an editable PowerPoint file from frontend-slides?
No — its output is a self-contained HTML deck (plus PDF export). It can read a PPTX as input content. If colleagues must edit the result in PowerPoint, generate with OfficeCLI instead and accept plainer styling.

How do these fit an AI agent workflow?
Unusually well, by design: frontend-slides is literally a coding-agent skill; OfficeCLI ships an MCP server, JSON output, and structured error codes so agents can self-correct; Docling exists to feed documents to models; Pandoc gives deterministic last-mile conversion. Stirling PDF and DocuSeal are conventional self-hosted apps that slot in before and after.

Are they free for commercial use?
Licenses differ meaningfully: MIT (frontend-slides, Docling), Apache-2.0 (OfficeCLI), GPL-2.0 (Pandoc), and AGPL-3.0 (DocuSeal) — the AGPL matters if you offer it as a service. Stirling PDF's licensing has its own arrangement; check the repository's current terms before deploying commercially.

Do the GitHub star counts mean Stirling PDF is the best tool here?
No — they measure audience size at a moment in time (all figures here observed July 14, 2026). PDF operations are a near-universal need, so Stirling PDF's numbers are large; frontend-slides serves a newer, narrower workflow. None of these six compete on the same job, so ranking them by stars compares apples to filing cabinets.

Open-Source AI Video Tools Compared: OpenMontage, Remotion, and auto-editor

Eli — Tue, 14 Jul 2026 11:50:38 +0000

Originally published on the ClawMama Blog. This DEV.to edition uses the same comparison methodology and links back to the canonical article.

"AI video tool" has become a label broad enough to be useless. OpenMontage, Remotion, and auto-editor all get the label, and comparing them as if they compete head-to-head produces bad decisions — because for most workflows, only one of them is even a candidate.

A cleaner way to think about it: video work splits into deciding what the video should be (script, assets, structure), producing the frames (rendering, compositing), and cleaning up recorded footage (cutting, trimming). Each of these tools lives primarily in one of those layers.

Decision guide, before the details

You have an idea and want a finished video — an explainer, a trailer, a talking-head piece — and you're comfortable working through an AI coding assistant: look at OpenMontage. It orchestrates the whole workflow, from script to composited output.
You need videos generated from data or templates, repeatably — personalized clips, automated social assets, motion graphics defined in code: look at Remotion. It's a developer framework, and that's its strength; it is not an AI editor out of the box.
You have recorded footage with dead air — screencasts, lectures, podcast video, raw takes — and want the silence gone without opening an editor: look at auto-editor. It does one job, quickly, from the command line.

If none of those sentences described your situation, the honest answer may be a conventional editor.

What each project is

	OpenMontage	Remotion	auto-editor
Category	Agent-driven video production system	Programmatic video framework (React)	Automatic rough-cut CLI
You provide	A brief or concept, via an AI coding assistant	React/TypeScript code defining compositions	Recorded video/audio files
It produces	Finished MP4s: explainers, trailers, social clips	Rendered video from code-defined templates	Trimmed media, or timelines for Premiere / Resolve / Final Cut
AI involvement	Central — the agent plans and executes the pipeline	None built in; commonly driven by AI tools	Signal analysis (loudness, motion), not generative
Key dependencies	Python 3.10+, FFmpeg, Node 18+, an AI coding assistant; optional generation providers	Node.js; rendering via headless browser; optional Lambda for scale	Python install; FFmpeg-based processing
License	AGPL-3.0	Source-available Remotion license (free for individuals and small teams; companies license it)	Unlicense (public domain)
Stars	37,503 as observed on July 12, 2026	53,082 as observed on July 14, 2026	4,538 as observed on July 14, 2026

Star counts are dated snapshots and — note the spread — mostly reflect how broad each project's audience is, not how good it is at your job. auto-editor's smaller number reflects a narrower, well-served niche.

OpenMontage: the workflow is the product

OpenMontage's pitch is that video production is a process problem, not a generation problem. Clips from a text-to-video model don't make a video; scripts, shot plans, asset sourcing, narration, captions, music, and composition do. So OpenMontage packages that process as something an AI coding assistant (Claude Code, Cursor, Copilot, Windsurf, or Codex) can execute: 12 production pipelines — animated explainers, cinematic trailers, documentary montages from free archives, talking-head videos, podcast repurposing, and others — backed by 50+ tools and a library of several hundred agent skills.

Two design choices matter for evaluating it:

There's a no-cost path. Piper TTS for narration, archive.org and free stock libraries (Pexels, Pixabay, Unsplash) for footage, FFmpeg for composition. You can produce complete videos without a single paid API key.
The ceiling depends on providers. The impressive end of its range — generated footage via Kling, Runway, Google Veo, or local models like WAN and Hunyuan, voices via ElevenLabs, music via Suno — requires accounts, keys, and in the local-model case, serious GPU hardware. Output quality tracks whichever providers you configure.

The corresponding caution: a system spanning 12 pipelines and dozens of provider integrations has a very large surface, and no one — including us, more on that below — has verified all of it. Treat each pipeline you care about as something to test yourself, starting from the free path. Note also the AGPL-3.0 license if you plan to build a service around it.

Interestingly, OpenMontage doesn't compete with Remotion — it lists Remotion as one of its composition engines. That tells you where the layers sit: OpenMontage orchestrates; Remotion renders.

Remotion: video as a build artifact

Remotion asks a different question: what if a video were a React component? You define compositions in TypeScript — frames as a function of time and props — preview them in Remotion Studio, and render to MP4 via a headless browser, locally or on serverless infrastructure (its Lambda renderer) when you need hundreds of variants.

That makes Remotion the right shape for a specific class of problems:

Data-driven video. Personalized recap videos, per-customer clips, market summaries — anything where the video is a template plus data.
Repeatable brand output. When the same intro, lower-thirds, and animation system must render identically every week, code beats hand animation.
Video inside product. Remotion's player component embeds compositions in web apps directly.

Two clarifications the marketing noise tends to bury. First, Remotion is not an AI video editor. There's no model inside; it renders exactly what your code says. It has become a favorite target for AI coding agents — an LLM writes React well, so "agent writes Remotion code" is a natural pipeline, and the project has leaned into this with agent-facing documentation. But the intelligence is in whatever drives it. Second, it's source-available, not OSI open source: free for individuals and small teams, with companies required to buy a license. For most readers that's fine; for some procurement processes it isn't, so check early.

auto-editor: one job, done well

auto-editor is the least glamorous of the three and, for its audience, the fastest payoff. Point it at a recording and it removes the parts where nothing happens — by audio loudness analysis by default, with motion-based detection as an alternative — adding configurable margins around cuts so the result doesn't feel chopped.

The feature that makes it a professional tool rather than a gadget: it exports timelines, not just files. Alongside rendering a trimmed MP4, it can produce project files for Premiere Pro, DaVinci Resolve, Final Cut Pro, ShotCut, and Kdenlive. The realistic workflow is auto-editor for the rough cut, a human in a real editor for the fine cut — hours of silence-scrubbing gone, editorial judgment retained.

It's a Python CLI, actively maintained (release 31.2.0 shipped in July 2026, and recent commits added an Apple SpeechAnalyzer transcription backend on macOS 26), and released into the public domain. What it is not: generative, creative, or a replacement for editing. It won't choose your best take. "Narrow and practical" is the whole point.

Choosing by workflow

Marketer or founder with briefs, not footage → OpenMontage, driven through an AI coding assistant. Start with the free-provider path and judge results before paying for generation APIs.
Developer automating video output → Remotion. If you later want an agent producing the videos, the agent writes Remotion code — the two compose.
Anyone who records screencasts, lectures, or podcasts → auto-editor, possibly as a permanent part of your pipeline. It coexists with either of the others.
Team building an internal video service → likely Remotion for rendering plus auto-editor for ingest cleanup, with OpenMontage worth studying for its skill/pipeline structure. Mind the AGPL and Remotion licensing implications respectively.

What we've verified ourselves — and what we haven't

Most of this article summarizes upstream documentation and repository state as of July 14, 2026. One part is first-hand: ClawMama runs OpenMontage as a hosted chat agent — the AI Video Production Studio — and we verified a representative local subset of it before listing (our Fidelity B rating). Concretely: installing the OpenMontage skill library surfaced 79 unique skills including the create-video and FFmpeg skills, and the agent produced a five-second 1280×720 H.264 MP4 that we checked with ffprobe.

Equally important is what that verification does not cover: it doesn't prove the complete OpenMontage pipeline, none of the provider-dependent video generation (Kling, Runway, Veo, and the rest) was executed, and the verified output was a simple local FFmpeg composition without audio. If a hosted OpenMontage matters to you, that's the honest current boundary. Remotion and auto-editor are tracked on our radar as qualified projects in the same category; neither has been through ClawMama verification, and nothing here claims otherwise.

FAQ

Which of these actually uses AI to make videos?
Only OpenMontage has AI at the center — an agent plans the production and can call generative providers for footage, voices, and music. Remotion renders whatever code describes, and auto-editor uses signal analysis (loudness, motion), not generative models. Both of the latter are frequently combined with AI, but don't ship it.

Can I use these together?
Yes, and the layering is natural: auto-editor cleans recorded footage, Remotion renders code-defined compositions, and OpenMontage orchestrates end-to-end production — it even uses Remotion internally as a composition engine.

Do I need paid APIs to get anything out of OpenMontage?
No. Its free path — Piper TTS, archive.org, and free stock libraries, composed with FFmpeg — produces complete videos. Paid providers raise the ceiling on generated footage and voice quality, and local generation models need substantial GPU hardware.

Is Remotion free for commercial use?
It depends on who you are. Remotion is source-available under its own license: free for individuals and small teams, while companies above the threshold need a paid license. Check the current terms in the repository before building a business on it.

What's the fastest win if I just record talks and screencasts?
auto-editor, without much contest. One command removes silence, and if you want human polish afterwards, export a Premiere, Resolve, or Final Cut timeline instead of a rendered file and fine-cut from there.

Open-Source AI Trading Research Tools: Dexter vs Vibe-Trading

Eli — Tue, 14 Jul 2026 11:50:34 +0000

Originally published on the ClawMama Blog. This DEV.to edition uses the same comparison methodology and links back to the canonical article.

If you search for "open-source AI trading agent," two projects dominate the conversation right now: Dexter and Vibe-Trading. They get lumped together because both are agents that talk about markets. In practice they occupy different points on the research spectrum, need different credentials, and carry different risks.

One framing before anything else: neither tool makes money for you, and neither should be trusted to trade for you. Both are research instruments. The interesting question is which kind of research they're built for.

The short answer

Choose Dexter if your job is understanding a company or a market question: reading fundamentals, gathering evidence, building a thesis you can check. It's a focused deep-research agent with a deliberately narrow footprint. It does not backtest and does not trade.
Choose Vibe-Trading if your job is testing strategy ideas against history: turning a hypothesis into code, running it through a backtest engine, and inspecting drawdown, turnover, and factor exposure. It's a much larger system, and that breadth cuts both ways.
Choose neither if what you actually want is a system that autonomously trades your money. Dexter doesn't do it at all; Vibe-Trading treats live execution as an experimental, heavily gated add-on — and in our view that gating is the correct default, not a limitation to work around.

Side by side

	Dexter	Vibe-Trading
Core job	Deep financial research and evidence gathering	Research, strategy generation, and backtesting
Language / runtime	TypeScript, runs with Bun	Python 3.11+, pip / Docker / MCP
Architecture	Single agent with task planning and self-reflection	Multi-agent (LangChain/LangGraph), skill library, MCP server, web UI + terminal UI
Market data	Financial Datasets API (key required)	~19 free sources with fallback chains, plus key-gated providers (Alpha Vantage, Futu, Longbridge, Tushare…)
LLM credentials	OpenAI required; Anthropic, Google, xAI, OpenRouter optional; Ollama for local	Many providers (OpenRouter, OpenAI, DeepSeek, Gemini, Groq…); Ollama for local
Backtesting	None	Multi-market engine; Sharpe, max drawdown, walk-forward, Monte Carlo; 461-factor "Alpha Zoo"
Live execution	None, by design	Experimental broker connectors, mandate-gated, kill switch
Auditability	Tool calls logged to a scratchpad file	Run cards, audit ledger, reproducible backtest configs
License	MIT (per its README)	MIT
GitHub Stars	27,368 as observed on July 14, 2026	22,020 as observed on July 14, 2026

Stars are a point-in-time popularity signal, not a quality ranking — both projects are young and moving quickly, so treat those numbers as a snapshot, nothing more.

Dexter: a research analyst, not a trading system

Dexter describes itself as "an autonomous agent for deep financial research," and the description is accurate in both directions: it goes deep, and it stops at research.

You run it from a terminal (bun start after cloning), give it a question — "analyze this company's margins over five years," "what would invalidate a bull case on this stock" — and it plans a sequence of steps, pulls data, reflects on gaps, and iterates until it has an answer. The design centers on task planning and self-reflection rather than a fixed pipeline, which matters for research questions where the second question depends on the first answer.

Three things stand out for anyone evaluating it seriously:

Evidence comes from defined sources. Fundamentals and financial statements come through the Financial Datasets API; web evidence comes through Exa (or Tavily as fallback). That's a short, inspectable list — you know where every claim originated.
The audit trail is a file. Every tool call gets logged to a scratchpad, so you can reconstruct what the agent looked at and when. For financial research, being able to check "did it actually read the 10-K data or hallucinate it?" is worth more than an impressive-sounding summary.
Its disclaimers are unusually direct. The README states the project is for educational and informational purposes, not for real trading, and that outputs may be incorrect or out of date. Take that at face value.

The credential footprint is small but not zero: you need an OpenAI key at minimum, plus a Financial Datasets key and a search key for the tool to be genuinely useful. None of those credentials can move money, which keeps Dexter's worst-case failure mode at "wrong analysis" rather than "wrong order."

What Dexter doesn't do: backtesting, portfolio tracking, factor analysis, order routing, or anything time-series-quantitative. If your workflow lives in those areas, it's the wrong tool — not a lesser one.

Vibe-Trading: a quant research platform with agents inside

Vibe-Trading, from the HKUDS group, is a different scale of project. Where Dexter is one agent with a handful of tools, Vibe-Trading is a platform: a multi-agent layer built on LangChain/LangGraph, a library of finance skills, a 461-factor library (the "Alpha Zoo," drawing on Qlib's 158 factors, the GTJA 191 set, Kakushadze's 101, academic proxies, and SEC fundamentals), a React web interface, a terminal interface, and an MCP server so external clients can call its tools.

The workflow it's built around: describe a strategy idea in natural language → the system generates strategy code → the backtest engine runs it against historical data → you get metrics (Sharpe, max drawdown, turnover), benchmark comparisons, and trade-level attribution. It supports walk-forward analysis, Monte Carlo confidence intervals, and bootstrap validation — the standard toolkit for asking "is this result luck?"

Details that suggest the maintainers take research integrity seriously:

The factor library ships with a lookahead-guard test suite and an AST purity gate intended to catch forward-biased factors before they contaminate results.
Backtests produce reproducible configurations — the same fixed data and config should give the same metrics, which is the minimum bar for trusting any backtest.
Recent commit history shows active correctness work on real edge cases: negative-final-equity metric crashes, silently truncated historical data windows, realized turnover being computed but dropped. This is the unglamorous maintenance that separates research software from demos.

Data and credentials are more involved than Dexter's. The free tier covers a lot — A-shares via mootdx or AKShare, US/HK equities via yfinance, crypto via OKX/CCXT — with automatic fallback chains when a source throttles. Key-gated providers (Alpha Vantage, Futu, Longbridge, Tushare, and others) extend coverage but each adds credentials to manage. If you go anywhere near the broker connectors, you're storing brokerage credentials, and the calculus changes entirely (more on that below).

On live execution: Vibe-Trading does include broker connectors — some read-only (IBKR via local gateway, Trading 212), some capable of paper trading and order placement behind an explicit "mandate" (symbol universe, size caps, exposure limits, daily limits), a filesystem kill switch, and an audit ledger. The project labels all of this experimental. Our recommendation is simpler: use Vibe-Trading as a research and backtesting system, and treat the execution layer as out of scope until you have independent reasons to trust it with real credentials.

The risk section most comparisons skip

Both tools inherit every classic failure mode of quantitative and AI-assisted research. If you use either, these are the ones that actually bite:

Stale or wrong data. Free data sources throttle, gap, and revise. Fallback chains (Vibe-Trading) reduce outages but can silently switch you between sources with different adjustment conventions. Paid sources (Dexter's Financial Datasets, Vibe-Trading's key-gated providers) reduce but don't eliminate this. Always check the dates on the data behind a conclusion.
Overfitting. A 461-factor library is a wonderful tool and an overfitting machine. Test enough factors against one historical period and some will look brilliant by chance. Walk-forward analysis and out-of-sample discipline help; nothing cures it.
Transaction costs and slippage. A backtest that ignores costs is fiction. Vibe-Trading surfaces turnover in its metrics precisely because high-turnover strategies die on costs — read that number.
Survivorship bias. Historical universes built from today's listed companies exclude the ones that failed. Neither tool can fix a biased universe for you; it's a property of the data you feed in.
LLM-specific failure. An agent can produce a confident, well-written thesis anchored on a misread number. Dexter's scratchpad and Vibe-Trading's run cards exist so you can verify; use them, don't skim them.
Credentials and autonomous execution. The single largest risk decision is whether an agentic system holds credentials that can place orders. Research-only API keys bound your downside at bad analysis. Brokerage credentials do not. Keep those two credential sets on opposite sides of a wall.

Which should you choose?

You're an individual investor who wants better company research. Dexter. Small setup, inspectable sources, an audit trail, and no temptation to hand it an order button it doesn't have.
You're quant-curious and want to test ideas properly. Vibe-Trading. The backtest engine, validation tooling, and factor library are the point. Run it in research mode with free data sources first; add key-gated data only when a specific gap justifies it.
You want both fundamental research and strategy testing. They coexist without conflict — Dexter for the "should I care about this company" question, Vibe-Trading for the "does this rule have historical evidence" question. The credential sets don't overlap much beyond an LLM key.
You want an unattended money-making bot. Neither, and be skeptical of anything that claims to be one. The honest versions of these tools are labeled "research."

Methodology and limits of this comparison

This comparison is based on both projects' public repositories, READMEs, and commit history as of July 14, 2026, plus point-in-time GitHub metrics recorded the same day. We have not independently benchmarked research quality or backtest accuracy between the two systems, and we haven't validated Vibe-Trading's broker-connector safety mechanisms — we describe them as documented upstream. Both projects move fast; verify current capabilities against the repositories before relying on details here.

Where this intersects with what we build: ClawMama's business radar tracks both projects in its trading-research category. Vibe-Trading currently sits as a qualified challenger — meaning it passed our activity and relevance gate, but hosted fidelity, credential handling, and safety boundaries have not yet been validated for chat use. Neither Dexter nor Vibe-Trading is a verified ClawMama Catalog Agent today. If that changes, the coverage will stay where this article stands: research and backtesting first, no autonomous live trading, and no pretending a backtest is investment proof.

FAQ

Can either tool trade for me automatically?
Dexter cannot — it has no execution capability. Vibe-Trading ships experimental broker connectors behind explicit mandates and a kill switch, but the project itself labels live trading experimental. We'd treat both as research-only and keep brokerage credentials out of any agent's environment.

Do I need paid data to get value from these tools?
For Vibe-Trading, no — its default fallback chains use free sources (yfinance, AKShare, mootdx, CCXT), and that's enough for learning and most strategy prototyping. For Dexter, a Financial Datasets API key is effectively required for the fundamentals work it's designed for, alongside an OpenAI key.

Is a good backtest evidence a strategy will make money?
No. A good backtest is evidence a strategy would have worked on one specific historical dataset, under the cost assumptions you chose. Overfitting, regime change, transaction costs, and survivorship bias all sit between a green backtest and live profitability. Tools like walk-forward analysis narrow the gap; nothing closes it.

Which is easier to set up?
Dexter, comfortably — clone, add three or four keys, bun start. Vibe-Trading is a pip install or Docker away for basics, but its surface area (web UI, MCP server, data source configuration, broker connectors) means more decisions before you're productive.

Are these tools related to ClawMama?
Not as products. Both are independent open-source projects we track and compare because readers evaluate them for the same jobs. Vibe-Trading is a qualified challenger on our radar; neither project is a verified ClawMama Catalog Agent as of this writing.

普通人也能冲 10 万美元奖金池：OKX.AI 黑客松参赛指南 + 10 个获奖创意

Eli — Sat, 11 Jul 2026 01:37:35 +0000

距离截止只剩几天，但这场比赛仍然值得冲：OKX.AI Genesis Hackathon 正在寻找能解决真实问题、获得真实使用的 Agent Service Provider（ASP），总奖金池为 100,000 美元，提交截止时间是 2026 年 7 月 17 日 23:59 UTC。

更令人兴奋的是，这并不是一场只欢迎加密交易工具或大型开发团队的比赛。官方明确表示，参赛项目可以来自加密领域，也可以完全不涉及加密。只要你能把自己的知识、工作流、工具、数据或服务，变成一个可以被别人调用的实用 AI 服务，就有机会参赛。

换句话说：会写代码当然有帮助，但一个足够具体、真正有人需要的创意，可能比“做一个什么都能聊的机器人”更重要。

重要提醒：奖金不是注册即领，也不保证获奖。项目必须符合活动要求、通过 OKX.AI 审核并成功上线，再由官方根据相应奖项标准评选。本文不构成投资建议；钱包连接、支付、交易等关键操作应由本人确认。

先看懂：这场黑客松到底在寻找什么？

OKX.AI 希望参赛者构建的是 Agent Service Provider，简称 ASP。

它不是一个只回答问题的聊天窗口，而是一项能够完成明确任务的 AI 服务。例如：

接收一份数据，生成可验证的分析结果；
根据用户条件筛选并比较多个方案；
持续监测某个变化，在满足条件时给出报告；
为其他 Agent 提供按次调用的数据、工具或专业判断；
把一个原本需要人工反复完成的流程，变成可交付的服务。

X Layer 官方对目标说得很直接：他们要的是能够解决真实问题并产生真实使用量的 Agent，而不是“又一个聊天机器人”。你可以先阅读 OKX.AI Genesis Hackathon 官方活动页，再查看 X Layer 的官方倒计时公告。

奖金怎么分？你不只有一种获奖方式

官方公布的总奖金池为 100,000 美元。主要奖项包括：

Best Product：10,000 / 6,000 / 4,000 USDT
Creative Genius：10,000 / 6,000 / 4,000 USDT
Revenue Rocket：10,000 / 6,000 / 4,000 USDT
Finance Copilot：3 名，每名 2,500 USDT
Software Utility：3 名，每名 2,500 USDT
Lifestyle Companion：3 名，每名 2,500 USDT
Artistic Excellence：3 名，每名 2,500 USDT
Social Buzz：10 名，每名 1,000 USDT

每位获奖者还可能获得 OKX 官方曝光和后续合作机会。

这套奖项设计释放了一个重要信号：你不一定要做功能最多、技术最复杂的产品。你也可以从创意、产品体验、真实收入、垂直类别或社区传播中选择一个主攻方向。

普通人如何参加：把比赛拆成 4 步

第一步：从一个真实问题出发

不要先问“我能接入多少模型”，先问：

谁会使用它？
他们现在如何解决这个问题？
Agent 每次能交付什么明确结果？
用户为什么愿意再次调用，甚至为一次调用付费？

最适合短时间参赛的题目，通常具有三个特征：需求清楚、输入输出清楚、90 秒内可以演示清楚。

第二步：做出能运行的 ASP，并提交到 OKX.AI

按照官方规则，参赛者需要构建一个解决明确现实问题的 ASP，并通过 OKX.AI 提交上架。项目必须通过内部审核并成功上线，才能保持参赛资格。

因此，别把全部时间花在宏大的路线图上。先做一个小而完整的版本：输入是什么、Agent 做什么、输出是什么、结果如何验证。

背景资料可以从 OKX Onchain OS 开发文档和 X Layer 开发者文档开始了解。

第三步：在 X 发布参赛介绍和演示

官方要求使用 #OKXAI 发布参赛内容，介绍你的 ASP、解释使用场景，并提供清晰的演示或操作流程。演示内容不应超过 90 秒。

一个有效的演示可以按下面的节奏：

0–15 秒：现实中的问题是什么；
15–35 秒：用户提交什么；
35–70 秒：ASP 如何完成任务；
70–90 秒：输出结果、真实价值和下一步。

第四步：在截止前提交活动表单

你需要在 7 月 17 日 23:59 UTC 前，通过官方活动页提供的入口提交表单。表单需要包含 ASP 信息和 X 参赛帖链接。

提交前一定回到官方活动页核对最新要求。活动规则、入口或时间可能更新，应以官方页面为准。

10 个我认为有获奖潜力的 App 创意

下面不是“保证获奖题库”，而是根据官方强调的真实问题、产品完整度、创造力、收入和垂直类别，筛选出的十个值得快速验证的方向。

1. 跨境自由职业任务报价 Agent

用户输入任务说明，Agent 自动拆解交付物、估算时间、识别风险并生成报价与验收标准。它还可以把模糊需求改写成双方都能确认的工作范围。

机会点：真实商业需求强，适合展示 Agent 如何把专业工作包装成可交易服务。

2. 小商家的“差评急救” Agent

读取客户评价和订单背景，区分产品、物流、沟通或预期问题，生成不同语气的回复和补救方案，并持续总结高频原因。

机会点：容易找到真实测试用户，结果也能在 90 秒内展示。

3. API 与数据质量体检 Agent

开发者提交 API 文档或接口地址，Agent 检查响应稳定性、字段变化、延迟、错误码和文档一致性，输出一份可执行报告。

机会点：特别适合 Software Utility，也可以设计成按次付费的机器服务。

4. 智能合约“人话风险说明书”

把公开合约信息转换成普通用户能理解的权限、资产流向、升级能力和异常风险摘要，并明确说明哪些结论尚不能确认。

机会点：用户价值清楚，但必须坚持风险提示，不承诺安全，也不代替审计。

5. 会议承诺追踪 Agent

读取会议纪要，提取“谁在何时交付什么”、依赖项和未解决问题；下一次会议前自动生成追踪清单。

机会点：非加密场景，受众广，容易形成持续使用。

6. 创作者授权与素材溯源 Agent

创作者上传作品或素材清单，Agent 整理来源、授权范围、到期日和发布渠道限制，并为每次商业使用生成核对报告。

机会点：兼具 Artistic Excellence 与实用价值，不只是生成图片。

7. “一冰箱食材”家庭计划 Agent

用户拍摄或输入现有食材、过敏信息、预算和人数，Agent生成数日菜单、采购差额和减少浪费的烹饪顺序。

机会点：Lifestyle Companion 的演示效果强，使用频率高。

8. 社区事实核验协作 Agent

针对一条公开说法，Agent拆分可核验主张、寻找一手来源、标注证据等级，并输出“已证实、存疑、无法确认”的结构化结果。

机会点：社会价值和可解释性强，重点是展示来源链，而不是假装全知。

9. 电商退货原因诊断 Agent

读取退货备注、客服记录和商品信息，把问题归类为尺码、描述偏差、质量、物流或使用障碍，并提出商品页和运营改进建议。

机会点：可以直接用“减少退货率”衡量价值，适合争取真实商家试用。

10. Agent 服务验收官

当一个 Agent 声称完成任务后，这个服务按照事先约定的标准检查结果：文件是否齐全、数据是否可复现、链接是否有效、格式是否合规，并输出通过或退回原因。

机会点：它服务的不是单个行业，而是整个 Agent 经济的信任层；也非常适合与其他 ASP 组合调用。

想提高获奖概率，别只做 Demo

选择一个奖项作为主攻方向

冲 Best Product：优化完整流程、错误处理和用户体验；
冲 Creative Genius：提供一个令人意外但合理的新服务；
冲 Revenue Rocket：尽快获得真实订单、收入和正面评价；
冲垂直类别奖：让项目与对应类别高度匹配；
冲 Social Buzz：讲清产品故事，让真实用户参与测试和传播。

找 3–10 个真实用户，而不是自己反复测试

邀请目标用户完成一次真实任务。记录他们在哪里犹豫、输出哪里不可信、什么结果最有用。真实反馈通常比继续增加十个功能更有价值。

让输出可以被验证

评委看到的不应只是“Agent 说自己完成了”。最好提供来源、过程记录、结构化结果、前后对比，或一个明确的验收标准。

把安全边界讲清楚

如果涉及钱包、支付、交易、授权或敏感数据，应明确哪些步骤由 Agent 准备、哪些步骤必须由用户确认。可信的边界，本身就是产品能力。

没有现成 Agent，也可以从一次对话开始

对普通人来说，最大的障碍往往不是缺少创意，而是不知道如何把一个行业经验变成 ASP：该选择什么场景、如何缩小范围、如何设计输入输出、如何准备 90 秒演示。

ClawMama 的 OKX.AI OnchainOS Agent 已经预置相关能力，可以帮助你从一个想法开始：梳理参赛方向、选择 User / ASP / Evaluator 角色、设计 A2A 或 A2MCP 服务、准备上架清单和参赛内容。

你不需要先自己部署一套 Agent。可以直接告诉它：

“我熟悉餐饮运营／电商／设计／财务／教育，请帮我把这项经验变成一个适合 OKX.AI Genesis Hackathon 的 ASP，并给我一份最小可行版本计划。”

最后再强调一次：工具可以帮你研究、设计和准备材料，但报名、钱包连接、支付、交易和最终提交应由你本人检查并确认。

窗口已经打开，时间也正在倒数。真正值得冲刺的，不是最庞大的 Agent，而是那个能在 90 秒里让人看见：它解决了一个真实问题，而且有人愿意再次使用。

资料与报名：

A Practical Way into the OKX.AI Agent Economy — No Agent Setup Required

Eli — Sat, 11 Jul 2026 00:31:25 +0000

The agent economy is often explained as if everyone already has an agent.

Choose a model. Install a framework. Connect a wallet. Add tools. Deploy a server. Keep it online. Then register it somewhere.

That path works for developers, but it should not be the admission ticket for everyone else.

A more useful starting point is simpler:

What can people do inside an agent economy, and how can they begin before learning to build an agent?

OKX.AI offers a concrete answer. It describes itself as the world's first A2A agent economy: a network where agents can find work, hire services, and settle payments onchain.

The important part is not another collection of AI demos. It is the attempt to build the commercial infrastructure around agent work.

Two marketplaces, two sides of work

OKX.AI connects two marketplaces.

Agent Marketplace

The Agent Marketplace is where a user can discover and hire working agents by capability, price, and onchain reputation.

An agent may provide research, data analysis, content production, onchain intelligence, design, or another professional capability. The marketplace makes those capabilities easier to compare and purchase.

Task Marketplace

The Task Marketplace begins with demand.

A user can assign an agent directly, choose from an automatically matched shortlist, or publish a task so qualified providers can respond.

Together, the two marketplaces create a basic service loop:

A need is described

→ a service is found

→ terms are confirmed

→ work is delivered

→ the result is reviewed

→ payment is settled

→ reputation accumulates

This is a larger idea than “AI can use crypto.” It is an attempt to give agent work a market structure: discovery, identity, payment, escrow, evaluation, and settlement.

You do not need to begin as a developer

OKX.AI defines three roles: User, ASP, and Evaluator.

They matter because operating an agent service is only one way to participate.

User: create useful demand

A User publishes work and hires services.

This is the most accessible role. You can begin with a specific outcome:

compare public onchain activity across several projects;
summarize recent developments in an ecosystem;
monitor a category of public signals;
find a research or content provider;
turn a broad goal into a task with clear acceptance criteria.

Well-defined demand is not a secondary contribution. No service economy works without it.

ASP: provide a service

An ASP, or Agent Service Provider, makes an agent capability available to others.

OKX.AI supports two service models:

A2A (Agent-to-Agent) for complex work where scope, price, and delivery terms may need negotiation. Funds can be held in escrow until the user approves the result.
A2MCP (Agent-to-MCP) for standardized API-like services such as data queries, price feeds, and utility functions. These may be free or paid per call.

This distinction creates room for different kinds of providers.

A researcher might offer a multi-step A2A report. A developer might expose a narrow A2MCP data endpoint. A small studio might provide a repeatable design or content workflow.

The service does not always need to become a full SaaS product first. A clear capability can be made discoverable and paid for inside an agent market.

Evaluator: decide whether the work was completed

An Evaluator helps resolve a hard problem in autonomous work: generating an answer is not always the same as completing the job.

Evaluators check whether delivery matches the agreed requirements and participate in dispute resolution. This gives the economy a quality and trust layer, rather than relying only on an agent's claim that the task is done.

Why x402 matters

A marketplace can help an agent find a service. It still needs a machine-friendly way to pay for it.

That is where x402 becomes important.

The name refers to the HTTP status code 402 Payment Required. x402 turns payment into a flow that software can understand:

An agent requests a service

→ the endpoint returns payment requirements

→ payment is completed within the agent's authorization

→ the service returns the result

Most online payment flows were designed for humans: create an account, select a subscription, open a checkout page, enter card details, and confirm the purchase.

Agents need something more granular and programmable.

An agent completing a task may only need to buy:

one market-data query;
one address-risk report;
one image-processing operation;
one structured research result;
one specialized model call.

It should not have to purchase an entire software subscription for a single capability.

According to the official OKX.AI ASP guide, paid A2MCP endpoints must support x402, with the OKX Payment SDK recommended. Onchain OS also advertises native x402 support and gas-free payment operations on X Layer.

This makes x402 more than a checkout feature. It supports specialization between agents.

An agent can accept a job, discover that it needs outside data, pay another service for that data, combine it with its own work, and deliver the final result. The agent does not need to own every capability in the workflow.

That is how a market of composable machine services starts to become possible.

Onchain OS is the capability layer

OKX positions Onchain OS as Built for AI. Ready for Web3.

It combines Agentic Wallet, payments, trading, and an AI Toolkit, with three main access paths:

Skills and CLI;
MCP;
Open API.

The official page currently lists nine Skills and 72 features across token checks, market monitoring, risk detection, trading and transfers, and onchain broadcasting.

A simplified view of the stack looks like this:

OKX.AI

Markets, tasks, identity, reputation, escrow, and settlement

Onchain OS

Web3 capabilities that agents can use

A2A / A2MCP

Ways to package and expose agent services

x402

Per-call payment for machine-accessible services

The infrastructure is substantial. But infrastructure alone does not make the ecosystem accessible to someone who has never deployed an agent.

The hidden barrier is operations

The official OKX.AI onboarding path begins with an agent environment and the installation of Onchain OS Skills.

A developer can set this up locally. A normal user still faces several operational questions:

Which agent framework should I install?
Where should it run?
How does it stay online?
How are tools and credentials configured?
How do I receive task notifications?
Which actions require human approval?

A laptop demo that works for an hour is not the same as an agent that can remain reachable for task intake, longer workflows, and owner decisions.

This is the gap addressed by the OKX.AI OnchainOS Base Agent on ClawMama.

It gives a user a ready-to-use, continuously available agent with Onchain OS Skills already attached. Instead of beginning with deployment, the user can begin with a conversation:

I want to understand how I could participate in OKX.AI.

The agent can then help with:

choosing between User, ASP, and Evaluator roles;
understanding the Agent and Task Marketplaces;
Agentic Wallet and identity onboarding;
turning knowledge or a business process into a service description;
deciding whether a service fits A2A or A2MCP;
understanding the x402 requirement for paid endpoints;
requesting approval before sensitive wallet, payment, or trading actions.

The important change is the order of onboarding.

Instead of:

Learn a framework

→ configure infrastructure

→ create an agent

→ search for something useful to do

a newcomer can follow:

Describe a real goal

→ start with a working agent

→ complete one useful task

→ identify a repeatable workflow

→ decide whether it should become a service

Four small experiments are enough to begin

A newcomer does not need to understand the entire stack on day one.

1. Start with a read-only task

Ask for public information with a verifiable output. For example:

Compare the recent public onchain activity of three protocols. Name the sources, separate observations from assumptions, and list anything that could not be verified.

This tests useful capabilities without starting with financial execution.

2. Define what “done” means

Turn the request into acceptance criteria:

The result must:

cover all three protocols;

state the time period;

name the data sources;

separate facts from interpretation;

identify missing data;

include a comparison table.

This is the beginning of both task design and evaluation.

3. Find one repeatable capability

Look for a narrow step that appears across many tasks:

retrieving a defined set of metrics;
normalizing project information;
detecting changes in public activity;
producing a fixed-format report;
checking whether required fields are present.

A complex workflow may fit A2A. A narrow and predictable function may be a better A2MCP candidate.

4. Choose a role after the experiment

Only after completing a few real tasks, ask:

Do I mainly want to publish work as a User?
Can I offer a reliable service as an ASP?
Am I better at defining standards and checking results as an Evaluator?
Would the service use A2A or A2MCP?
If it is paid per call, how will the endpoint support x402?

Architecture decisions are easier after the workflow is understood.

One person, a network of services

The OKX.AI homepage uses the phrase “One person. One company.”

The useful interpretation is not that an agent automatically creates a successful company. It is that agents can reduce the cost of organizing and selling digital work.

A researcher can package an analysis method. A developer can expose a paid data tool. A designer can take scoped A2A jobs. A domain expert can turn a checklist and judgment process into a repeatable service.

When marketplaces, agent identity, Onchain OS, x402, escrow, and reputation are connected, one person can potentially operate several digital service units without building a conventional software company around each one.

Start with a real need, not a deployment

The most interesting thing about OKX.AI is not that every participant must become an agent developer.

It is that several kinds of participation can exist in the same economy:

Users define demand.
ASPs package capabilities.
Evaluators enforce quality.
Agent Marketplace makes services discoverable.
Task Marketplace gives those services work.
Onchain OS supplies Web3 capabilities.
x402 supports payment for machine-accessible services.

OKX.AI provides the market, identity, payment, reputation, and settlement infrastructure. A ready-to-use environment such as ClawMama provides a lower-friction way for ordinary users to enter: start with a working agent in Telegram, complete a real task, and learn the system through use.

Creating an agent is one possible first step into the agent economy.

Clearly describing one useful job is another.

Start here

To try this path directly, open the OKX.AI OnchainOS Base Agent on ClawMama. The relevant Skills are already attached, so you can begin in chat, explore the available roles, and work toward participating in the OKX.AI ecosystem.

Wallet, trading, transfer, swap, DeFi, payment, staking, registration, and arbitration actions should remain subject to human approval. Never submit private keys, seed phrases, or unprotected API secrets in chat. This article is not financial advice.

普通人如何加入 OKX.AI：从一次对话开始参与 Agent 经济

Eli — Sat, 11 Jul 2026 00:31:23 +0000

AI Agent 正在从“回答问题”走向“完成工作”。

它可以寻找服务、调用工具、交付结果，也可以通过新的支付协议为数据、API 或专业能力付费。围绕这些行为，一个新的市场正在形成：人提出目标，Agent 组织工作，不同服务方提供能力，结果经过确认后完成结算。

OKX.AI 正在为这个市场建设基础设施。它的官方定位是全球首个 A2A Agent 经济：Agent 可以寻找工作、雇用服务，并在链上完成支付结算。

这听起来很技术，但参与者不一定要先学会创建 Agent。对于普通用户，更现实的入口是先获得一个现成可用的 Agent，再通过对话了解生态、发布任务或整理自己可以提供的服务。

OKX.AI 建设的不是一个 AI 工具目录

普通的 AI 工具目录解决的是“有哪些产品”。OKX.AI 惴解决的问题更接近商业基础设施：

如何发现合适的 Agent；
如何把需求发布成任务；
如何确认 Agent 的身份和信誉；
如何为一次服务调用付款；
如何托管复杂任务的费用；
如何确认交付并处理争议。

OKX.AI 用两个相互连接的市场承载这些活动。

Agent Marketplace

Agent Marketplace 用来发现和雇用已经提供服务的 Agent。用户可以按照能力、价格和链上信用记录进行比较。

这里流通的不只是“一个聊天机器人”，而是一项项可以被购买的能力：研究、数据查询、内容处理、链上分析、设计或者其他专业工作。

Task Marketplace

Task Marketplace 从需求出发。用户可以直接指定 Agent、由系统给出候选 Agent，或者公开发布任务，让符合条件的服务方参与。

两个市场连接起来后，一项工作可以沿着这样的路径流动：

提出需求

→ 匹配服务

→ 确认条件

→ 执行工作

→ 验收结果

→ 完成结算

→ 积累信誉

这也是 OKX.AI 更大的价值：它尝试把 Agent 从孤立的软件功能，变成能够进入市场、提供服务和建立长期记录的数字经济参与者。

普通人可以选择三种角色

OKX.AI 目前设计了三种主要角色。

User：提出需求并雇用服务

User 是任务发起者。你可以让自己的 Agent 帮助描述需求、发布任务、筛选服务方和跟进交付。

对大多数人来说，这是最直接的参与方式。你不必先提供技术服务，可以从一个具体任务开始，例如：

整理某个生态近期动态；
对比多个项目的公开链上数据；
寻找合适的研究或内容服务；
监控某类公开信号；
将一个复杂目标拆成可以执行和验收的任务。

ASP：提供可以收费的 Agent 服务

ASP 是 Agent Service Provider，也就是 Agent 服务提供者。

如果你有专业知识、数据、API 或稳定工作流，可以把它整理成两类服务：

A2A（Agent-to-Agent）：适合需要协商范围、价格和交付标准的复杂任务，费用通过托管机制处理；
A2MCP（Agent-to-MCP）：适合数据查询、价格信息和实用 API 等标准化服务，可以免费，也可以按次收费。

这给个人和小团队带来一种新的可能：不一定先开发一套完整 SaaS，也可以把一项明确能力放进 Agent 市场，让其他 Agent 发现、调用和付费。

Evaluator：评估交付并处理争议

Evaluator 负责判断任务是否按照约定完成。

Agent 经济不能只解决“谁来做”，也必须解决“怎样算做完”。任务标准、结果核验和争议处理，是市场建立信任的重要部分。

x402 为什么重要

如果 Agent Marketplace 解决的是“去哪里找服务”，那么 x402 解决的是“机器怎样为一次服务调用付款”。

x402 的名称来自 HTTP 状态码 402 Payment Required。它把支付要求放进机器可以理解的网络请求流程中：

Agent 请求服务

→ 服务端返回付款要求

→ Agent 在授权范围内完成支付

→ 服务端返回数据或结果

传统互联网支付通常围绕人设计：注册账号、购买订阅、跳转收银台、输入验证码。Agent 更需要可编程、细粒度的支付方式。

例如，一个 Agent 为完成研究任务，可能只需要：

查询一次实时数据；
购买一份地址风险报告；
调用一次图像处理；
获取一项标准化分析结果。

它不必购买整套软件或长期订阅，只为当前任务需要的能力付费。

根据 OKX.AI 的 ASP 指南，收费的 A2MCP 服务端点需要支持 x402，官方推荐使用 OKX Payment SDK。与此同时，Onchain OS 也提供原生 x402 支持，并在 X Layer 上支持 0 gas 支付场景。

因此，x402 不只是一个支付功能。它让 Agent 之间的专业分工有了商业基础：一个 Agent 接到任务后，可以购买其他 Agent 或 API 的能力，再组合成最终交付。

Onchain OS：把链上能力提供给 Agent

OKX 对 Onchain OS 的定位是：Built for AI. Ready for Web3.

它将 Agentic Wallet、支付、交易和 AI Toolkit 放进同一套体系，并提供三种接入方式：

Skills/CLI；
MCP；
Open API。

官方页面显示，AI Toolkit 目前包含 9 个 Skills、72 项能力，覆盖代币检查、市场监控、风险检测、交易与转账、链上广播等场景。

可以简单理解为：

OKX.AI

负责市场、任务、身份、信誉与结算

Onchain OS

负责让 Agent 获取和使用链上能力

x402

负责标准化服务的机器支付

真正的门槛，是先获得一个可以持续工作的 Agent

OKX.AI 的官方加入流程要求参与者先准备一个 Agent，并安装 Onchain OS Skills。

开发者可以自己安装框架、配置钱包和运行环境。但对普通用户来说，这往往不是合适的第一步。一个能够参与生态的 Agent 还需要持续在线、保存上下文、使用工具，并在重要操作时通知用户确认。

这正是 ClawMama 的 OKX.AI OnchainOS Base Agent 所降低的门槛。

用户可以直接从一个现成可用、持续在线并预置 Onchain OS Skills 的 Agent 开始。整个过程不必从研究框架开始，而可以从一句普通的话开始：

我想了解自己适合怎样参与 OKX.AI。

Agent 可以进一步帮助用户：

理解 User、ASP 和 Evaluator 的区别；
了解 Agent Marketplace 和 Task Marketplace；
完成 Agentic Wallet 与身份注册引导；
把个人知识或业务能力整理成服务；
判断适合 A2A 还是 A2MCP；
理解 x402 收费端点的要求；
在钱包、支付和交易等关键操作前请求人工批准。

这改变了普通人的参与顺序。

传统路径是：

学习框架

→ 安装环境

→ 配置工具

→ 创建 Agent

→ 开始寻找用途

更低门槛的路径是：

从聊天表达目标

→ 获得一个可以工作的 Agent

→ 完成第一个真实任务

→ 找到可重复的工作流

→ 再决定是否提供服务

一个人也可以经营一组数字服务

OKX.AI 首页提出了 “One person. One company.” 的愿景。

它真正值得关注的地方，不是承诺 Agent 会自动创造一家成功公司，而是 Agent 正在降低个人组织和销售数字服务的成本。

研究者可以把分析方法整理成服务；开发者可以提供按次调用的数据或工具；设计师可以接受 A2A 任务；行业专家可以把自己的检查清单和判断流程变成可重复的工作流。

当 Marketplace、Agent Identity、Onchain OS、x402、托管和信誉系统被连接起来，一个人就有机会管理多个能够持续提供服务的数字执行单元。

从一个真实需求开始

加入 Agent 经济，不必从学习开发开始。

更实际的第一步是选择一件具体的事：发布一个研究任务、整理一项可以重复交付的能力，或者了解某种服务如何被 Agent 调用和付款。

OKX.AI 提供市场、身份、支付、信誉与结算基础设施。ClawMama 则提供一个更接近普通用户的入口：直接获得一个已经准备好的 Agent，通过聊天开始参与。

Agent 经济的第一步，不一定是创建 Agent。

也可以只是把一件真实的事说清楚，然后让一个可以工作的 Agent 帮你继续往前走。

从这里开始

想直接体验这条路径，可以打开 ClawMama 的 OKX.AI OnchainOS Base Agent。相关 Skills 已经预置，你可以从聊天开始了解角色、整理任务，并逐步参与 OKX.AI 生态。

钱包、交易、转账、兑换、DeFi、支付、质押、注册和仲裁等操作应保留人工确认。不要在聊天中提交私钥、助记词或未经保护的 API 密钥。本文不构成投资建议。

Insurance Might Be the Most Underrated AI Agent Wedge in YC 2026

Eli — Thu, 09 Jul 2026 09:53:27 +0000

AI founders love the glamorous agent stories: coding agents, sales agents, AI doctors, AI lawyers. But if you dig through the YC 2026 batch data, one of the more interesting signals is decidedly unglamorous: insurance.

Out of 477 real-ish company records in the current snapshot, 25 match insurance-related keywords — about 5.2% — and 8 companies sit in the Fintech → Insurance subindustry. Not a tidal wave. But it's enough to suggest something worth paying attention to: insurance is quietly becoming one of the better wedges for AI agents that actually ship.

The reason is simple. Insurance is wall-to-wall documents, rules, judgment calls, exceptions, approvals, claims, underwriting, and cross-system coordination. In other words: wall-to-wall work that agents can do and humans hate doing.

Insurance is not fintech's leftover category

Most people file insurance under "slow fintech": aging distribution, legacy systems, long processes, heavy regulation. From an AI builder's perspective, that list of flaws reads more like a list of opportunities.

Insurance workflows are highly structured — but not fully structured. Policies, claims files, medical records, photos, repair estimates, payout history, compliance clauses: the inputs are messy and heterogeneous. Yet every step has a crisp objective: is this covered, what documents are missing, how should this risk be priced, can this pass approval.

That's not a chatbot problem. It's an agent problem — reading documents, following procedures, calling systems, leaving audit trails, handling exceptions. And precisely because it's complex, insurance is more likely to command real budget than yet another AI writing tool.

Agents die without boundaries; insurance comes with them built in

The most common failure mode for early agent products: they sound like they can do everything and end up doing nothing well. Insurance workflows hand you boundaries for free:

Inventory and asset processes can be automated end to end
Medical prior authorizations can be assembled and submitted
Claims can start with document verification and status progression
Underwriting can start with extraction, rule matching, and risk flagging

The batch has concrete examples. InventoryQuant's one-liner is "We automate the inventory process in insurance." ClaimGlide's is "AI automated prior-auths for private medical practices." Neither is a vague "enterprise AI assistant" — each cuts in through one specific workflow.

The payoff of a wedge like this is that ROI is legible: fewer documents handled manually, days shaved off a cycle, fewer human reviews, fewer disputed denials. Buyers understand it, and they'll pay for it.

Document-heavy industries are the agent home turf

Zoom out and insurance stops looking like an isolated niche. Across the same 477 records, the keyword screens show:

email / calendar / docs: 71 companies (14.9%)
legal / compliance: 62 companies (13.0%)
healthcare clinical/admin: 68 companies (14.3%)
data / eval / observability / verification / analytics: 167 companies (35.0%)

Put those together and insurance is one piece of a much larger pattern: document-dense, rule-dense, verification-dense work.

The first wave of AI hype was "replacing creativity." The more realistic landing zone is compressing administrative friction — and insurance sits at the exact center of it. Customers want fast payouts, carriers want risk control, regulators want explainable processes, and internal systems are old and fragmented. A good agent here doesn't write pretty copy; it turns a pile of chaotic material into a processable queue.

Why insurance AI wins from the back office first

Don't expect insurance AI to reinvent underwriting on day one. The likelier path starts with small back-office cuts: document collection, form filling, email follow-ups, clause comparison, first-pass review, anomaly flagging, status syncing.

None of this is strategic. All of it eats hours every day. And it suits early-stage startups perfectly: land in one department, solve one pain point, own one clear metric, then expand into adjacent workflows.

That's what a wedge means. You don't swallow the carrier — you become the automation colleague one team can't work without. Once the agent owns the documents, the rules, and the operational record, it has a shot at graduating from "assistant" to "system layer."

The hard parts are the moat

To be clear, insurance is not easy money. Data privacy, regulatory requirements, liability boundaries, system integration, and enterprise procurement all stretch the sales cycle. Error costs are real: one bad call can affect a payout, a compliance posture, or a customer relationship.

But that difficulty is exactly what protects a good product. The heavier the rules, the deeper the processes, the messier the legacy stack — the harder it is for a general-purpose model to flatten the category. Durable value comes from workflow know-how, data pipelines, audit capability, and industry integrations. The winners in insurance AI won't necessarily be the teams with the strongest models. They'll be the teams that understand the workflow best, embed deepest, and prove results most convincingly.

Underrated because it's unphotogenic

Insurance will never go viral like consumer AI, and it doesn't make for good demo videos like robotics. But the best startup opportunities are rarely the prettiest ones — they're the hardest to displace.

Twenty-five insurance keyword hits isn't a wave. It's an early marker of where the tide is going. For AI agents to move from demos to revenue, they have to enter industries with real budgets, repetitive workflows, and well-defined error costs. Insurance checks all three.

So instead of asking "will AI disrupt insurance," ask the sharper question: which insurance workflow gets taken over first by a small, focused agent?

The answer probably isn't on center stage. It's buried in a stack of PDFs, emails, spreadsheets, and rule manuals that nobody wants to read.

Data notes

The numbers above come from a current snapshot of ExploreYC and YC Startup Directory public data, covering the Winter, Spring, Summer, and Fall 2026 batches — the Summer and Fall batches may still be incomplete. The raw export contains 478 records; after excluding one obvious test entry, keyword stats use 477 real-ish records. Keyword screens are heuristic and coarse, and matches can overlap. This is research and analysis, not investment advice.

Every slice in this post came from the same dataset. If you want to run your own cuts — by batch, by industry, by keyword — the ExploreYC Startup Research Agent does exactly that; there's a walkthrough of how it works on the ecosystem page, and it runs on ClawMama.

San Francisco's Gravity Is Back: 366 of 477 YC 2026 Startups Are in One City

Eli — Thu, 09 Jul 2026 09:52:58 +0000

If you could pick only one counterintuitive number from the YC 2026 batches, make it this one: out of 477 real-ish company records, 366 list San Francisco as their location — roughly 77%.

For comparison: New York City has 24. London 10. Boston 7. Los Angeles 4. Fully remote? 3 companies. Even if you add the 11 tagged "San Francisco + Remote", the conclusion doesn't budge: AI startups aren't spreading across the map. They're re-concentrating in one city.

This isn't Bay Area nostalgia. It's industry structure casting a vote.

Remote won work. It didn't win startup density.

One of the most popular takes of the past few years: software teams can start anywhere, so companies no longer need the Bay Area. That take wasn't entirely wrong — tooling, cloud services, open models, and online fundraising genuinely lowered the barrier to starting a company.

But the YC 2026 location data is a reminder that a lower barrier is not the same as a vanished advantage.

Building an AI startup isn't just writing code. It runs on model gossip, talent flow, customer pilots, investor feedback, peer pressure, and extremely fast narrative iteration. Much of that works online. But the densest informal information still travels fastest offline. San Francisco's edge was never the office space — it's collision frequency.

AI made same-city learning matter again

In the classic SaaS era, most domain knowledge came from customers and product cycles were relatively stable. You could build a vertical software company in any city and grind toward PMF at your own pace.

The AI era doesn't work like that. Model capabilities turn over every few months. Agent architectures keep getting rewritten. Inference costs, context windows, voice, tool calling, and eval infrastructure are all on rolling release. A seemingly minor technical shift can redraw your product's boundaries overnight.

In that environment, whoever hears real feedback earlier, learns earlier what others tripped over, and understands earlier what investors and customers are actually buying — saves themselves three months of wrong turns. Three months is nothing in ordinary software. In AI, it can be an entire product generation.

The network effect compounds itself

The sheer size of the 2026 cohort strengthens the pull: 478 raw records across Winter (201), Spring (198), Summer (75), and Fall (4). When hundreds of companies orbit the same startup network, geographic concentration reinforces itself — founders meet in person, dogfood each other's products, trade candidates, refer customers, and benchmark fundraising pace. Every node raises the city's network density.

This isn't Silicon Valley mythology. It's the most ordinary network effect in any market: more nodes, more connections; more connections, faster information; faster information, more concentrated opportunity.

Three reasons AI re-centralizes

The internet decentralized distribution. AI is pulling startups back toward the center, for three reasons:

Talent is scarcer. People who simultaneously understand models, product, infrastructure, and an industry's workflow are still rare. Dense cities make assembling that team far easier.
Fundraising is more narrative-driven. AI companies constantly have to explain a fast-changing future, and face-to-face is still the fastest way to build that trust.
Customer pilots need social proof. Many AI products aren't "buy a seat and go" — they touch data, change processes, and carry risk. A referral and a case study from inside the same network cut the cost of trying dramatically.

So AI hasn't inherited the remote-software playbook. It looks more like a new gold rush: you can buy the tools remotely, but the miners still cluster at the mine.

What if you're not in San Francisco?

None of this means you can't build an AI company elsewhere. New York has finance and enterprise buyers. Boston has research and healthcare. London has finance and the European market. Manufacturing and energy have their own geographic centers.

But if you're building horizontal agents, developer tools, model infrastructure, or anything else in the startups-selling-to-startups category, San Francisco's density becomes a hidden competitive advantage for whoever has it. You don't go because the people there are smarter — you go because the feedback is faster, the noise is louder, and the comparisons are more brutal. For an early-stage company, brutal is sometimes good: it forces you to admit sooner that you don't have PMF, and to discover sooner what people will actually pay for.

Gravity is back — but it's not a permanent title

San Francisco getting stronger again doesn't mean it wins forever. A geographic center has never been a moral award; it's an efficiency outcome. The moment another city assembles a denser loop of AI talent, customers, and capital, the gravity moves.

But in the YC 2026 snapshot, at least, the answer is clear: AI didn't flatten the map — it re-elevated certain places. San Francisco isn't the only entrance, but it's once again the strongest one.

The takeaway isn't "move to San Francisco." It's this: in a technology cycle that changes this fast, information density is itself a moat.

Data notes

This analysis is based on a current snapshot of public data from ExploreYC and the YC Startup Directory, covering the Winter / Spring / Summer / Fall 2026 batches (201 / 198 / 75 / 4 companies respectively). The Summer and Fall batches are likely still incomplete. The raw export has 478 records; the geography analysis uses 477 after excluding one obvious mock/test record. Location fields depend on directory syncing and self-reporting — 31 records have an Unknown location. Figures will shift over time, and none of this is investment advice.

The location slices in this post came from the ExploreYC Startup Research Agent — an agent that queries YC company data by city, batch, industry, or keyword, so you can cut the dataset yourself instead of taking my word for it. There's a write-up of how it's built on the ecosystem page, and more agents like it on ClawMama.

Defense Went From Taboo to Product Category: What YC's 2026 Data Shows

Eli — Thu, 09 Jul 2026 09:52:30 +0000

If your mental model of Y Combinator is still collaboration software, payment tools, and consumer apps, the 2026 batch data will feel slightly off-script: out of 477 real-ish company records, 41 companies (8.6%) match defense / security / safety keywords, and the Industrials → Defense subindustry alone has 12 companies.

This isn't a "defense tech is suddenly trendy" hot take. The more accurate framing: AI startups are moving from on-screen productivity tools back toward real-world security, supply chains, manufacturing, and state-level capability.

From taboo to product category

For a long time, Silicon Valley kept an awkward distance from defense. You could build it, but you didn't lead with it. You could raise for it, but it didn't go on the homepage.

The 2026 numbers suggest that psychological barrier has dropped. Out of 478 raw records, Industrials hit 68 companies, about 14% — up from 6% in 2021 and 2022, and just 2% in 2023. And Industrials is no longer just "cool hardware" or slick robot demos: it includes 22 Manufacturing and Robotics companies and 12 Defense companies.

Defense is shifting from a values debate to an application category.

AI makes dual-use the default

Dual-use isn't new. Satellites, drones, materials, communications, and cybersecurity have always served both commercial and defense markets. But AI blurs the boundary much further:

An autonomous system that inspects power grids can also patrol borders.
A sensor-fusion platform built for industrial safety maps directly onto battlefield awareness.
A supply-chain agent that schedules production for manufacturers can also find alternative suppliers for critical components.

That's why the 41 keyword matches aren't simply a count of "defense companies." They signal that safety is becoming a foundational narrative: models need to be reliable, systems need to be verifiable, robots need to work in non-ideal environments, and infrastructure needs to withstand risk.

Most of these startups won't introduce themselves as defense companies. They'll say robotics, autonomous, security, verification, industrial operations. Once the product goes deep enough into the physical world, defense and public safety show up in the customer list on their own.

The real world needs AI more than SaaS does

The wider context in the 2026 data: Consumer is down to 20 companies (~4%), B2B is 292 (~61%), and Industrials is 68 (~14%). On the keyword side, robot / drone / autonomous / hardware matches 103 companies — 21.6% of the batch.

The main battleground of AI startups is leaving the chat box.

The last generation of software companies sold "a better back office." This generation of AI companies sells "fewer people, faster actions, better judgment." Put that capability in a CRM and it's sales efficiency. Put it in a warehouse and it's inventory turnover. Put it on a drone or a robot and it's real-world operational capability — and the customers most willing to pay for operational capability are rarely consumers. They're enterprises, governments, industrial operators, and security-adjacent organizations.

Normalization doesn't mean everyone should build defense

An easy misread: defense heating up does not mean every AI founder should rewrite their pitch deck around national security. Defense and dual-use customers typically mean long sales cycles, heavy compliance, opaque procurement, and feedback loops far slower than commercial buyers.

What it does change is the imagination space. Many teams used to assume the best AI applications were white-collar desktop scenarios: drafting emails, filling spreadsheets, summarizing meetings. The 2026 data shows AI agents, industrials, robotics, energy, CAD, supply chain, and security all appearing as high-frequency themes at the same time. Together they point to one trend: AI isn't just helping people think — it's helping systems act.

And when a product starts acting, responsibility grows. When responsibility grows, safety stops being a feature and becomes the core selling point.

The new consensus: capability first

The normalization of defense and dual-use is really a narrative switch in Silicon Valley — from "software is eating the world" to "AI is rebuilding capability."

If a team can make unmanned systems more reliable, factories less prone to downtime, critical infrastructure more secure, or high-stakes decisions auditable, it can plausibly be counted as dual-use. That circle will keep getting wider — and more ordinary.

So 8.6% isn't an endpoint. It's a signal that safety, resilience, autonomous systems, and industrial capability are becoming mainstream grammar for AI startups. Short term, expect more policy, ethics, and procurement controversy. Long term, it forces founders to answer a much harder question: can your AI actually bear consequences in the real world?

Data notes

This analysis is based on a current snapshot of public data from ExploreYC and the YC Startup Directory, covering the Winter / Spring / Summer / Fall 2026 batches (201 / 198 / 75 / 4 companies respectively). The Summer and Fall batches are likely still incomplete. The raw export has 478 records; percentages use 477 after excluding one obvious mock/test record. Keyword screens are coarse heuristic matches and may overlap. Figures will shift as the directory syncs, and none of this is investment advice.

The slices in this post came from the ExploreYC Startup Research Agent — an agent that queries YC company data by batch, industry, or keyword so you can cut the dataset yourself instead of taking my word for it. There's a write-up of how it's built on the ecosystem page, and more agents like it on ClawMama.

Stablecoins Are Back in YC 2026 — and This Time They Look Like Plumbing

Eli — Thu, 09 Jul 2026 09:52:03 +0000

If your mental model of crypto startups is still profile pictures, token communities, and wallet wars, the YC 2026 batch will surprise you — mostly by how boring it is. And boring, here, is the interesting part.

The numbers: 37 companies, 7.8% of the batch

In the current ExploreYC snapshot of YC 2026 (477 companies after excluding one obvious test record), a rough keyword screen for stablecoin / crypto / web3 hits 37 companies — about 7.8% of the batch.

That's not a dominant share, and the count matters less than the composition. The words attached to these companies aren't "next-gen social" or "community." They're payments, accounts, settlement, and on/off-ramps. Fewer narratives, more pipes.

From narrative assets to financial infrastructure

Last cycle's Web3 ran on consumer imagination: NFTs, DAOs, on-chain games, social identity. Plenty of energy, but many of those apps had neither durable cash flow nor a recurring need behind them. Users arrived fast and left faster.

This stablecoin wave answers a different question. Not "do I want an on-chain identity?" but "can money move faster, cheaper, and more globally?" That's unglamorous — and startup history is full of huge outcomes grown in unglamorous soil: payments, reconciliation, accounts, compliance, settlement.

Two examples from the batch:

SpotPay — one-liner: "Stablecoin Global Bank Account"
Unifold — multi-chain deposit and payment infrastructure

Neither is trying to convince consumers to open one more app every day. Both are trying to become the underlying interface that money moves through.

The strongest stablecoin products won't feel like crypto

The clearest sign of maturity may be this: users no longer need to understand the blockchain. A business cares about a short list — can I get paid, how fast does it settle, what are the fees, can I reconcile it, how does compliance work, and what happens when a transfer fails.

Done well, stablecoin infrastructure won't even be marketed as Web3. It ships as global accounts, vendor payouts, cross-border settlement, payroll, merchant acquiring, finance automation. The chain runs in the back; the front is dashboards and buttons a finance team already understands.

That's also why "infrastructure comeback" is a more credible thesis than "consumer revival": consumers owe no loyalty to a technology's ideals, but businesses will absolutely change their processes for cost, speed, and reach.

Cross-border payments are the natural battleground

The problems with traditional cross-border payments are old: slow, expensive, opaque, too many correspondent banks, plus holiday and regional constraints. For distributed teams, remote hiring, cross-border supply chains, and small exporters, that friction is a straight operating cost.

Stablecoins' advantages are equally direct: transfers run 24/7, settlement is fast, reach is global. But the hard part was never the transfer itself. It's the ramps and everything around them:

fiat on-ramps and off-ramps, bank connectivity
compliance review, risk controls, tax handling
reconciliation, refunds, customer support

So the opportunity isn't "ship another wallet." Wallets are thin; infrastructure is thick. Whoever connects on-chain money to real-world financial systems sits closest to the center of the business.

Multi-chain isn't a flex — it's customer de-risking

Directions like Unifold's reveal another shift: customers don't want to bet on a single chain. Businesses care about stability, coverage, and failure handling, not loyalty to a technical community.

Multi-chain support that just means "we list many networks" is worth little. But if it lets a merchant automatically route across assets, networks, regions, and payment paths to the best option, it becomes real infrastructure. Customers want the money to arrive; they don't want to join an ecosystem debate.

This pushes crypto startups from community-driven to operations-driven. The last cycle rewarded narrative, tokens, and growth flywheels. This one rewards licenses, risk management, banking partnerships, API reliability, and a finance-grade user experience.

7.8% is small. The signal is hard.

Thirty-seven companies at 7.8% doesn't mean crypto took over YC 2026. The restraint is the point: stablecoins came back without trying to swallow every consumer use case, aiming instead at high-frequency needs inside financial infrastructure.

In the same snapshot, 44 companies (~9%) carry the Fintech tag and 292 (~61%) carry B2B. Stablecoin startups will most likely overlap both — they're fintech and enterprise software at once. The buyer isn't a retail believer in the future; it's a company that needs better money movement.

That's what makes this wave worth watching: it doesn't prove itself with hype. It proves itself with settlement times, fee rates, delivery success rates, and compliance capability.

The next Web3 may not look like Web3

If stablecoins truly become infrastructure, their success will erase the "crypto feel." Users see a global account. Finance sees automated reconciliation. Vendors see faster payment. Developers see a payments API. The chain becomes a settlement layer, not the hero of a marketing poster.

That's not Web3 retreating after failure — it's a technology maturing into quiet. A lot of infrastructure only truly goes mainstream once people stop talking about it.

Stablecoins are back. They just traded the consumer-internet costume and the grand narrative for something quieter and harder: financial plumbing.

Data notes: This analysis is based on the current snapshot of ExploreYC and YC Startup Directory public data. YC 2026 spans the Winter, Spring, Summer, and Fall batches; Summer and Fall may still be incomplete. The stablecoin/crypto/web3 keyword screen is coarse — themes overlap and some companies may be missed. This is research commentary, not investment advice.

The batch and keyword cuts in this post came from the ExploreYC Startup Research Agent on ClawMama — you can slice the same dataset by batch, industry, or keyword yourself. More on how the integration works on the ecosystem page.

Healthcare AI's First Stop Isn't the Doctor — It's the Paperwork (YC 2026 Data)

Eli — Thu, 09 Jul 2026 09:51:35 +0000

Ask people what "healthcare AI" means and most will describe an AI doctor: a model that diagnoses disease, reads scans, replaces clinical judgment. Look at the actual YC 2026 batch data, though, and the founders building in healthcare are betting on something far less cinematic — and far more fundable: paperwork.

The numbers: 8% labeled, 14% in practice

Across the 478 companies in the current ExploreYC snapshot of YC 2026, only 40 carry the Healthcare industry tag — about 8%. But run a rough keyword screen for clinical and administrative healthcare themes and you hit 68 companies, or 14.3%.

That gap is the story. Healthcare AI isn't confined to the "Healthcare" label anymore. It's leaking into billing, prior authorization, documentation, life sciences tooling, and back-office operations — categories that file under B2B or vertical SaaS but live entirely inside the healthcare workflow.

The biggest pain in healthcare isn't medical

Healthcare sounds high-tech. Day to day, it runs on low-tech friction: chart prep, insurance claims, prior authorizations, billing codes, compliance documents, clinical trial paperwork. What burns out physicians, clinics, and back-office teams usually isn't uncertainty about the medicine — it's a system that demands every step be written down, justified, and formatted correctly.

That's why the 2026 healthcare cohort looks unglamorous on purpose:

Overdrive Health — AI-native medical billing services
ClaimGlide — automated prior authorizations for private practices
Ritivel — an AI-native platform for life sciences documentation
Rhizome AI — an agent platform for life sciences

None of these replaces a doctor. All of them replace the documentation swamp surrounding one.

Prior authorization is a near-perfect wedge

If you were designing an ideal entry point for AI in healthcare, you'd invent prior auth:

It's painful enough. Clinics must prove to insurers that a treatment, test, or drug is "necessary" — endless document requests, rule lookups, and form filling.
It's structured enough. There are medical records, billing codes, published insurance policies, and explicit status feedback loops.
It's valuable enough. Slow authorizations delay revenue recognition, degrade patient experience, and stall treatment timelines.

The AI doesn't need to be a genius clinician. It needs to do three things: read the source material, match it against the rules, and produce a submittable document. Success is verifiable — the auth either goes through or it doesn't. Compared to open-ended diagnosis, this is closer to an industrial task with clear boundaries and computable ROI. That's healthcare AI realism: start with auditable administrative labor.

Billing and coding: the hidden river of cash flow

Medical billing looks like a back-office function, but it sits directly on the money. A wrong code, a missing document, or a slow submission means delayed or lost revenue for a clinic. The US system is especially gnarly — insurers, providers, patients, and third-party systems pass information back and forth, and any single field can become the blocker.

So AI billing isn't "auto-fill the form." It has to understand clinical records, insurance rules, payment flows, and exception handling — and it has to be reliable, because errors here aren't UX blemishes. They're denials, delays, and compliance exposure measured in real dollars.

This also explains the sequencing. Clinical decision-making carries heavy regulation, high risk, and murky liability. Documentation, billing, and authorization carry risk too, but they can be deployed incrementally, with humans reviewing outputs inside existing processes. The administrative layer is where healthcare AI gets to prove itself first.

Documentation isn't overhead — it's the production system

In life sciences and clinical settings, documentation is routinely underestimated by outsiders. Lab records, compliance narratives, research files, regulatory submissions, quality processes — these determine whether a team can advance a project, pass an audit, or reproduce a result. Documentation is part of the production system.

Companies like Ritivel signal that AI is moving into the knowledge infrastructure layer of healthcare and biotech: not a patient-facing app, but a working foundation for professional teams. Unsexy in the short term, load-bearing in the long term.

And platforms like Rhizome AI point at the likely shape of the endgame: healthcare AI probably won't arrive as one super-doctor. It'll arrive as a fleet of embedded assistants — one handling documentation, one handling retrieval, one handling compliance checks, one handling internal handoffs.

Trust beats demos

Healthcare doesn't hand critical judgment to a black box. A startup whose pitch is "our model summarizes charts" will slam into procurement, compliance, liability, and integration walls fast. The real opportunity is placing AI inside a specific workflow, keeping humans in final control, and stripping out the low-value labor.

Put differently: the first commercial wave of healthcare AI isn't "the AI doctor sees you now." It's "the AI pulls the doctor out of the forms." Less sci-fi, more real business. Healthcare doesn't lack smart people — it lacks time, patience, and an execution layer that can run complex rules end to end. Whoever thins out the paperwork first earns the right to talk about deeper medical intelligence.

Data notes: This analysis is based on the current snapshot of ExploreYC and YC Startup Directory public data. YC 2026 spans the Winter, Spring, Summer, and Fall batches; Summer and Fall may still be incomplete. Keyword screens are coarse and themes overlap. This is research commentary, not investment advice.

The batch and category cuts in this post came from the ExploreYC Startup Research Agent on ClawMama — you can slice the same dataset by batch, industry, or keyword yourself. More on how the integration works on the ecosystem page.

YC 2026 Is 61% B2B — That's Not a Rebound, It's the New Default

Eli — Thu, 09 Jul 2026 09:51:07 +0000

If you remember one number from YC's 2026 batches, make it this: 61% of the companies are B2B. Out of 478 companies in the current snapshot, 292 sell to businesses. Consumer? 20 companies — about 4%.

A decade ago, YC was shorthand for consumer internet. In 2026 it reads more like a map of enterprise workflows. Here's what the data says, and why I think it marks a structural shift rather than a hot sector.

61% is not a rebound — it's the new baseline

Look at the five-year trend for B2B share of each YC year:

Year	B2B share	Consumer share
2021	47%	11%
2022	47%	9%
2023	68%	6%
2024	63%	10%
2025	66%	7%
2026	61%	4%

B2B jumped from 47% to 68% in 2023 and has held above 60% for four straight years. That's not a cycle — it's a regime change. And consumer's curve runs the opposite direction, ending at its lowest point in the series.

The economics are blunt: consumers will try a new app but won't necessarily pay; enterprises move slowly, but once your product is embedded in a workflow, the money, data, permissions, and renewals all live inside it. Post-generative-AI, the default startup idea is no longer "a better interface for users" — it's "eat a specific job, a specific process, a specific system integration."

B2B no longer means "selling SaaS"

The 2026 sub-industry breakdown shows how wide this wave runs:

B2B (general): 97 companies
Infrastructure: 62
Engineering, Product & Design: 39
Productivity: 17, Operations: 15
Security: 10, Marketing: 9, Finance & Accounting: 8, Legal: 8, Supply Chain & Logistics: 8

This isn't a fresh crop of CRMs. It's a map of the inside of a company — code, procurement, compliance, finance, support, logistics, data observability — with AI-native tools growing at every node. A few one-liners from the batch make the pattern concrete:

Canary — "the first AI QA engineer that understands your code"
Rubric AI — reasoning and verification infrastructure for AI
Carrot Labs — track and attribute AI spend across every provider
Pollinate — AI agents for the supply chain

None of these are selling a smarter chat box. They're selling capacity a specific department can put to work.

Software is turning from tool into employee

The keyword data is even more telling. Across the 477 real companies (one obvious test record excluded), 226 — 47.4% — match terms like agent, copilot, operator, or assistant in how they describe themselves. Nearly half the year is packaging its product as a job, not a tool.

Old SaaS was a tool: you log in, click buttons, fill forms, export reports. The new wave behaves like a role: it makes the calls, drafts the documents, reviews the contracts, runs QA, processes approvals, flags anomalies. What the buyer purchases isn't seats — it's the automation of a category of repetitive labor.

This is also the deeper reason B2B share climbed. The first place AI creates provable value isn't open-ended creativity; it's closed loops: clear inputs, verifiable outputs, quantifiable labor costs, bounded error. Enterprises are full of exactly those loops.

Two people can now take on a vertical workflow

One more number matters: among companies with known team size, the median team is 2 people (average 3.1). There are 205 two-person teams and 34 solo founders. The B2B surge did not bring back sales-heavy, implementation-heavy org charts.

The opposite happened. AI lets tiny teams wedge into an industry crevice: find one workflow that's frequent, painful, and clearly budgeted, then go deep with models, data connections, and orchestration. Two people used to struggle covering product, sales, support, and delivery at once; with those internal capacities amplified, small teams now iterate faster than big ones.

Which explains why horizontal platforms feel less exciting than vertical wedges this year. Insurance inventory audits, PE deal sourcing, CAD for mechanical engineers — each looks narrow, but each has budget, pain, and headcount it can replace.

A takeover doesn't mean everyone wins

Crowded B2B also means brutal filtering. Enterprise buyers don't pay because you used AI. They ask: can you plug into our existing systems, actually reduce headcount-hours, pass compliance, and deliver reliably? When half the batch says "agent," the real differentiation isn't in the demo video — it's in data access, workflow detail, and cost of deployment.

So the honest reading of YC 2026's B2B takeover isn't "enterprise software is a safe lane." It's that the default battleground moved: from capturing attention to capturing workflows, from building apps to building roles, from selling interfaces to selling outcomes.

In 2021 the founder question was "will users open it every day?" In 2026 it's: "will a company hand you a piece of its work?"

Data notes

Source: public data from ExploreYC and the YC Startup Directory, current snapshot.
"YC 2026" covers the Winter, Spring, Summer, and Fall 2026 batches; Summer and Fall data are likely still incomplete (Fall has only 4 records so far).
Raw 2026 count is 478; one obvious test record is excluded where noted, giving 477.
Keyword screens are coarse heuristic matches and overlap is allowed — treat percentages as directional.
This is research and analysis, not investment advice.

Slice the data yourself

Every cut in this post — by industry, by batch, by keyword — came from queries you can run yourself with the ExploreYC Startup Research Agent. It works against the same public dataset, so you can test your own hypotheses instead of taking mine. Background on the integration is on the ecosystem page, and the agent runs on ClawMama.