I promised an AI agent platform with 3 things fixed. Here's v0.5, delivered.

Tianshu AI — Sat, 11 Jul 2026 15:33:16 +0000

A few months ago I posted a video saying I'd build an open-source,
self-hostable AI agent platform from scratch. No code yet — just three
things I kept hitting with existing agents that I wanted to fix:

You can't see progress. You give an agent a job, walk away, come back — and it's been sitting on "npm or pnpm?" for 20 minutes. Or it died half an hour ago and never told you.
No real isolation. Most agents are single-user. On a shared box, a leftover browser session means one person's task quietly uses another person's account.
No continuity across devices. Start on your phone, you can't pick it up on your laptop.

This post is me keeping that promise. Tianshu
v0.5 ships all three — here's each, running.

Promise 1 — visible progress

A Kanban board backed by a worker pool. The orchestrator drops tasks
onto Ready; workers pick them up, run them, stream their transcript
live, and move them to Done. A task that dies is marked
awaiting intervention — not a silent screen.

(That screenshot is a real run — I dropped two tasks in, a Coder worker
is mid-flight on one of them.)

Promise 2 — real isolation

Every tenant runs in its own Linux sandbox. The default backend is
OpenShell (Docker), which — unlike a Hypervisor.framework microVM —
barely touches idle CPU on Apple Silicon, and ships a network egress
policy: the agent can only reach hosts on an allow-list; anything else
is blocked on the spot and logged.

Files are isolated the same way — one workspace per tenant, browsable in
the UI, persisted across sessions. Every row carries a tenant_id from
the first line of code; this was a hard design rule, not a retrofit.

Promise 3 — continuity across devices

A channel system routes messages to the agent regardless of surface.
WeChat is wired first; more channels are on the way.

The bonus round

Things I didn't promise but built anyway:

Workforce Studio. Your whole agent config — main agent, every worker, the enabled plugin set, prompt blocks — pulled into one Solution you can edit in a three-pane IDE, diff against what's running, export/import as a file, and activate in one click. Config becomes something you version, diff, and share. You manage your agent the way you manage code.

Key-free web search (web_search + web_fetch) — no API keys.
Point-and-click config for the model provider catalog (keys stay masked and server-side), MCP servers, and per-plugin settings.

Try it

npm install -g @tianshu-ai/tianshu
tianshu setup   # a wizard configures a provider, sandbox, etc.

Open source (Apache-2.0), self-hostable: https://github.com/tianshu-ai/tianshu

⚠️ Heads-up: in v0.5 the admin/Settings pages aren't behind an auth
gate yet — run it as a single trusted operator for now. A proper
auth/role gate is next on the roadmap.

If those three pains sound like your week, I'd love your issues — what's
missing, what I got wrong, what to copy from.

Three things AI agents keep getting wrong (and why I'm rebuilding the platform from scratch)

Tianshu AI — Tue, 02 Jun 2026 14:44:47 +0000

TL;DR. Today's AI agents can call tools. The scaffolding for actually running a job — visible progress, real isolation, continuity across devices — is still missing. After hitting the same three pain points enough times, I'm rebuilding the agent platform from scratch. Open source. Self-hostable. Pre-alpha. Build in public, every week. Star github.com/tianshu-ai if you want the v0.1 demo when it lands — or scroll down and tell me which of the three pains hit you the hardest.

The 3-minute video version

If you'd rather read, the rest of the post says the same thing — with a few more concrete examples and three architecture sketches the video skips.

I've been using AI agents for a while

For a year or so I've been using a handful of agent-shaped tools — to do research, run code, automate boring stuff. Different products, similar shape: a chat box, a tool-using model behind it, sometimes a sandbox.

They work. Until you actually try to put one to work.

Three things keep showing up. They aren't the same bug. But every time, I get the same uneasy feeling — this is supposed to be the easy part, and it isn't.

Pain #1 — You think it's running. It's actually waiting for you to talk

You hand the agent a task. You walk away. You come back to make coffee, or take a meeting.

You come back, and it's stuck on a question:

"Should I use npm or pnpm?"

So much for "automatic." I'm just babysitting a robot.

The deeper version of this isn't even the question itself — it's that the agent has no idea which decisions you actually want to be asked about and which ones it should just pick a default and move on. Every CLI flag, every package manager, every "do you want to enable telemetry?" prompt becomes a stop. The agent is technically running. In practice, it's been waiting for you for forty minutes.

If a junior engineer Slack-pinged me every time they had to choose between npm and pnpm, I would not call that "an agent." I would call it "an interview I am running."

Pain #2 — You think it finished. It actually crashed half an hour ago

You give it a research task. Twenty open tabs, summarize, dump in a doc. You go to a meeting.

You come back. Silent screen.

Context overflowed. Or the model timed out. Or some tool errored. Task killed mid-way. No notice. No trail.

You have to reverse-engineer where it got to: open the half-written doc, scroll the chat, guess which subtasks finished, then write a new prompt to drag it back on track. Sometimes it's faster to start over than to figure out what happened. Sometimes — be honest — you only realize it died because you noticed the laptop fan stopped.

The thing that bothers me here is the asymmetry. Modern agents have plenty of internal state — chain-of-thought, tool call traces, intermediate scratchpads. None of that is exposed in a way that survives a crash. When it dies, you get nothing. When it finishes, you get the answer. When it half-finishes? You get a vibe.

A human contractor who disappeared mid-job and didn't text you would lose the contract. We somehow accept this from the agents we pay for.

Pain #3 — Your task. Someone else's account

This one is a category most "personal AI assistant" products quietly skip.

A lot of agent products are built single-user: one machine, one human, one set of cookies. That's a fine assumption if your only target is the solo MacBook owner.

But — the family laptop. The shared workstation in the office. The team's agent that everyone in the channel pokes at. Those get used by two or three people back-to-back. The agent fires up a browser, the browser still has the previous user's session open, and now your "summarize this thread" task is reading email as somebody else.

Or worse: the agent acts. Likes a post. Replies to a DM. Submits a form. Under the wrong identity.

Next morning, somebody at the office asks: "Hey — were you on Slack at 1 a.m.?"

This isn't a hypothetical. It's an account-isolation bug, and once you know it exists, you stop wanting to share an "agent" with anybody.

So… what's the actual hole?

These three aren't the same bug. But they're missing the same thing.

Today's agents can call tools. That's the part LLM products got good at. The thing that's still missing is the scaffolding for actually running a job — the layer that lives between "the model can call this tool" and "a human can leave the room and trust this thing."

I've hit those three pains enough times that the obvious patch — "just prompt better" or "use a different agent" — stopped feeling like a real fix. So I'm rebuilding the platform from scratch.

The scaffolding, I think, is three things.

Idea 1 — Make progress visible

A board. Not a chat log. A first-class plan view that tells you:

where the agent is (which step, which subtask)
where it's stuck (which input it's waiting for, which tool failed)
why it stopped (model finished? error? user cancel? context overflow?)

Imagine a Kanban-shaped surface where every running agent is a column, every step is a card, and the card stays around with its log even after the agent dies. You should be able to glance and know whether to walk away or step in. You shouldn't have to guess.

The unsexy version of this insight is: agents don't need more autonomy. They need a better status bar.

Idea 2 — Real isolation. One workspace per (person, task)

A workspace is the agent's "user" — its own:

browser profile (cookies, login state, extensions)
file root
credentials / secrets vault
tool config

Two jobs by two people on the same machine should run in two workspaces. Two jobs by the same person, where one is "personal Twitter cleanup" and the other is "company OKR draft," should also run in two workspaces. You can share the machine without sharing the identity.

This is unglamorous infrastructure work. It's also why I think the right primitive for an agent platform isn't "session" — it's tenant. Multi-tenancy as a design assumption from day one, not a Pro-tier feature bolted on later.

Nothing crosses workspace boundaries by default. Cookies don't leak. Files don't leak. Identity doesn't leak.

Idea 3 — Continuity across devices

You should be able to:

send the agent a line from your phone on the bus,
pick it up from your laptop at the desk,
ask one more follow-up from a different chat tool entirely,

…and the agent should keep up. Same plan, same workspace, same memory. The chat surface (Telegram, WhatsApp, the project's own web UI, an iMessage thread, a hardware button on a desk gadget) is just a channel — it's not where the agent lives.

That's a stronger claim than "we have a mobile app." It means the agent identity is portable across surfaces, not duplicated per surface. Channels are pluggable; the agent is one thing.

The agent stays put. The channels come and go.

Naming

I'm calling it Tianshu (天枢).

Tianshu is the first star of the Big Dipper — the one that decides where the whole constellation points. The orchestrator. You give it the direction, and it brings workers along to do the actual job.

The workers, eventually, are going to get names too — drawn from Chinese craftsman gods rather than the usual "Worker-1, Worker-2." Lu Ban (鲁班) the master builder for code-generation work. Nüwa (女娲) the creator for synthesis. Xihe (羲和) the sun-charioteer for scheduled / time-bound jobs. There's a small mythology behind it, and I'll write that one up properly later — partly because it's fun, partly because every single AI-agent product I see in English is named like a SaaS startup, and I want this one to feel different.

Open source. Self-hostable. The default is your machine, not someone else's cloud.

What's actually built

Honestly: not much yet, by design.

The architecture is being written down as RFCs (this post is the first half of RFC-001 — the "why" half).
v0.1 demo target is the plan board + workspace isolation — the bare minimum to demonstrate Pains #1 and #3 are fixable.
Code drops will follow each RFC, not lead them.

For the architecturally-curious, here's the one-page v0.1 sketch — channel layer, planner / dispatcher / aggregator main agent, sandboxed workers, tenant-scoped storage:

I'd rather show up with one running pixel than with a glossy landing page and no repo. So this post lives where I am right now: somewhere between "I have opinions" and "I have a binary."

What I want from you

Three real questions, not rhetorical:

1. Have you hit any of those three pains? Which one was worst? I want anti-patterns, not validation.

2. What did I miss? Pain #4. The hole I haven't noticed yet. Especially around eval, observability, or "things go fine in dev, weird in prod" stories.

3. Is there a tool out there that already gets one of these right? I'm not looking for a list of every agent framework — I'm looking for the one piece somebody nailed that I should just copy from instead of reinventing.

Reply, file an issue, DM me, email — whatever works. The goal of this post is to be wrong in a useful way before I write much more code.

Subscribe to the build

GitHub: github.com/tianshu-ai — star it to be the first to see v0.1 drop.
X: @tianshuAIdev — short build-in-public updates.
YouTube: @Tianshu-AI — the 3-minute version of this post lives here.

Devlog every week. Once a demo runs, a video. If those three pains sound like your week — see you in the issues.

I'm Tianshu. See you next week.

DEV Community: Tianshu AI