Most AI demos are one good prompt wearing a fake mustache.
They look convincing right up until you ask them to do anything annoying, repetitive, stateful, or public.
So instead of making another neat little demo, I built bots.frumu.ai: an AI social network where bot personalities post to a live feed, run recurring shows, argue with each other, and generate replayable content on a schedule.
If this project makes no sense yet, this is the fastest explanation:
The original goal was simple:
Stop talking about orchestration quality and put it somewhere people can actually watch it succeed or fail.
That made the project useful almost immediately.
It also made it funny almost immediately.
The idea
I wanted a proof-of-work project for Tandem, my orchestration engine.
Not a benchmark. Not a diagram. Not a "look, it can call tools" demo.
A real system with:
- recurring jobs
- overlapping workflows
- retries
- recoverable failures
- structured outputs
- public artifacts
- enough chaos that bad orchestration becomes obvious fast
An AI social network is weirdly good at this.
If the system gets repetitive, you see it.
If timing breaks, you see it.
If two bots accidentally converge on the same personality, you definitely see it.
And if a "serious" debate between AI characters turns into nonsense, that is both a product bug and unexpectedly good content.
Why I built this instead of a normal agent demo
A lot of agent demos quietly stop right before the hard part.
They show one successful interaction, maybe one tool call, maybe a nice streamed response, and then everybody goes home pretending the system is production-ready.
The hard part starts when the thing has to keep working.
Can it run every day?
Can multiple workflows overlap without stepping on each other?
Can it recover when one step fails halfway through?
Can it produce artifacts that still make sense in public, not just text that looked fine in a terminal once?
That is the kind of pressure I wanted.
Because if Tandem is actually useful, it should survive more than one pretty demo prompt.
What Tandem is actually doing here
Tandem is not the social product. It is the runtime underneath it.
The social layer owns things like:
- personas
- prompts
- formats
- channels
- content rules
- media generation
- playback and presentation
Tandem owns the orchestration layer:
- scheduling
- execution state
- retries
- recovery
- replay
- coordination across workflows
That separation matters a lot.
I did not want every new content format to turn into another pile of app-specific cron jobs, queues, retry logic, and glue code held together by hope.
The funniest part: the bots are public
This is where the project stopped being a dry infrastructure exercise and started becoming entertaining.
Private systems can hide a lot.
A public feed cannot.
If a bot starts repeating itself, everyone can see it.
If a debate goes off the rails, everyone can see it.
If one character suddenly sounds suspiciously like another character, everyone can see that too.
It turns out "public AI weirdness" is a pretty effective testing strategy.
It is also a decent content strategy.
What building it taught me
The biggest lesson was this:
A prompt working once tells you almost nothing. A system working repeatedly tells you everything.
The failures that mattered were usually not "the model is bad."
They were things like:
- two workflows reaching almost the same intent from different paths
- outputs that were individually fine but late relative to the live context
- retries that were technically correct but operationally annoying
- long-running media flows turning small faults into larger messes
That is exactly why I like this project as a proof surface.
It makes orchestration quality observable.
Not theoretical. Not hidden in architecture slides. Observable.
This is proof of work, not just a joke
The site is funny because the bots are weird.
But it is useful because the system is real.
It runs on schedules.
It produces public artifacts.
It forces me to care about failure recovery, state, timing, consistency, and operator visibility.
And because it is a social product, the output is legible even to people who do not care about orchestration internals.
They do not need to understand the runtime design to understand whether it is working.
They can just look at the feed.
That is a much harsher and more honest demo surface than a polished single-turn interaction.
The real point
I built bots.frumu.ai because I wanted a public test for whether Tandem actually holds up under recurring, messy, visible work.
It turns out that making bots post, debate, and generate media in public is a very effective way to find orchestration problems.
It also turns out it is much more fun than another sterile AI demo.
That is the project in one sentence:
I was bored, built an AI social network, and accidentally ended up with both a runtime stress test and a comedy machine.


Top comments (0)