floworkos

Posted on Jun 12

Flowork Agent — Self-Hosted AI Agent Operating System with Persistent Memory and Built-In Security

#ai #security #devops #selfhosted

The Memory That Learns From Mistakes

Every Flowork agent carries a private SQLite brain with full-text search (FTS5). When an agent runs a task, it stores:

The prompt it was given
The exact error or correction it received
What it learned from that mistake (not shame — education)
When it happened (so it can recall under similar conditions)

If an agent hallucinates stock data and a user corrects it, that correction goes into the brain as a memory. Next time it sees "stock analysis," FTS5 retrieves similar past episodes, and the agent can tune its response. It doesn't memorize the user — it memorizes the domain.

This is not RAG with external documents. This is the agent's own experience, accessible at inference time through a brain.query(question) capability. An agent can even "dream" in idle time — reviewing its own logs and compiling lessons without being prompted.

🎯 Security Without Theater

Flowork does not bolster security theater. It ships a real, built-in security radar:

WASM sandbox — every agent bytecode runs in an isolated WebAssembly runtime (via wazero). No direct OS access. Agents cannot fork processes or read sibling files.
Capability-based grants — before an agent touches the filesystem, calls an LLM, or runs a tool, the kernel checks one grant per capability. You define which agents can call which tools; others get a clean "no."
Frozen kernel guarding — the microkernel code itself is protected. If someone or something tries to patch it at runtime, the Guardian detects tampering and triggers safe-mode. The agent is isolated; the system remains intact.
Built-in scanning arsenal — alongside sandbox & grant checks, you get pluggable security scanners (malware patterns, injection detection, drift monitors) that run before code is deployed as an agent.

No other agent framework I'm aware of ships this stack together. Most agent frameworks run agents as plain Python or JavaScript — no sandbox, no capability model, no guard. You're trusting the agent to behave. Flowork trusts the architecture.

🔌 MCP: Both Client and Server

Flowork implements the Model Context Protocol — both sides.

As an MCP client, your agents can call external MCP servers:

Agent → Flowork MCP client → (connects to) → GitHub MCP server → get file from repo
Agent → Flowork MCP client → (connects to) → Filesystem MCP server → read /data

Drop the MCP server address into an agent's tool config, and the tool becomes available immediately. No recompile. No kernel patch.

As an MCP server, Flowork itself exposes your agents to Claude Desktop or Cursor:

Claude Desktop (MCP client) → Flowork MCP server → call agent A's brain query
                            → call agent B's tools
                            → call scanner X

This flips the relationship: instead of your agents calling the LLM, Claude can query your agents as a knowledge base, and they remember what you taught them.

🧩 117 Tools + Plug-and-Play Modules

Flowork ships with 117 built-in tools across nine command families:

Text — split, join, format, regex, hash, encode/decode
Network — HTTP, DNS, IP geolocation, port scan
Crypto — key generation, signing, encryption (libsodium)
Database — SQLite query, CSV transform
Time — schedule, cron, countdown
AI — LLM completion, embedding, vision (multi-model)
File — read, write, list (sandboxed within agent's folder)
Slack/Discord/Telegram — native channel integrations
Voice — offline STT (Whisper) + free TTS

Everything else is plug-and-play. Drop a new tool folder into tools/, a new agent folder into agents/, a scanner into scanners/, or a new channel binding into channels/ — the kernel auto-discovers it, no restart needed.

Modules are versioned (Git SHA), so you can rollback a broken tool without touching other modules.

🚀 One Binary, No Dependencies

Flowork is written in pure Go 1.25 with no cgo. It compiles to a single static binary for Linux, macOS, or Windows. Drop it on any machine, run ./Flowork_Agent, and you have a full agent OS listening on http://127.0.0.1:1987.

No Python, no Node.js, no Docker.
No service dependencies (no Postgres, no Redis — everything is SQLite or in-process).
No telemetry. No phoning home. Your agents' thoughts stay in your folder.
Works fully offline (or with local Ollama, or with your own private LLM).

📡 Connectors: Telegram, Discord, Slack, WhatsApp, CLI, Web, Cron

Flowork agents listen to multiple entry points at once:

Channel	How It Works	Auth
Telegram	Webhook or polling	BotToken + ChatID
Discord	Webhook	Bot token + channel ID
Slack	Event subscription	Slack App token
WhatsApp	Twilio integration	Twilio credentials
CLI	`flowork ask --agent mybot "query"`	Local (no auth needed)
Web	Dashboard + chat UI on `:1987`	Optional JWT
Cron	Scheduled tasks	Built-in cron parser
MCP	Claude Desktop / Cursor	MCP server on stdlib

A single agent can listen to all of these at once. The same agent logic answers a Telegram user and a Discord server and runs a nightly batch job.

🎭 Self-Protecting & Tamper-Aware

The microkernel is frozen. You write it once; it is never edited after deployment. This is radical simplicity: no configuration sprawl, no runtime patches, no kernel bugs in production.

A Guardian process monitors the kernel's bytecode. If it detects a modification (a Trojan payload, a supply-chain attack, a bug fix gone wrong), it does not crash the system. Instead:

The agent that attempted the patch is isolated (moved to safe-mode).
A log entry is written (so you know what tried to happen).
The rest of the agents continue working.

This is fault isolation without sacrifice of integrity.

🧯 Mistakes → Lessons (Not Shame)

Flowork ships with an educational error model — a redemptive philosophy baked into the architecture. When an agent makes a mistake:

Capture the error — store the exact input, the agent's reasoning, the mistake, and the correction.
Teach in real-time — during the same conversation, the agent references this memory and adjusts.
Learn offline — in idle time, the agent "dreams," reading its own logs and writing abstract lessons so future similar cases are handled better.
Expose the lesson — via brain.query(), an agent can surface what it learned. A human can verify it, correct it, or approve it.

This is not zero-shot prompting. It's not fine-tuning. It's lived experience, stored and retrieval-augmented at inference time, with human oversight built in.

Honest limitation: This works well for domain-specific agents (stock analysis, music trivia, promo routing) where mistakes are domain errors. It does not teach an agent to be honest if its base LLM is dishonest — but it does catch hallucinations when a human corrects them, and it prevents the same hallucination from happening again in similar context.

🏗️ Architecture Snapshot

┌─────────────────────────────────────────────────────────┐
│                 Flowork Microkernel                      │
│  (frozen once, guarded, sandboxing + capability router) │
└────────┬────────┬────────┬────────┬────────┬────────────┘
         │        │        │        │        │
      [Bus]    [loket]  [Guard]  [MCP]  [LLM Router]
         │        │        │        │        │
    ┌────┴────┐   │    ┌────────┬──┴───┬─────┴────────┐
    │ Channels│   │    │Scanner │MCP   │Local/Remote  │
    │(Tele/DC/   │    │Radar   │Srv   │LLM (Claude,  │
    │Slack/WA)   │    │        │      │Ollama, etc.) │
    └─────┬──────┘    └────────┼──────┴──────────────┘
          │                    │
    ┌─────┴────────────────────┴──────────────────┐
    │                                               │
    │        Agent Instances (WASM sandboxes)      │
    │                                               │
    │  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
    │  │ Agent A  │  │ Agent B  │  │ Agent C  │   │
    │  │(folder)  │  │(folder)  │  │(folder)  │   │
    │  │SQLite    │  │SQLite    │  │SQLite    │   │
    │  │brain +   │  │brain +   │  │brain +   │   │
    │  │tools     │  │tools     │  │tools     │   │
    │  └──────────┘  └──────────┘  └──────────┘   │
    │                                               │
    └───────────────────────────────────────────────┘

Each agent folder contains:

my-agent/
  ├── config.toml          # Agent personality, tools, grants
  ├── brain.db             # SQLite FTS5 memory (private)
  ├── tools/               # Custom tools (WASM or native)
  ├── log/                 # Conversation history
  └── schedule.cron        # (optional) Cron jobs

When the kernel restarts, agents resume with their memories intact. No session loss. No reset.

🎯 Real Strengths (and Trade-Offs)

Strengths

True local-first — agents run offline; no cloud dependency.
Memory that compounds — an agent with six months of corrected mistakes outperforms a fresh agent, not because it's fine-tuned, but because it recalls relevant context.
Plug-and-play modularity — break a tool, fix one folder; the kernel and all other agents are unaffected.
Single binary deploy — no DevOps, no containerization, no external service dependencies.
Radical transparency — all code frozen, all module versions tracked, all capability grants auditable.
Built-in security that's not theater — sandbox + capability model + guard, together, by design.

Trade-Offs

WASM sandbox overhead — agents are not as fast as native Python/JavaScript. If you need millisecond-scale performance, this is not your framework. For typical agent workloads (API calls, LLM inference, memory queries), the overhead is negligible.
Smaller module ecosystem (compared to, say, LangChain) — you get 117 built-in tools and plug-and-play extensibility, but not ten thousand community packages. This is intentional: fewer tools, each well-audited.
Educational errors work best in domains with clear ground truth — stock prices, music facts, promo rules. In open-ended creative work, "mistakes" are subjective; the error model is less useful there.
Learning is domain-specific, not general — agents learn from their own experience in their own domain. They do not transfer learning across agents or to a shared model. This is a privacy win but a generalization loss.

🛠️ Getting Started

Clone the repo:

   git clone https://github.com/flowork-os/Flowork_Agent.git
   cd Flowork_Agent

Read the handbook (plain Markdown):

   cat doc/handbook/getting-started.md

Run:

   ./start.sh

(or go run ./cmd/flowork if you want to build from source)

Visit the dashboard:

   http://127.0.0.1:1987

Add a channel (Telegram, Discord, etc.) by pasting tokens into the Web UI.
Create your first agent via the dashboard or CLI, grant it some tools, and talk to it.

🤔 Flowork vs. OpenClaw / Hermes (Same Yard, Different Bets)

Aspect	Flowork	OpenClaw	Hermes
Self-hosted?	✅ Yes, full	✅ Yes (Rust)	✅ Yes (Python)
Memory	Private per-agent, FTS5, recall at inference	Shared context window	Prompt history only
Security	WASM sandbox + capability grants + kernel guard	Native Python (no sandbox)	Python (no sandbox)
MCP Support	Both client & server	Client only	Limited
Single Binary	✅ Pure Go, no deps	Rust binary (smaller)	Python (requires Python)
Offline	✅ Full	✅ Mostly	Needs internet for most LLMs
Plug-and-Play Tools	✅ 117 built-in, hot-load new ones	Manual tool registration	Manual setup
Multi-Agent	✅ Seamless	Supported	Supported
Error Learning	✅ Redemptive, FTS-backed	Mistakes in history	Not explicitly supported

The bet Flowork makes: Simplicity, memory, and security matter more than ecosystem size. One frozen kernel, infinite modules. Agents learn from their own mistakes. Run offline. Own your data.

📝 The Handbook

Start here: doc/handbook/

Getting Started — Install, run, first agent.
Concepts — Microkernel, loket, grants, sandbox.
Agents — Create, configure, tune.
Tools — Built-in tools, write custom ones.
Channels — Telegram, Discord, Slack, MCP, etc.
Memory & Learning — How agents remember & improve.
Security — Sandbox, grants, scanning, guard.
Multi-Agent Orchestration — Agents calling agents.

📖 Further Reading

Educational Errors Blueprint — The philosophy & implementation of mistake-as-lesson.
MCP Spec — If you want to write custom MCP servers.
wazero (WASM runtime) — Under the hood of agent sandboxing.
SQLite FTS5 — How agent memories are indexed & retrieved.

🤝 Contributing & Community

Flowork is MIT-licensed. Contributions welcome:

Bug reports — file an issue.
New tools — write a tool module, open a PR.
Integrations — new channels, new MCP servers.
Docs — the handbook is Markdown; help clarify.

The philosophy is radical simplicity — one frozen kernel, infinite well-audited modules. Keep that in mind when contributing.

🔗 Links

GitHub: https://github.

🔗 Flowork is open source — both products

🤖 Flowork Agent (the self-hosted agent OS): https://github.com/flowork-os/Flowork_Agent
🛣️ Flow Router (the sovereign LLM gateway): https://github.com/flowork-os/flowork_Router

💬 Join the Flowork community on Telegram: https://t.me/+55oqrk75lc43YWE1

DEV Community