DEV Community: Alexandros Stergiakis

slang-workflows: provable multi-agent workflows for Claude Code

Alexandros Stergiakis — Tue, 07 Jul 2026 07:43:36 +0000

Claude Code has moved past improvised subagents: dynamic workflows now let Claude write a JavaScript orchestration script so coordination runs as code, not turn-by-turn. slang-workflows takes that idea one step further and makes it provable. You declare the collaboration — the agents, message routing (stake -> @Agent, -> @all), control flow, convergence, budgets — in a typed .slang file, and a non-LLM executor runs it as a deterministic state machine that enforces the structure rather than just running it.

What the executor enforces

Output contracts — a structural schema (output: { … }) plus a semantic where clause (e.g. where score >= 0 && score <= 100); results are validated and the agent is re-prompted on violation, so it can't fake "done".
Static analysis before you run — validate_workflow catches deadlocks, unknown agent references, and orphaned outputs; a buggy workflow fails loudly up front, not mid-run.
Tool scoping — write_paths / deny restrict which files and tools each agent may touch (an architect that writes the design but not the code), enforced by the runtime, not by a prompt it can ignore.
Provable termination — budgets + per-stake timeouts mean a run always finishes; no livelocks, no hangs.
Diagrams for free — get_topology renders the agent topology as a Mermaid flowchart and get_trace renders the round-by-round execution as a Mermaid sequence diagram (an orchestration script gives you a progress tree, not diagrams).

The model thinks; the executor coordinates

Each slang agent is one Claude Agent SDK session, resumed across turns. The top-level Claude Code session only triggers (run_workflow) and observes (get_workflow_state, get_topology, get_trace) — it never makes a coordination decision. You can even have an LLM generate a .slang on the fly (get_slang_grammar → validate_workflow → run_workflow) and still get deterministic, contract-checked execution — flexibility with guarantees. Runs are synchronous or backgrounded and polled live. The whole executor is a pure state machine, covered by a mock-based test suite, so it's verifiable without spending a token.

Install

Inside a Claude Code session:

/plugin marketplace add shofer-dev/claude-code-slang-orchestrator
/plugin install slang-workflows@shofer-slang-orchestrator

Then run a .slang workflow with run_workflow, and watch it render live via get_topology and get_trace.

Standalone Claude Code plugin — TypeScript, Apache-2.0. Source: github.com/shofer-dev/claude-code-slang-orchestrator

[Boost]

Alexandros Stergiakis — Mon, 06 Jul 2026 20:28:12 +0000

Alexandros Stergiakis

Jul 5

A cheap, persistent memory that learns your repo so your agent stops re-reading it

#ai #mcp #claude #programming

3 min read

A cheap, persistent memory that learns your repo so your agent stops re-reading it

Alexandros Stergiakis — Sun, 05 Jul 2026 15:54:43 +0000

Live Memory: stop re-reading your repo on every Claude Code session

Every Claude Code session re-discovers your codebase from scratch — re-reading files, re-grepping, re-paying for the same context. live-memory runs a separate, cheaper large-context model in a long-running MCP server whose only job is to accumulate knowledge of your repository over time. Your main agent asks it questions through one read-only tool, ask_live_memory, instead of reloading the repo itself.

It learns passively, for free

PostToolUse and FileChanged hooks tee the content of every file your agent reads or edits into the memory, so it learns the code as a side effect of real work — without paying to re-read it — and stays current as the repo changes: an observed edit is authoritative and applied immediately, while an out-of-band change (external editor, git checkout) is flagged stale and re-read on next use. ask_live_memory is the active fallback for anything the passive layer hasn't seen yet.

One singleton, per-workspace, append-only

The server is a singleton that serves every Claude Code session — many agents, concurrently and over time — keying state per workspace (cwd). Its window is append-only between compactions; compaction is a neutral, query-agnostic summarization into a knowledge ledger (a high/low-watermark keeps it rare and batched) — never front-truncation — so older knowledge is distilled, not dropped. A background keep-warm loop holds the KV/prompt cache hot to cut latency and cost.

Provider-pluggable and zero-config

It's provider-pluggable (Anthropic Messages with cache_control, or any OpenAI-compatible endpoint — DeepSeek, gateways) and zero-config: with no API key but a Claude subscription, it uses the subscription OAuth token (auto-refreshed) on Haiku. A two-tier timeout gives the model a budget and returns a best-effort answer before the hard MCP timeout. Everything it can touch is read-only and path-jailed. Human-facing status is a /live-memory-stats slash command, kept off the agent's tool surface.

Does it pay off?

In a claude -p A/B on a real repo, on understanding-heavy work the building (premium) model offloaded ~93% of its codebase-reading tokens to live-memory, cost ~61% less per task and finished ~22% faster — and its cost got more predictable (the without-memory arm occasionally spiraled into long re-reading loops). Honest scope: pure edit/execution work is roughly break-even, and on realistic hybrid (understand-then-edit) tasks the all-in saving is a smaller ~11% per task — this makes understanding cheaper, not typing.

The companion runs on a cheap model — and it can be very cheap

The premium-model saving above is what you keep regardless; the only cost added back is running the small memory model, and that model is pluggable. Default is Haiku, but on our accuracy set (15 Q × 3 reps) deepseek-v4-flash matched — and slightly edged — Haiku (98% vs 91% correct, fewer hallucinations, both perfect on the negative traps) at ~8× lower token price — or point it at a local model for ≈ free. The cheaper the companion, the closer the all-in cost gets to the full premium saving: −25% all-in on Haiku → −57% on deepseek-v4-flash, versus the −61% building-model number.

Install

Live Memory is an HTTP MCP server you run once, plus a plugin that wires it in — start the server first:

# zero-config on a Claude subscription → Haiku (or: cd ../server && pip install -e . && python -m live_memory)
git clone https://github.com/shofer-dev/claude-code-live-memory
cd claude-code-live-memory/deploy && ./install-service.sh

Then, inside a Claude Code session:

/plugin marketplace add shofer-dev/claude-code-live-memory
/plugin install live-memory@shofer-live-memory

Ask your agent a whole-repo question — it'll call ask_live_memory instead of reading files.

Standalone Claude Code plugin — Python, Apache-2.0. Source: github.com/shofer-dev/claude-code-live-memory