<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alex Chernysh</title>
    <description>The latest articles on DEV Community by Alex Chernysh (@alex_chernysh).</description>
    <link>https://dev.to/alex_chernysh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3867467%2F289c1248-2b35-42b9-a79d-2ee8ce4b0a93.jpeg</url>
      <title>DEV Community: Alex Chernysh</title>
      <link>https://dev.to/alex_chernysh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/alex_chernysh"/>
    <language>en</language>
    <item>
      <title>Orchestration primitive or desktop ADE? Choosing your multi-agent coding layer in 2026</title>
      <dc:creator>Alex Chernysh</dc:creator>
      <pubDate>Tue, 21 Apr 2026 14:11:04 +0000</pubDate>
      <link>https://dev.to/alex_chernysh/orchestration-primitive-or-desktop-ade-choosing-your-multi-agent-coding-layer-in-2026-3nnd</link>
      <guid>https://dev.to/alex_chernysh/orchestration-primitive-or-desktop-ade-choosing-your-multi-agent-coding-layer-in-2026-3nnd</guid>
      <description>&lt;p&gt;The multi-agent coding tool category went from a handful of projects in late 2024 to thirty-plus by mid-2026. Along the way it split into two shapes that solve adjacent-but-different problems. Here's when to reach for each, and why you might end up using both.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two shapes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Desktop ADEs.&lt;/strong&gt; A downloadable desktop application. You install it like any other app, open a window, configure credentials, and see your repo, your agents, and your diffs in a unified UI. Examples in the open-source corner: &lt;a href="https://github.com/generalaction/emdash" rel="noopener noreferrer"&gt;emdash&lt;/a&gt; (Electron app, 23 CLI providers supported, YC W26-funded), &lt;a href="https://conductor.build" rel="noopener noreferrer"&gt;Conductor&lt;/a&gt;, Cline's desktop mode. Closed-source you'd put in the same category: Claude Code's VS Code extension, Cursor's "run in background" mode.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Orchestration primitives.&lt;/strong&gt; A library or CLI you import into your own workflow. You don't see a window; you see a process you can pipe into other things. Examples: &lt;a href="https://bernstein.run" rel="noopener noreferrer"&gt;Bernstein&lt;/a&gt; (the project this blog belongs to — 18 CLI adapters, Python-importable), &lt;a href="https://github.com/skeet70/workz" rel="noopener noreferrer"&gt;Workz&lt;/a&gt;, certain configurations of Plandex. LangGraph and CrewAI are adjacent but different — they orchestrate LLM calls, not CLI coding agents.&lt;/p&gt;

&lt;p&gt;The distinction is not about which is better. It's about what layer of the problem you're solving.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a desktop ADE does well
&lt;/h2&gt;

&lt;p&gt;A desktop ADE gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A visual workspace. Diffs, PR status, CI checks, agent logs all in one window.&lt;/li&gt;
&lt;li&gt;Zero-config launch. You open the app, it picks up your repo, agents just work.&lt;/li&gt;
&lt;li&gt;Identity handled. Credentials in the OS keychain, not in a &lt;code&gt;.env&lt;/code&gt; file that leaks.&lt;/li&gt;
&lt;li&gt;Distribution pattern. Electron installers for macOS, Windows, Linux. Your non-terminal colleague can use it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This shape is the right answer when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're the kind of developer who keeps an IDE open all day and wants agents integrated into that workflow, not hidden in a &lt;code&gt;tmux&lt;/code&gt; pane.&lt;/li&gt;
&lt;li&gt;You're onboarding teammates who don't live in the terminal.&lt;/li&gt;
&lt;li&gt;You want one tool that covers edit, review, merge, and CI visibility end-to-end.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What it trades off:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not programmable from the outside. You can't &lt;code&gt;import emdash&lt;/code&gt; or write a CI job that kicks off a parallel agent run via emdash's API. It's a UI, not a library.&lt;/li&gt;
&lt;li&gt;Ships with opinionated conventions. Agents live in app-managed worktrees; audit logs live in app databases. Extracting them into another system is possible but not first-class.&lt;/li&gt;
&lt;li&gt;Cross-machine coordination is an extra feature (SSH mode, remote runtime) rather than the default shape.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What an orchestration primitive does well
&lt;/h2&gt;

&lt;p&gt;A primitive gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A process you can script. &lt;code&gt;bernstein run --goal "..." | jq .&lt;/code&gt; works. So does invoking it from a GitHub Actions workflow, or importing &lt;code&gt;bernstein.core&lt;/code&gt; in your Python code.&lt;/li&gt;
&lt;li&gt;Deterministic coordination. The scheduler is a regular event loop. Every run is replay-able from the audit trail.&lt;/li&gt;
&lt;li&gt;MCP server mode. Your agent-of-choice can talk to the orchestrator through the same Model Context Protocol Anthropic publishes for Claude Code.&lt;/li&gt;
&lt;li&gt;Composition. A primitive is one step in a larger pipeline: linter → primitive multi-agent pass → janitor → merge queue → deploy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This shape is the right answer when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want to embed multi-agent coding into a system you already run: CI, internal dev-platform, evaluation harness.&lt;/li&gt;
&lt;li&gt;You care about reproducibility. HMAC-chained audit trails give you "did the agent really do exactly that?" answers days later.&lt;/li&gt;
&lt;li&gt;You're already in a scripting-first workflow and don't want a new app to keep open.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What it trades off:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No visual diff/merge UI out of the box. You &lt;code&gt;git diff&lt;/code&gt; the worktree, or plug it into your existing tools.&lt;/li&gt;
&lt;li&gt;Setup needs a terminal. &lt;code&gt;pipx install bernstein &amp;amp;&amp;amp; bernstein init&lt;/code&gt;, not a double-click installer.&lt;/li&gt;
&lt;li&gt;It's one layer of a larger stack. You'll likely pair it with a separate review tool, CI system, and notification channel.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Decision shortcuts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Building a product on top of multi-agent coding?&lt;/strong&gt; Reach for a primitive. Libraries compose; apps don't.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Onboarding a team that wants a single download?&lt;/strong&gt; Reach for a desktop ADE. Developer ergonomics of an opinionated installable app is hard to beat for non-power-users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Running agents as part of CI / evaluation / internal platform?&lt;/strong&gt; Primitive, nearly always.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Running agents on your own laptop during normal dev work?&lt;/strong&gt; Either works; it's a preference question. Try both for a week.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need to prove to compliance or security "here's exactly what happened"?&lt;/strong&gt; HMAC audit trails live in the primitive layer. ADE output logs are usually app-scoped.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  They often co-exist
&lt;/h2&gt;

&lt;p&gt;Nothing prevents running both. A pattern we've seen in Bernstein's early users:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bernstein in CI for the "every PR gets a lint-plus-refactor agent pass" step.&lt;/li&gt;
&lt;li&gt;Desktop ADE for interactive "I'm pairing with Claude Code on this refactor" flow.&lt;/li&gt;
&lt;li&gt;Bernstein's MCP server mode exposed to the ADE so both see the same audit trail.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're already using a desktop ADE and it covers what you need, keep it. If you hit the "but I want to run this from a shell script / from CI / inside another service" wall, that's the signal to look at a primitive, regardless of which specific one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bernstein's specific positioning
&lt;/h2&gt;

&lt;p&gt;Bernstein is the primitive-shape tool. What we optimize for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deterministic coordinator written in plain Python — no LLM in the scheduling loop, so runs are reproducible.&lt;/li&gt;
&lt;li&gt;HMAC-chained audit trail — every agent action is replay-able bit-for-bit days later.&lt;/li&gt;
&lt;li&gt;MCP server mode — expose Bernstein to any MCP-capable client (Claude Code, Cursor, or your own agent).&lt;/li&gt;
&lt;li&gt;18 CLI adapters including Claude Code, Codex, Cursor, Aider, Gemini CLI, OpenAI Agents SDK, Amp, Cody, Ollama, and more.&lt;/li&gt;
&lt;li&gt;Apache 2.0, BYOK, &lt;code&gt;pipx install bernstein&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What we don't build: a desktop UI. If you need one, emdash and Conductor both do that well and are worth trying.&lt;/p&gt;

&lt;p&gt;The category is large enough to have multiple right answers. The question is which layer of your stack you're optimizing for. A primitive and an ADE are not competing with each other. They're competing with the "write a bunch of glue code to make two agents work on the same repo without destroying it" option — which nearly everyone used until twelve months ago, and which neither shape is going back to.&lt;/p&gt;

</description>
      <category>multiagentcoding</category>
      <category>agentorchestration</category>
      <category>aicodingagents</category>
      <category>developertools</category>
    </item>
    <item>
      <title>From 4,000 Lines to 200: Decomposing Bernstein's Core</title>
      <dc:creator>Alex Chernysh</dc:creator>
      <pubDate>Tue, 21 Apr 2026 14:10:28 +0000</pubDate>
      <link>https://dev.to/alex_chernysh/from-4000-lines-to-200-decomposing-bernsteins-core-2n8h</link>
      <guid>https://dev.to/alex_chernysh/from-4000-lines-to-200-decomposing-bernsteins-core-2n8h</guid>
      <description>&lt;p&gt;Bernstein's orchestrator.py hit 4,198 lines. We used 11 parallel agents, orchestrated by Bernstein itself, to decompose it into 15 sub-packages in the first pass, each under 400 lines. Subsequent refactors extended this to 22 sub-packages. Here's how that worked and what we learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  How a file gets to 4,000 lines
&lt;/h2&gt;

&lt;p&gt;It happens gradually. The orchestrator started as a clean 300-line module that managed a tick loop: check for tasks, spawn agents, collect results. Then it grew. Cost tracking logic. Quality gates. Token monitoring. Git worktree management. Heartbeat detection. Idle agent recycling. Shutdown coordination.&lt;/p&gt;

&lt;p&gt;Each addition was small and reasonable. But after two months of active development, &lt;code&gt;orchestrator.py&lt;/code&gt; was a 4,198-line monolith that imported 47 modules and had 23 public methods. The test file was 2,800 lines. IDE navigation was painful. Merge conflicts were constant because every feature touched the same file.&lt;/p&gt;

&lt;p&gt;The rule we now follow: if a module crosses 600 lines, it's time to decompose.&lt;/p&gt;

&lt;h2&gt;
  
  
  The plan
&lt;/h2&gt;

&lt;p&gt;We defined 15 target sub-packages, each responsible for one concern:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Sub-package&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;th&gt;Lines (after)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;orchestration/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Lifecycle, tick pipeline&lt;/td&gt;
&lt;td&gt;~350&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;agents/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Spawner, discovery, heartbeat&lt;/td&gt;
&lt;td&gt;~380&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tasks/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Task store, retry, scheduling&lt;/td&gt;
&lt;td&gt;~340&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;quality/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Quality gates, CI monitor&lt;/td&gt;
&lt;td&gt;~290&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cost/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cost tracking, budgets&lt;/td&gt;
&lt;td&gt;~310&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tokens/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Token monitoring, intervention&lt;/td&gt;
&lt;td&gt;~250&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;security/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Audit logs, policy engine&lt;/td&gt;
&lt;td&gt;~270&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Worktree management, merge queue&lt;/td&gt;
&lt;td&gt;~280&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;persistence/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;WAL, checkpointing&lt;/td&gt;
&lt;td&gt;~220&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;planning/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Plan loading, dependencies&lt;/td&gt;
&lt;td&gt;~200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;routing/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Model selection, bandit&lt;/td&gt;
&lt;td&gt;~320&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;communication/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bulletin board, messaging&lt;/td&gt;
&lt;td&gt;~180&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Task server, API&lt;/td&gt;
&lt;td&gt;~260&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;config/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Configuration, defaults&lt;/td&gt;
&lt;td&gt;~190&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;observability/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Metrics, tracing&lt;/td&gt;
&lt;td&gt;~240&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The decomposition needed to be backward-compatible. Existing code importing &lt;code&gt;from bernstein.core.orchestrator import Orchestrator&lt;/code&gt; had to keep working.&lt;/p&gt;

&lt;h2&gt;
  
  
  11 agents, 15 packages
&lt;/h2&gt;

&lt;p&gt;Here's the recursive part: we used Bernstein to execute the decomposition. A YAML plan defined 15 extraction stages with dependency edges (e.g., &lt;code&gt;tasks/&lt;/code&gt; had to be extracted before &lt;code&gt;agents/&lt;/code&gt; because the spawner depends on the task store).&lt;/p&gt;

&lt;p&gt;11 agents ran in parallel across independent sub-packages. Each agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Extracted the relevant functions and classes from &lt;code&gt;orchestrator.py&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Created the new sub-package with proper &lt;code&gt;__init__.py&lt;/code&gt; exports&lt;/li&gt;
&lt;li&gt;Updated all internal imports&lt;/li&gt;
&lt;li&gt;Ran the sub-package's tests to verify nothing broke&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The whole decomposition took about 3 hours of wall time. A human doing this manually — carefully moving code, fixing imports, running tests after each change — would spend 2-3 days.&lt;/p&gt;

&lt;h2&gt;
  
  
  The re-export shim pattern
&lt;/h2&gt;

&lt;p&gt;Backward compatibility was the hardest constraint. We solved it with re-export shims. The original &lt;code&gt;orchestrator.py&lt;/code&gt; became a thin file that imports from sub-packages and re-exports:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# src/bernstein/core/orchestrator.py (after — ~200 lines, down from 4,198)
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Orchestrator shim — re-exports from sub-packages for backward compat.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bernstein.core.orchestration.lifecycle&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Orchestrator&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bernstein.core.orchestration.tick&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TickPipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bernstein.core.orchestration.manager&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OrchestratorManager&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bernstein.core.orchestration.shutdown&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ShutdownCoordinator&lt;/span&gt;

&lt;span class="n"&gt;__all__&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Orchestrator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TickPipeline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OrchestratorManager&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ShutdownCoordinator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every existing import path works unchanged. New code imports from the specific sub-package. Over time, the shims can be deprecated.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Dependency graphs matter more than you think.&lt;/strong&gt; The extraction order was critical. Extracting &lt;code&gt;git/&lt;/code&gt; before &lt;code&gt;tasks/&lt;/code&gt; would have created circular imports because the merge queue references task completion callbacks. We had to map the dependency graph before writing the plan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tests are the safety net.&lt;/strong&gt; Each extraction step ran the full test suite. We caught 14 import errors, 3 circular dependencies, and 1 subtle bug where a function relied on module-level state that moved to a different file. Without tests, at least half of those would have shipped broken.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;600 lines is a good limit.&lt;/strong&gt; After the decomposition, the largest sub-package is &lt;code&gt;agents/&lt;/code&gt; at ~380 lines. Every module is small enough to read in one sitting, grep effectively, and test in isolation. When a new file starts approaching 600 lines, we split it proactively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Orchestrators can orchestrate themselves.&lt;/strong&gt; There's something satisfying about using your own tool to refactor itself. The decomposition was one of our most complex multi-agent runs, and it validated that the parallel execution model works for real refactoring tasks, not just greenfield code generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The result
&lt;/h2&gt;

&lt;p&gt;Before: 1 file, 4,198 lines, 47 imports, constant merge conflicts.&lt;br&gt;
After: 15 sub-packages in the first pass (extended to 22 in later refactors), ~280 lines average, clean dependency boundaries, agents can work on different packages without conflicts.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/chernistry/bernstein/tree/main/src/bernstein/core" rel="noopener noreferrer"&gt;full source&lt;/a&gt; is on GitHub. The re-export shims are in the top-level files like &lt;code&gt;orchestrator.py&lt;/code&gt;, &lt;code&gt;spawner.py&lt;/code&gt;, and &lt;code&gt;task_lifecycle.py&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/cost-aware-routing"&gt;How Bernstein routes tasks to the right model&lt;/a&gt; — the routing sub-package in action&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/cloudflare-cloud-execution"&gt;Running agents on Cloudflare&lt;/a&gt; — cloud execution built on the decomposed architecture&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/getting-started"&gt;Getting started&lt;/a&gt; — try a multi-agent session yourself&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>pythonrefactoring</category>
      <category>codedecomposition</category>
      <category>multiagentorchestration</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Getting Started: Your First Multi-Agent Run in 5 Minutes</title>
      <dc:creator>Alex Chernysh</dc:creator>
      <pubDate>Tue, 21 Apr 2026 14:09:52 +0000</pubDate>
      <link>https://dev.to/alex_chernysh/getting-started-your-first-multi-agent-run-in-5-minutes-57fj</link>
      <guid>https://dev.to/alex_chernysh/getting-started-your-first-multi-agent-run-in-5-minutes-57fj</guid>
      <description>&lt;p&gt;This guide gets you from zero to a working multi-agent session in under 5 minutes. You'll install Bernstein, configure Claude Code as your agent, run a goal, and understand the output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Install Bernstein
&lt;/h2&gt;

&lt;p&gt;Bernstein requires Python 3.12+. Install it with pip or uv:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;bernstein
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or if you use &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv pip &lt;span class="nb"&gt;install &lt;/span&gt;bernstein
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify the installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bernstein &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;span class="c"&gt;# bernstein 1.8.8&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Configure your agent
&lt;/h2&gt;

&lt;p&gt;Bernstein needs at least one CLI coding agent installed. The fastest setup uses Claude Code, but &lt;a href="https://bernstein.readthedocs.io/en/latest/ADAPTER_GUIDE/" rel="noopener noreferrer"&gt;18 agents are supported&lt;/a&gt; including Codex, Gemini CLI, the OpenAI Agents SDK, Aider, and more.&lt;/p&gt;

&lt;p&gt;Make sure Claude Code is installed and your API key is set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Claude Code if you haven't&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @anthropic-ai/claude-code

&lt;span class="c"&gt;# Set your API key&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bernstein auto-detects installed agents. Verify it finds yours:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bernstein agents
&lt;span class="c"&gt;# Available agents:&lt;/span&gt;
&lt;span class="c"&gt;#   claude (Claude Code) ✓&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Run your first goal
&lt;/h2&gt;

&lt;p&gt;cd into any git repository and run a goal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
bernstein run &lt;span class="nt"&gt;--goal&lt;/span&gt; &lt;span class="s2"&gt;"Add type hints to all functions in src/utils.py"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bernstein will:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Decompose&lt;/strong&gt; the goal into concrete tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assign&lt;/strong&gt; each task a role, priority, and model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spawn&lt;/strong&gt; agents in isolated git worktrees&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor&lt;/strong&gt; progress via heartbeats and output parsing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Merge&lt;/strong&gt; completed work back to your branch&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 4: Read the TUI
&lt;/h2&gt;

&lt;p&gt;The terminal UI shows live progress:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─ Bernstein v1.8.8 ─────────────────────────────────┐
│ Goal: Add type hints to all functions in src/utils  │
│ Tasks: 3 total │ 1 running │ 1 done │ 1 pending    │
│ Agents: 2 active │ Cost: $0.12                      │
├─────────────────────────────────────────────────────┤
│ ✓ task-001  Analyze existing type usage    00:42    │
│ ► task-002  Add type hints to helpers      01:15    │
│ ○ task-003  Add type hints to validators   pending  │
└─────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;✓&lt;/strong&gt; = completed and merged&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;►&lt;/strong&gt; = currently running&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;○&lt;/strong&gt; = pending (waiting for dependencies or an available agent)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Press &lt;code&gt;q&lt;/code&gt; to stop gracefully (agents finish their current task) or &lt;code&gt;Ctrl+C&lt;/code&gt; to force stop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Check the results
&lt;/h2&gt;

&lt;p&gt;When all tasks complete, check what changed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git log &lt;span class="nt"&gt;--oneline&lt;/span&gt; &lt;span class="nt"&gt;-5&lt;/span&gt;
&lt;span class="c"&gt;# a1b2c3d Add type hints to validator functions&lt;/span&gt;
&lt;span class="c"&gt;# d4e5f6g Add type hints to helper functions&lt;/span&gt;
&lt;span class="c"&gt;# h7i8j9k Analyze existing type usage in src/utils.py&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each agent's work is a separate commit, merged through Bernstein's merge queue. If any task failed, its changes are rolled back and the failure is logged in &lt;code&gt;.sdd/dead_letter.json&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to try next
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Run a YAML plan&lt;/strong&gt; for structured, multi-stage projects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bernstein run plans/my-project.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plans let you define stages, dependencies, roles, and complexity per task. See the &lt;a href="https://bernstein.readthedocs.io/en/latest/GETTING_STARTED/" rel="noopener noreferrer"&gt;plan file docs&lt;/a&gt; for the full schema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use multiple agent types&lt;/strong&gt; by installing additional adapters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Bernstein will route tasks to the best available agent&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;codex-cli  &lt;span class="c"&gt;# or install any supported agent&lt;/span&gt;
bernstein agents       &lt;span class="c"&gt;# see all detected agents&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Monitor costs&lt;/strong&gt; across sessions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bernstein cost
&lt;span class="c"&gt;# Session total: $0.47&lt;/span&gt;
&lt;span class="c"&gt;# By model: haiku=$0.03, sonnet=$0.28, opus=$0.16&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Check the API&lt;/strong&gt; for programmatic access:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Task server runs on port 8052 during sessions&lt;/span&gt;
curl http://127.0.0.1:8052/status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/cost-aware-routing"&gt;How Bernstein routes tasks to the right model&lt;/a&gt;: bandit router cuts spend roughly in half in our own runs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/cloudflare-cloud-execution"&gt;Running agents on Cloudflare&lt;/a&gt; — scale beyond your laptop&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/chernistry/bernstein" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; — source code, issues, and discussions&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://pypi.org/project/bernstein/" rel="noopener noreferrer"&gt;PyPI package&lt;/a&gt; — release history and downloads&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aicodingagentssetup</category>
      <category>claudecodetutorial</category>
      <category>gettingstarted</category>
      <category>pythonclitools</category>
    </item>
    <item>
      <title>How Bernstein Routes Tasks to the Right Model</title>
      <dc:creator>Alex Chernysh</dc:creator>
      <pubDate>Tue, 21 Apr 2026 14:09:16 +0000</pubDate>
      <link>https://dev.to/alex_chernysh/how-bernstein-routes-tasks-to-the-right-model-379j</link>
      <guid>https://dev.to/alex_chernysh/how-bernstein-routes-tasks-to-the-right-model-379j</guid>
      <description>&lt;p&gt;Not every coding task needs Opus. Bernstein's contextual bandit router learns which model handles each task type best, then routes accordingly. In our own runs, the bandit router cut spend roughly in half compared to uniform model selection. Measure yours with bernstein cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  The uniform selection problem
&lt;/h2&gt;

&lt;p&gt;Most multi-agent setups use the same model for everything. Every task — whether it's renaming a variable or designing an authentication system — gets routed to the same model at the same effort level. This is wasteful. A &lt;code&gt;docs&lt;/code&gt; task that writes a docstring doesn't need the same model as a &lt;code&gt;security&lt;/code&gt; task that implements credential scoping.&lt;/p&gt;

&lt;p&gt;The cost difference is real. At current API pricing, routing a simple task to Haiku instead of Opus costs roughly 30x less. Over a session with 40-60 tasks, that adds up fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the router works
&lt;/h2&gt;

&lt;p&gt;Bernstein's routing pipeline has three layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Heuristic classification.&lt;/strong&gt; Every task has a &lt;code&gt;complexity&lt;/code&gt; field (low, medium, high) and a &lt;code&gt;role&lt;/code&gt; (backend, frontend, qa, security, etc.). The router uses a rule-based classifier to make an initial model/effort assignment. Low-complexity tasks default to Haiku or Sonnet with standard effort. High-complexity tasks get Opus with max effort.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Epsilon-greedy bandit.&lt;/strong&gt; This is where it gets interesting. The bandit maintains per-role reward estimates for each model. When a task arrives, it exploits the best-known model 80% of the time and explores alternatives 20% of the time. Rewards come from task outcomes: did the agent complete the task? Did tests pass? How many retries were needed?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Simplified selection logic
&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sonnet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;opus&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;complexity&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;CASCADE&lt;/span&gt;
&lt;span class="n"&gt;selected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bandit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;candidate_models&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;CASCADE&lt;/code&gt; list includes all available models from cheapest to most capable. For high-complexity tasks, the bandit only considers Sonnet and Opus — sending a hard architecture task to Haiku would waste the agent's time even if it's cheap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Effectiveness seeding.&lt;/strong&gt; The bandit warms up using historical effectiveness data from the &lt;code&gt;.sdd/metrics/&lt;/code&gt; directory. If a previous run showed that &lt;code&gt;backend&lt;/code&gt; tasks succeed 95% of the time with Sonnet but only 70% with Haiku, the bandit starts with that prior. No cold-start problem after the first session.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the router learns
&lt;/h2&gt;

&lt;p&gt;After a few sessions, clear patterns emerge:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task type&lt;/th&gt;
&lt;th&gt;Typical model&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Docs, docstrings&lt;/td&gt;
&lt;td&gt;Haiku&lt;/td&gt;
&lt;td&gt;Templated output, low reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test writing&lt;/td&gt;
&lt;td&gt;Sonnet&lt;/td&gt;
&lt;td&gt;Needs code understanding, not creativity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bug fixes&lt;/td&gt;
&lt;td&gt;Sonnet&lt;/td&gt;
&lt;td&gt;Pattern matching on error traces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Refactoring&lt;/td&gt;
&lt;td&gt;Sonnet/Opus&lt;/td&gt;
&lt;td&gt;Depends on scope&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture, security&lt;/td&gt;
&lt;td&gt;Opus&lt;/td&gt;
&lt;td&gt;Requires deep reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These aren't hardcoded rules — they're learned from outcomes. If your codebase has unusually complex tests, the bandit will learn to route test tasks to a stronger model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration
&lt;/h2&gt;

&lt;p&gt;The bandit is enabled by default when a metrics directory exists. You can tune exploration rate and model cascade in your config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .sdd/config.yaml&lt;/span&gt;
&lt;span class="na"&gt;routing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;bandit_epsilon&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.2&lt;/span&gt;          &lt;span class="c1"&gt;# 20% exploration&lt;/span&gt;
  &lt;span class="na"&gt;cascade&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;haiku&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sonnet&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;opus&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;min_samples_per_arm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;       &lt;span class="c1"&gt;# explore each option at least 5 times&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To disable bandit routing and use pure heuristics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;routing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;bandit_enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;Across our internal runs (self-development sessions where Bernstein improves its own codebase), the bandit router cut per-session spend roughly in half compared to the baseline of Sonnet-for-everything. Task completion rates stayed within a couple of percentage points, so cheaper models handle their assigned tasks fine. Measure your own runs with &lt;code&gt;bernstein cost&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The savings compound. A 10-agent session running 50 tasks might cost $15-20 with uniform Sonnet. With bandit routing, the same session runs $7-10. Over weeks of iterative development, that's the difference between a side project budget and a real expense.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://bernstein.readthedocs.io/en/latest/ARCHITECTURE/" rel="noopener noreferrer"&gt;Architecture overview&lt;/a&gt; for how routing fits into the orchestration pipeline&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/getting-started"&gt;Getting started&lt;/a&gt; to try it yourself&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/chernistry/bernstein/tree/main/src/bernstein/core/routing" rel="noopener noreferrer"&gt;Source code&lt;/a&gt; for the full router implementation&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aicostoptimization</category>
      <category>modelrouting</category>
      <category>contextualbandit</category>
      <category>multiagentorchestration</category>
    </item>
    <item>
      <title>Community Spotlight: April 2026</title>
      <dc:creator>Alex Chernysh</dc:creator>
      <pubDate>Tue, 21 Apr 2026 14:08:40 +0000</pubDate>
      <link>https://dev.to/alex_chernysh/community-spotlight-april-2026-50o5</link>
      <guid>https://dev.to/alex_chernysh/community-spotlight-april-2026-50o5</guid>
      <description>&lt;p&gt;Every month we spotlight the people who make Bernstein better. Here are April's highlights from the first month of public development.&lt;/p&gt;

&lt;h2&gt;
  
  
  What happened in April
&lt;/h2&gt;

&lt;p&gt;Bernstein went from v1.0.0 to v1.8.8 in a few weeks. The pace was intense, and community contributions made a real difference:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Architecture decomposition&lt;/strong&gt;: 52 oversized modules broken into 22 focused sub-packages, each under 600 lines. The orchestrator monolith (4,198 lines) is now navigable, testable, and merge-conflict-free. &lt;a href="https://dev.to/blog/module-decomposition"&gt;Read the full story&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;18 agent adapters&lt;/strong&gt;: We started with 7 adapters and now support 18: Claude Code, Codex, Gemini CLI, OpenAI Agents SDK, Cursor, Aider, Amp, Kiro, Kilo, Qwen, Goose, Ollama, Cody, Continue, OpenCode, Cloudflare Agents, IaC, and a generic wrapper. Each adapter is a focused Python class under 200 lines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-aware routing&lt;/strong&gt;: The &lt;a href="https://dev.to/blog/cost-aware-routing"&gt;contextual bandit router&lt;/a&gt; learns which model handles each task type best. In our own runs, the bandit cut spend roughly in half compared to sending everything to the same model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare cloud execution&lt;/strong&gt;: Agents can now &lt;a href="https://dev.to/blog/cloudflare-cloud-execution"&gt;run on Cloudflare Workers&lt;/a&gt; with Durable Workflows, R2 artifact storage, and D1 state.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows support&lt;/strong&gt;: Full cross-platform compatibility contributed by &lt;a href="https://github.com/oldschoola" rel="noopener noreferrer"&gt;@oldschoola&lt;/a&gt;: environment passthrough, Unicode safety, process management, and terminal handling.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Contributors
&lt;/h2&gt;

&lt;p&gt;Thanks to everyone who contributed PRs, reported bugs, and tested edge cases this month:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/oldschoola" rel="noopener noreferrer"&gt;@oldschoola&lt;/a&gt;: Windows compatibility (3 merged PRs), codex config, task filtering, auto-PR&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Ai-chan-0411" rel="noopener noreferrer"&gt;@Ai-chan-0411&lt;/a&gt;: community spotlight template&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/alexanderxfgl-bit" rel="noopener noreferrer"&gt;@alexanderxfgl-bit&lt;/a&gt;: spotlight generator script&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/forfreedomforrich-eng" rel="noopener noreferrer"&gt;@forfreedomforrich-eng&lt;/a&gt;: &lt;code&gt;--dry-run&lt;/code&gt; flag, trigger URL fix&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/TheCodingDragon0" rel="noopener noreferrer"&gt;@TheCodingDragon0&lt;/a&gt;: &lt;code&gt;bernstein config diff&lt;/code&gt;, glossary&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/internet-dot" rel="noopener noreferrer"&gt;@internet-dot&lt;/a&gt;: HOL workflow&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Beledarian" rel="noopener noreferrer"&gt;@Beledarian&lt;/a&gt;: config path validation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All contributors are listed in &lt;a href="https://github.com/chernistry/bernstein/blob/main/CONTRIBUTORS.md" rel="noopener noreferrer"&gt;CONTRIBUTORS.md&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to get involved
&lt;/h2&gt;

&lt;p&gt;Bernstein is Apache 2.0 and welcomes contributions of all sizes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/chernistry/bernstein/labels/good%20first%20issue" rel="noopener noreferrer"&gt;Good first issues&lt;/a&gt;: curated tasks for newcomers&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/chernistry/bernstein/issues/786" rel="noopener noreferrer"&gt;Write a blog post&lt;/a&gt;: get published on bernstein.run&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/chernistry/bernstein/issues/775" rel="noopener noreferrer"&gt;Adopt an adapter&lt;/a&gt;: become the maintainer for your favorite agent&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/chernistry/bernstein/issues/787" rel="noopener noreferrer"&gt;Submit benchmarks&lt;/a&gt;: share your orchestration metrics&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>community</category>
      <category>opensource</category>
      <category>contributors</category>
      <category>multiagentorchestration</category>
    </item>
    <item>
      <title>Running AI Agents on Cloudflare: Workers, Workflows, and Durable Objects</title>
      <dc:creator>Alex Chernysh</dc:creator>
      <pubDate>Tue, 21 Apr 2026 14:08:39 +0000</pubDate>
      <link>https://dev.to/alex_chernysh/running-ai-agents-on-cloudflare-workers-workflows-and-durable-objects-fl</link>
      <guid>https://dev.to/alex_chernysh/running-ai-agents-on-cloudflare-workers-workflows-and-durable-objects-fl</guid>
      <description>&lt;p&gt;Bernstein v1.8.4 ships with Cloudflare cloud execution. Agents can now run on Workers, multi-step tasks use Durable Workflows, artifacts go to R2, and state persists in D1. Here's the architecture and how to deploy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why local-only limits adoption
&lt;/h2&gt;

&lt;p&gt;Running agents locally works for individual developers, but it has real constraints. Your laptop is the bottleneck: CPU, memory, and network all compete with your actual work. Long-running sessions drain battery. If you close your laptop, the session dies. And scaling beyond 4-5 concurrent agents on a MacBook starts hitting resource limits.&lt;/p&gt;

&lt;p&gt;Cloud execution solves this. Agents run on remote infrastructure while you monitor progress from a dashboard or TUI. Sessions survive disconnects. You can scale to 20+ concurrent agents without melting your machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cloudflare stack
&lt;/h2&gt;

&lt;p&gt;Cloudflare recently became &lt;a href="https://openai.com/index/cloudflare-openai-agent-cloud/" rel="noopener noreferrer"&gt;OpenAI's infrastructure partner for agent cloud computing&lt;/a&gt; — the same infrastructure Bernstein agents can now run on. We chose Cloudflare's stack because it maps cleanly to orchestration primitives:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workers&lt;/strong&gt; handle lightweight, stateless agent execution. Each agent task runs in an isolated Worker with its own environment. Workers cold-start in under 50ms, so spinning up a new agent is nearly instant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Durable Workflows&lt;/strong&gt; orchestrate multi-step tasks. When an agent needs to clone a repo, run code, execute tests, and report results, the workflow ensures each step completes before the next begins — with automatic retries on failure. If a Worker crashes mid-task, the workflow resumes from the last completed step, not from scratch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;R2&lt;/strong&gt; stores artifacts. Agent outputs — diffs, test results, generated files — persist in R2 buckets. The orchestrator reads results from R2 when merging completed work back to the main branch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;D1&lt;/strong&gt; holds orchestration state. Task queues, agent assignments, cost metrics, and audit logs all live in D1. This replaces the local &lt;code&gt;.sdd/&lt;/code&gt; file-based state with a durable database that survives restarts and supports concurrent access from multiple Workers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture overview
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Architecture diagram omitted in this cross-post. See the original post on bernstein.run for the rendered version.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The orchestrator itself runs as a Worker with a Durable Object for maintaining tick state. Agent Workers are spawned per-task and communicate results back through R2 and D1.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying
&lt;/h2&gt;

&lt;p&gt;Prerequisites: a Cloudflare account with Workers, R2, and D1 enabled, and &lt;code&gt;wrangler&lt;/code&gt; installed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Authenticate with Cloudflare&lt;/span&gt;
wrangler login

&lt;span class="c"&gt;# Deploy the Bernstein cloud stack&lt;/span&gt;
bernstein cloud deploy &lt;span class="nt"&gt;--project&lt;/span&gt; my-project

&lt;span class="c"&gt;# This creates:&lt;/span&gt;
&lt;span class="c"&gt;#   - Orchestrator Worker + Durable Object&lt;/span&gt;
&lt;span class="c"&gt;#   - R2 bucket: bernstein-my-project-artifacts&lt;/span&gt;
&lt;span class="c"&gt;#   - D1 database: bernstein-my-project-state&lt;/span&gt;
&lt;span class="c"&gt;#   - Workflow definitions for multi-step tasks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once deployed, run tasks against the cloud backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run a goal on cloud infrastructure&lt;/span&gt;
bernstein run &lt;span class="nt"&gt;--goal&lt;/span&gt; &lt;span class="s2"&gt;"Refactor auth module"&lt;/span&gt; &lt;span class="nt"&gt;--cloud&lt;/span&gt;

&lt;span class="c"&gt;# Monitor from your terminal&lt;/span&gt;
bernstein cloud status

&lt;span class="c"&gt;# Or check the dashboard&lt;/span&gt;
bernstein dashboard &lt;span class="nt"&gt;--cloud&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agent API keys (Anthropic, OpenAI, etc.) are stored as Worker secrets via &lt;code&gt;wrangler secret put&lt;/code&gt;. They never leave the Cloudflare network.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost considerations
&lt;/h2&gt;

&lt;p&gt;Cloudflare Workers pricing is request-based, not instance-based. You pay for the compute your agents actually use, not for idle VMs. For a typical 50-task session:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Workers compute: ~$0.50-2.00&lt;/li&gt;
&lt;li&gt;R2 storage: pennies (artifacts are small)&lt;/li&gt;
&lt;li&gt;D1 reads/writes: pennies (state operations are lightweight)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cloud infrastructure cost is a small fraction of the LLM API costs that agents incur. The real savings come from not needing to keep your machine running and from being able to scale to more concurrent agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;We're working on scheduled runs (trigger a session from a cron or GitHub webhook), multi-region execution (run agents closer to the repos they're working on), and a hosted dashboard for monitoring cloud sessions without a local CLI.&lt;/p&gt;

&lt;p&gt;Try it: &lt;code&gt;pip install bernstein&lt;/code&gt; and check the &lt;a href="https://dev.to/blog/getting-started"&gt;getting started guide&lt;/a&gt;.&lt;br&gt;
Source: &lt;a href="https://github.com/chernistry/bernstein" rel="noopener noreferrer"&gt;github.com/chernistry/bernstein&lt;/a&gt;&lt;/p&gt;

</description>
      <category>cloudflareworkers</category>
      <category>cloudaiagents</category>
      <category>serverlessorchestration</category>
      <category>multiagentorchestration</category>
    </item>
    <item>
      <title>Stop using LLMs to schedule other LLMs</title>
      <dc:creator>Alex Chernysh</dc:creator>
      <pubDate>Wed, 08 Apr 2026 12:56:54 +0000</pubDate>
      <link>https://dev.to/alex_chernysh/why-i-stopped-using-llms-to-schedule-llms-4176</link>
      <guid>https://dev.to/alex_chernysh/why-i-stopped-using-llms-to-schedule-llms-4176</guid>
      <description>&lt;p&gt;Three AI coding agents on the same repo = three agents overwriting each other's work. Claude Code edits &lt;code&gt;auth.py&lt;/code&gt;. Codex edits &lt;code&gt;auth.py&lt;/code&gt; two seconds later. Claude's changes vanish. Meanwhile Gemini "refactors" the test suite and breaks six things.&lt;/p&gt;

&lt;p&gt;Two weeks of this. Here's what fixed it: git worktrees per agent, a deterministic Python scheduler (not an LLM), and a janitor that verifies work before merge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The wrong turn
&lt;/h2&gt;

&lt;p&gt;My first orchestrator used an LLM to coordinate the other LLMs. A manager agent read the backlog, decided assignments, checked progress, re-planned on failure.&lt;/p&gt;

&lt;p&gt;It was slow, expensive, and kept hallucinating priorities. ~40% of total tokens went to coordination overhead instead of code.&lt;/p&gt;

&lt;p&gt;Then the obvious hit: scheduling is a solved problem. Operating systems have done concurrent process scheduling since the 1960s. Nobody uses neural networks for &lt;code&gt;cron&lt;/code&gt;. Why use one for task assignment?&lt;/p&gt;

&lt;p&gt;I ripped out the LLM scheduler. The result is &lt;a href="https://github.com/chernistry/bernstein" rel="noopener noreferrer"&gt;Bernstein&lt;/a&gt;, an open-source orchestrator that coordinates any CLI coding agent with &lt;strong&gt;zero LLM tokens on scheduling&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pipeline
&lt;/h2&gt;

&lt;p&gt;Four stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Decompose&lt;/strong&gt;: one LLM call takes your goal, outputs a task graph with roles, owned files, and dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spawn&lt;/strong&gt;: each task gets a fresh CLI agent in an isolated git worktree. Parallel execution. Main branch untouched.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify&lt;/strong&gt;: a janitor checks concrete signals. Tests pass, files exist, linter clean, types correct. Binary outcomes, not opinions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Merge&lt;/strong&gt;: verified work lands on main. Failed tasks retry on a different model or get decomposed further.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Goal → Planner (LLM) → Task Graph → Orchestrator (Python) → Agents ‖
                                         ↓
                                    Janitor → Merge
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The orchestrator is a Python event loop that polls a local task server, matches open tasks to available agents, and manages lifecycle. Deterministic, auditable, reproducible. Same inputs produce the same decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Worktrees: the part that unlocked it
&lt;/h2&gt;

&lt;p&gt;Each agent gets its own &lt;a href="https://git-scm.com/docs/git-worktree" rel="noopener noreferrer"&gt;git worktree&lt;/a&gt; on a disposable branch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git worktree add .sdd/worktrees/session-abc123 &lt;span class="nt"&gt;-b&lt;/span&gt; agent/session-abc123
&lt;span class="c"&gt;# agent works in isolation&lt;/span&gt;
&lt;span class="c"&gt;# janitor verifies, then:&lt;/span&gt;
git checkout main
git merge agent/session-abc123 &lt;span class="nt"&gt;--no-ff&lt;/span&gt;
git worktree remove .sdd/worktrees/session-abc123
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each agent thinks it owns the repo. No file locks, no coordination protocol between agents, no conflicts during work. The task graph declares file ownership, so overlapping files never get assigned concurrently.&lt;/p&gt;

&lt;p&gt;Expensive directories (&lt;code&gt;node_modules&lt;/code&gt;, &lt;code&gt;.venv&lt;/code&gt;) get symlinked from the main tree so you don't pay setup cost per agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model routing without vibes
&lt;/h2&gt;

&lt;p&gt;Renaming a variable doesn't need Opus. But static rules for model selection go stale fast.&lt;/p&gt;

&lt;p&gt;Bernstein uses a &lt;a href="https://en.wikipedia.org/wiki/Contextual_bandit" rel="noopener noreferrer"&gt;LinUCB contextual bandit&lt;/a&gt; that learns from outcomes. Features: complexity tier, file scope, role, estimated token budget. Reward: &lt;code&gt;quality_score * (1 - normalized_cost)&lt;/code&gt;. Cheapest model that passes the janitor wins.&lt;/p&gt;

&lt;p&gt;Under ~50 completions it falls back to static cascade (haiku → sonnet → opus). After warm-up the bandit takes over. Policy persists across runs so learning accumulates.&lt;/p&gt;

&lt;p&gt;Net effect in my runs: ~23% cost reduction vs. running everything on one top-tier model.&lt;/p&gt;

&lt;h2&gt;
  
  
  New in v1.8: MCP server mode
&lt;/h2&gt;

&lt;p&gt;Since the original post, Bernstein gained a &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; server. Any MCP-aware client (Claude Desktop, Cursor, VS Code, Zed) can now call Bernstein as a tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bernstein mcp &lt;span class="nt"&gt;--transport&lt;/span&gt; stdio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your IDE agent decomposes a goal, calls &lt;code&gt;bernstein_run&lt;/code&gt;, and Bernstein fans out the work across 12 parallel CLI agents in worktrees. The IDE agent just waits for results. One cheap router model at the top, a swarm of cheap workers below, one expensive reviewer at the end — instead of one Opus chewing through 40 serialized tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it differs from CrewAI, AutoGen, LangGraph, Composio, emdash
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Bernstein&lt;/th&gt;
&lt;th&gt;CrewAI / AutoGen / LangGraph&lt;/th&gt;
&lt;th&gt;Composio / emdash&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scheduling&lt;/td&gt;
&lt;td&gt;Deterministic Python&lt;/td&gt;
&lt;td&gt;LLM-driven&lt;/td&gt;
&lt;td&gt;Hosted/UI-driven&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works with&lt;/td&gt;
&lt;td&gt;20+ CLI agents (Claude Code, Codex, Aider, etc.)&lt;/td&gt;
&lt;td&gt;Their SDK classes&lt;/td&gt;
&lt;td&gt;Their desktop app / web UI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Git isolation&lt;/td&gt;
&lt;td&gt;Worktree per agent&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Varies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Verification&lt;/td&gt;
&lt;td&gt;Janitor + quality gates&lt;/td&gt;
&lt;td&gt;Mostly absent&lt;/td&gt;
&lt;td&gt;Mostly absent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent lifetime&lt;/td&gt;
&lt;td&gt;Short: spawn, work, exit&lt;/td&gt;
&lt;td&gt;Long-running&lt;/td&gt;
&lt;td&gt;Long-running&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State&lt;/td&gt;
&lt;td&gt;File-based (inspect with &lt;code&gt;cat&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;In-memory / checkpointer&lt;/td&gt;
&lt;td&gt;Cloud/hosted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Interface&lt;/td&gt;
&lt;td&gt;CLI + MCP server&lt;/td&gt;
&lt;td&gt;SDK&lt;/td&gt;
&lt;td&gt;Desktop ADE&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Philosophical difference: CrewAI/AutoGen/LangGraph are frameworks — you write agents in their SDK. Composio and emdash are desktop ADEs — you use their UI. Bernstein is infrastructure — you point it at Claude Code, Codex, or Aider (or all three in one run) and it handles the rest.&lt;/p&gt;

&lt;p&gt;The LLM-driven coordination in those frameworks is non-deterministic and hard to debug. When Bernstein assigns task #47 to Sonnet, you can read the policy file and trace the feature vector that selected it. No prompt archaeology.&lt;/p&gt;

&lt;p&gt;Trade-off: no agent-to-agent chat, no built-in RAG, no hosted option. It's a CLI for people who want their agents to write code and get out.&lt;/p&gt;

&lt;h2&gt;
  
  
  What still sucks
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Agents hallucinate file paths. The janitor catches it, but retries cost tokens.&lt;/li&gt;
&lt;li&gt;Context windows fill up on large codebases. Short-lived agents help; it's still a real constraint.&lt;/li&gt;
&lt;li&gt;12 parallel Opus agents is not cheap. Budgets and the bandit help. Not attention-free.&lt;/li&gt;
&lt;li&gt;Setup friction. At least one CLI agent must be installed and authenticated.&lt;/li&gt;
&lt;li&gt;File ownership isn't bulletproof. Agents occasionally touch files outside their scope.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is v1.8, not v10. But the core loop is stable and I've been running it against production code for months.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;bernstein
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
bernstein init
bernstein &lt;span class="nt"&gt;-g&lt;/span&gt; &lt;span class="s2"&gt;"Add rate limiting middleware"&lt;/span&gt;
bernstein live    &lt;span class="c"&gt;# TUI&lt;/span&gt;
bernstein cost    &lt;span class="c"&gt;# spend so far&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For multi-stage work, a YAML plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;stages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;goal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;rate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;limiting&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;middleware"&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;
        &lt;span class="na"&gt;complexity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;medium&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;goal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Integration&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tests&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;rate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;limiter"&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;qa&lt;/span&gt;
        &lt;span class="na"&gt;complexity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;low&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docs&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;goal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Document&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;rate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;limiting&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;OpenAPI&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;spec"&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docs&lt;/span&gt;
        &lt;span class="na"&gt;complexity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;low&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bernstein run plan.yaml              &lt;span class="c"&gt;# deterministic execution&lt;/span&gt;
bernstein run &lt;span class="nt"&gt;--dry-run&lt;/span&gt; plan.yaml    &lt;span class="c"&gt;# preview + cost estimate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mix models in the same run. Claude Code for architecture, Gemini for boilerplate, Aider with a local Ollama model for offline tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/chernistry/bernstein" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;. Apache 2.0. Star if it saves you a merge conflict.&lt;/p&gt;

&lt;p&gt;If you've been babysitting one agent at a time, try the worktree-per-agent pattern and tell me what breaks. I'm especially interested in failure modes I haven't hit yet.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>python</category>
      <category>ai</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
