<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alex Chen</title>
    <description>The latest articles on DEV Community by Alex Chen (@alex_chen_45b61c234682eb6).</description>
    <link>https://dev.to/alex_chen_45b61c234682eb6</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3883787%2Ff7c6b285-a545-467d-9f79-594a9e5b4e49.png</url>
      <title>DEV Community: Alex Chen</title>
      <link>https://dev.to/alex_chen_45b61c234682eb6</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/alex_chen_45b61c234682eb6"/>
    <language>en</language>
    <item>
      <title>The Architectural Shape Hint: A Spec-Time Trick That Lets 10 AI Agents Run in Parallel Without Stepping on Each Other</title>
      <dc:creator>Alex Chen</dc:creator>
      <pubDate>Sun, 03 May 2026 06:12:10 +0000</pubDate>
      <link>https://dev.to/alex_chen_45b61c234682eb6/the-architectural-shape-hint-a-spec-time-trick-that-lets-10-ai-agents-run-in-parallel-without-2g69</link>
      <guid>https://dev.to/alex_chen_45b61c234682eb6/the-architectural-shape-hint-a-spec-time-trick-that-lets-10-ai-agents-run-in-parallel-without-2g69</guid>
      <description>&lt;p&gt;I run agent swarms now. Not "an agent" — &lt;em&gt;agents&lt;/em&gt;, plural, in flight at once, each working on a different feature against the same repo. Ten agents per session is normal. Twenty isn't unusual when the spec is well-decomposed. The token math works, the wall-clock math works, the model latency hides inside the swarm because something is always landing while something else is still compiling. The economics make a strong case for parallel execution as the default.&lt;/p&gt;

&lt;p&gt;Until you hit the wall everyone hits: &lt;strong&gt;two agents touched the same file&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I've spent the better part of the year fighting this. I've shipped four layers of runtime defense. They all work and none of them are the answer. The answer turned out to be one attribute on the spec. This is the post about that one attribute.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The four layers nobody told you you'd need
&lt;/h2&gt;

&lt;p&gt;Before I describe the fix, let me describe the disease — because if you're running parallel agents and you &lt;em&gt;don't&lt;/em&gt; recognize this stack, you're probably going to recognize it next week.&lt;/p&gt;

&lt;p&gt;When two agents in flight at once both want to edit &lt;code&gt;src/router/routes.py&lt;/code&gt;, here's what claw-forge (the harness I work in) does:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;File-claim locks.&lt;/strong&gt; Each task declares &lt;code&gt;touches_files=[...]&lt;/code&gt; upfront. The dispatcher refuses to start a second task that wants a file currently held by a running task. The second task defers to the next dispatch cycle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-dispatch worktree sync.&lt;/strong&gt; Before the agent runs, the harness merges &lt;code&gt;target_branch&lt;/code&gt; into the feature branch &lt;em&gt;inside the worktree&lt;/em&gt;. If &lt;code&gt;target&lt;/code&gt; moved while the task was queued, the merge happens before any token is spent. Conflicts surface as &lt;code&gt;resume_conflict:&lt;/code&gt; failures with the offending file list.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Catch-up rebase inside &lt;code&gt;squash_merge&lt;/code&gt;.&lt;/strong&gt; When the agent's branch finally squash-merges to main and conflicts with concurrent work, the harness merges target into the branch and retries the squash automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resume-on-retry preamble.&lt;/strong&gt; If a task fails mid-run, the next attempt picks up the worktree as-is, with a prompt prefix listing what's already committed and what failed last time. The agent doesn't redo the first 60% of the work.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This stack is correct. Each layer earns its keep. If I deleted any one of them, real users would file real bug reports within 48 hours. But notice what they all have in common: &lt;strong&gt;they are reactive&lt;/strong&gt;. Every layer is a response to "two agents touched the same file." The conflict has already happened by the time the layer fires.&lt;/p&gt;

&lt;p&gt;What if it never happened?&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Conflicts are usually predictable from architecture
&lt;/h2&gt;

&lt;p&gt;Sit down with a senior engineer who has worked on a codebase for six months. Hand them a list of feature requests. Ask: "If we built these in parallel with one engineer per feature, where would the merge conflicts happen?" They'll be right within five minutes. They don't run the merges. They look at the codebase's structure and &lt;em&gt;know&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The reason they know is that conflicts cluster around &lt;strong&gt;architectural surfaces&lt;/strong&gt;. A few specific files — the dispatcher, the routes table, the global event bus, the error envelope, the auth middleware — get touched by every feature. Most other files are owned by one feature each. The conflict surface isn't uniformly distributed across the repo. It's concentrated on the structural choke points.&lt;/p&gt;

&lt;p&gt;This is the same insight that drives plugin architectures in big software systems. WordPress plugins don't conflict because each lives in &lt;code&gt;wp-content/plugins/&amp;lt;name&amp;gt;/&lt;/code&gt;. VS Code extensions don't conflict because each lives in its own directory and registers through a stable API. The host is small and stable. The plugins are everything else.&lt;/p&gt;

&lt;p&gt;If you build your codebase as a small core plus many plugins, &lt;em&gt;and&lt;/em&gt; your spec tells the harness which features are plugins versus core, &lt;em&gt;and&lt;/em&gt; the harness honors that distinction at scheduling time — then ten agents working on ten plugins literally cannot conflict. They are editing files in ten different directories. The locks are decorative. The catch-up rebase is dead code. The pre-dispatch sync is a no-op.&lt;/p&gt;

&lt;p&gt;This was the unlock. Encode the architectural intent in the spec. Let the scheduler use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Two shapes, one attribute
&lt;/h2&gt;

&lt;p&gt;Every feature in our specs now carries an architectural-shape attribute. There are exactly two shapes that matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;shape="plugin"&lt;/code&gt;&lt;/strong&gt; — vertical features. Live in their own directory, own their own data model, own their own tests. Adding or removing the plugin doesn't touch sibling plugins. Examples: "user can register," "user can edit profile," "task CRUD with tag filtering." Each lives in &lt;code&gt;src/plugins/&amp;lt;name&amp;gt;/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;shape="core"&lt;/code&gt;&lt;/strong&gt; — cross-cutting concerns. Edit files used by every plugin. Examples: "all endpoints validate JWT," "uniform RFC 7807 error envelope," "global rate limit," "database connection pool." Each lives in &lt;code&gt;src/core/&amp;lt;concern&amp;gt;/&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. No tier, no taxonomy, no UML. Two values. The simplicity is load-bearing — if the classifier had three values it would have ten by next quarter, and the scheduling rule would have to handle a Cartesian product of cases.&lt;/p&gt;

&lt;p&gt;A spec entry now looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;feature&lt;/span&gt; &lt;span class="na"&gt;index=&lt;/span&gt;&lt;span class="s"&gt;"14"&lt;/span&gt; &lt;span class="na"&gt;shape=&lt;/span&gt;&lt;span class="s"&gt;"plugin"&lt;/span&gt; &lt;span class="na"&gt;plugin=&lt;/span&gt;&lt;span class="s"&gt;"auth"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;description&amp;gt;&lt;/span&gt;User can register with email and password&lt;span class="nt"&gt;&amp;lt;/description&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/feature&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;feature&lt;/span&gt; &lt;span class="na"&gt;index=&lt;/span&gt;&lt;span class="s"&gt;"20"&lt;/span&gt; &lt;span class="na"&gt;shape=&lt;/span&gt;&lt;span class="s"&gt;"core"&lt;/span&gt;
         &lt;span class="na"&gt;touches_files=&lt;/span&gt;&lt;span class="s"&gt;"src/core/middleware/auth.py"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;description&amp;gt;&lt;/span&gt;All endpoints validate JWT on incoming requests&lt;span class="nt"&gt;&amp;lt;/description&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/feature&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;plugin="auth"&lt;/code&gt; attribute auto-fills &lt;code&gt;touches_files&lt;/code&gt; to &lt;code&gt;["src/plugins/auth/**"]&lt;/code&gt;. The harness now knows that feature 14 will only touch files inside &lt;code&gt;src/plugins/auth/&lt;/code&gt;. Two &lt;code&gt;shape="plugin"&lt;/code&gt; features with different &lt;code&gt;plugin&lt;/code&gt; names are &lt;em&gt;guaranteed&lt;/em&gt; to be file-disjoint. Not "probably." Not "usually." Guaranteed by directory boundaries.&lt;/p&gt;

&lt;p&gt;For &lt;code&gt;shape="core"&lt;/code&gt; features the auto-derivation can't help — cross-cutting work touches a specific file by name. The author writes &lt;code&gt;touches_files="src/core/middleware/auth.py"&lt;/code&gt; explicitly. The parser refuses any spec where &lt;code&gt;shape="core"&lt;/code&gt; lacks a &lt;code&gt;touches_files&lt;/code&gt; value. Cross-cutting work without a declared file set is a bug in the spec, not a runtime decision the dispatcher gets to make.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The scheduling rule that follows
&lt;/h2&gt;

&lt;p&gt;Once shape is in the spec, the dispatcher gets two new rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;shape="plugin"&lt;/code&gt; tasks dispatch freely up to &lt;code&gt;--concurrency N&lt;/code&gt;.&lt;/strong&gt; Their file sets are disjoint by construction. The file-claim lock layer becomes a sanity check rather than a primary defense. Plugin tasks scale linearly with concurrency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;shape="core"&lt;/code&gt; tasks single-flight.&lt;/strong&gt; At most one cross-cutting task runs at a time, regardless of &lt;code&gt;--concurrency&lt;/code&gt;. Two core tasks both want to edit the auth middleware? They serialize. Always. No clever overlap analysis, no "well actually they touch different lines." Cross-cutting work is cheap to serialize — it's a small minority of features — and the cost of getting it wrong is high.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tasks without &lt;code&gt;shape&lt;/code&gt;&lt;/strong&gt; (legacy specs) fall through to the existing concurrency cap + file-claim lock behavior. Backward compatibility is free because the new rules are gated on &lt;code&gt;task.shape IS NOT NULL&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The scheduler's filter is twelve lines of Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_ready_tasks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TaskNode&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;ready&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_is_ready&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="c1"&gt;# Cross-cutting (shape="core") tasks single-flight: drop any
&lt;/span&gt;    &lt;span class="c1"&gt;# candidate ``core`` task from the ready set if another core task
&lt;/span&gt;    &lt;span class="c1"&gt;# is already running.
&lt;/span&gt;    &lt;span class="n"&gt;any_core_running&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;running&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;core&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;any_core_running&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ready&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ready&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;core&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ready&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire enforcement mechanism. The scheduler has no opinion about parallelism beyond this. The &lt;code&gt;touches_files&lt;/code&gt; lock layer handles the second-line defense for cases where a plugin author lied about their shape (which the code review should catch separately).&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Why this works structurally, not just behaviorally
&lt;/h2&gt;

&lt;p&gt;The thing that makes this approach durable is that the safety property is &lt;strong&gt;structural&lt;/strong&gt;: it's a consequence of file-system layout, not of clever runtime detection.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;feat/plugins/auth/&lt;/code&gt; and &lt;code&gt;feat/plugins/profile/&lt;/code&gt; are the only file sets two agents touch, there is no possible interleaving where they conflict. Not because the harness is smart. Because the files don't overlap. The same way two &lt;code&gt;git worktree&lt;/code&gt; instances on different branches can edit different files without any locking — git just doesn't see them as a conflict.&lt;/p&gt;

&lt;p&gt;Compare this to the old approach: "predict conflicts at runtime by checking which files each agent claims to touch." That works &lt;em&gt;if&lt;/em&gt; every agent honestly declares its file set. In practice, agents trying to wire a plugin into a registry often need to edit the registry too. They forget to declare the registry file. The lock layer doesn't fire. The merge conflicts at squash time. The whole reactive stack kicks in.&lt;/p&gt;

&lt;p&gt;The plugin-shape approach refuses to be in that situation. If your codebase has a registry that every plugin has to edit, that registry is a hotspot and you should restructure it — or declare it as &lt;code&gt;shape="core"&lt;/code&gt; and serialize work on it. The architecture catches up to the parallelism, not the other way around.&lt;/p&gt;

&lt;p&gt;This is also why the harness composes naturally with my project's &lt;code&gt;boundaries&lt;/code&gt; audit pass. That tooling already identifies hotspot files (registries, route tables, dispatch chains) and refactors them into plugin-extensible patterns. After a &lt;code&gt;boundaries apply --auto&lt;/code&gt; pass, the codebase is more amenable to plugin-shape features — fewer surfaces remain that &lt;em&gt;force&lt;/em&gt; a &lt;code&gt;shape="core"&lt;/code&gt; declaration. The two pieces — spec-time architectural intent and codebase structural refactoring — pull in the same direction. Each makes the other more effective.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. The brownfield path: refactor first, then extend
&lt;/h2&gt;

&lt;p&gt;Greenfield projects can be built plugin-shaped from day one. Brownfield projects — i.e. every project worth working on — usually have an existing dispatcher / route table / event bus that gets touched by every feature. You can't bolt plugin-shape semantics onto a codebase whose architecture isn't ready for them.&lt;/p&gt;

&lt;p&gt;So the brownfield workflow has an extra step:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;analyze&lt;/code&gt; — generate a manifest with stack, conventions, test baseline.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;boundaries audit&lt;/code&gt; — emit &lt;code&gt;boundaries_report.md&lt;/code&gt; listing extension hotspots and the refactor pattern best suited to each (registry / split / route-table / extract-collaborators).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;boundaries apply --auto&lt;/code&gt; — refactor each hotspot one at a time on its own feature branch with test gating. Squash-merges to main on green; reverts on red.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/create-spec&lt;/code&gt; — the slash command reads &lt;code&gt;boundaries_report.md&lt;/code&gt; first. If hotspots remain unrefactored, it warns the user before generating any spec. Then it asks &lt;code&gt;shape&lt;/code&gt; per feature.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;claw-forge add&lt;/code&gt; — runs the planner against the now-shape-aware spec.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Skipping step 3 is the costly mistake. New features land as &lt;code&gt;shape="plugin"&lt;/code&gt;, but the file-claim lock catches them when they try to edit the un-refactored hotspot, the dispatcher fails the task with &lt;code&gt;resume_conflict&lt;/code&gt;, and the agent has wasted one full attempt on stale state. Refactoring up front is cheaper than discovering you need to mid-flight. The boundaries harness exists exactly to make that "up front" step automatic.&lt;/p&gt;

&lt;p&gt;The cultural ask is: when adding non-trivial features to an existing codebase, do the structural work &lt;em&gt;first&lt;/em&gt;. That's not a new principle — it's "make the change easy, then make the easy change," Kent Beck, twenty years ago. Plugin-shape specs make this principle observable: if you can't write a clean spec without declaring half your features as &lt;code&gt;shape="core"&lt;/code&gt;, that's a structural signal, not a spec-writing failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. What this doesn't solve (be honest)
&lt;/h2&gt;

&lt;p&gt;I want to be careful not to oversell this. Here's what plugin-shape specs explicitly do not do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Semantic conflicts inside a single plugin.&lt;/strong&gt; Two tasks for the same plugin (&lt;code&gt;plugin="auth"&lt;/code&gt;) still serialize via &lt;code&gt;touches_files&lt;/code&gt; locks. Adding "user can reset password" while "user can change email" is in flight will defer the second one until the first finishes. This is fine — it's the correct behavior — but it limits intra-plugin parallelism to one task at a time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-plugin coupling that wasn't designed in.&lt;/strong&gt; If your &lt;code&gt;tasks&lt;/code&gt; plugin imports from your &lt;code&gt;auth&lt;/code&gt; plugin's internals (and your codebase doesn't enforce plugin isolation via lint or import boundaries), edits to &lt;code&gt;auth/&lt;/code&gt; can break &lt;code&gt;tasks/&lt;/code&gt; after merge. The spec doesn't catch this; tests do. Treat the spec as a parallelism hint, not an isolation guarantee.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared infrastructure changes.&lt;/strong&gt; A migration that adds a column to the &lt;code&gt;users&lt;/code&gt; table is &lt;code&gt;shape="core"&lt;/code&gt; because the migrations directory is shared. Two such migrations serialize. They have to — concurrent migration writers race on the migration sequence number. Don't try to plugin-ify your migrations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specifications written as shape-agnostic.&lt;/strong&gt; A feature whose acceptance criteria say "the system shall …" without naming a directory or file is hard to classify. Either rewrite the criterion to reference a concrete piece of the system, or accept that the feature won't get a &lt;code&gt;shape&lt;/code&gt; attribute and will fall through to legacy scheduling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The honest framing: plugin-shape specs make the &lt;em&gt;common&lt;/em&gt; parallelism case (many vertical features against a clean plugin host) trivial-safe. The hard cases — cross-cutting concerns, coupled plugins, shared infrastructure — still require engineering judgment. The win is that the common case becomes the default rather than the exception.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. The cultural shift this enables
&lt;/h2&gt;

&lt;p&gt;There's a meta-point here that's bigger than the technical mechanism.&lt;/p&gt;

&lt;p&gt;Most discussions of "AI agents at scale" focus on the &lt;em&gt;agent's&lt;/em&gt; capabilities — context window, reasoning depth, tool-use accuracy. Those matter, but they're not where the leverage is. The leverage is in &lt;strong&gt;encoding the human's architectural intent in a place the harness can read&lt;/strong&gt;. Specs are not just task descriptions for the agent. They're scheduling hints for the orchestrator. They're isolation declarations for the locks. They're refactoring targets for the boundaries pass. They're documentation for the next human reviewer.&lt;/p&gt;

&lt;p&gt;When you start writing specs that carry this much load, the spec format itself stops being a casual prose blob and becomes a structured contract. XML attributes that look fussy at first — &lt;code&gt;index&lt;/code&gt;, &lt;code&gt;depends_on&lt;/code&gt;, &lt;code&gt;shape&lt;/code&gt;, &lt;code&gt;plugin&lt;/code&gt;, &lt;code&gt;touches_files&lt;/code&gt; — earn their keep because every one of them maps to a runtime decision the harness will otherwise have to guess. Guessing is what produces the four-layer reactive stack. Declaring is what makes that stack a quiet backstop instead of a daily firefight.&lt;/p&gt;

&lt;p&gt;This is the same shift that happened in deployment automation a decade ago: declarative manifests beat imperative shell scripts because the &lt;em&gt;intent&lt;/em&gt; — "I want three replicas behind a load balancer" — was machine-readable rather than buried in a sequence of side-effecting commands. Plugin-shape specs are doing the same thing for AI-agent orchestration: making intent readable so the orchestrator can stop guessing.&lt;/p&gt;

&lt;p&gt;If you're building AI-coding-agent infrastructure right now and your dispatcher is making scheduling decisions based purely on what's in the queue, you're building the imperative-shell-script version of this. The declarative version — where the agents read what the human meant rather than what they typed — is meaningfully better, and it doesn't require a smarter model. It requires a more structured spec.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. The minimum implementation
&lt;/h2&gt;

&lt;p&gt;If you want to try this in your own harness, the minimum viable version is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;One attribute on your task/feature object.&lt;/strong&gt; Call it &lt;code&gt;shape&lt;/code&gt;, &lt;code&gt;kind&lt;/code&gt;, &lt;code&gt;category&lt;/code&gt;, whatever — but pick &lt;em&gt;exactly two&lt;/em&gt; values. "vertical" and "horizontal" works. "feature" and "infra" works. Two values. The temptation to add a third is a trap.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One auto-derivation rule.&lt;/strong&gt; When &lt;code&gt;shape="plugin"&lt;/code&gt; and a &lt;code&gt;plugin="X"&lt;/code&gt; is set, the file-claim list defaults to &lt;code&gt;["plugins/X/**"]&lt;/code&gt;. One line of helper code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One scheduling rule.&lt;/strong&gt; When any &lt;code&gt;shape="core"&lt;/code&gt; task is running, drop other core tasks from the ready set. Twelve lines of Python.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One spec-time validation.&lt;/strong&gt; &lt;code&gt;shape="core"&lt;/code&gt; without an explicit file list raises an error before the planner runs. Five lines.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the whole ship. Total surface area: maybe 50 lines of harness code, plus the spec schema extension and the docs to teach the spec author what to declare.&lt;/p&gt;

&lt;p&gt;The minimum tests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A round-trip test that parses the documented XML example and asserts the auto-derived file lists match (guards against doc/code drift).&lt;/li&gt;
&lt;li&gt;A scheduler test that adds two &lt;code&gt;shape="core"&lt;/code&gt; tasks and confirms only one is in the ready set when the other is running.&lt;/li&gt;
&lt;li&gt;A scheduler test that confirms &lt;code&gt;shape="plugin"&lt;/code&gt; tasks dispatch freely when a core task is running.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three tests. Done. The pattern compounds: now your codebase has a place to put new shape-aware behavior, and your spec authors have a place to encode new architectural intent. Future work — auto-derived shape inference via static analysis, telemetry on adoption rates, conflict-prediction at scheduler time — all builds on this primitive.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. Closing thought
&lt;/h2&gt;

&lt;p&gt;The thing that took me too long to internalize is that &lt;strong&gt;parallelism is a property of the architecture, not the runtime&lt;/strong&gt;. You can't bolt safe parallelism onto a codebase whose architecture forces every feature through the same chokepoint. You can build elaborate runtime defenses against the resulting conflicts — and you should, because real codebases always have &lt;em&gt;some&lt;/em&gt; chokepoints — but the runtime defenses are the patch, not the cure.&lt;/p&gt;

&lt;p&gt;The cure is to design codebases where parallelism is structurally safe, and to encode that structural intent in the spec so the orchestrator can lean on it. Two values, one attribute, twelve lines of scheduler logic. That's the surface area of the win. The cost was a year of fighting the four-layer reactive stack to recognize that the layers were treating symptoms, not the disease.&lt;/p&gt;

&lt;p&gt;If your AI-agent harness is dropping conflicts on you, look at your spec format before you look at your dispatcher. The dispatcher is downstream. The spec is where the architecture lives.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Alex Chen builds AI-coding-agent infrastructure shipped to production. He runs ten-agent swarms daily and would like to thank the team's &lt;code&gt;boundaries&lt;/code&gt; harness for finally making it stop hurting.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Building an Autonomous Crypto Trading Bot</title>
      <dc:creator>Alex Chen</dc:creator>
      <pubDate>Sun, 03 May 2026 06:05:58 +0000</pubDate>
      <link>https://dev.to/alex_chen_45b61c234682eb6/building-an-autonomous-crypto-trading-bot-2lc4</link>
      <guid>https://dev.to/alex_chen_45b61c234682eb6/building-an-autonomous-crypto-trading-bot-2lc4</guid>
      <description>&lt;p&gt;I've been spending too much time inside trading bot codebases lately. Most of them are one of two things: a 200-line Jupyter notebook that someone calls a "system," or a sprawling monorepo where the strategy logic and exchange integration are so tangled that you can't swap exchanges without rewriting half the code.&lt;/p&gt;

&lt;p&gt;A few weeks ago I went deep on &lt;strong&gt;AlphaStrike&lt;/strong&gt;, a production-grade crypto perpetual futures bot. Not because the returns were headline-grabbing (though a 2.4 Sharpe is nothing to sneeze at), but because the architecture solves problems most of us hand-wave past. I want to walk through what's interesting, what's novel, and what I'd steal for my own projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Space
&lt;/h2&gt;

&lt;p&gt;Algorithmic crypto trading sounds simple at the whiteboard: read prices, predict direction, place orders, manage risk. In practice, every layer of that stack will try to kill you.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exchanges are inconsistent.&lt;/strong&gt; WEEX, Binance, Hyperliquid — every one has different symbol formats, different REST paradigms, different WebSocket lifecycles, different ways of representing a position.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models decay.&lt;/strong&gt; A signal that worked last quarter doesn't work this quarter. Pretending otherwise is how accounts get blown up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Volatility is non-stationary.&lt;/strong&gt; Static leverage and fixed position sizes are a lie you tell yourself until you wake up at -40% drawdown.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pure quant is fragile.&lt;/strong&gt; Numbers don't know that the SEC just sued the second-largest exchange.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AlphaStrike's design isn't trying to be the smartest bot. It's trying to be the bot that's still alive in 12 months. That's a different optimization target, and it shows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture, Top-Down
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EXCHANGE → DATA GATEWAY → FEATURE LAYER → FEATURE VALIDATOR
                                                    │
                                                    ▼
EXECUTION ← RISK LAYER ← STRATEGY LAYER ← ML LAYER
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Eight stages, every one of them able to halt the pipeline on its own. That's the first lesson: &lt;strong&gt;every layer is a potential circuit breaker.&lt;/strong&gt; If features fail validation (PSI drift, KS test, CUSUM), no signal reaches the model. If the risk layer flags exposure, no order reaches the exchange. Fail-closed by default.&lt;/p&gt;

&lt;p&gt;Let me walk through the four pieces I actually want to talk about.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Exchange Abstraction Done Right
&lt;/h2&gt;

&lt;p&gt;This is where most trading bots rot. AlphaStrike defines two &lt;code&gt;Protocol&lt;/code&gt; classes — &lt;code&gt;ExchangeRESTProtocol&lt;/code&gt; and &lt;code&gt;ExchangeWebSocketProtocol&lt;/code&gt; — and every adapter (WEEX, Hyperliquid, Binance, generic OpenAPI) implements them. The trading logic only talks to the unified protocol.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@runtime_checkable&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ExchangeRESTProtocol&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Protocol&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_ticker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;symbol&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;UnifiedTicker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;place_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;UnifiedOrder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;UnifiedOrderResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_positions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;symbol&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;UnifiedPosition&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_leverage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;symbol&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;leverage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The unified data models (&lt;code&gt;UnifiedOrder&lt;/code&gt;, &lt;code&gt;UnifiedPosition&lt;/code&gt;, &lt;code&gt;UnifiedCandle&lt;/code&gt;) are the contract. Every adapter has a &lt;code&gt;mappers.py&lt;/code&gt; that translates between exchange-native shapes and the unified shapes. Symbol normalization happens at the adapter boundary — internally everything is &lt;code&gt;BTCUSDT&lt;/code&gt;, externally it becomes &lt;code&gt;cmt_btcusdt&lt;/code&gt; or whatever WEEX wants this week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why I care:&lt;/strong&gt; I've shipped trading code where exchange-specific assumptions leaked into the strategy. It's death by a thousand &lt;code&gt;if exchange == "binance"&lt;/code&gt; cuts. The Protocol-based approach keeps the boundary honest. You add a new exchange by writing one adapter file, not by hunting through the codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The ML Layer That Doesn't Trust Itself
&lt;/h2&gt;

&lt;p&gt;The signal pipeline runs &lt;strong&gt;12 categories&lt;/strong&gt; of weak signals — order flow, microstructure, volatility, correlation, sentiment, seasonality, statistical, price action, volume, derivatives, alternative, macro — and combines them through a regime-aware ensemble. This is the explicitly Renaissance/Medallion-inspired bit, and the backtest deltas are real:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Single Signal&lt;/th&gt;
&lt;th&gt;12-Category Ensemble&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sharpe&lt;/td&gt;
&lt;td&gt;1.2&lt;/td&gt;
&lt;td&gt;2.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Win Rate&lt;/td&gt;
&lt;td&gt;52%&lt;/td&gt;
&lt;td&gt;58%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max Drawdown&lt;/td&gt;
&lt;td&gt;-15%&lt;/td&gt;
&lt;td&gt;-8%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;But the part I find genuinely novel is the &lt;strong&gt;signal decay tracker&lt;/strong&gt;. Every signal logs its predictions, the system records outcomes, and signals get auto-retired when their rolling accuracy drops below 48%. Weight is &lt;code&gt;(edge × 2)²&lt;/code&gt;, so signals with real edge get amplified and weak signals fade out without anyone touching code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;edge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;accuracy&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;            &lt;span class="c1"&gt;# 0.52 accuracy → 0.02 edge
&lt;/span&gt;&lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;         &lt;span class="c1"&gt;# quadratic weighting of strong signals
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;accuracy&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.48&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retire&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the right way to do it. Most "ensemble" systems use static weights tuned once and forgotten. Here the weights are alive — they update with reality. Models that lose their edge get fired by the system itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Dynamic Leverage as a First-Class Citizen
&lt;/h2&gt;

&lt;p&gt;Static leverage is the crypto equivalent of running with scissors while drunk. AlphaStrike treats leverage as a continuous control variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;leverage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="n"&gt;vol_factor&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="n"&gt;dd_factor&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="n"&gt;perf_factor&lt;/span&gt;

&lt;span class="n"&gt;vol_factor&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;normal_vol&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;current_vol&lt;/span&gt;     &lt;span class="c1"&gt;# clamped 0.3 to 1.5
&lt;/span&gt;&lt;span class="n"&gt;dd_factor&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;        &lt;span class="c1"&gt;# tiered by drawdown
&lt;/span&gt;&lt;span class="n"&gt;perf_factor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;half_kelly_fraction&lt;/span&gt;          &lt;span class="c1"&gt;# 0.6 to 1.2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real scenarios from the doc:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Conditions&lt;/th&gt;
&lt;th&gt;Leverage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Normal&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.0x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High vol (5%)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.0x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;In 12% drawdown&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.5x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strong perf + low vol&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.0x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All bad (high vol + DD + losing)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.0x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The leverage state lives in &lt;code&gt;data/state/leverage_state.json&lt;/code&gt; so it survives restarts. When the system reduces from 5x to 2x because volatility spiked, the next process boot doesn't forget. That detail matters more than it sounds — most bots reset to defaults on restart and quietly take on more risk than the operator thinks.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The LLM Layer That Knows Its Place
&lt;/h2&gt;

&lt;p&gt;Here's the part that surprised me. AlphaStrike has an LLM decision layer — a local Ollama-served &lt;code&gt;qwen2.5:1.5b&lt;/code&gt; — but its design philosophy is the opposite of what's currently fashionable. The LLM does not generate signals. It does not pick trades. It does not "reason about the market."&lt;/p&gt;

&lt;p&gt;It only intervenes &lt;strong&gt;when performance degrades.&lt;/strong&gt; When the rolling win rate drops below 40%, drawdown crosses 15%, or you stack 5 consecutive losses, the system hands the LLM a structured performance report and a tightly scoped tool palette:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;adjust_conviction(symbol, threshold, reason)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;adjust_position_size(symbol, multiplier, reason)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;adjust_leverage(new_leverage, reason)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;disable_shorts(symbol, reason)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;disable_asset(symbol, duration_hours, reason)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;no_action(reason)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example LLM response when SOL is having a 25% win rate, 22% drawdown, 7-loss streak:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"adjust_position_size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"symbol"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SOL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"multiplier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"adjust_conviction"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"symbol"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SOL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"new_threshold"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"disable_shorts"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"symbol"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SOL"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"send_alert"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"critical"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the right shape for LLMs in financial systems: &lt;strong&gt;bounded actions, explicit triggers, no inference loops touching live capital.&lt;/strong&gt; The model doesn't have to be smart, it has to be defensive. A 1.5B parameter local model is more than enough when the action space is six tools wide.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Took Away
&lt;/h2&gt;

&lt;p&gt;Three things I'm stealing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Protocol-based exchange abstraction.&lt;/strong&gt; No more &lt;code&gt;if exchange ==&lt;/code&gt; chains. Define the contract once, swap implementations behind it. This generalizes way past trading.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Self-retiring signals with quadratic edge weighting.&lt;/strong&gt; Static feature weights are tech debt the moment you ship them. Make signal decay a first-class concept and let the data prune your own model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LLM-as-circuit-breaker, not LLM-as-strategist.&lt;/strong&gt; The hype-cycle take is "use the LLM to pick trades." The mature take is "use the LLM to recognize when your quant system is dying and apply targeted, reversible, well-typed interventions." The hype-cycle take blows up your account. The mature take saves it.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What I'd build next: an offline evaluation harness for the LLM's tool-call decisions. Right now the LLM's interventions only get evaluated by their downstream P&amp;amp;L impact, which is noisy and slow. A counterfactual replay framework — "what would have happened if the LLM had done nothing, or chosen a different tool?" — would let you tune the trigger thresholds and the prompt without burning real capital. That's where I'd put the next two weeks of engineering time.&lt;/p&gt;

&lt;p&gt;Trading bots are not magic. They're software systems that have to survive volatility, exchange flakiness, model decay, and operator panic. The systems that survive are the ones that take all four threats seriously at the architecture level — not the ones with the prettiest backtest curve.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>automation</category>
      <category>cryptocurrency</category>
      <category>softwareengineering</category>
    </item>
  </channel>
</rss>
