<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: akirayuusha</title>
    <description>The latest articles on DEV Community by akirayuusha (@akirayuusha).</description>
    <link>https://dev.to/akirayuusha</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3982044%2Fac27c61c-7c71-4526-853d-040b9c03e256.jpg</url>
      <title>DEV Community: akirayuusha</title>
      <link>https://dev.to/akirayuusha</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/akirayuusha"/>
    <language>en</language>
    <item>
      <title>Launching BabyChain: durable image and video model chains on AWS Aurora and Vercel</title>
      <dc:creator>akirayuusha</dc:creator>
      <pubDate>Sat, 13 Jun 2026 01:34:00 +0000</pubDate>
      <link>https://dev.to/akirayuusha/launching-babychain-durable-image-and-video-model-chains-on-aws-aurora-and-vercel-1p5h</link>
      <guid>https://dev.to/akirayuusha/launching-babychain-durable-image-and-video-model-chains-on-aws-aurora-and-vercel-1p5h</guid>
      <description>&lt;p&gt;Today we are launching &lt;strong&gt;BabyChain&lt;/strong&gt;: a self-hosted canvas studio and durable chain API for image and video model workflows.&lt;/p&gt;

&lt;p&gt;The short version is this: BabyChain lets you design a ComfyUI-style media chain on a canvas, then call that same chain from product code as &lt;code&gt;POST /api/v1/chains/runs&lt;/code&gt;. Every step executes through provider APIs with server-side credentials, every state transition persists to &lt;a href="https://aws.amazon.com/rds/aurora" rel="noopener noreferrer"&gt;AWS Aurora&lt;/a&gt;, and &lt;a href="https://vercel.com" rel="noopener noreferrer"&gt;Vercel&lt;/a&gt; functions stay stateless.&lt;/p&gt;

&lt;p&gt;The product has one invariant: &lt;strong&gt;every output becomes the next input.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://babychain.babysea.live" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/babysea-community/babychain" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you run model chains on a local GPU workbench today, BabyChain is the version of that workflow you can deploy, call from a backend, and keep forever. The canvas is not a demo shell. It is a visual editor on top of the same durable contract your application calls.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why we built it: canvas workflows are not production infrastructure
&lt;/h2&gt;

&lt;p&gt;Real generative media work is rarely one model call. It is an image model feeding an image-to-video model, often with a refine step in the middle and a video-to-video step at the end. Canvas tools made that composable, but most of them are creative workbenches. The workflow lives inside a UI, expects a local GPU or a managed model runtime, and does not naturally become an authenticated API that another product can call.&lt;/p&gt;

&lt;p&gt;We kept hitting the same wall in our own projects: the moment a visual workflow needed to become &lt;em&gt;product infrastructure&lt;/em&gt; (authenticated, retryable, callable from a queue, safe to expose to another backend), we had to rewrite it as glue code.&lt;/p&gt;

&lt;p&gt;BabyChain's design goal was to make that distance zero.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Design the chain on a canvas. Call the same chain from your backend. Those should not be two different systems.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What ships today
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-flow canvas studio.&lt;/strong&gt; Many independent image → video flows side by side on one permanent workspace, with autosave and a library of saved chains plus their results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;57 image and video models&lt;/strong&gt; across Black Forest Labs, Runway, Alibaba Cloud DashScope, Google, OpenAI, and BytePlus, for &lt;strong&gt;78,948 valid chain combinations&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema-true node cards.&lt;/strong&gt; Every card's fields, enum options, ranges, and defaults are generated from that model's schema, so the UI cannot offer a parameter the API would reject.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BYOK execution.&lt;/strong&gt; Provider keys live in the server environment, never in the browser and never in caller requests. Caller apps authenticate with BabyChain API keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Durable runs on Aurora.&lt;/strong&gt; Ordered steps, provider request ids, generation ids, outputs, failures, callbacks, and a timeline are all persisted. One signed callback delivers the terminal run when a webhook URL is supplied.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Self-host it in an afternoon
&lt;/h2&gt;

&lt;p&gt;BabyChain is Apache-2.0 and built to be deployed, not hosted by us:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/babysea-community/babychain.git
&lt;span class="nb"&gt;cd &lt;/span&gt;babychain &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pnpm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--frozen-lockfile&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env.local   &lt;span class="c"&gt;# DATABASE_URL, owner login, provider keys&lt;/span&gt;
pnpm run aurora:migrate      &lt;span class="c"&gt;# applies the schema, idempotent&lt;/span&gt;
pnpm dev                     &lt;span class="c"&gt;# or use the one-click Vercel deploy button&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, BabyChain is designed around AWS Aurora. For local development, it can also point at a local PostgreSQL database. The README walks through creating the Aurora cluster, setting &lt;code&gt;DATABASE_URL&lt;/code&gt;, applying the schema, and deploying the app on Vercel.&lt;/p&gt;

&lt;p&gt;The rest of this post is the architecture: how a canvas workflow becomes durable infrastructure on Aurora and Vercel.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture: Aurora remembers, Vercel advances
&lt;/h2&gt;

&lt;p&gt;The naive way to run a multi-model chain on serverless is to hold the whole chain in one function invocation. That dies quickly. A single image → video → video-modify workflow can spend several minutes inside provider queues, and a stateless function should not be asked to babysit that entire wait.&lt;/p&gt;

&lt;p&gt;The durable way is to make the database the only place workflow state lives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Aurora owns every fact about a run.
Vercel functions are stateless workers.
Each invocation advances a run by at most one step.
Any instance can pick up any run mid-chain.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a caller creates a run, BabyChain persists the run and its ordered steps to Aurora, may opportunistically advance the first ready step, and returns without waiting for the full chain. Each subsequent poll of &lt;code&gt;GET /api/v1/chains/get/{runId}&lt;/code&gt;, or a cron sweep, loads the run from Aurora, advances exactly one provider step (submit or poll), persists the result, and returns. Long chains survive serverless limits because &lt;strong&gt;no instance ever needs to outlive a step&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Aurora owns every fact about a run, so a Vercel function is allowed to disappear at any moment. The chain is not.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Aurora Serverless v2 fits the workload: bursty, low-idle, spiky on demo days. The connection pool absorbs Aurora wake-ups when a cluster is configured to pause with a 30-second connection timeout. For Aurora/RDS endpoints, deployers keep &lt;code&gt;?sslmode=require&lt;/code&gt; in &lt;code&gt;DATABASE_URL&lt;/code&gt;; BabyChain strips the driver-level query param and connects with TLS, including the RDS CA behavior expected by the Node.js &lt;code&gt;pg&lt;/code&gt; client.&lt;/p&gt;

&lt;h2&gt;
  
  
  The schema: seven tables, one source of truth
&lt;/h2&gt;

&lt;p&gt;Everything durable lives in one private schema, &lt;code&gt;babychain_private&lt;/code&gt;, applied idempotently by &lt;code&gt;pnpm aurora:migrate&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Table&lt;/th&gt;
&lt;th&gt;Owns&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;chain_run&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Run lifecycle: status, input, output, error code/message, idempotency key hash, callback intent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;chain_step&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ordered steps: per-step params, provider request ids, generation ids, output files, failure details&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;canvas&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Saved node graphs as &lt;code&gt;jsonb&lt;/code&gt;, owner-scoped, with a touch trigger and a &lt;code&gt;(owner_email, updated_at desc)&lt;/code&gt; index&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;api_key&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Hashed caller keys with scopes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;audit_event&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Append-only audit trail&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;callback_delivery&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Final signed-webhook delivery state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;babysea_webhook_delivery&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Inbound provider webhook bookkeeping&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two design details earned their keep:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The &lt;code&gt;input_order&lt;/code&gt; sidecar.&lt;/strong&gt; PostgreSQL &lt;code&gt;jsonb&lt;/code&gt; does not preserve key order, but the public run resource echoes the caller's input back in API responses, and key order is part of how people read their own request. Run creation stores a small &lt;code&gt;jsonb&lt;/code&gt; array of the caller's original key order alongside the canonicalized input, and the presenter re-applies it on the way out. It is a small detail, but it matters when an API response is also a debugging surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Guarded state transitions.&lt;/strong&gt; Steps only leave the &lt;code&gt;queued&lt;/code&gt; state through updates with a &lt;code&gt;where status = 'queued'&lt;/code&gt; guard. That single predicate makes the fail-fast path race-safe: when a step fails, the runner marks the run failed and sweeps every still-queued downstream step to &lt;code&gt;skipped&lt;/code&gt; (their input can never arrive) without ever clobbering a step that a concurrent invocation already started.&lt;/p&gt;

&lt;h2&gt;
  
  
  Idempotency end to end
&lt;/h2&gt;

&lt;p&gt;Generative media is expensive enough that retries must not multiply spend. BabyChain makes idempotency a property of the whole pipeline, not one endpoint:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Run creation&lt;/strong&gt; hashes the caller's &lt;code&gt;Idempotency-Key&lt;/code&gt; per principal and stores it on &lt;code&gt;chain_run&lt;/code&gt; with a unique constraint. A retried create replays the stored run: same id, same response, zero new provider calls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step submission&lt;/strong&gt; derives a deterministic idempotency key per run, step, and chain version. If a function instance dies between submitting to a provider and persisting the result, the retry resubmits with the same key, and providers that honor idempotency can deduplicate server-side.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The same discipline applies on the way out: when a run includes a webhook URL, the terminal callback is claimed on the run row and each signed delivery attempt is recorded in &lt;code&gt;callback_delivery&lt;/code&gt;, so concurrent instances do not both send the same terminal callback.&lt;/p&gt;

&lt;h2&gt;
  
  
  A canvas that cannot lose your work
&lt;/h2&gt;

&lt;p&gt;The studio is a multi-flow &lt;a href="https://reactflow.dev" rel="noopener noreferrer"&gt;React Flow&lt;/a&gt; canvas. Every edit autosaves to the &lt;code&gt;canvas&lt;/code&gt; table in Aurora, and surviving real-world usage took three iterations:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Debounced autosave is a data-loss bug with good intentions: it can drop the last burst of edits before a reload.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;A debounce-based autosave silently dropped the last burst of edits before a reload. We replaced it with a &lt;strong&gt;dirty flag plus a steady flush loop&lt;/strong&gt;, so a flush is always at most ~1.5 seconds behind your latest keystroke.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;&lt;code&gt;sendBeacon&lt;/code&gt; final flush&lt;/strong&gt; covers tab close, reload, navigation, and tab-hide through an owner-authenticated endpoint.&lt;/li&gt;
&lt;li&gt;Hydration runs exactly once per mount, so router refreshes can never reset live canvas state.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is the demo I like most: edit a prompt, log out, log back in on another machine. The edit is there, served from Aurora. Close the tab mid-run, reopen, and the run resumes, because run progress was never in the browser to begin with.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the real pain lived: provider normalization
&lt;/h2&gt;

&lt;p&gt;Six providers (Black Forest Labs, Runway, Alibaba Cloud DashScope, Google Gemini API, OpenAI, BytePlus ARK), 57 supported models, 78,948 valid chain combinations. Not one provider agrees on what "give me a 16:9 image" means.&lt;/p&gt;

&lt;p&gt;The deepest rabbit hole was Alibaba DashScope output sizes. Each model family has different rules, documented nowhere and discovered only by probing the live API:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;qwen-image&lt;/code&gt; / &lt;code&gt;qwen-image-plus&lt;/code&gt; accept &lt;strong&gt;exactly five sizes&lt;/strong&gt;. Anything else is a 400.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;qwen-image-max&lt;/code&gt; and &lt;code&gt;z-image-turbo&lt;/code&gt; cap each dimension at 2048.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;wan2.6&lt;/code&gt; / &lt;code&gt;wan2.7&lt;/code&gt; families enforce per-model &lt;strong&gt;pixel budgets&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Provider docs are a starting point. The live API is the truth.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So the adapter computes sizes per model. For budgeted models, a requested ratio &lt;code&gt;w:h&lt;/code&gt; is fitted into a pixel budget &lt;code&gt;P_max&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;scale = sqrt(P_max / (w * h))
W = floor(scale * w / 16) * 16
H = floor(scale * h / 16) * 16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…and snapped-size models get a lookup table instead, because they allow no freedom at all. Wrong sizes now physically cannot be sent.&lt;/p&gt;

&lt;p&gt;The same empirical attitude shaped everything at the boundary: Runway's per-endpoint pixel ratios, OpenAI's permanent quota 429s masquerading as transient rate limits, and BFL output URLs that expire after ~10 minutes (the UI shows honest loading and expiry states instead of leaking alt text).&lt;/p&gt;

&lt;p&gt;One structural decision keeps this manageable: the canvas node cards are &lt;strong&gt;generated from each model's schema&lt;/strong&gt; (fields, enum options, ranges, defaults). The UI cannot offer a parameter the API would reject, because both are projections of the same source of truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  What keeps it honest
&lt;/h2&gt;

&lt;p&gt;BabyChain is built around runtime invariants instead of optimistic workflows. A chain should be able to fail cleanly, resume after an interrupted function, reject invalid model roles and normalized inputs before dispatch, and preserve canvas state even if the browser disappears mid-edit.&lt;/p&gt;

&lt;p&gt;The runtime behavior we validated end to end:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Step fails             -&amp;gt; run goes terminal, downstream steps skipped, caller sees the provider's real error
Function instance dies -&amp;gt; next poll resumes the run from Aurora, idempotent resubmit
Client retries create  -&amp;gt; same run replayed, zero duplicate spend
Tab closes mid-edit    -&amp;gt; sendBeacon flush, canvas intact after re-login
Aurora wake-up         -&amp;gt; 30s connection budget absorbs it when pause is enabled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The current project gate is 237 tests plus typecheck, lint, and production build. The tests cover the runner, provider adapters, templates, API behavior, migrations, idempotency errors, callback behavior, and the schema rules that keep the canvas and API aligned.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would do next
&lt;/h2&gt;

&lt;p&gt;BabyChain is already usable as a deployable starter, but the next layer is about making runs cheaper to inspect, easier to share, and safer to operate for teams:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Output archival.&lt;/strong&gt; Copy provider outputs to S3 before short-lived URLs expire, especially for providers whose generated URLs are only useful for minutes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Branching chains.&lt;/strong&gt; The canvas already runs flows side by side; the runner should support fan-out inside one chain, such as one image feeding multiple video treatments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team workspaces.&lt;/strong&gt; Add multi-user accounts with per-key scopes and quotas on top of the existing &lt;code&gt;api_key&lt;/code&gt; model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run economics.&lt;/strong&gt; Surface per-run cost estimates and provider spend from the data Aurora already stores.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;Statelessness is a feature you design for, not a constraint you fight. Once every fact about a run lives in Aurora (runs, steps, provider ids, outputs, failures, callbacks, canvases, audit), serverless time limits, cold starts, and instance churn stop being the center of the system. Vercel gives the control plane instant deployment; Aurora gives it durable memory.&lt;/p&gt;

&lt;p&gt;Design on the canvas. Ship the same contract as an API. Let the database remember everything.&lt;/p&gt;

&lt;p&gt;Creators and developers: deploy it, chain your own models in your own cloud, and tell us what you automate first.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try it: &lt;a href="https://babychain.babysea.live" rel="noopener noreferrer"&gt;https://babychain.babysea.live&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Deploy it: &lt;a href="https://github.com/babysea-community/babychain" rel="noopener noreferrer"&gt;https://github.com/babysea-community/babychain&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Hackathon note
&lt;/h2&gt;

&lt;p&gt;BabyChain is our entry to the &lt;strong&gt;H0: Hack the Zero Stack with Vercel v0 &amp;amp; AWS Databases&lt;/strong&gt; hackathon. This post was created for the purposes of entering that hackathon. #H0Hackathon&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
