If you have built anything with long-running stateful workflows — loan approvals, order processing, subscription lifecycles, insurance claims, onboarding funnels — you have probably hit a wall that nobody talks about cleanly.
You need to change the workflow. But you already have thousands of instances running.
The problem nobody has a clean answer to
The standard options are all painful.
Wait for instances to drain naturally. Fine if your workflows complete in minutes. Useless if they run for weeks or months waiting for human approval, document submission, or payment settlement.
Write a migration script. You query your database, move rows between tables, pray nothing is mid-transition, and hope you did not accidentally re-trigger a side effect for 40,000 customers.
Keep old code running forever alongside new code. Now you are maintaining two versions of your business logic indefinitely, and the operational complexity compounds with every release.
Temporal's approach: version markers in your workflow code. This works, but it means every code change requires careful getVersion() calls throughout your workflow function, and a non-determinism error on a long-running production workflow is a genuine incident. We have seen threads from teams where a change they believed was backwards-compatible broke in rare production scenarios after deployment.
None of these answers are wrong exactly. They are just the best available options in a space where the fundamental problem — migrating running stateful instances to a new version of their logic — has never been solved cleanly.
What we built
StateKeep is a statechart hosting platform. You upload an XState-compatible machine definition, spawn actors against it, and send events. StateKeep handles persistence, event history, encryption at rest, and version migration.
The part that is different: when you deploy a new version, each running actor migrates based on its event history fingerprint — not its current state.
Every actor carries a compact hash of every event type it has processed, in order. When you deploy a new version with a historyPath, the platform checks each actor's fingerprint against the path you declared. Actors whose history contains that path migrate. Actors whose history does not contain it stay on the current version.
The consequence: two actors in the same state can receive different migration decisions in the same deployment.
The concrete example
A loan application workflow. Two customers, Alice and Bob. Both are currently in awaiting_documents.
Alice paid the verification fee to get there. Bob waived it.
You deploy a new version that adds an income verification step — but only for customers who paid the fee, because that is the regulatory requirement for that path.
You declare:
json
{
"id": "loan-v2",
"parentId": "loan-v1",
"historyPath": ["START_APPLICATION", "SUBMIT_INFO", "PAY_FEE"],
"definition": { ... }
}
The platform evaluates every actor. Alice's history contains that path. She migrates to loan-v2, landing in the new income_verify state. Bob's history does not contain PAY_FEE. He stays on loan-v1, continuing to awaiting_documents as before.
Both actors keep working. Neither restarts. Neither loses context. No migration script was written. No side effects were re-fired. The decision was made per-actor, based on history, in under a second.
What this looks like in practice
Deploy a new version targeting a specific path:
typescript
import { createClient } from '@statekeep/sdk';
const sk = createClient({
baseUrl: 'https://your-instance.com',
apiKey: 'sk_...'
});
// Deploy v2 — only actors who paid the fee are eligible
await sk.deploy('loan-v2', loanV2Definition, {
parentId: 'loan-v1',
historyPath: ['START_APPLICATION', 'SUBMIT_INFO', 'PAY_FEE'],
});
// Deploy a wildcard version — all actors migrate
await sk.deploy('order-v2', orderV2Definition, {
parentId: 'order-v1',
// no historyPath = all actors eligible
});
Preview what will happen before committing:
const preview = await sk.preview('loan-v2', loanV2Definition, {
parentId: 'loan-v1',
historyPath: ['START_APPLICATION', 'SUBMIT_INFO', 'PAY_FEE'],
});
console.log(preview.migration.wouldMigrate.length); // 1,203 actors
console.log(preview.migration.wouldStay.length); // 847 actors
The preview calls the exact same evaluation function as the live deployment. What you see is what will happen.
What StateKeep does and does not do
StateKeep is a state tracker, not a side effect executor. It does not run your action handlers or evaluate your guards.
Guards (guard: 'isEligible') are stubbed to false — guarded transitions never fire. Actions (actions: 'sendEmail') are no-ops — state changes but nothing executes. Your backend reads the new stateValue from the event response and handles side effects in its own code.
This is intentional. It means migration never accidentally re-fires side effects. An actor migrating from v1 to v2 does not trigger emails, charges, or notifications — because StateKeep never ran any of those in the first place.
The supported pattern: model routing decisions as explicit events rather than guards. Your backend evaluates the condition and sends APPROVE_FAST_TRACK or APPROVE_STANDARD. The machine routes deterministically from there. No guards needed.
Rescue deployments
When a buggy version reaches actors before you catch it, you deploy a rescue version targeting only the actors whose history includes the buggy path:
typescript
await sk.deploy('loan-v2-rescue', fixedDefinition, {
parentId: 'loan-v2-buggy',
historyPath: ['START_APPLICATION', 'SUBMIT_INFO', 'PAY_FEE', 'TRIGGER_BUG'],
});
Only actors whose history contains TRIGGER_BUG migrate to the fix. Everyone else is unaffected. No system-wide freeze. Forward-only. No rollback.
The audit trail
Every routing decision is logged. For every actor evaluated in a deployment, there is a record of: which version it was on, which version it moved to (or why it stayed), its history fingerprint at decision time, and the registered prefix hash it was compared against.
GET /v1/actors/:id/decisions returns the full routing history for a single actor. When a customer asks "why didn't my application get the new income verification step," the answer is in the database, not in a support ticket.
Early access
We are at early access stage. The platform is running on a VPS, 432 tests passing, real migration engine deployed.
We are specifically looking for developers who have hit the workflow migration problem in production — people who have written migration scripts they were not happy with, people who have hit non-determinism errors on Temporal after a versioning change, people who have kept old workflow code running forever because they had no other option.
Free access, no strings attached. We want honest feedback from people who understand the problem space. If that is you, reach out at statekeep.support@gmail.com with a sentence about what you are building. We will get you set up.
We are not looking for validation. We are looking for the edge cases we have not thought of yet.
Top comments (0)