Chaos engineering tools usually target infrastructure: pods, nodes, networks, CPU pressure. But many real failures happen inside the application logic - in retries, compensations, concurrency, or state transitions.
Infrastructure tools can’t simulate a panic inside a function, a 15ms delay at the wrong moment, or an internal invariant break.
ChaosKit solves that by bringing chaos engineering directly into Go code.
GitHub: https://github.com/rom8726/chaoskit
Why Code-Level Chaos Matters
While building the workflow engine, it became clear that most critical failures were logic-level, not cluster-level:
a compensation step panics;
a goroutine leak accumulates over time;
a tiny delay causes a race condition.
ChaosKit is designed to expose these failures early - inside your unit and integration tests.
How ChaosKit Works
ChaosKit provides controlled "chaos points" you can insert into your code:
chaoskit.MaybeDelay(ctx)
chaoskit.MaybeError(ctx)
chaoskit.MaybePanic(ctx)
By default, these are no-ops.
Chaos injectors activate only when:
the binary is built with
-tags=chaos, andyou attach a chaos context:
ctx := chaoskit.AttachChaos(context.Background())
This ensures that chaos never triggers accidentally in production.
Injectors, Validators, Scenarios
ChaosKit consists of three core building blocks:
Injectors
Simulate failures:
delays
random errors
panic injection
ToxiProxy-based network faults
custom logic
Example:
Inject("latency", injectors.RandomDelay(5*time.Millisecond, 20*time.Millisecond))
Validators
Check invariants after each step:
goroutine count
recursion depth
no infinite loops
custom validation
Example:
Assert("goroutines", validators.GoroutineLimit(100))
Scenarios
Define steps, injectors, validators, and repetition:
scenario := chaoskit.NewScenario("workflow").
WithTarget(engine).
Step("run", ExecuteWorkflow).
Inject("panic", injectors.PanicWithProbability(0.15)).
Assert("limit", validators.GoroutineLimit(100)).
Repeat(50).
Build()
Chaos Inside go test
ChaosKit integrates directly with Go’s testing package:
chaostest.RunChaos(t, "workflow", target, func(s *chaoskit.ScenarioBuilder) *chaoskit.ScenarioBuilder {
return s.
Step("inc", func(ctx context.Context, target chaoskit.Target) error {
chaoskit.MaybeDelay(ctx)
target.(*TestTarget).Increment()
return nil
}).
Inject("delay", injectors.RandomDelay(5*time.Millisecond, 20*time.Millisecond))
},
chaostest.WithRepeat(5),
chaostest.WithDefaultThresholds(), // 95% success required
)
This allows developers to run chaos experiments during normal test runs, without extra infrastructure.
Useful for:
CI-friendly probabilistic chaos tests
invariant checks under failure
catching race conditions and leaks
validating retry/compensation logic
Runtime behavior
ChaosKit is engineered to avoid accidental activation:
Build tag isolation (
\-tags=chaos) Chaos code is excluded from production binaries.Explicit chaos context Without
AttachChaos, everything is a no-op.Network chaos is external ToxiProxy never touches production traffic.
Monkey patching disabled by default Used only in tests, only if explicitly enabled.
When ChaosKit Is Valuable
Great fit for:
workflow/Saga engines
stateful services
concurrency-heavy components
libraries with retry logic
business logic with invariants
Not ideal for:
pure infrastructure chaos (use Chaos Mesh, Litmus)
tests that can’t integrate
Maybe*calls
Conclusion
ChaosKit focuses on the part of the system that is hardest to test and easiest to break: your code’s internal failure paths.
If your Go project relies on workflows, compensations, or complex state transitions, ChaosKit helps ensure correctness under unpredictable conditions - without requiring Kubernetes, Docker, or cluster chaos tools.
Top comments (0)