DEV Community

Roman Chudov
Roman Chudov

Posted on

ChaosKit: Code-Level Chaos Engineering for Go

Chaos engineering tools usually target infrastructure: pods, nodes, networks, CPU pressure. But many real failures happen inside the application logic - in retries, compensations, concurrency, or state transitions.

Infrastructure tools can’t simulate a panic inside a function, a 15ms delay at the wrong moment, or an internal invariant break.

ChaosKit solves that by bringing chaos engineering directly into Go code.

GitHub: https://github.com/rom8726/chaoskit


Why Code-Level Chaos Matters

While building the workflow engine, it became clear that most critical failures were logic-level, not cluster-level:

  • a compensation step panics;

  • a goroutine leak accumulates over time;

  • a tiny delay causes a race condition.

ChaosKit is designed to expose these failures early - inside your unit and integration tests.


How ChaosKit Works

ChaosKit provides controlled "chaos points" you can insert into your code:

chaoskit.MaybeDelay(ctx)
chaoskit.MaybeError(ctx)
chaoskit.MaybePanic(ctx)
Enter fullscreen mode Exit fullscreen mode

By default, these are no-ops.

Chaos injectors activate only when:

  • the binary is built with -tags=chaos, and

  • you attach a chaos context:

   ctx := chaoskit.AttachChaos(context.Background())
Enter fullscreen mode Exit fullscreen mode

This ensures that chaos never triggers accidentally in production.


Injectors, Validators, Scenarios

ChaosKit consists of three core building blocks:

Injectors

Simulate failures:

  • delays

  • random errors

  • panic injection

  • ToxiProxy-based network faults

  • custom logic

Example:

Inject("latency", injectors.RandomDelay(5*time.Millisecond, 20*time.Millisecond))
Enter fullscreen mode Exit fullscreen mode

Validators

Check invariants after each step:

  • goroutine count

  • recursion depth

  • no infinite loops

  • custom validation

Example:

Assert("goroutines", validators.GoroutineLimit(100))
Enter fullscreen mode Exit fullscreen mode

Scenarios

Define steps, injectors, validators, and repetition:

scenario := chaoskit.NewScenario("workflow").
    WithTarget(engine).
    Step("run", ExecuteWorkflow).
    Inject("panic", injectors.PanicWithProbability(0.15)).
    Assert("limit", validators.GoroutineLimit(100)).
    Repeat(50).
    Build()
Enter fullscreen mode Exit fullscreen mode

Chaos Inside go test

ChaosKit integrates directly with Go’s testing package:

chaostest.RunChaos(t, "workflow", target, func(s *chaoskit.ScenarioBuilder) *chaoskit.ScenarioBuilder {
    return s.
        Step("inc", func(ctx context.Context, target chaoskit.Target) error {
            chaoskit.MaybeDelay(ctx)
            target.(*TestTarget).Increment()
            return nil
        }).
        Inject("delay", injectors.RandomDelay(5*time.Millisecond, 20*time.Millisecond))
},
    chaostest.WithRepeat(5),
    chaostest.WithDefaultThresholds(), // 95% success required
)
Enter fullscreen mode Exit fullscreen mode

This allows developers to run chaos experiments during normal test runs, without extra infrastructure.

Useful for:

  • CI-friendly probabilistic chaos tests

  • invariant checks under failure

  • catching race conditions and leaks

  • validating retry/compensation logic


Runtime behavior

ChaosKit is engineered to avoid accidental activation:

  1. Build tag isolation (\-tags=chaos) Chaos code is excluded from production binaries.

  2. Explicit chaos context Without AttachChaos, everything is a no-op.

  3. Network chaos is external ToxiProxy never touches production traffic.

  4. Monkey patching disabled by default Used only in tests, only if explicitly enabled.


When ChaosKit Is Valuable

Great fit for:

  • workflow/Saga engines

  • stateful services

  • concurrency-heavy components

  • libraries with retry logic

  • business logic with invariants

Not ideal for:

  • pure infrastructure chaos (use Chaos Mesh, Litmus)

  • tests that can’t integrate Maybe* calls


Conclusion

ChaosKit focuses on the part of the system that is hardest to test and easiest to break: your code’s internal failure paths.

If your Go project relies on workflows, compensations, or complex state transitions, ChaosKit helps ensure correctness under unpredictable conditions - without requiring Kubernetes, Docker, or cluster chaos tools.

Repo: https://github.com/rom8726/chaoskit

Top comments (0)