DEV Community

Krishna Bajpai
Krishna Bajpai

Posted on

Designing for Sub-Microsecond Latency (link)

Lessons from Building a Minimal Execution Engine
Modern systems are fast — but predictable fast is rare.

Most frameworks optimize for throughput, developer velocity, or horizontal scalability. When you care about tail latency, determinism, and sub-microsecond critical paths, those abstractions often become liabilities.

I built SubMicro Execution Engine to explore what happens when latency — not features — is the primary design constraint. Below are a few practical lessons that shaped the system.

  1. Latency Lives in the Edges, Not the Core Logic The actual “work” a system performs is rarely the bottleneck.

Latency hides in:

  • memory allocation
  • cache-line contention
  • branch misprediction
  • scheduler handoffs
  • synchronization primitives
    The engine minimizes these by:

  • keeping hot paths allocation-free

  • favoring flat, cache-friendly data layouts

  • avoiding implicit synchronization

  • designing execution flows that fit in L1/L2 cache

  • If you can’t draw the hot path from memory, you don’t control latency.

  1. Determinism Beats Raw Throughput A system that does 1M ops/sec sometimes is less useful than one that does 200k ops/sec always.

Design choices were guided by:

  • stable execution order
  • predictable scheduling
  • minimal dynamic behavior in hot paths
  • This trades peak throughput for tight latency distributions, which matter far more in real-time and trading-style systems.
  1. Abstractions Have a Cost — Measure Them Ruthlessly Abstractions aren’t bad, but unmeasured abstractions are dangerous.

In low-latency systems:

  • virtual dispatch can cost more than the logic itself
  • generic containers hide memory access patterns
  • “clean” interfaces often fragment the execution path
    The engine favors:

  • explicit control over execution

  • visible data movement

  • simple, inspectable components

  • Code clarity is preserved by removing layers, not adding them.

  1. Scheduling Is a Latency Feature Schedulers decide when work happens — which is as important as what happens.

Design considerations include:

  • minimal context switching
  • optional busy-polling strategies
  • execution models that avoid OS interference in hot paths The goal is to keep execution close to the CPU, not bouncing between queues and threads.
  1. Measure the Tail, Not the Average Average latency lies.

The engine is designed with the assumption that:

  • p99 and p99.9 matter more than the mean
  • occasional spikes break real-time systems
  • instrumentation must be lightweight enough for production use If you don’t measure the tail, you are optimizing blind.

Closing Thoughts
Sub-microsecond systems are not built by adding optimizations — they’re built by removing uncertainty.

This project is intentionally minimal. It is not a framework. It is an exploration of how far you can push latency control when every design decision answers one question:

Does this reduce or increase unpredictability?

Repo: submicro-execution-engine
GitHub: https://github.com/krish567366/submicro-execution-engine
website: https://submicro.krishnabajpai.me/

Top comments (0)