Lessons from Building a Minimal Execution Engine
Modern systems are fast — but predictable fast is rare.
Most frameworks optimize for throughput, developer velocity, or horizontal scalability. When you care about tail latency, determinism, and sub-microsecond critical paths, those abstractions often become liabilities.
I built SubMicro Execution Engine to explore what happens when latency — not features — is the primary design constraint. Below are a few practical lessons that shaped the system.
- Latency Lives in the Edges, Not the Core Logic The actual “work” a system performs is rarely the bottleneck.
Latency hides in:
- memory allocation
- cache-line contention
- branch misprediction
- scheduler handoffs
synchronization primitives
The engine minimizes these by:keeping hot paths allocation-free
favoring flat, cache-friendly data layouts
avoiding implicit synchronization
designing execution flows that fit in L1/L2 cache
If you can’t draw the hot path from memory, you don’t control latency.
- Determinism Beats Raw Throughput A system that does 1M ops/sec sometimes is less useful than one that does 200k ops/sec always.
Design choices were guided by:
- stable execution order
- predictable scheduling
- minimal dynamic behavior in hot paths
- This trades peak throughput for tight latency distributions, which matter far more in real-time and trading-style systems.
- Abstractions Have a Cost — Measure Them Ruthlessly Abstractions aren’t bad, but unmeasured abstractions are dangerous.
In low-latency systems:
- virtual dispatch can cost more than the logic itself
- generic containers hide memory access patterns
“clean” interfaces often fragment the execution path
The engine favors:explicit control over execution
visible data movement
simple, inspectable components
Code clarity is preserved by removing layers, not adding them.
- Scheduling Is a Latency Feature Schedulers decide when work happens — which is as important as what happens.
Design considerations include:
- minimal context switching
- optional busy-polling strategies
- execution models that avoid OS interference in hot paths The goal is to keep execution close to the CPU, not bouncing between queues and threads.
- Measure the Tail, Not the Average Average latency lies.
The engine is designed with the assumption that:
- p99 and p99.9 matter more than the mean
- occasional spikes break real-time systems
- instrumentation must be lightweight enough for production use If you don’t measure the tail, you are optimizing blind.
Closing Thoughts
Sub-microsecond systems are not built by adding optimizations — they’re built by removing uncertainty.
This project is intentionally minimal. It is not a framework. It is an exploration of how far you can push latency control when every design decision answers one question:
Does this reduce or increase unpredictability?
Repo: submicro-execution-engine
GitHub: https://github.com/krish567366/submicro-execution-engine
website: https://submicro.krishnabajpai.me/
Top comments (0)