What is TokenGate?
TokenGate is a beta Python concurrency system built around a token-managed execution model. Instead of managing threads directly, you decorate synchronous functions and the system handles routing, admission, and worker assignment automatically.
The core idea is simple: every function call becomes a token. That token moves through a lifecycle — created, waiting, admitted, executing, completed — while the coordinator manages a pool of core-pinned workers underneath. You interact with the public API, TokenGate handles everything else.
As of v0.2.2.0 tokens are natively awaitable. That's what made this test possible to write cleanly:
result = await token
results = await asyncio.gather(*tokens)
The decorated functions stay synchronous. The orchestrator stays async. TokenGate sits between them and keeps the pipeline full.
What I did
I dispatched 65,536 task tokens simultaneously and completed all of them in 30 seconds. Zero failures. This's what the numbers actually showed.
What I found was an unexpected goldilocks zone — a region of task sustainability that not only exceeded worker capacity but held stable all the way to 65,536 simultaneous submissions:
RESULTS SUMMARY
Wave Tokens OK Fail Time Tok/s Lat(ms) Conc Overlap ΣTask(ms)
-------------------------------------------------------------------------------------
1 4 4 0 0.002s 1718.7 0.582ms 1.00× 2.39× 5.56ms
2 8 8 0 0.002s 3781.8 0.264ms 2.20× 3.79× 8.01ms
3 16 16 0 0.004s 4067.5 0.246ms 2.37× 5.42× 21.31ms
4 32 32 0 0.007s 4392.7 0.228ms 2.56× 11.90× 86.71ms
5 64 64 0 0.015s 4268.0 0.234ms 2.48× 23.01× 345.10ms
6 128 128 0 0.029s 4475.1 0.223ms 2.60× 38.72× 1107.58ms
7 256 256 0 0.059s 4331.8 0.231ms 2.52× 52.89× 3125.85ms
8 512 512 0 0.113s 4532.8 0.221ms 2.64× 60.30× 6810.70ms
9 1024 1024 0 0.246s 4165.0 0.240ms 2.42× 61.87× 15211.02ms
10 2048 2048 0 0.505s 4052.4 0.247ms 2.36× 33.82× 17091.60ms
11 4096 4096 0 0.980s 4181.1 0.239ms 2.43× 33.15× 32476.06ms
12 8192 8192 0 2.163s 3786.8 0.264ms 2.20× 30.14× 65197.90ms
13 16384 16384 0 4.327s 3786.7 0.264ms 2.20× 23.91× 103450.88ms
14 32768 32768 0 8.754s 3743.3 0.267ms 2.18× 18.72× 163876.79ms
15 65536 65536 0 17.314s 3785.2 0.264ms 2.20× 17.16× 297095.85ms
-------------------------------------------------------------------------------------
TOTAL 131068 131068 0 86.845s
Avg latency across waves : 0.268 ms/token
Peak concurrency ratio : 2.64×
Peak overlap ratio : 61.87×
What "Sub-Quadratic Scaling" Really Means
This isn't a special property of AI or machine learning — it's basic math that shows up anywhere work can be parallelized. A calculator running two operations simultaneously instead of sequentially is doing the same thing at a smaller scale depending on the return times. The term just describes a curve: double the input, less than double the cost.
The overlap ratio — the rightmost metric — measures Σ(individual task execution times) divided by wave elapsed time. If every task ran back-to-back on a single thread it would read 1.0×. Values above 1× mean real parallel execution is happening. Values well above worker count mean something more interesting is going on.
Workers that finish a short task don't wait for the wave to end. They immediately pull the next token. So a worker that cycles through eight short tasks within one wave window contributes eight task-durations to the overlap sum while only occupying one worker-slot in wall time. The total concurrent activity compounds.
This is sub-quadratic scaling — doubling the token count costs less than double the time, because additional tokens slide into gaps that already exist in the schedule rather than adding full sequential cost. The system does more work per unit time as load increases, up to a point that point is wave 9.
At 1024 tokens the overlap ratio peaks at 61.87×. Wave 10 transitions — workers saturate fully, rapid cycling slows, and the ratio settles near the hardware worker count. From there it holds. Wave 15 at 65,536 tokens: 17.16× sustained overlap, flat throughput, zero failures. The system found its floor and stayed there even when heavily overloaded.
What do these tasks look like?
These are deliberately varied and non-trivial — a prime sieve, string manipulation, list sorting, and an iterative SHA-256 chain. All four are plain synchronous functions. The decorator is the only thing that makes them token-aware:
@task_token_guard(operation_type='cpu_crunch', tags={'weight': 'light'})
def cpu_crunch(n: int) -> int:
"""Sum primes up to n — lightweight CPU-bound work."""
total = 0
for i in range(2, n):
if all(i % j != 0 for j in range(2, int(i ** 0.5) + 1)):
total += i
return total
@task_token_guard(operation_type='string_ops', tags={'weight': 'light'})
def string_transform(seed: int) -> str:
"""Generate and mangle a string — lightweight string work."""
rng = random.Random(seed)
chars = [rng.choice(string.ascii_letters) for _ in range(300)]
text = ''.join(chars)
return text[::-1].upper().replace('A', '4').replace('E', '3').replace('I', '1')
@task_token_guard(operation_type='data_transform', tags={'weight': 'medium'})
def data_sort(size: int) -> List[int]:
"""Sort a random list — medium CPU work."""
rng = random.Random(size)
data = [rng.randint(0, 100_000) for _ in range(size)]
return sorted(data)
@task_token_guard(operation_type='hash_compute', tags={'weight': 'heavy'})
def hash_chain(seed: str, iterations: int) -> str:
"""Iterative SHA-256 chain — heavier CPU-bound work."""
h = seed.encode()
for _ in range(iterations):
h = hashlib.sha256(h).digest()
return h.hex()
The entire orchestrator is async and touches no internal controls:
tokens = submit_batch(target)
results = await asyncio.gather(*tokens, return_exceptions=True)
That's it. Submit, await, report.
Closing
Making this system fail under normal conditions would require hundreds of thousands of simultaneous submissions — a load that isn't realistic for a single server in any real scenario. The architecture stabilizes under load rather than degrading. That's not an accident, it's a consequence of how token admission and worker pinning interact at scale.
Your sweet spot will land in a different place than mine depending on your hardware. The test is in demo/ — run it and see where your system peaks.
Top comments (0)