DEV Community: Joseph Boone

Maximum Concurrency (& Sub-Quadratic Scaling) By TokenGate

Joseph Boone — Sun, 26 Apr 2026 20:00:58 +0000

What is TokenGate?

TokenGate is a beta Python concurrency system built around a token-managed execution model. Instead of managing threads directly, you decorate synchronous functions and the system handles routing, admission, and worker assignment automatically.

The core idea is simple: every function call becomes a token. That token moves through a lifecycle — created, waiting, admitted, executing, completed — while the coordinator manages a pool of core-pinned workers underneath. You interact with the public API, TokenGate handles everything else.

As of v0.2.2.0 tokens are natively awaitable. That's what made this test possible to write cleanly:

result  = await token
results = await asyncio.gather(*tokens)

The decorated functions stay synchronous. The orchestrator stays async. TokenGate sits between them and keeps the pipeline full.

What I did

I dispatched 65,536 task tokens simultaneously and completed all of them in 30 seconds. Zero failures. This's what the numbers actually showed.

What I found was an unexpected goldilocks zone — a region of task sustainability that not only exceeded worker capacity but held stable all the way to 65,536 simultaneous submissions:

  RESULTS SUMMARY

  Wave   Tokens   OK    Fail      Time    Tok/s   Lat(ms)    Conc   Overlap   ΣTask(ms)
  -------------------------------------------------------------------------------------
  1      4        4     0       0.002s   1718.7    0.582ms   1.00×     2.39×       5.56ms
  2      8        8     0       0.002s   3781.8    0.264ms   2.20×     3.79×       8.01ms
  3      16       16    0       0.004s   4067.5    0.246ms   2.37×     5.42×      21.31ms
  4      32       32    0       0.007s   4392.7    0.228ms   2.56×    11.90×      86.71ms
  5      64       64    0       0.015s   4268.0    0.234ms   2.48×    23.01×     345.10ms
  6      128      128   0       0.029s   4475.1    0.223ms   2.60×    38.72×    1107.58ms
  7      256      256   0       0.059s   4331.8    0.231ms   2.52×    52.89×    3125.85ms
  8      512      512   0       0.113s   4532.8    0.221ms   2.64×    60.30×    6810.70ms
  9      1024     1024  0       0.246s   4165.0    0.240ms   2.42×    61.87×   15211.02ms
  10     2048     2048  0       0.505s   4052.4    0.247ms   2.36×    33.82×   17091.60ms
  11     4096     4096  0       0.980s   4181.1    0.239ms   2.43×    33.15×   32476.06ms
  12     8192     8192  0       2.163s   3786.8    0.264ms   2.20×    30.14×   65197.90ms
  13     16384    16384 0       4.327s   3786.7    0.264ms   2.20×    23.91×  103450.88ms
  14     32768    32768 0       8.754s   3743.3    0.267ms   2.18×    18.72×  163876.79ms
  15     65536    65536 0      17.314s   3785.2    0.264ms   2.20×    17.16×  297095.85ms
  -------------------------------------------------------------------------------------
  TOTAL  131068   131068 0      86.845s

  Avg latency across waves  : 0.268 ms/token
  Peak concurrency ratio    : 2.64×
  Peak overlap ratio        : 61.87×

What "Sub-Quadratic Scaling" Really Means

This isn't a special property of AI or machine learning — it's basic math that shows up anywhere work can be parallelized. A calculator running two operations simultaneously instead of sequentially is doing the same thing at a smaller scale depending on the return times. The term just describes a curve: double the input, less than double the cost.

The overlap ratio — the rightmost metric — measures Σ(individual task execution times) divided by wave elapsed time. If every task ran back-to-back on a single thread it would read 1.0×. Values above 1× mean real parallel execution is happening. Values well above worker count mean something more interesting is going on.

Workers that finish a short task don't wait for the wave to end. They immediately pull the next token. So a worker that cycles through eight short tasks within one wave window contributes eight task-durations to the overlap sum while only occupying one worker-slot in wall time. The total concurrent activity compounds.

This is sub-quadratic scaling — doubling the token count costs less than double the time, because additional tokens slide into gaps that already exist in the schedule rather than adding full sequential cost. The system does more work per unit time as load increases, up to a point that point is wave 9.

At 1024 tokens the overlap ratio peaks at 61.87×. Wave 10 transitions — workers saturate fully, rapid cycling slows, and the ratio settles near the hardware worker count. From there it holds. Wave 15 at 65,536 tokens: 17.16× sustained overlap, flat throughput, zero failures. The system found its floor and stayed there even when heavily overloaded.

What do these tasks look like?

These are deliberately varied and non-trivial — a prime sieve, string manipulation, list sorting, and an iterative SHA-256 chain. All four are plain synchronous functions. The decorator is the only thing that makes them token-aware:

@task_token_guard(operation_type='cpu_crunch', tags={'weight': 'light'})
def cpu_crunch(n: int) -> int:
    """Sum primes up to n — lightweight CPU-bound work."""
    total = 0
    for i in range(2, n):
        if all(i % j != 0 for j in range(2, int(i ** 0.5) + 1)):
            total += i
    return total


@task_token_guard(operation_type='string_ops', tags={'weight': 'light'})
def string_transform(seed: int) -> str:
    """Generate and mangle a string — lightweight string work."""
    rng = random.Random(seed)
    chars = [rng.choice(string.ascii_letters) for _ in range(300)]
    text = ''.join(chars)
    return text[::-1].upper().replace('A', '4').replace('E', '3').replace('I', '1')


@task_token_guard(operation_type='data_transform', tags={'weight': 'medium'})
def data_sort(size: int) -> List[int]:
    """Sort a random list — medium CPU work."""
    rng = random.Random(size)
    data = [rng.randint(0, 100_000) for _ in range(size)]
    return sorted(data)


@task_token_guard(operation_type='hash_compute', tags={'weight': 'heavy'})
def hash_chain(seed: str, iterations: int) -> str:
    """Iterative SHA-256 chain — heavier CPU-bound work."""
    h = seed.encode()
    for _ in range(iterations):
        h = hashlib.sha256(h).digest()
    return h.hex()

The entire orchestrator is async and touches no internal controls:

tokens  = submit_batch(target)
results = await asyncio.gather(*tokens, return_exceptions=True)

That's it. Submit, await, report.

Closing

Making this system fail under normal conditions would require hundreds of thousands of simultaneous submissions — a load that isn't realistic for a single server in any real scenario. The architecture stabilizes under load rather than degrading. That's not an accident, it's a consequence of how token admission and worker pinning interact at scale.

Your sweet spot will land in a different place than mine depending on your hardware. The test is in demo/ — run it and see where your system peaks.

https://tavari.online/

I Built a Threading Engine - I Need Results (Feedback)

Joseph Boone — Thu, 23 Apr 2026 04:26:37 +0000

I've been building TokenGate - an experimental Python concurrency engine that uses a token-based model to manage threaded tasks. No manual thread management, no futures, no ThreadPoolExecutor. Just a 3 line coordinator and single line decorators to manage all threading.

On my machine (Ryzen 7800 / RTX 4070 Super) I'm seeing 7.25x concurrency across 8 tasks of mixed work and 6.01x on sustained high variety workloads - but that's just one setup. I want to know what it does on yours.

Concurrency is measured across batches of 8 tasks in my testing scenarios to match my core count. It's possible to get a higher concurrency ratio. My tests are normalized.

What I'm asking

Try the demos (or make an app!), paste your results, that's it.

The whole demo suite takes about 5 minutes to set up and the results
tell me a lot about how the engine scales across different hardware.

How to get started

Option 1 - Direct download (fastest):

Grab the beta zip from tavari.online

Option 2 - Clone the repo:

git clone https://github.com/TavariAgent/Py-TokenGate
cd Py-TokenGate
pip install -r requirements.txt

Then check BETA.md
for the quick start.

What TokenGate actually does

Decorates synchronous functions with @task_token_guard - one line, done
Routes tasks through a token-managed thread pool automatically
Built-in DoS protection to prevent the system from overwhelming itself
Live telemetry via WebSocket GUI at localhost:5000
Works standalone or with the WebSocket dashboard

What I want to hear back

Your hardware (CPU model, core count)
Your experience (how did it work for you?)

Drop results here, open an Issue on GitHub or leave me feedback on tavari.online.

This is an active beta - rough edges are expected.
I'm self-taught, this is the work my learning experience produced.
If it's useful or interesting to you, I would appreciate the feedback.

GitHub Repo

1m Tokens (& WebSocket)

Joseph Boone — Thu, 19 Mar 2026 21:32:25 +0000

Greetings readers, I made a threading engine with many optimizations (including ML) and WebSocket task controls per operation.

Even when computing a slow moving series like Leibniz PI at 1 million token executions the tasks all resolved as expected in ~200 seconds.

# ── LAYER 0: TERM TOKENS ──────────────────────────────────────────────────────
@task_token_guard(operation_type='pi_term', tags={'weight': 'light'})
def compute_pi_term(n: int) -> str:
    """
    Compute a single Leibniz term: (-1)^n / (2n + 1)
    Returns as string to preserve Decimal precision across token boundary.
    Light weight — 1,000,000 of these fire simultaneously.
    """
    getcontext().prec = DECIMAL_PRECISION
    sign = Decimal(-1) ** n
    term = sign / Decimal(2 * n + 1)
    return str(term)

# ── LAYER 1: CHUNK TOKENS ─────────────────────────────────────────────────────
@task_token_guard(operation_type='pi_chunk', tags={'weight': 'light'})
def sum_chunk(term_strings: List[str]) -> str:
    """
    Sum a batch of Leibniz terms.
    Receives resolved term strings from Layer 0 tokens.
    Light weight — 1,000 of these, each summing 1,000 terms.
    """
    getcontext().prec = DECIMAL_PRECISION
    total = sum(Decimal(t) for t in term_strings)
    return str(total)

# ── LAYER 2: PARTIAL TOKENS ───────────────────────────────────────────────────
@task_token_guard(operation_type='pi_partial', tags={'weight': 'medium'})
def sum_partial(chunk_strings: List[str]) -> str:
    """
    Sum a batch of chunk sums.
    Receives resolved chunk strings from Layer 1 tokens.
    Medium weight — 10 of these, each summing 100 chunks.
    """
    getcontext().prec = DECIMAL_PRECISION
    total = sum(Decimal(c) for c in chunk_strings)
    return str(total)

Leibniz is intentionally the slowest converging PI series, it needs ~10 million terms for 7 correct digits. That makes it a good stress test: maximum token volume, minimum mathematical payoff.

(Note: 64 workers with SMT enabled is only ~7% faster on a 7800X3D — more workers doesn't always mean more throughput, especially for micro-ops where execution port contention becomes the real ceiling.)

Tokens move through async admission and resolve on pinned workers, CPU heavy tasks stay on core 1, light tasks distribute across the rest. Failure nets, duplication safety, and WebSocket controls prevent runaway processes at the process level.

Take a look at the repo:

TavariAgent / Py-TokenGate

Beta Python concurrency model using token-managed routing

TokenGate

Welcome to the TokenGate repository.

What it is:

A small experimental system for routing decorated synchronous functions
through a token-managed concurrency model. It is intended to operate as
its own concurrency workflow rather than alongside normal threading patterns.

What it is not:

It is not presented as production code.

Overview:

TokenGate is an exploration of token-managed concurrency: a
concept for coordinating async orchestration with thread-backed
work in a structured way.

This repository is a proof of concept, not a finished product.
It is experimental, still evolving, and shared in the spirit of
exploration.

If you'd like the fuller overview, please start here:

Proof of Concept

If anything here is useful, interesting, or sparks an
idea, that already makes this project worthwhile.

How to Use (Two Versions, Two Decorators)

Note: Do not attempt to decorate an async function.

The token decorator uses asyncio, but the decorated function itself should
…

View on GitHub

Threading Async Together

Joseph Boone — Mon, 16 Mar 2026 22:39:11 +0000

Hello readers,

I built a proof-of-concept application I call TokenGate. It’s a high performance async/threaded event bus, with control mechanisms designed to be extremely minimalist.

The core concept is to produce parallelism in concurrent operations through async token gathering and coordinated threading workers.

Here's what "TokenGate" uses to thread an operation:

# -- Python 3.12 -- #
from token_system import task_token_guard
from operations_coordinator import OperationsCoordinator

# 1. Decorated standard synchronous function for threading
@task_token_guard(operation_type='string_ops', tags={'weight': 'light'})
def string_operation_task(task_data):
    # This function is now threaded
    return result

# 2. Starts the coordinator (through a running loop)
coordinator = OperationsCoordinator()
coordinator.start()

# 3. finally or an exception stops on close
coordinator.stop()

Task tokens are generated by using a wrapped decorator.

Here's some test results on operations in a "release mechanism" that dispatches batches of mixed tasks incrementally:

CONCURRENCY BURST: Medium x8 | release 1464 (8 tasks)
======================================================================
  Submit spread (barrier jitter): 0.19ms
  Overall wall-clock:             0.009045s
  Min task duration:              0.007818s
  Max task duration:              0.008432s
  Mean task duration:             0.008148s
  Stdev (clustering indicator):   0.000218s

  Duration per task (tight clustering = true concurrency):
    Task 00: 0.007928s  
    Task 01: 0.008000s  
    Task 02: 0.008136s  
    Task 03: 0.008209s  
    Task 04: 0.008432s  
    Task 05: 0.008300s  
    Task 06: 0.008362s  
    Task 07: 0.007818s  

  Serial estimate (sum):  0.065186s
  Actual wall-clock:      0.009045s
  Concurrency ratio:      7.21x  (concurrent)

CONCURRENCY BURST [Medium x8 | release 1464] PASSED
======================================================================
CONCURRENCY WINDOW: Sustained mixed releases (30s)
======================================================================
  Releases:                       1484
  Total tasks:                    11872
  Overall wall-clock:             30.070291s
  Min task duration:              0.001157s
  Max task duration:              0.105874s
  Mean task duration:             0.014970s
  Stdev (clustering indicator):   0.025983s

  Serial estimate (sum):          177.728067s
  Actual wall-clock:              30.070291s
  Sustained concurrency ratio:    5.91x  (concurrent)

CONCURRENCY WINDOW [Sustained mixed releases (30s)] PASSED

CONCURRENCY SUITE COMPLETE.

(Concurrency ratios of up to 7.21x were witnessed on an 8 core CPU with ~32 dynamic workers in ideal conditions, which is roughly 90% of the 8x concurrent operation ceiling.)

I've tested a wide variety of normally threaded operations with result delivery as expected.

It's still a just a proof, however I've used it in various side-projects with good results.

For anyone interested here's my project on GitHub (with proofs):

Repo link - https://github.com/TavariAgent/Py-TokenGate