<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: nanasi</title>
    <description>The latest articles on DEV Community by nanasi (@nanasi).</description>
    <link>https://dev.to/nanasi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3196651%2F1365ef99-9e3a-4623-9cbb-50c2af6b6cca.jpg</url>
      <title>DEV Community: nanasi</title>
      <link>https://dev.to/nanasi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nanasi"/>
    <language>en</language>
    <item>
      <title>Building a Tokenizer 9.5x Faster than SentencePiece Unigram in Pure Rust 🦀</title>
      <dc:creator>nanasi</dc:creator>
      <pubDate>Fri, 29 May 2026 04:11:55 +0000</pubDate>
      <link>https://dev.to/nanasi/building-a-tokenizer-95x-faster-than-sentencepiece-unigram-in-pure-rust-bkl</link>
      <guid>https://dev.to/nanasi/building-a-tokenizer-95x-faster-than-sentencepiece-unigram-in-pure-rust-bkl</guid>
      <description>&lt;p&gt;Tokenization is one of those silent bottlenecks in the Large Language Model (LLM) world. While GPUs do the heavy lifting of running the model, the CPU is responsible for splitting raw text into token IDs. &lt;/p&gt;

&lt;p&gt;In particular, the &lt;strong&gt;Unigram tokenization algorithm&lt;/strong&gt; (popularized by Google's &lt;strong&gt;SentencePiece&lt;/strong&gt; and used in models like T5 and Mistral) is notoriously CPU-intensive. During inference, it builds a lattice (a graph) of valid vocabulary matches and runs the &lt;strong&gt;Viterbi algorithm&lt;/strong&gt; (a dynamic programming search) to find the path that maximizes joint probability. &lt;/p&gt;

&lt;p&gt;While mathematically elegant, this graph search and per-input working memory overhead limits throughput. &lt;/p&gt;

&lt;p&gt;What if we could move that complexity from &lt;em&gt;runtime&lt;/em&gt; to &lt;em&gt;distillation time&lt;/em&gt;?&lt;/p&gt;

&lt;p&gt;Meet &lt;strong&gt;Tenya&lt;/strong&gt;, an experimental &lt;strong&gt;Compiled Distilled Unigram (CDU)&lt;/strong&gt; tokenizer written in Rust that tokenizes text at &lt;strong&gt;23 Million tokens per second&lt;/strong&gt;—achieving a &lt;strong&gt;9.5x speedup&lt;/strong&gt; over SentencePiece's core engine, with &lt;strong&gt;zero heap allocations&lt;/strong&gt; in the hot path.&lt;/p&gt;

&lt;p&gt;Here is how it works, the code behind it, and the performance trade-offs.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ The Bottleneck: Probabilistic Dynamic Programming
&lt;/h2&gt;

&lt;p&gt;Under standard SentencePiece Unigram, tokenizing a string requires evaluating multiple matching subwords. For a word like &lt;code&gt;tokenization&lt;/code&gt;, the dictionary might contain &lt;code&gt;token&lt;/code&gt;, &lt;code&gt;to&lt;/code&gt;, &lt;code&gt;ken&lt;/code&gt;, &lt;code&gt;iz&lt;/code&gt;, and &lt;code&gt;ation&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;To choose the best split, standard Unigram:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Builds a graph of possible cuts based on vocabulary overlaps.&lt;/li&gt;
&lt;li&gt;Assigns log-probability scores to each edge.&lt;/li&gt;
&lt;li&gt;Runs a Viterbi path-search to find the path with the highest joint likelihood.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Even with memory-optimized lattices, doing a graph-search dynamically for every single text string creates overhead. &lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 The Tenya Solution: Compiled Distilled Unigrams (CDU)
&lt;/h2&gt;

&lt;p&gt;Tenya shifts this paradigm. It asks: &lt;strong&gt;What if we did the dynamic programming once during training/export, and used a simple, deterministic search during inference?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The architecture works in three phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Train&lt;/strong&gt;: Train a SentencePiece Unigram model (the \"Teacher\") on a corpus.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distill&lt;/strong&gt;: Tokenize the corpus with the teacher to count how often each vocabulary piece is actually selected. We then calculate a static priority score for each token:
$$\text{Priority} = \text{Teacher Log-Probability} + \alpha \cdot \ln(\text{Corpus Frequency} + 1)$$&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compile&lt;/strong&gt;: Export this vocabulary to a JSON file, which Tenya loads and flattens into a contiguous, cache-friendly prefix Trie in RAM.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At runtime, Tenya does a single forward pass over the bytes of the text, matching character sequences greedily using this flat Trie and resolving conflicts using the pre-computed priorities. &lt;/p&gt;




&lt;h2&gt;
  
  
  🦀 Inside the Rust Core: Zero Allocation lookups
&lt;/h2&gt;

&lt;p&gt;To make Tenya run as fast as possible, we implemented it in Rust with &lt;strong&gt;zero heap allocations in the hot path&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;Instead of using pointer-based nodes allocated all over the heap, the prefix tree is compiled into a flat vector of contiguous memory nodes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;TrieNode&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;token_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;f64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;children&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// (byte, next_node_index)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since the children of each node are sorted by their byte values, transitioning to the next node is just a cache-friendly &lt;strong&gt;binary search&lt;/strong&gt; on a contiguous memory block.&lt;/p&gt;

&lt;p&gt;During tokenization, the main loop traverses this Trie byte-by-byte, resolving matches and pushing token IDs into a pre-allocated vector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;encode_bytes_into&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="nf"&gt;.len&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;best_match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;f64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;None&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.trie&lt;/span&gt;&lt;span class="nf"&gt;.common_prefix_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="n"&gt;token_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;match_len&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Pick the match with the highest distilled priority score&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;update&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;best_match&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nb"&gt;None&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;best_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;best_priority&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="n"&gt;priority&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;best_priority&lt;/span&gt; &lt;span class="p"&gt;||&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;priority&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;best_priority&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;match_len&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;best_len&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;};&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;update&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;best_match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;token_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;match_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;token_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;match_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;best_match&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="nf"&gt;.push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;match_len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Byte fallback for unknown bytes&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;byte&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
            &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="nf"&gt;.push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.fallback&lt;/span&gt;&lt;span class="nf"&gt;.lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
            &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🔄 Guaranteed 100% Reversibility
&lt;/h3&gt;

&lt;p&gt;Many greedy tokenizers lose spaces or fail on emojis. Tenya guarantees &lt;strong&gt;100% round-trip reversibility&lt;/strong&gt; by reserving the final 256 IDs in its vocabulary for &lt;strong&gt;explicit byte fallbacks&lt;/strong&gt;. If a character sequence is not in the Trie, it falls back to the exact bytes of the characters. When decoding, these bytes are reconstructed directly back into valid UTF-8.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏁 Empirical Benchmarks
&lt;/h2&gt;

&lt;p&gt;We benchmarked Tenya against Google's SWIG-wrapped SentencePiece Unigram C++ library on a validation corpus consisting of mixed prose, numbers, and code:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;SentencePiece Unigram (Teacher)&lt;/th&gt;
&lt;th&gt;Tenya (LongestMatch)&lt;/th&gt;
&lt;th&gt;Tenya (PriorityMatch)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;p50 Latency (ms)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.415 ms&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0.047 ms&lt;/strong&gt; (8.8x faster)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0.124 ms&lt;/strong&gt; (3.3x faster)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;p95 Latency (ms)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.890 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.056 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.149 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Throughput (MB/s)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.67 MB/s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;32.21 MB/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;12.25 MB/s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Throughput (Tokens/sec)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.67 M tokens/s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;23.30 M tokens/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11.28 M tokens/s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Teacher F1 Agreement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100.00% (Baseline)&lt;/td&gt;
&lt;td&gt;38.57%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;43.80%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reversibility Rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100.00%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100.00%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100.00%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why is there a speed difference between strategies?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LongestMatch&lt;/strong&gt; is the fastest (&lt;strong&gt;23.30M tokens/s&lt;/strong&gt;). It takes large steps through the text, matching long words, which means it rarely restarts its search from the root of the Trie.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PriorityMatch&lt;/strong&gt; is slower (&lt;strong&gt;11.28M tokens/s&lt;/strong&gt;) but matches the teacher's style much closer (&lt;strong&gt;43.80% F1 alignment&lt;/strong&gt;). Because it prefers high-priority short sub-words (like common syllables), it has to restart its search from the root node more frequently.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ⚖️ The Trade-offs
&lt;/h2&gt;

&lt;p&gt;Is this a free lunch? No.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Vocabulary Fragmentation&lt;/strong&gt;: Because Tenya uses a greedy prefix matcher instead of dynamic programming path optimization, it tends to split words into slightly smaller pieces. Tenya generates &lt;strong&gt;1.5x to 1.9x more tokens&lt;/strong&gt; than the teacher, meaning the text representations are longer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not a Drop-in Replacement&lt;/strong&gt;: You cannot plug Tenya directly into an already-trained LLaMA model because the word boundaries will mismatch, resulting in garbage outputs. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Where Tenya excels&lt;/strong&gt; is when you are &lt;strong&gt;training a new model from scratch&lt;/strong&gt;, building high-throughput dataset loading pipelines, or running local inference on highly resource-constrained IoT/edge hardware.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Try it yourself!
&lt;/h2&gt;

&lt;p&gt;The repository is open-source and includes the entire Python training pipeline, the Rust core runtime, the command-line CLI, and Criterion benchmark targets.&lt;/p&gt;

&lt;p&gt;Get started by cloning the repo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/spellsaif/tenya.git
&lt;span class="nb"&gt;cd &lt;/span&gt;tenya
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the pipeline to train, distill, and export:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set up venv&lt;/span&gt;
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
pip &lt;span class="nb"&gt;install &lt;/span&gt;sentencepiece

&lt;span class="c"&gt;# Train teacher &amp;amp; distill policy&lt;/span&gt;
python3 python/train_teacher.py &lt;span class="nt"&gt;--input&lt;/span&gt; data/sample.txt &lt;span class="nt"&gt;--vocab-size&lt;/span&gt; 500 &lt;span class="nt"&gt;--model-prefix&lt;/span&gt; teacher
python3 python/distill_policy.py &lt;span class="nt"&gt;--input&lt;/span&gt; data/sample.txt &lt;span class="nt"&gt;--model&lt;/span&gt; teacher.model &lt;span class="nt"&gt;--output&lt;/span&gt; policy.json
python3 python/export_tenya_vocab.py &lt;span class="nt"&gt;--model&lt;/span&gt; teacher.model &lt;span class="nt"&gt;--policy&lt;/span&gt; policy.json &lt;span class="nt"&gt;--output&lt;/span&gt; vocab.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tokenize text using the Rust CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo run &lt;span class="nt"&gt;--release&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; tenya-cli &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;--vocab&lt;/span&gt; vocab.json encode &lt;span class="nt"&gt;--text&lt;/span&gt; &lt;span class="se"&gt;\"&lt;/span&gt;hello world&lt;span class="se"&gt;\"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check out the full repository here:&lt;br&gt;
👉 &lt;strong&gt;&lt;a href="https://github.com/spellsaif/tenya" rel="noopener noreferrer"&gt;github.com/spellsaif/tenya&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let me know what you think of this architecture in the comments!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>llm</category>
      <category>programming</category>
    </item>
    <item>
      <title>I stopped trusting middleware for everything (almost)</title>
      <dc:creator>nanasi</dc:creator>
      <pubDate>Wed, 25 Mar 2026 06:06:06 +0000</pubDate>
      <link>https://dev.to/nanasi/i-stopped-trusting-middleware-for-everything-almost-4f2g</link>
      <guid>https://dev.to/nanasi/i-stopped-trusting-middleware-for-everything-almost-4f2g</guid>
      <description>&lt;p&gt;Not because middleware is bad.&lt;/p&gt;

&lt;p&gt;But because I was using it for things it was never meant to guarantee.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fho1rsecaeszw76vsihrv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fho1rsecaeszw76vsihrv.png" alt=" " width="400" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern we all write
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;authMiddleware&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/me&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works. It’s simple. It’s familiar.&lt;/p&gt;

&lt;p&gt;But it relies on an assumption:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“&lt;code&gt;user&lt;/code&gt; will be there when I need it.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And nothing actually enforces that.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where things break
&lt;/h2&gt;

&lt;p&gt;A few very normal mistakes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You forget to apply middleware to a route&lt;/li&gt;
&lt;li&gt;You register it in the wrong order&lt;/li&gt;
&lt;li&gt;You refactor something and break the chain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything still compiles. The app still runs.&lt;/p&gt;

&lt;p&gt;The failure shows up later — usually when it matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  Middleware isn’t the problem
&lt;/h2&gt;

&lt;p&gt;Frameworks like &lt;strong&gt;Hono&lt;/strong&gt; and &lt;strong&gt;Elysia&lt;/strong&gt; do middleware really well:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hono keeps things minimal and close to Web standards&lt;/li&gt;
&lt;li&gt;Elysia pushes type safety further than most frameworks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And middleware itself is great for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logging&lt;/li&gt;
&lt;li&gt;compression&lt;/li&gt;
&lt;li&gt;request/response transformations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s exactly what it’s designed for.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real issue
&lt;/h2&gt;

&lt;p&gt;The problem is when we use middleware for something else:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;data dependencies&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When a handler depends on &lt;code&gt;user&lt;/code&gt;, that dependency is implicit.&lt;/p&gt;

&lt;p&gt;It’s not declared anywhere. It’s just assumed.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I tried instead
&lt;/h2&gt;

&lt;p&gt;Instead of replacing middleware, I separated concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Middleware → handles &lt;strong&gt;flow&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Relics → enforce &lt;strong&gt;what must exist&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Defining a contract
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;UserCtx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;authRelic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;relic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;UserCtx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Unauthorized&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I provide &lt;code&gt;UserCtx&lt;/code&gt;, or I fail.”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Using it in routes
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;authRelic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/me&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;relic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;UserCtx&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the handler doesn’t assume anything.&lt;/p&gt;

&lt;p&gt;If it runs, the dependency is already satisfied.&lt;/p&gt;




&lt;h2&gt;
  
  
  What changed
&lt;/h2&gt;

&lt;p&gt;Before:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“this should exist”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“this must exist”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And if it doesn’t, the app fails at startup.&lt;/p&gt;




&lt;h2&gt;
  
  
  Comparing approaches (in good faith)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concern&lt;/th&gt;
&lt;th&gt;Middleware (Hono / Elysia)&lt;/th&gt;
&lt;th&gt;Relics (Tomoe)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Flow control&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Not the goal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-cutting concerns&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Not the goal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data dependencies&lt;/td&gt;
&lt;td&gt;Implicit&lt;/td&gt;
&lt;td&gt;Explicit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Guarantees&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Enforced&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failure timing&lt;/td&gt;
&lt;td&gt;Runtime&lt;/td&gt;
&lt;td&gt;Startup&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;They’re not competing tools — they solve different problems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this felt better
&lt;/h2&gt;

&lt;p&gt;Most of my bugs weren’t about routing or performance.&lt;/p&gt;

&lt;p&gt;They were about assumptions.&lt;/p&gt;

&lt;p&gt;Middleware made those assumptions easy to write.&lt;/p&gt;

&lt;p&gt;Relics made them explicit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;I didn’t replace middleware.&lt;/p&gt;

&lt;p&gt;I stopped asking it to do something it was never designed to do.&lt;/p&gt;




&lt;p&gt;I’m building this idea into a small framework called &lt;strong&gt;Tomoe&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Still early, but I’d love feedback:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/Project-Tomoe/tomoe" rel="noopener noreferrer"&gt;https://github.com/Project-Tomoe/tomoe&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>javascript</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
