<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sergei Vaskov</title>
    <description>The latest articles on DEV Community by Sergei Vaskov (@sergeivaskov).</description>
    <link>https://dev.to/sergeivaskov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3907631%2F6e815ca7-8019-404b-9f08-fff88c942390.jpg</url>
      <title>DEV Community: Sergei Vaskov</title>
      <link>https://dev.to/sergeivaskov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sergeivaskov"/>
    <language>en</language>
    <item>
      <title>How I Save 80% of Neural Network Context When Working with Logs</title>
      <dc:creator>Sergei Vaskov</dc:creator>
      <pubDate>Fri, 01 May 2026 13:58:42 +0000</pubDate>
      <link>https://dev.to/sergeivaskov/how-i-save-80-of-neural-network-context-when-working-with-logs-3c48</link>
      <guid>https://dev.to/sergeivaskov/how-i-save-80-of-neural-network-context-when-working-with-logs-3c48</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I actively use AI to debug issues in my project development. I ran into an annoying problem: logs typically contain massive amounts of repetitive information. This eats up a huge chunk of input context without adding any real value.&lt;/p&gt;

&lt;p&gt;Here's a typical scenario: I notice a bug, copy 1000 lines of logs, and paste them into the chat. The neural network starts parsing the text, but out of those 1000 lines, only about 300 are actually unique. The rest is the same template repeated hundreds of times:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026-04-18T11:54:02.746 [WinFocusMonitor] computeIsSecureField: queryMsaaProtected == 0   
2026-04-18T11:54:02.747 [WinFocusMonitor] workerMain: hwnd=00000000016000FE idObject=-4 idChild=0 -&amp;gt; secure=0
2026-04-18T11:54:02.765 [WinFocusMonitor] computeIsSecureField: queryMsaaProtected == 0   
2026-04-18T11:54:02.765 [WinFocusMonitor] workerMain: hwnd=00000000003B0640 idObject=-4 idChild=-45107 -&amp;gt; secure=0
2026-04-18T11:54:04.787 [MemDiag] periodic t+9s WorkingSet=352.257812MB Peak=368.761719MB 
2026-04-18T11:54:04.788 [MemDiag] WordIndex en_US READY at t+9s
2026-04-18T11:54:04.788 [MemDiag] after  WordIndex en_US READY WorkingSet=352.320312MB Peak=368.761719MB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see the same components (&lt;code&gt;[WinFocusMonitor]&lt;/code&gt;, &lt;code&gt;[MemDiag]&lt;/code&gt;), the same keys (&lt;code&gt;WorkingSet=&lt;/code&gt;, &lt;code&gt;hwnd=&lt;/code&gt;), the same phrases over and over.&lt;/p&gt;

&lt;p&gt;The idea was to copy logs to the clipboard the standard way, then use Ctrl+Alt+V to paste an optimized version. As a result, with this simple background utility, I achieved log compression of up to 80% depending on volume and number of repetitions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 1: Basic Idea — Dictionary of Repetitions
&lt;/h2&gt;

&lt;p&gt;My first thought was simple: if something repeats, I'll extract it into a "legend" (dictionary) and replace it with a short tag. I asked a AI to write a PowerShell script that analyzes logs and extracts repeating elements into tags like &lt;code&gt;#1#&lt;/code&gt;, &lt;code&gt;#2#&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here's what those same logs look like after the first version of the script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;--- &lt;span class="n"&gt;LEGEND&lt;/span&gt; ---
&lt;span class="c"&gt;#T# = 2026-04-18T11:54:
#0# = [WinFocusMonitor]
#1# = [MemDiag]
#2# = hwnd=
#3# = idObject=
#4# = idChild=
#5# = secure=
#6# = WorkingSet=
#7# = Peak=
#8# = ' computeIsSecureField: queryMsaaProtected == 0'
#9# = ' WordIndex '
&lt;/span&gt;
--- &lt;span class="n"&gt;LOGS&lt;/span&gt; ---
&lt;span class="c"&gt;#T#02.746 #0##8#
#T#02.747 #0# workerMain: #00000000016000FE #3#-4 #4#0 -&amp;gt; #5#0
#T#02.765 #0#8   
#T#02.765 #0# workerMain: #00000000003B0640 #3#-4 #4#-45107 -&amp;gt; #5#0
#T#04.787 #1# periodic t+9s #6352.257812MB #7#368.761719MB 
#T#04.788 #1# #9#en_US READY at t+9s
#T#04.788 #1# after  #9#en_US READY #6#352.320312MB #7#368.761719MB
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All basic repeating elements are extracted into the legend at the top.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result of stage one:&lt;/strong&gt; 4072 characters → 2625 characters. &lt;strong&gt;Savings&lt;/strong&gt;: ~35%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 2: Tag Optimization with Base62
&lt;/h2&gt;

&lt;p&gt;When you have many variables, tags start looking like &lt;code&gt;#123&lt;/code&gt;, &lt;code&gt;#456&lt;/code&gt; — that's already 4-5 characters per tag.&lt;/p&gt;

&lt;p&gt;This can be optimized by adding lowercase and uppercase letters to tags. In this system, the first 62 variables can be written in just &lt;strong&gt;two characters&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;#0&lt;/code&gt;, &lt;code&gt;#1&lt;/code&gt;, &lt;code&gt;#2&lt;/code&gt;... &lt;code&gt;#9&lt;/code&gt; (10 items)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;#a&lt;/code&gt;, &lt;code&gt;#b&lt;/code&gt;, &lt;code&gt;#c&lt;/code&gt;... &lt;code&gt;#z&lt;/code&gt; (26 items)
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;#A&lt;/code&gt;, &lt;code&gt;#B&lt;/code&gt;, &lt;code&gt;#C&lt;/code&gt;... &lt;code&gt;#Z&lt;/code&gt; (26 items)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's 62 variables in two characters total. Beyond that come &lt;code&gt;#10&lt;/code&gt;, &lt;code&gt;#11&lt;/code&gt; and so on, but in three characters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;--- &lt;span class="n"&gt;LEGEND&lt;/span&gt; ---
&lt;span class="c"&gt;#T# = 2026-04-18T11:54:
#0# = [WinFocusMonitor]
#1# = [MemDiag]
#2# = hwnd=
#3# = idObject=
#4# = idChild=
#5# = secure=
#6# = WorkingSet=
#7# = Peak=
#8# = ' computeIsSecureField: queryMsaaProtected == 0'
#9# = ' WordIndex '
&lt;/span&gt;
--- &lt;span class="n"&gt;LOGS&lt;/span&gt; ---
&lt;span class="c"&gt;#T#02.746 #0##8#   
#T#02.747 #0# workerMain: #2#16000FE #3#-4 #4#0 -&amp;gt; #5#0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At larger scales, this can provide &lt;strong&gt;an additional 3-5% compression&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 3: Removing Leading Zeros
&lt;/h2&gt;

&lt;p&gt;Another detail that catches the eye in C++ logs — hex pointers with fixed width. The Windows API returns HWND as &lt;code&gt;00000000016000FE&lt;/code&gt; — eight leading zeros that carry no meaningful information.&lt;/p&gt;

&lt;p&gt;I optimized this by trimming leading zeros. What was &lt;code&gt;00000000016000FE&lt;/code&gt; became &lt;code&gt;16000FE&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;--- &lt;span class="n"&gt;LOGS&lt;/span&gt; ---
&lt;span class="c"&gt;#T#02.746 #0##8#   
#T#02.747 #0# workerMain: #2#16000FE #3#-4 #4#0 -&amp;gt; #5#0
#T#02.765 #0#8   
#T#02.765 #0# workerMain: #2#3B0640 #3#-4 #4#-45107 -&amp;gt; #5#0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On 45 KB of logs, this saved about 300 characters. Not much, but still something.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 4: Meta-BPE — Tags Made of Tags
&lt;/h2&gt;

&lt;p&gt;Working with larger logs revealed duplicates among the tags themselves:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#T#19.557 #0##C##8##81# #90#
#T#19.557 #0##K##8##81# #90#
#T#19.601 #0##U##8##81# #90#
#T#19.601 #0##C##8##32# #91#
#T#19.601 #0##w##8##32# #91#
#T#19.601 #0##x##8##32# #91#
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The prefix &lt;code&gt;#T#19.557 #0#&lt;/code&gt; repeats several times in a row. Now we need to find repeating sequences of tags and extract them into new variables.&lt;/p&gt;

&lt;p&gt;I added an additional processing cycle that works on top of the regular one. I started marking new variables with (&lt;code&gt;!&lt;/code&gt;) instead of (&lt;code&gt;#&lt;/code&gt;) to differentiate them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;--- &lt;span class="n"&gt;LEGEND&lt;/span&gt; ---
&lt;span class="c"&gt;#T# = 2026-04-18T11:54:
#0# = [TTDiag]
#8# = vk=
#32# = 881
#81# = 890
#C# = ' &amp;gt; shouldBlockAllFeatures '
#K# = ' &amp;gt; drainPendingReset '
#U# = ' &amp;gt; entry '
#w# = ' &amp;gt; executePendingReplacementIfAny '
#x# = ' &amp;gt; hotkeyManager.processKeyEvent '
&lt;/span&gt;
!&lt;span class="m"&gt;1&lt;/span&gt;! = &lt;span class="c"&gt;#T#19.557 #0##C##8#
&lt;/span&gt;!&lt;span class="m"&gt;2&lt;/span&gt;! = &lt;span class="c"&gt;#T#19.601 #0#
&lt;/span&gt;!&lt;span class="m"&gt;3&lt;/span&gt;! = &lt;span class="c"&gt;#8##32# #91#
&lt;/span&gt;
--- &lt;span class="n"&gt;LOGS&lt;/span&gt; ---
!&lt;span class="m"&gt;1&lt;/span&gt;!&lt;span class="c"&gt;#81# #90#
&lt;/span&gt;!&lt;span class="m"&gt;1&lt;/span&gt;!&lt;span class="c"&gt;#K##81# #90#
&lt;/span&gt;!&lt;span class="m"&gt;2&lt;/span&gt;!&lt;span class="c"&gt;#U##81# #90#
&lt;/span&gt;!&lt;span class="m"&gt;2&lt;/span&gt;!&lt;span class="c"&gt;#C#!3!
&lt;/span&gt;!&lt;span class="m"&gt;2&lt;/span&gt;!&lt;span class="c"&gt;#w#!3!
&lt;/span&gt;!&lt;span class="m"&gt;2&lt;/span&gt;!&lt;span class="c"&gt;#x#!3!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now regular tags combine into meta-tags.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result of stage four:&lt;/strong&gt; &lt;strong&gt;45,839&lt;/strong&gt; characters → &lt;strong&gt;13,398&lt;/strong&gt; characters. &lt;strong&gt;Compression of 70.8%&lt;/strong&gt; from original.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 5: Macros with Substitution
&lt;/h2&gt;

&lt;p&gt;Further work revealed the following patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!26!#o#!3!
!26!#E#!3!
!26!#K#!3!
!26!#D#!3!
!26!#w#!3!
!26!#z#!3!
!26!#x#!3!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here only &lt;strong&gt;one middle element&lt;/strong&gt; changes! The prefix &lt;code&gt;!26&lt;/code&gt; and suffix &lt;code&gt;!3&lt;/code&gt; are identical in all lines. We can extract this construction into the following pattern &lt;code&gt;!26!#@#!3!&lt;/code&gt;, where &lt;code&gt;@&lt;/code&gt; is a placeholder for substitution, and the actual value in the final macro will be passed after &lt;code&gt;:&lt;/code&gt;, for example &lt;code&gt;&amp;amp;1:o&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;--- &lt;span class="n"&gt;LEGEND&lt;/span&gt; ---
&lt;span class="c"&gt;#o# = ' &amp;gt; shouldExecute '
#E# = ' &amp;gt; hotkeyDefs-&amp;gt;snapshot '
#K# = ' &amp;gt; drainPendingReset '
#D# = ' &amp;gt; handleCaseSwitchHotkey '
#w# = ' &amp;gt; executePendingReplacementIfAny '
#z# = ' &amp;gt; handleTranslationHotkey '
#x# = ' &amp;gt; hotkeyManager.processKeyEvent '
&lt;/span&gt;!&lt;span class="m"&gt;26&lt;/span&gt;! = &lt;span class="c"&gt;#T#1d#0
&lt;/span&gt;!&lt;span class="m"&gt;3&lt;/span&gt;! = &lt;span class="c"&gt;#881 #90
&lt;/span&gt;
&amp;amp;&lt;span class="m"&gt;1&lt;/span&gt; = !&lt;span class="m"&gt;26&lt;/span&gt;!&lt;span class="c"&gt;#@#!3!
&lt;/span&gt;
--- &lt;span class="n"&gt;LOGS&lt;/span&gt; ---
&amp;amp;&lt;span class="m"&gt;1&lt;/span&gt;:&lt;span class="n"&gt;o&lt;/span&gt;
&amp;amp;&lt;span class="m"&gt;1&lt;/span&gt;:&lt;span class="n"&gt;E&lt;/span&gt;
&amp;amp;&lt;span class="m"&gt;1&lt;/span&gt;:&lt;span class="n"&gt;K&lt;/span&gt;
&amp;amp;&lt;span class="m"&gt;1&lt;/span&gt;:&lt;span class="n"&gt;D&lt;/span&gt;
&amp;amp;&lt;span class="m"&gt;1&lt;/span&gt;:&lt;span class="n"&gt;w&lt;/span&gt;
&amp;amp;&lt;span class="m"&gt;1&lt;/span&gt;:&lt;span class="n"&gt;z&lt;/span&gt;
&amp;amp;&lt;span class="m"&gt;1&lt;/span&gt;:&lt;span class="n"&gt;x&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the logs have turned into &lt;strong&gt;pure enumeration of pointers&lt;/strong&gt;. Each line &lt;code&gt;&amp;amp;1:o&lt;/code&gt; reads as: "Take template &lt;code&gt;&amp;amp;1&lt;/code&gt; (which expands to &lt;code&gt;!26!#@#!3!&lt;/code&gt;), substitute &lt;code&gt;o&lt;/code&gt; for &lt;code&gt;@&lt;/code&gt;, and you get the original log line."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result of stage five:&lt;/strong&gt; &lt;strong&gt;45,839&lt;/strong&gt; characters → &lt;strong&gt;10,192&lt;/strong&gt; characters. &lt;strong&gt;Final compression of 78.4%&lt;/strong&gt;!&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: Performance
&lt;/h2&gt;

&lt;p&gt;PowerShell is great for quick automation on the fly, but it's slow with loops over string arrays and nested iterations. This led to long paste delays that could reach 20 seconds. Eventually, AI rewrote everything for me in Rust, and now pasting happens instantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does This Work with Neural Networks?
&lt;/h2&gt;

&lt;p&gt;The question arises: "Will neural networks understand this format? Won't the model get confused by these tags?"&lt;/p&gt;

&lt;p&gt;The answer — not only will they understand, but they work with it even &lt;strong&gt;better&lt;/strong&gt; than with raw logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. LLMs Are Trained on Structured Data
&lt;/h3&gt;

&lt;p&gt;Modern neural networks are trained on code, JSON, YAML, markdown. When a model sees a &lt;code&gt;--- LEGEND ---&lt;/code&gt; block with "key = value" pairs, it automatically understands: this is a dictionary, I need to keep it in context. You don't need to explain how to work with this format — it "expands" the tags in its mind on its own.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Focus on Anomalies
&lt;/h3&gt;

&lt;p&gt;This is the most important part. When you paste 1000 lines into chat with identical text like &lt;code&gt;[WinFocusMonitor] workerMain: hwnd=...&lt;/code&gt;, the model's attention (attention mechanism) gets "smeared" across repetitive noise.&lt;/p&gt;

&lt;p&gt;In compressed format, the model sees:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;amp;24:o
&amp;amp;24:E
&amp;amp;24:K
&amp;amp;24:D
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a neural network, this is the perfect signal: &lt;em&gt;"Aha, the line structure is identical, only one parameter changes. Let me check the legend to see what &lt;code&gt;o&lt;/code&gt;, &lt;code&gt;E&lt;/code&gt;, &lt;code&gt;K&lt;/code&gt;, &lt;code&gt;D&lt;/code&gt; mean, and focus on the differences"&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Errors and anomalies in this format literally &lt;strong&gt;shine&lt;/strong&gt;. If suddenly among ten &lt;code&gt;&amp;amp;24:K&lt;/code&gt; entries, &lt;code&gt;&amp;amp;24:X&lt;/code&gt; appears or even &lt;code&gt;&amp;amp;25:K&lt;/code&gt; — the model will instantly spot the pattern violation.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Context Window Savings
&lt;/h3&gt;

&lt;p&gt;Even the most advanced models have a problem: the larger the context, the more "forgetfulness."&lt;/p&gt;

&lt;p&gt;By saving up to 80% of characters on logs, you leave the model more "working memory" for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retaining your project's architecture&lt;/li&gt;
&lt;li&gt;Chat history&lt;/li&gt;
&lt;li&gt;Writing quality code in responses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model won't "forget" the task context due to bloated logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. No Hallucinations
&lt;/h3&gt;

&lt;p&gt;The key difference of this approach from "asking AI to shorten the log" — we compress &lt;strong&gt;mathematically&lt;/strong&gt;, not through paraphrasing. All milliseconds, hex addresses, error codes remain in their places. The BPE algorithm guarantees zero data loss.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Use?
&lt;/h2&gt;

&lt;p&gt;The application is currently available only for Windows and Mac. It works as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Copy raw logs from terminal/file (&lt;code&gt;Ctrl+C&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Switch to Cursor or ChatGPT/Claude web interface&lt;/li&gt;
&lt;li&gt;Press &lt;code&gt;Ctrl+Alt+V&lt;/code&gt; instead of regular &lt;code&gt;Ctrl+V&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Compressed logs are pasted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bonus:&lt;/strong&gt; original logs are automatically restored to clipboard (in case you need to paste the original somewhere else)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;a href="https://github.com/sergeivaskov/logs-tokenizer" rel="noopener noreferrer"&gt;project source code&lt;/a&gt; is available on. You can port this algorithm to your platform in a couple of minutes, as well as adapt it to your needs.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>vibecoding</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
