<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Maksim G</title>
    <description>The latest articles on DEV Community by Maksim G (@maksim_g_5e9b4b7a29285350).</description>
    <link>https://dev.to/maksim_g_5e9b4b7a29285350</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3974603%2Fe22085ed-8695-439d-af5d-c07d4f89fe71.jpg</url>
      <title>DEV Community: Maksim G</title>
      <link>https://dev.to/maksim_g_5e9b4b7a29285350</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/maksim_g_5e9b4b7a29285350"/>
    <language>en</language>
    <item>
      <title>[Boost]</title>
      <dc:creator>Maksim G</dc:creator>
      <pubDate>Tue, 09 Jun 2026 07:38:35 +0000</pubDate>
      <link>https://dev.to/maksim_g_5e9b4b7a29285350/-14hl</link>
      <guid>https://dev.to/maksim_g_5e9b4b7a29285350/-14hl</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/maksim_g_5e9b4b7a29285350/machinelearning-nlp-python-onnx-51kp" class="crayons-story__hidden-navigation-link"&gt;I replaced an LLM with a 120 MB ONNX model to read YouTube comments&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/maksim_g_5e9b4b7a29285350" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3974603%2Fe22085ed-8695-439d-af5d-c07d4f89fe71.jpg" alt="maksim_g_5e9b4b7a29285350 profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/maksim_g_5e9b4b7a29285350" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Maksim G
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Maksim G
                
              
              &lt;div id="story-author-preview-content-3850595" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/maksim_g_5e9b4b7a29285350" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3974603%2Fe22085ed-8695-439d-af5d-c07d4f89fe71.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Maksim G&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/maksim_g_5e9b4b7a29285350/machinelearning-nlp-python-onnx-51kp" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jun 8&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/maksim_g_5e9b4b7a29285350/machinelearning-nlp-python-onnx-51kp" id="article-link-3850595"&gt;
          I replaced an LLM with a 120 MB ONNX model to read YouTube comments
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/maksim_g_5e9b4b7a29285350/machinelearning-nlp-python-onnx-51kp" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt;&amp;nbsp;reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/maksim_g_5e9b4b7a29285350/machinelearning-nlp-python-onnx-51kp#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              &lt;span class="hidden s:inline"&gt;Add&amp;nbsp;Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            5 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>I replaced an LLM with a 120 MB ONNX model to read YouTube comments</title>
      <dc:creator>Maksim G</dc:creator>
      <pubDate>Mon, 08 Jun 2026 17:41:41 +0000</pubDate>
      <link>https://dev.to/maksim_g_5e9b4b7a29285350/machinelearning-nlp-python-onnx-51kp</link>
      <guid>https://dev.to/maksim_g_5e9b4b7a29285350/machinelearning-nlp-python-onnx-51kp</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2xs22xtjga7jspuhhe83.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2xs22xtjga7jspuhhe83.png" alt=" " width="800" height="684"&gt;&lt;/a&gt;Likes and view counts tell you a video was &lt;em&gt;loud&lt;/em&gt;. They don't tell you whether the audience agreed, pushed back, or just showed up to argue. The signal for that is sitting right there in the comments — you just have to read a few hundred of them and keep score.&lt;/p&gt;

&lt;p&gt;So I built a tool that does exactly that. It's called &lt;a href="https://pjq.life" rel="noopener noreferrer"&gt;PJQ&lt;/a&gt; (Public Judgment Quotient): point it at a YouTube video, and it samples the comments, sorts every one into a stance, and gives you one honest verdict on how the audience actually received the thing.&lt;/p&gt;

&lt;p&gt;The first version ran every comment through an LLM. It worked, and it was also the wrong tool for the job. This is the story of replacing it with a fine-tuned model that fits in 120 MB and runs on a cheap CPU box with no API bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "sentiment" wasn't enough
&lt;/h2&gt;

&lt;p&gt;Off-the-shelf sentiment analysis gives you positive / negative / neutral. That collapses the part that's actually interesting.&lt;/p&gt;

&lt;p&gt;"This is genuinely wrong and here's why" and "first!! 🔥🔥" are both technically positive-ish or neutral noise to a sentiment model, but they mean completely different things about reception. A coordinated spam wave looks like overwhelming support if you don't separate it out. So I threw away the 3-class scale and wrote a 7-stance rubric instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SUP&lt;/strong&gt; — substantive support (actually agrees with the point)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AGA&lt;/strong&gt; — substantive disagreement (actually argues against it)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NEU&lt;/strong&gt; — neutral (a question, a clarification, a fact, no position)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OFF&lt;/strong&gt; — off-topic (about something else entirely)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;THIN&lt;/strong&gt; — thin positive (hype with no argument — "🔥 legend")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SUS&lt;/strong&gt; — inorganic (bot / paid / coordinated spam)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AGN&lt;/strong&gt; — own-agenda (a comment carrying the author's external goal: crypto shill, self-promo, account farming)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reporting rule matters as much as the labels: &lt;strong&gt;mood and support are computed strictly from SUP / AGA / NEU.&lt;/strong&gt; THIN, OFF, SUS and AGN are tracked separately and never inflate the opinion numbers. That single decision is what stops a fan-noise flood or a bot ring from faking a 90%-positive verdict.&lt;/p&gt;

&lt;h2&gt;
  
  
  The LLM version, and why it had to go
&lt;/h2&gt;

&lt;p&gt;For the first cut I batched comments and asked an LLM to label each one against the rubric. Quality was good. Everything else was bad:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost scaled with every comment.&lt;/strong&gt; One video is 300–500 comments. Multiply by every analysis and the per-call bill is the whole business model's problem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It was a single point of failure.&lt;/strong&gt; One API key, one provider. If that account gets rate-limited or suspended, the product is just &lt;em&gt;down&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It was slow and non-deterministic.&lt;/strong&gt; Same comment, slightly different label across runs. Hard to test, hard to trust.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The task itself is narrow: short text in, one of seven labels out. That is a &lt;em&gt;classification&lt;/em&gt; problem, and classification problems don't need a 70-billion-parameter model reasoning from scratch every time. They need a model that has seen enough examples to know the boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  SetFit: few-shot fine-tuning without a GPU farm
&lt;/h2&gt;

&lt;p&gt;I went with &lt;a href="https://github.com/huggingface/setfit" rel="noopener noreferrer"&gt;SetFit&lt;/a&gt;. It fine-tunes a sentence-transformer with contrastive learning, then trains a tiny classification head on top of the embeddings. It gets competitive accuracy from a few hundred labeled examples and trains on a laptop CPU in minutes — no GPU cluster, no prompt engineering.&lt;/p&gt;

&lt;p&gt;The architecture is deliberately boring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Body:&lt;/strong&gt; &lt;code&gt;paraphrase-multilingual-MiniLM-L12-v2&lt;/code&gt; — a 384-dimension multilingual encoder. Multilingual is non-negotiable here; the comments under one video can be Russian, English and Spanish in the same thread.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Head:&lt;/strong&gt; plain scikit-learn &lt;code&gt;LogisticRegression&lt;/code&gt; over the embeddings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input trick:&lt;/strong&gt; I prepend the video title as context — &lt;code&gt;title[:200] + " [SEP] " + comment[:300]&lt;/code&gt;. The same three words mean different things under a physics lecture and a boxing highlight, and the title is cheap context that fixes a surprising number of edge cases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Training is a stratified 80/10/10 split by (label, language), then evaluation on macro-F1 — not accuracy. With seven imbalanced classes, accuracy lies to you; a model that calls everything NEU can still look "80% accurate." Macro-F1 forces every class to pull its weight. My acceptance bar was macro-F1 ≥ 0.72, with per-class precision/recall and a per-source breakdown so I could see &lt;em&gt;which&lt;/em&gt; stance was dragging.&lt;/p&gt;

&lt;h2&gt;
  
  
  Exporting to ONNX so it runs anywhere
&lt;/h2&gt;

&lt;p&gt;A fine-tuned model in a notebook is a demo. To put it in production cheaply I exported the whole thing to ONNX:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the sentence-transformer body via 🤗 &lt;strong&gt;Optimum&lt;/strong&gt; (&lt;code&gt;optimum[exporters-onnx]&lt;/code&gt;),&lt;/li&gt;
&lt;li&gt;the LogisticRegression head via &lt;strong&gt;skl2onnx&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At inference time the runtime is just &lt;code&gt;onnxruntime&lt;/code&gt; on CPU:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;tokenize → run the body ONNX graph → &lt;code&gt;last_hidden_state&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;mean-pool&lt;/strong&gt; over the token dimension using the attention mask (exactly like sentence-transformers does internally — get this wrong and your embeddings are quietly garbage)&lt;/li&gt;
&lt;li&gt;run the head ONNX graph → probabilities → argmax → stance
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# rough shape of the hot path
&lt;/span&gt;&lt;span class="n"&gt;emb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mean_pool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body_session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;attention_mask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;probs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;head_session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;emb&lt;/span&gt;&lt;span class="p"&gt;})[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;stance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;STANCE_CODES&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The payoff:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic.&lt;/strong&gt; Same comment, same label, every time. Now I can actually write tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No subscription, no per-call cost.&lt;/strong&gt; It runs next to the app on a small VPS. The marginal cost of classifying a comment is some CPU cycles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batchable and fast.&lt;/strong&gt; 32 comments a batch, a few hundred per video in seconds instead of minutes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model directory ships its own &lt;code&gt;labels.json&lt;/code&gt; and a &lt;code&gt;manifest.json&lt;/code&gt; with the encoder name, embedding dim and rubric version, so the serving code refuses to load a model that doesn't match the rubric it was trained for. Small thing, saves you from a very confusing class of bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you get out the other end
&lt;/h2&gt;

&lt;p&gt;Once every sampled comment has a stance, the verdict is just aggregation: the SUP/AGA/NEU split, a net-mood score, a controversy index (how evenly split SUP vs AGA is), and the bot/hype/off-topic counters kept strictly to the side. You can see live verdicts and an &lt;a href="https://pjq.life/trends" rel="noopener noreferrer"&gt;audience-trends dashboard across YouTube categories and regions&lt;/a&gt; without signing up for anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways if you're doing something similar
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reach for an LLM to &lt;em&gt;prototype&lt;/em&gt; a labeling task, not to &lt;em&gt;serve&lt;/em&gt; it.&lt;/strong&gt; Use it to bootstrap a labeled set, then distill that into a small model you own.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick your metric before you celebrate.&lt;/strong&gt; Macro-F1 over accuracy the moment your classes are imbalanced.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ONNX is the boring bridge that makes "it works in my notebook" into "it runs on a $5 box."&lt;/strong&gt; Optimum + skl2onnx + onnxruntime, and you've cut the API cord.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Match the model to the shape of the problem.&lt;/strong&gt; A seven-way text classifier does not need to reason. It needs to have seen the boundary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to see the classifier's verdicts on real videos, it's live at &lt;strong&gt;&lt;a href="https://pjq.life" rel="noopener noreferrer"&gt;pjq.life&lt;/a&gt;&lt;/strong&gt; — first analysis is free. Happy to answer anything about the rubric or the training setup in the comments.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
