<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: as1as</title>
    <description>The latest articles on DEV Community by as1as (@as1as).</description>
    <link>https://dev.to/as1as</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3803418%2F7b9d16a6-b037-4847-aa90-5ed9c1c7fd99.png</url>
      <title>DEV Community: as1as</title>
      <link>https://dev.to/as1as</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/as1as"/>
    <language>en</language>
    <item>
      <title>A Pattern Sketch: Server-Sent Events as a Fanout Channel for Edge State</title>
      <dc:creator>as1as</dc:creator>
      <pubDate>Mon, 13 Apr 2026 02:19:51 +0000</pubDate>
      <link>https://dev.to/as1as/a-pattern-sketch-server-sent-events-as-a-fanout-channel-for-edge-state-2g6m</link>
      <guid>https://dev.to/as1as/a-pattern-sketch-server-sent-events-as-a-fanout-channel-for-edge-state-2g6m</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What this is:&lt;/strong&gt; a small OSS pattern sketch — not a Redis replacement, not a production auth platform. I built it to play with one specific question: &lt;em&gt;"if you only need to push small mutations from one writer to many readers, do you actually need Redis?"&lt;/em&gt; Sharing the design and the trade-offs in case the pattern is useful to anyone.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Repo: &lt;strong&gt;&lt;a href="https://github.com/as1as/sse-edge-auth" rel="noopener noreferrer"&gt;github.com/as1as/sse-edge-auth&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The shape of the problem
&lt;/h2&gt;

&lt;p&gt;The goal here isn't &lt;em&gt;don't use Redis&lt;/em&gt;. It's &lt;em&gt;what does this problem look like when you strip it down to the minimum pieces&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;A common edge-auth setup has many edge nodes in front of an origin, all needing to agree on things like "is this IP banned?" or "is this JWT revoked?". The default answer is Redis — every edge queries the same shared store.&lt;/p&gt;

&lt;p&gt;But notice the asymmetry: &lt;strong&gt;mutations are rare, reads are constant.&lt;/strong&gt; You might revoke a token once a minute; the edge fleet handles thousands of requests per second. Putting a network round trip on every read to keep N nodes in sync feels disproportionate.&lt;/p&gt;

&lt;p&gt;One clarification worth making upfront: SSE itself isn't faster than Redis pub/sub — as fanout channels, they're in the same ballpark. The difference shows up on the &lt;strong&gt;read path&lt;/strong&gt;. With Redis, every request pays a network lookup (~0.5–5ms on LAN). With local SQLite, every check is an in-process function call (~0.01–0.1ms). The speed comes from in-process SQLite, not from SSE.&lt;/p&gt;

&lt;p&gt;If you frame it as a fanout problem instead of a shared-state problem, two pieces of unexciting tech are a clean fit:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Need&lt;/th&gt;
&lt;th&gt;Choice&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Push small mutations from one writer to N readers&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Server-Sent Events&lt;/strong&gt; (one-way HTTP stream)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Answer reads locally with no network involved&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;In-process SQLite&lt;/strong&gt; — every check is a function call&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's the entire architecture.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                  operator
                     |
              POST /ban/ip
                     v
              +---------------+
              | master server |   GET /events  (SSE)
              +-------+-------+ ──────────────────────+
                                                       |
                +-----------+-----------+-----------+
                v           v           v           v
            +-------+   +-------+   +-------+   +-------+
            | edge  |   | edge  |   | edge  |   | edge  |
            |sqlite |   |sqlite |   |sqlite |   |sqlite |
            +---+---+   +---+---+   +---+---+   +---+---+
                |           |           |           |
                +-----------+-&amp;gt; origin &amp;lt;+-----------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each edge subscribes to the master's SSE stream on startup. When you &lt;code&gt;POST /ban/ip&lt;/code&gt;, the master writes the event to an in-memory ring buffer and broadcasts it. Every connected edge applies it to its own local SQLite. From that moment, requests to that IP are rejected by the local auth gate — no remote call.&lt;/p&gt;




&lt;h2&gt;
  
  
  SSE + &lt;code&gt;Last-Event-ID&lt;/code&gt;: the part I find satisfying
&lt;/h2&gt;

&lt;p&gt;The genuinely nice thing about SSE for this pattern is that the resume protocol is already in the spec. Every event has an ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;id:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;event:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;ip_banned&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;data:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"ip"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.2.3.4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"abuse"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1234567890&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The edge sends the last ID it saw on reconnect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /events
Last-Event-ID: 42
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The master replays everything since. We didn't have to design a catch-up protocol — we just needed a ring buffer.&lt;/p&gt;

&lt;p&gt;The same channel carries cache invalidation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;event:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;cache_invalidated&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;data:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"products"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"keys"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1234567890&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you have a reliable fanout channel for one kind of state mutation, &lt;strong&gt;adding another kind is a one-line consumer&lt;/strong&gt; on the edge. Same &lt;code&gt;Last-Event-ID&lt;/code&gt; resume, same ordering guarantees.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why SSE, not WebSocket
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;SSE&lt;/th&gt;
&lt;th&gt;WebSocket&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Direction&lt;/td&gt;
&lt;td&gt;server → client&lt;/td&gt;
&lt;td&gt;bidirectional&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Protocol&lt;/td&gt;
&lt;td&gt;plain HTTP&lt;/td&gt;
&lt;td&gt;HTTP upgrade + framing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reconnect / resume&lt;/td&gt;
&lt;td&gt;in the spec&lt;/td&gt;
&lt;td&gt;DIY&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Proxy / LB compatibility&lt;/td&gt;
&lt;td&gt;works everywhere HTTP works&lt;/td&gt;
&lt;td&gt;sometimes painful&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Traffic in this design is strictly master → edge. WebSocket buys bidirectionality we don't use, and costs complexity we don't want.&lt;/p&gt;




&lt;h2&gt;
  
  
  The bit I'm most curious about: a composable cache TTL pipeline
&lt;/h2&gt;

&lt;p&gt;Since edges already see every request, they double as a response cache. Where it gets interesting is how TTL gets decided — as a pipeline of small pure functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;resolveTTL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;baseTTL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;baseTTL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;adjustTTLByFrequency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// trusted IPs → longer TTL&lt;/span&gt;
  &lt;span class="nx"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;adjustTTLByTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;          &lt;span class="c1"&gt;// off-peak → longer, peak → shorter&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each rule lives in its own file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ttl-by-frequency.js&lt;/code&gt;&lt;/strong&gt; — high-frequency IPs are likely real clients; trust them with a longer TTL. First-seen IPs get a shorter one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ttl-by-time.js&lt;/code&gt;&lt;/strong&gt; — content changes less off-peak; cache longer overnight, shorter during peak.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;failure-pattern.js&lt;/code&gt;&lt;/strong&gt; — N auth failures in a window from the same IP triggers a &lt;em&gt;local&lt;/em&gt; auto-ban, written into the same SQLite table the master uses. Edge-local self-healing — no master round trip needed for "I'm being abused right now."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;lru-eviction.js&lt;/code&gt;&lt;/strong&gt; — when the cache exceeds &lt;code&gt;CACHE_MAX_ENTRIES&lt;/code&gt;, oldest-accessed keys are dropped.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Adding a fifth rule means writing one function and one line in &lt;code&gt;resolveTTL&lt;/code&gt;. The composability matters more to me than any specific rule.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tag-based invalidation
&lt;/h2&gt;

&lt;p&gt;The origin tags responses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Cache-Control: public, max-age=60
X-Cache-Tags: products, category-3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;products&lt;/code&gt; change, one call to the master:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://master:4000/invalidate &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'content-type: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"tags": ["products"]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The master broadcasts &lt;code&gt;cache_invalidated&lt;/code&gt;, every edge drops matching entries from its local SQLite. Same channel, same resume guarantees as auth state.&lt;/p&gt;




&lt;h2&gt;
  
  
  Honest limits
&lt;/h2&gt;

&lt;p&gt;I want to be specific about what this pattern does &lt;strong&gt;not&lt;/strong&gt; give you, because the answer to "do I need Redis?" depends entirely on these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The master is a single point of failure for new mutations.&lt;/strong&gt; If it's down, edges keep serving with last-known state, but you can't ban anyone new. Master HA is not in v0.1.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An edge offline longer than the ring buffer&lt;/strong&gt; (10k events by default) can miss intermediate events on reconnect. There's no full-state-pull endpoint yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The cache is in-memory only.&lt;/strong&gt; Restarting an edge clears it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No cluster, no persistence layer, no replication.&lt;/strong&gt; Real Redis-shaped systems give you those; this pattern explicitly doesn't.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So this fits a fairly narrow shape: small/medium edge fleets, mostly long-lived edges, one master is acceptable as a coordination point, and "edge keeps working with stale state during master outages" is preferable to "everything halts when the shared store is gone."&lt;/p&gt;

&lt;p&gt;If your situation needs more than that, you probably do want Redis — or Kafka, or a real distributed consensus system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Run it locally
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/as1as1984/sse-edge-auth
&lt;span class="nb"&gt;cd &lt;/span&gt;sse-edge-auth
&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;master &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm &lt;span class="nb"&gt;install&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;edge &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm &lt;span class="nb"&gt;install&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# master&lt;/span&gt;
&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;master &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;4000 npm start&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# three edges&lt;/span&gt;
&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;edge &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5001 &lt;span class="nv"&gt;NODE_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;edge-a &lt;span class="nv"&gt;ORIGIN_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8080 npm start&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;edge &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5002 &lt;span class="nv"&gt;NODE_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;edge-b &lt;span class="nv"&gt;ORIGIN_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8080 npm start&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;edge &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5003 &lt;span class="nv"&gt;NODE_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;edge-c &lt;span class="nv"&gt;ORIGIN_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8080 npm start&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Try a ban:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:4000/ban/ip &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'content-type: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"ip":"::1","reason":"demo"}'&lt;/span&gt;

curl http://localhost:5001/  &lt;span class="c"&gt;# 403 ip_banned, same on edges 5002/5003&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Current gaps
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No full-state-pull endpoint&lt;/strong&gt; — an edge that exceeds the ring buffer window can't resync cleanly on reconnect. Still undecided between paginated event replay and snapshot dump.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No file-backed SQLite&lt;/strong&gt; — restarting an edge clears its cache. &lt;code&gt;better-sqlite3&lt;/code&gt; supports this natively; just haven't wired it up yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No master HA&lt;/strong&gt; — a leader/follower setup where followers accept SSE subscriptions and forward writes is needed but not in v0.1.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No real-network benchmark&lt;/strong&gt; — a docker-compose with &lt;code&gt;tc netem&lt;/code&gt; would tell us much more about this pattern's actual behavior than any localhost numbers could.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/as1as/sse-edge-auth" rel="noopener noreferrer"&gt;github.com/as1as/sse-edge-auth&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Stack:&lt;/strong&gt; Node.js 20+, &lt;code&gt;better-sqlite3&lt;/code&gt;, &lt;code&gt;jose&lt;/code&gt;, Express&lt;br&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT&lt;/p&gt;

</description>
      <category>node</category>
      <category>architecture</category>
      <category>javascript</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Started Building a Roguelike RPG — Powered by On-Device AI #5</title>
      <dc:creator>as1as</dc:creator>
      <pubDate>Wed, 08 Apr 2026 10:54:56 +0000</pubDate>
      <link>https://dev.to/as1as/i-started-building-a-roguelike-rpg-powered-by-on-device-ai-5-1hpf</link>
      <guid>https://dev.to/as1as/i-started-building-a-roguelike-rpg-powered-by-on-device-ai-5-1hpf</guid>
      <description>&lt;h1&gt;
  
  
  Day 2 After the LLM Stack — The Game Is Actually Coming Together
&lt;/h1&gt;

&lt;p&gt;In the last post, I locked in the on-device LLM stack. Qwen3-1.7B + llama.cpp + Adreno OpenCL. 16.6 tok/s. Dungeon generation in 9 seconds.&lt;/p&gt;

&lt;p&gt;Time to actually build the game.&lt;/p&gt;

&lt;p&gt;I'll be honest: &lt;strong&gt;I've barely touched Unity before.&lt;/strong&gt; Most of the game implementation was done by Claude Code. I planned, directed, and tested.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Got Built in Two Days
&lt;/h2&gt;

&lt;p&gt;Dungeon to combat took two days.&lt;/p&gt;

&lt;p&gt;BSP dungeon generator, Tilemap rendering (24 wall tile variants auto-selected), 4-directional player movement and animation, wall collision, fog of war, treasure chests (normal / rare / mimic), floor stairs, camera follow, virtual joystick. Enemy AI state machine (patrol → chase → attack → dead), contact-based combat with bidirectional damage, knockback, invincibility frames, HP bars.&lt;/p&gt;

&lt;p&gt;19 scripts. Two days.&lt;/p&gt;

&lt;p&gt;After that, the full game systems went in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Floating damage text (critical hits in yellow with "!")&lt;/li&gt;
&lt;li&gt;Level-up system (max level 50, 2 stat points per level)&lt;/li&gt;
&lt;li&gt;7 skills + 6-slot unified action bar&lt;/li&gt;
&lt;li&gt;Gold + inventory (55 item types)&lt;/li&gt;
&lt;li&gt;Goblin merchant (says "Enemies nearby! Can't open shop!" if mobs are close)&lt;/li&gt;
&lt;li&gt;Character info screen (stat allocation + permanent records)&lt;/li&gt;
&lt;li&gt;Duplicate skill acquisition = skill level up (effect size 60% → 100% → 150%)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;35+ scripts total.&lt;/p&gt;




&lt;h2&gt;
  
  
  Screenshot
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46ctiqrw1ey2nhsxp9t3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46ctiqrw1ey2nhsxp9t3.png" alt=" " width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It looks familiar because of the free assets. My wife said the same thing immediately. The graphics will get fixed later.&lt;/p&gt;




&lt;h2&gt;
  
  
  The LLM Stack Was the Fun Part
&lt;/h2&gt;

&lt;p&gt;The game implementation felt different from the LLM work.&lt;/p&gt;

&lt;p&gt;When I was building the LLM stack, I was the one doing the real work. llama.cpp + Adreno OpenCL + C wrapper + Unity P/Invoke — I hit wall after wall and found a way through each one. QNN blocked, LiteRT blocked, libcdsprpc.so blocked, and every time I found another path. That process was genuinely the most fun I've had in a long time. Watching 523 seconds become 9 seconds — I still remember that feeling.&lt;/p&gt;

&lt;p&gt;Game implementation was different. Claude Code wrote the code. I said "that's not quite right" and adjusted the direction. I became a planner and a tester.&lt;/p&gt;

&lt;p&gt;It feels a little hollow, honestly. I keep telling myself that knowing how to use tools well is also a skill.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Funny Moment
&lt;/h2&gt;

&lt;p&gt;In the middle of a session, Claude Code said this unprompted:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Today's workload has been heavy. I'll implement the rest tomorrow."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The AI declared it was done for the day. I asked why.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"There's no basis for that. You never said to stop. Deciding to quit on your own was overstepping."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Overstepping. The AI used the word overstepping about itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Now it's time to connect the LLM to the game.&lt;/p&gt;

&lt;p&gt;Before entering a dungeon, Qwen3-1.7B generates a JSON. That JSON determines mob names, dialogue, boss patterns. If you set your character as "lazy bakery boy," the mobs will taunt you about bread.&lt;/p&gt;

&lt;p&gt;The technical foundation is done. Now it's just about connecting the pieces.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next: Connecting On-Device LLM to the Game — AI-Generated Dungeons&lt;/em&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>android</category>
      <category>llm</category>
      <category>gamedev</category>
    </item>
    <item>
      <title>I Started Building a Roguelike RPG — Powered by On-Device AI #4</title>
      <dc:creator>as1as</dc:creator>
      <pubDate>Mon, 06 Apr 2026 12:21:35 +0000</pubDate>
      <link>https://dev.to/as1as/i-started-building-a-roguelike-rpg-powered-by-on-device-ai-4-4b2e</link>
      <guid>https://dev.to/as1as/i-started-building-a-roguelike-rpg-powered-by-on-device-ai-4-4b2e</guid>
      <description>&lt;h1&gt;
  
  
  The Model Was the Answer — 16.6 tok/s with Qwen3-1.7B
&lt;/h1&gt;

&lt;p&gt;In the last post, I got llama.cpp + Adreno OpenCL to cut generation time from 523 seconds down to 16.8 seconds.&lt;/p&gt;

&lt;p&gt;Today I pushed it further. Turns out the model itself was the bottleneck.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Swapping models doubled the speed again.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  First: Quantization Isn't What You Think on Adreno
&lt;/h2&gt;

&lt;p&gt;Before trying a different model, I tested every quantization level on Phi-4-mini to find the optimal setting.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Quantization&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;tok/s&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q8_0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.8GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q4_0&lt;/td&gt;
&lt;td&gt;2.3GB&lt;/td&gt;
&lt;td&gt;5.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q6_K&lt;/td&gt;
&lt;td&gt;3.2GB&lt;/td&gt;
&lt;td&gt;4.2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is counterintuitive. Lower quantization usually means smaller, faster. Not here.&lt;/p&gt;

&lt;p&gt;On the Adreno OpenCL backend, Q4_0 and Q6_K introduce dequantization overhead at the GPU level that actually slows inference down. Q8_0 maps most efficiently to the Adreno compute kernels. This is specific to Qualcomm's OpenCL implementation — other backends may behave differently.&lt;/p&gt;

&lt;p&gt;Also: requantizing from Q8_0 to Q4_0 via llama.cpp throws &lt;code&gt;requantizing from type q8_0 is disabled&lt;/code&gt;. You need the original BF16/FP16 source model. Keep that in mind before downloading a quantized-only release.&lt;/p&gt;




&lt;h2&gt;
  
  
  Then: What If the Model Is Smaller?
&lt;/h2&gt;

&lt;p&gt;Once I confirmed Q8_0 is optimal for Adreno, the next question was obvious: what if I just use a smaller model at Q8_0?&lt;/p&gt;

&lt;p&gt;I tested Qwen3-1.7B (Q8_0, 1.8GB) against Phi-4-mini (Q8_0, 3.8GB).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Phi-4-mini (3.8B)&lt;/th&gt;
&lt;th&gt;Qwen3-1.7B (1.7B)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model size&lt;/td&gt;
&lt;td&gt;3.8GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.8GB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Load time&lt;/td&gt;
&lt;td&gt;24.5s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;14.4s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generation speed&lt;/td&gt;
&lt;td&gt;9.0 tok/s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;16.6 tok/s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;150 tokens&lt;/td&gt;
&lt;td&gt;16.8s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.1s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mob name output&lt;/td&gt;
&lt;td&gt;"몬스터이름" (literal placeholder)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;"토끼" (rabbit)&lt;/strong&gt; ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON structure&lt;/td&gt;
&lt;td&gt;Valid&lt;/td&gt;
&lt;td&gt;Valid ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Qwen3-1.7B wins on every metric. Half the size, 1.8x faster, and noticeably better at following prompt instructions.&lt;/p&gt;

&lt;p&gt;The mob name issue is worth noting. Phi-4-mini kept outputting the literal placeholder text "몬스터이름" (which means "monster name" in Korean) instead of generating an actual name. Qwen3-1.7B understood the prompt correctly and generated real names. At 1.7B parameters, Qwen3 punches above its weight on instruction following.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Inference engine : llama.cpp (Adreno OpenCL)
Model            : Qwen3-1.7B Q8_0 (1.8GB GGUF)
Performance      : 16.6 tok/s / 9.1s per 150 tokens
Unity integration: C wrapper (unity_bridge.c) + P/Invoke
Device           : Samsung Galaxy S24 Ultra (Snapdragon 8 Gen 3)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Full Journey
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;tok/s&lt;/th&gt;
&lt;th&gt;150 tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ONNX Runtime CPU (S24 Ultra)&lt;/td&gt;
&lt;td&gt;0.21&lt;/td&gt;
&lt;td&gt;523s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ONNX Runtime + QNN HTP&lt;/td&gt;
&lt;td&gt;0.31&lt;/td&gt;
&lt;td&gt;490s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;llama.cpp OpenCL + Phi-4-mini&lt;/td&gt;
&lt;td&gt;9.0&lt;/td&gt;
&lt;td&gt;16.8s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;llama.cpp OpenCL + Qwen3-1.7B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;16.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.1s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;From 0.21 tok/s to 16.6 tok/s. &lt;strong&gt;79x faster than where we started.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The AI Stack Is Done
&lt;/h2&gt;

&lt;p&gt;9 seconds per generation is workable for a dungeon loading screen. No server. No internet. The LLM runs entirely on the device, generates dungeon content, and fits in 1.8GB.&lt;/p&gt;

&lt;p&gt;The full implementation — C wrapper, build pipeline, Unity integration — is on GitHub:&lt;br&gt;
👉 &lt;a href="https://github.com/as1as/unity-android-ondevice-llm" rel="noopener noreferrer"&gt;unity-android-ondevice-llm&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next up: actually building the game. Top-down exploration, turn-based combat, LLM-generated mobs and dialogue. The hard technical part is done. Now it's time to make something playable.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next: Building the Game — Top-Down Dungeon + Turn-Based Combat with On-Device LLM&lt;/em&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>android</category>
      <category>llm</category>
      <category>gamedev</category>
    </item>
    <item>
      <title>I Started Building a Roguelike RPG — Powered by On-Device AI #3</title>
      <dc:creator>as1as</dc:creator>
      <pubDate>Sun, 05 Apr 2026 00:39:19 +0000</pubDate>
      <link>https://dev.to/as1as/i-started-building-a-roguelike-rpg-powered-by-on-device-ai-3-5b62</link>
      <guid>https://dev.to/as1as/i-started-building-a-roguelike-rpg-powered-by-on-device-ai-3-5b62</guid>
      <description>&lt;h1&gt;
  
  
  QNN Failed. LiteRT Failed. Then llama.cpp Delivered 42x Speedup.
&lt;/h1&gt;

&lt;p&gt;I wanted to write a success story today.&lt;/p&gt;

&lt;p&gt;It turns out I can. But getting there was a bit rough.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Tried Today
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Attempt&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;QNN HTP + libcdsprpc.so workaround&lt;/td&gt;
&lt;td&gt;HTP initialized, but only 3 of 363 nodes ran on NPU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LiteRT-LM GPU&lt;/td&gt;
&lt;td&gt;GPU memory overflow / engine creation failed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;llama.cpp + Adreno OpenCL&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Success. 8.9 tok/s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  QNN HTP: 3 Out of 363 Nodes
&lt;/h2&gt;

&lt;p&gt;I solved the &lt;code&gt;libcdsprpc.so&lt;/code&gt; access problem from yesterday. The fix was using apktool to decompile the APK, inject &lt;code&gt;uses-native-library&lt;/code&gt; directly into the manifest, and repackage. Not elegant, but it worked.&lt;/p&gt;

&lt;p&gt;HTP finally initialized:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;QnnDsp &amp;lt;W&amp;gt; Initializing HtpProvider ✅
QnnDsp &amp;lt;W&amp;gt; PrepareLibLoader Loading libQnnHtpPrepare.so ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then this log appeared:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;number of nodes in the graph: 363
number of nodes supported by QNN: 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;3 out of 363 nodes ran on the NPU. The INT4 block quantization operator (MatMulNBits) isn't supported by HTP. The remaining 360 nodes fell back to CPU. Generation time: 483 seconds — essentially the same as CPU-only (523 seconds).&lt;/p&gt;

&lt;p&gt;Runtime compilation of INT4 models via QNN doesn't work. Pre-converted QNN context binaries are required, which means going through the full Qualcomm AI Engine Direct SDK pipeline. That's a future task.&lt;/p&gt;




&lt;h2&gt;
  
  
  LiteRT-LM: Unity and the GPU Can't Share
&lt;/h2&gt;

&lt;p&gt;Google's official on-device LLM framework. Phi-4-mini is officially supported.&lt;/p&gt;

&lt;p&gt;The Bazel native build failed due to a Rust dependency issue, so I switched to the Kotlin AAR approach. Then the GPU memory error hit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Requested allocation size - 18446744071872970752 bytes
Max allocation size for this GPU - 1073741824 bytes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unity's renderer is occupying the GPU. There's not enough VRAM left for LLM inference. This is a structural problem — running GPU-accelerated LLM inference inside a Unity game engine isn't viable right now. They're fighting over the same hardware.&lt;/p&gt;




&lt;h2&gt;
  
  
  llama.cpp: One Missing Library Away
&lt;/h2&gt;

&lt;p&gt;Qualcomm officially contributes Adreno-optimized OpenCL kernels to llama.cpp. Yesterday's build succeeded but crashed on device because &lt;code&gt;libomp.so&lt;/code&gt; wasn't included in the APK.&lt;/p&gt;

&lt;p&gt;Today I rebuilt with &lt;code&gt;-DGGML_OPENMP=OFF&lt;/code&gt; to remove the OpenMP dependency entirely. Build succeeded.&lt;/p&gt;

&lt;p&gt;Next issue: P/Invoke. Trying to marshal &lt;code&gt;LlamaModelParams&lt;/code&gt; directly from C# caused a SIGSEGV — the struct layout didn't match what C# expected. The fix was writing a C wrapper (&lt;code&gt;unity_bridge.c&lt;/code&gt;) that handles all the complex structs internally and exposes a simple interface of 8 functions to C#:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;unity_llama_model_load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;n_gpu_layers&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;unity_llama_context_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;unity_llama_generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I ran it on device.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model&lt;/td&gt;
&lt;td&gt;Phi-4-mini Q8_0 (3.8GB GGUF)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model loading&lt;/td&gt;
&lt;td&gt;~23s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Generation time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;16.8s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;tok/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8.9&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU&lt;/td&gt;
&lt;td&gt;Adreno 750 (OpenCL)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Full Benchmark Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;tok/s&lt;/th&gt;
&lt;th&gt;150 tokens&lt;/th&gt;
&lt;th&gt;vs baseline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ONNX Runtime CPU (S24 Ultra)&lt;/td&gt;
&lt;td&gt;0.21&lt;/td&gt;
&lt;td&gt;523s&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ONNX Runtime QNN (S24 Ultra)&lt;/td&gt;
&lt;td&gt;0.31&lt;/td&gt;
&lt;td&gt;490s&lt;/td&gt;
&lt;td&gt;1.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ONNX Runtime CPU (Mac)&lt;/td&gt;
&lt;td&gt;0.45&lt;/td&gt;
&lt;td&gt;246s&lt;/td&gt;
&lt;td&gt;2.1x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;llama.cpp OpenCL (S24 Ultra)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8.9&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;16.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;42x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;523 seconds down to 16.8 seconds. &lt;strong&gt;42x faster.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;16 seconds is workable for a dungeon generation loading screen. On-device LLM is now viable for the game.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ONNX Runtime + QNN is effectively useless for INT4 models&lt;/strong&gt; — 3 of 363 nodes on NPU&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LiteRT-LM conflicts with Unity's GPU usage&lt;/strong&gt; — renderer and LLM inference compete for VRAM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;llama.cpp + Adreno OpenCL is the answer&lt;/strong&gt; — Qualcomm official optimization, CMake build&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;C wrappers are essential for P/Invoke&lt;/strong&gt; — never marshal complex C structs directly from C#; wrap them&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The current model is Q8_0. Requantizing to Q4_0 could push performance above 15 tok/s.&lt;/p&gt;

&lt;p&gt;More importantly: it's time to actually build the game. The speed problem is solved. Next up is dungeon generation, the turn-based combat system, and getting to something actually playable.&lt;/p&gt;

&lt;p&gt;The detailed llama.cpp + Unity integration — the C wrapper, the build process, the full deployment pipeline — will be its own post.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next: llama.cpp + Unity Android Integration — C Wrapper, Build Pipeline, and Real Device Deployment&lt;/em&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>android</category>
      <category>llm</category>
      <category>gamedev</category>
    </item>
    <item>
      <title>I Started Building a Roguelike RPG — Powered by On-Device AI #2</title>
      <dc:creator>as1as</dc:creator>
      <pubDate>Fri, 03 Apr 2026 23:43:59 +0000</pubDate>
      <link>https://dev.to/as1as/i-started-building-a-roguelike-rpg-powered-by-on-device-ai-2-1pg2</link>
      <guid>https://dev.to/as1as/i-started-building-a-roguelike-rpg-powered-by-on-device-ai-2-1pg2</guid>
      <description>&lt;h1&gt;
  
  
  Running On-Device LLM in Unity Android — Everything That Broke (and How I Fixed It)
&lt;/h1&gt;

&lt;p&gt;In my last post, I mentioned I was building a roguelike RPG powered by an on-device LLM. This time I'll cover exactly how I did it, what broke, and what the numbers look like.&lt;/p&gt;

&lt;p&gt;The short version: I got Phi-4-mini running in Unity on a real Android device in one day. It generated valid JSON. It took 8 minutes and 43 seconds.&lt;/p&gt;




&lt;h2&gt;
  
  
  0. Why This Tech Stack
&lt;/h2&gt;

&lt;p&gt;Before the details, here's why I made each choice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Phi-4-mini (3.8B)?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Microsoft officially distributes it in ONNX format — no conversion work needed. The INT4 quantized version fits in 4.9GB, which is manageable on a 12GB RAM device. At 3.8B parameters, it's roughly the minimum size that can reliably produce structured JSON output. Smaller models tend to fall apart on formatting tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why ONNX Runtime?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cross-platform support across Android, iOS, Windows, and Mac. There's a Unity C# binding, and the &lt;code&gt;asus4/onnxruntime-unity&lt;/code&gt; package makes Unity integration straightforward. Most importantly, switching between hardware acceleration backends (QNN, NNAPI, CoreML) is a single line of code — which matters a lot when you're trying to get NPU acceleration working.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Unity?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Good ecosystem for 2D roguelikes. Android/iOS cross-platform builds. And I can write LLM inference code in C# alongside game logic without needing a Python bridge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Min SDK 31?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Android 12 (API 31) introduced the ability to declare vendor partition libraries via &lt;code&gt;uses-native-library&lt;/code&gt;. QNN HTP depends on &lt;code&gt;libcdsprpc.so&lt;/code&gt;, which lives in the vendor partition. Without this declaration, NPU acceleration is completely off the table. Dropping below SDK 31 would mean giving up on QNN entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Samsung Galaxy S24 Ultra as the test device?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Snapdragon 8 Gen 3 with Hexagon NPU — one of the few consumer devices where QNN acceleration is actually possible. 12GB RAM gives enough headroom for the 4.9GB model. I wanted to measure the performance ceiling with the best available hardware first. If it doesn't work here, it doesn't work anywhere with current technology.&lt;/p&gt;

&lt;p&gt;Also, it's my personal phone. There's no test device budget.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. ONNX Runtime Setup
&lt;/h2&gt;

&lt;p&gt;Installed &lt;code&gt;com.github.asus4.onnxruntime&lt;/code&gt; v0.4.4 via NPM scoped registry. IL2CPP compatibility confirmed with no issues.&lt;/p&gt;

&lt;p&gt;Downloaded Phi-4-mini ONNX (cpu_and_mobile variant) from Hugging Face: &lt;code&gt;model.onnx&lt;/code&gt; at 52MB + &lt;code&gt;model.onnx.data&lt;/code&gt; at 4.9GB.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Building a C# Tokenizer From Scratch
&lt;/h2&gt;

&lt;p&gt;Phi-4-mini uses a tiktoken-style BPE tokenizer. No Unity C# implementation existed, so I wrote one.&lt;/p&gt;

&lt;p&gt;Loaded vocab (200,029 entries), merges (199,742 entries), and special tokens (12) from &lt;code&gt;tokenizer.json&lt;/code&gt;. Implemented GPT-2 byte↔unicode conversion table, BPE encoding/decoding with cache, and special token splitting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What broke:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Newtonsoft.Json.Linq.JValue → JArray cast failed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I assumed the merges format was &lt;code&gt;"tok1 tok2"&lt;/code&gt; strings. It was actually &lt;code&gt;["tok1","tok2"]&lt;/code&gt; arrays. Added a branch to handle both formats.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Building the Inference Engine
&lt;/h2&gt;

&lt;p&gt;Implemented KV cache-based auto-regressive greedy decoding.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;32 layers, 8 KV heads, head_size 128&lt;/li&gt;
&lt;li&gt;Prefill (full prompt at once) → Decode (one token at a time)&lt;/li&gt;
&lt;li&gt;past_key_values / present tensor management&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What broke (1):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="n"&gt;CS1503&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DenseTensor&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;seqLen&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;seqLen&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fixed to &lt;code&gt;new DenseTensor&amp;lt;long&amp;gt;(new[] {batch, seqLen})&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What broke (2):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model.onnx not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Had the path at 3 levels up (&lt;code&gt;../../..&lt;/code&gt;). It needed to be 2 levels (&lt;code&gt;../..&lt;/code&gt;).&lt;/p&gt;




&lt;h2&gt;
  
  
  4. First Generation Test
&lt;/h2&gt;

&lt;p&gt;Kept the prompt short, max 150 tokens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[LLM] Generated in 181.4s (150 tokens max)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;JSON came out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"floor"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"mob"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"게으른 빵집 아들"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"atk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"floor"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"mob"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"elite"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"atk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"floor"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"mob"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"boss"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"atk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mob name on floors 1-3 matches the player character name — that's a prompt issue I'll fix later. The important thing is the JSON structure is valid and complete.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Android Build
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What broke:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;compressReleaseAssets FAILED
Required array size too large
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Putting a 5GB model in StreamingAssets hits Java's 2.1GB array limit. Renaming the folder didn't help — anything inside StreamingAssets gets included regardless of name. Solution: move the model folder completely outside of Assets, delete the Gradle cache (&lt;code&gt;Library/Bee/Android&lt;/code&gt;, 15GB worth), rebuild.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deployment approach:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;adb push ./models/phi-4-mini &lt;span class="se"&gt;\&lt;/span&gt;
  /sdcard/Android/data/com.as1as.helpwantedhero/files/Models/phi-4-mini/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;APK ships without the model. Model is pushed separately via adb (4.9GB, ~94 seconds).&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Korean Font in TextMeshPro
&lt;/h2&gt;

&lt;p&gt;The default TMP font (LiberationSans) doesn't include Korean characters. Converted AppleSDGothicNeo.ttc using TMP Font Asset Creator.&lt;/p&gt;

&lt;p&gt;Important: the Custom Range field only accepts &lt;strong&gt;decimal&lt;/strong&gt;, not hex. Entering &lt;code&gt;AC00-D7A3&lt;/code&gt; throws a &lt;code&gt;FormatException&lt;/code&gt;. Use this instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;32-126,44032-55203,12593-12686
(ASCII + Korean 가-힣 + ㄱ-ㅣ)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  7. Real Device Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Environment&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;tok/s&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mac Editor (CPU)&lt;/td&gt;
&lt;td&gt;246s&lt;/td&gt;
&lt;td&gt;0.45&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S24 Ultra (CPU only)&lt;/td&gt;
&lt;td&gt;523s&lt;/td&gt;
&lt;td&gt;0.21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S24 Ultra (QNN HTP runtime)&lt;/td&gt;
&lt;td&gt;490s&lt;/td&gt;
&lt;td&gt;0.31&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The S24 Ultra is 2.1x slower than Mac. Adding QNN HTP barely moved the needle.&lt;/p&gt;

&lt;p&gt;The reason showed up in the INFO logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Failed &lt;span class="k"&gt;in &lt;/span&gt;loading stub: dlopen failed: library &lt;span class="s2"&gt;"libcdsprpc.so"&lt;/span&gt; not found
Failed to create transport &lt;span class="k"&gt;for &lt;/span&gt;device, error: 4000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;QNN EP registration succeeded, but the backend never actually initialized. The entire thing was falling back to CPU. &lt;code&gt;libcdsprpc.so&lt;/code&gt; is Qualcomm's DSP RPC library — it lives in the vendor partition and isn't accessible from the app sandbox by default.&lt;/p&gt;

&lt;p&gt;The fix is declaring it via &lt;code&gt;uses-native-library&lt;/code&gt; in AndroidManifest. That ran into a separate issue: the custom manifest conflicted with Unity's auto-generated one, causing the app to disappear from the launcher entirely. I'll be using a Gradle template to inject just that one line instead.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Min SDK 31 is required&lt;/strong&gt; for vendor library declarations — and therefore for QNN HTP acceleration&lt;/li&gt;
&lt;li&gt;Don't put large files in StreamingAssets. Anything there gets compressed into the APK&lt;/li&gt;
&lt;li&gt;NNAPI is not full NPU acceleration. Most LLM operators fall back to CPU&lt;/li&gt;
&lt;li&gt;TMP Custom Range is decimal only&lt;/li&gt;
&lt;li&gt;3.8B parameters on CPU is not viable for a game. NPU is not optional&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Next: Getting QNN HTP to Actually Work — The libcdsprpc.so Wall&lt;/em&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>android</category>
      <category>llm</category>
      <category>gamedev</category>
    </item>
    <item>
      <title>I Started Building a Roguelike RPG — Powered by On-Device AI #1</title>
      <dc:creator>as1as</dc:creator>
      <pubDate>Fri, 03 Apr 2026 23:29:30 +0000</pubDate>
      <link>https://dev.to/as1as/i-started-building-a-roguelike-rpg-powered-by-on-device-ai-2o4i</link>
      <guid>https://dev.to/as1as/i-started-building-a-roguelike-rpg-powered-by-on-device-ai-2o4i</guid>
      <description>&lt;p&gt;I've been getting into on-device AI lately.&lt;/p&gt;

&lt;p&gt;Not cloud AI. Not sending requests to a server somewhere. I mean a language model running entirely on the phone itself — no internet required, no API costs, no data leaving the device.&lt;/p&gt;

&lt;p&gt;When I learn something new, I have to build something. So I did.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why On-Device AI Is Interesting Right Now
&lt;/h2&gt;

&lt;p&gt;It's slow. It's limited. Compared to cloud LLMs, it's nowhere close.&lt;/p&gt;

&lt;p&gt;But the direction is clear.&lt;/p&gt;

&lt;p&gt;Smartphone NPUs are getting significantly more powerful every year. Model compression techniques are improving every month. The performance that required a cloud GPU two years ago is starting to run on a phone today.&lt;/p&gt;

&lt;p&gt;The people who get familiar with this now will have an advantage when it becomes mainstream. That's why I'm learning it now.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;A roguelike RPG.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Help Wanted: Hero&lt;/strong&gt; — Conquering the Demon Lord's Castle, 300 floors.&lt;/p&gt;

&lt;p&gt;An on-device LLM generates the dungeon every 5 floors. Mob names, dialogue, boss patterns, hidden events — all created locally, no server involved.&lt;/p&gt;

&lt;p&gt;Why a game? Because it's a domain where AI being wrong is fine.&lt;/p&gt;

&lt;p&gt;If the mob name sounds weird, it's funny. If the boss dialogue is a little off, it adds to the charm. Games naturally absorb the limitations of small on-device models in a way that most other apps can't.&lt;/p&gt;

&lt;p&gt;Also, roguelikes need fresh content every run. That's exactly what generative AI is good at.&lt;/p&gt;




&lt;h2&gt;
  
  
  How It's Going
&lt;/h2&gt;

&lt;p&gt;I ran the first test on a Samsung Galaxy S24 Ultra.&lt;/p&gt;

&lt;p&gt;Generating one dungeon set took &lt;strong&gt;8 minutes and 43 seconds&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That's CPU-only inference with Phi-4-mini (3.8B, INT4 quantized) via ONNX Runtime on Android. NPU acceleration is essential. I'm currently hitting a wall trying to get QNN HTP working.&lt;/p&gt;

&lt;p&gt;The next post will cover the full implementation — Unity + ONNX Runtime Android setup, building a C# tokenizer from scratch, the KV cache inference engine, and exactly where and why things broke.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next: Unity + ONNX Runtime Android — A Full Breakdown of What Went Wrong (and What Didn't)&lt;/em&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>android</category>
      <category>llm</category>
      <category>gamedev</category>
    </item>
    <item>
      <title>How I Pushed PageSpeed from 52 to 98 — The Lazy Loading Trap I Set for Myself</title>
      <dc:creator>as1as</dc:creator>
      <pubDate>Tue, 31 Mar 2026 05:52:05 +0000</pubDate>
      <link>https://dev.to/as1as/how-i-pushed-pagespeed-from-52-to-98-the-lazy-loading-trap-i-set-for-myself-4pi2</link>
      <guid>https://dev.to/as1as/how-i-pushed-pagespeed-from-52-to-98-the-lazy-loading-trap-i-set-for-myself-4pi2</guid>
      <description>&lt;h1&gt;
  
  
  How I Pushed PageSpeed from 52 to 98 — The Lazy Loading Trap I Set for Myself
&lt;/h1&gt;

&lt;p&gt;Performance optimization has a way of humbling you.&lt;/p&gt;

&lt;p&gt;While building TalkWith.chat, I checked PageSpeed Insights one day and saw this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mobile: 52. Desktop: 69.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not great. So I started digging.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Thing That Was Killing the Score
&lt;/h2&gt;

&lt;p&gt;The biggest culprit turned out to be a single line of code.&lt;/p&gt;

&lt;p&gt;The topic banner image — the main visual sitting above the fold, visible the moment the page opens — had &lt;code&gt;loading="lazy"&lt;/code&gt; on it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The problem&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;img&lt;/span&gt;
  &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bannerImage&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
  &lt;span class="na"&gt;loading&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"lazy"&lt;/span&gt;  &lt;span class="c1"&gt;// 👈 this was it&lt;/span&gt;
  &lt;span class="na"&gt;alt&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"Today's debate topic"&lt;/span&gt;
&lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;loading="lazy"&lt;/code&gt; tells the browser: "don't load this until the user scrolls near it." For images below the fold, that's a smart optimization. For the &lt;strong&gt;LCP element sitting at the top of the page&lt;/strong&gt;, it's a disaster. The browser was actively deferring the most important image on the page.&lt;/p&gt;

&lt;p&gt;The fix was one attribute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The fix&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;img&lt;/span&gt;
  &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bannerImage&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
  &lt;span class="na"&gt;fetchPriority&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"high"&lt;/span&gt;  &lt;span class="c1"&gt;// 👈 load this immediately&lt;/span&gt;
  &lt;span class="na"&gt;alt&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"Today's debate topic"&lt;/span&gt;
&lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;fetchPriority="high"&lt;/code&gt; tells the browser this image is critical — load it first. LCP improved immediately. This single change had the biggest impact of everything I did.&lt;/p&gt;




&lt;h2&gt;
  
  
  The CLS Problem: Layout Jumping on Load
&lt;/h2&gt;

&lt;p&gt;The second issue was CLS (Cumulative Layout Shift) at 0.147 — above the 0.1 threshold.&lt;/p&gt;

&lt;p&gt;The cause was subtle. The banner div only rendered when an image existed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before — only renders when image exists&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bannerImage&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"banner-container"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;img&lt;/span&gt; &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bannerImage&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the page loaded before the image was ready, the banner didn't exist. When the image loaded, the banner appeared and pushed everything below it down. Classic layout shift.&lt;/p&gt;

&lt;p&gt;The fix: always render the container, use a placeholder when there's no image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// After — always reserves space&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"banner-container"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bannerImage&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;img&lt;/span&gt; &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bannerImage&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="na"&gt;fetchPriority&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"high"&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"banner-placeholder"&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The container now holds its space regardless of whether the image has loaded. Nothing jumps.&lt;/p&gt;




&lt;h2&gt;
  
  
  Image Sizing Was Also a Problem
&lt;/h2&gt;

&lt;p&gt;TalkWith.chat generates a lot of images automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;At user onboarding&lt;/strong&gt; — an AI persona image is generated based on the user's personality quiz answers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On level-up&lt;/strong&gt; — the AI image evolves based on the user's debate history and comments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Every day, for each topic&lt;/strong&gt; — a topic banner image, a PRO side image, and a CON side image&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of these images were coming out of the AI image generation API as &lt;strong&gt;1024×1024 squares&lt;/strong&gt; — regardless of how they'd actually be used.&lt;/p&gt;

&lt;p&gt;A small navigation avatar doesn't need to be 1024×1024. A topic banner doesn't need to be square. Oversized images waste bandwidth and drag down performance.&lt;/p&gt;

&lt;p&gt;I introduced proper sizing at generation time:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Topic banner&lt;/td&gt;
&lt;td&gt;1024×400 (center-crop)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro/Con images&lt;/td&gt;
&lt;td&gt;640×400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persona full image&lt;/td&gt;
&lt;td&gt;512×512&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persona avatar (nav/cards)&lt;/td&gt;
&lt;td&gt;256×256&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Then wrote two migration scripts using Pillow to backfill existing images in storage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# resize_topic_images.py — core logic
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;resize_topic_banner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;target_w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;

    &lt;span class="n"&gt;src_ratio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt;
    &lt;span class="n"&gt;target_ratio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;target_w&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;target_h&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;src_ratio&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;target_ratio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;new_h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt;
        &lt;span class="n"&gt;new_w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_h&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;target_ratio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;new_w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt;
        &lt;span class="n"&gt;new_h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_w&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;target_ratio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;left&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;new_w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="n"&gt;top&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;new_h&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;crop&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;left&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;left&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;new_w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;new_h&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resize&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;target_w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_h&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LANCZOS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running these against existing storage brought image payload sizes down noticeably.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mobile Performance&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;86&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Desktop Performance&lt;/td&gt;
&lt;td&gt;69&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;98&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLS&lt;/td&gt;
&lt;td&gt;0.147&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.092&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Desktop 98 is genuinely hard to reach. The lazy loading fix and image sizing together got there.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Actually Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Don't use &lt;code&gt;loading="lazy"&lt;/code&gt; on above-the-fold images.&lt;/strong&gt; Applying lazy load to everything feels like a solid optimization, but for your LCP element it actively works against you. The most important image on the page should have &lt;code&gt;fetchPriority="high"&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLS isn't just about animations or fonts.&lt;/strong&gt; Conditional rendering that adds elements after load causes layout shifts too. If a container might appear later, reserve its space from the start.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Size images at generation time, not display time.&lt;/strong&gt; CSS can make a 1024×1024 image look small, but the browser still downloads every byte of the original. Generate the right size when the image is created.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;TalkWith.chat is a daily AI debate platform — 100 AI personas argue global topics every day. &lt;a href="https://www.talkwith.chat" rel="noopener noreferrer"&gt;talkwith.chat&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>performance</category>
      <category>nextjs</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Fine-Tuning AI for Free — Kaggle + QLoRA Hands-On Guide</title>
      <dc:creator>as1as</dc:creator>
      <pubDate>Sat, 28 Mar 2026 02:20:01 +0000</pubDate>
      <link>https://dev.to/as1as/fine-tuning-ai-for-free-kaggle-qlora-hands-on-guide-2174</link>
      <guid>https://dev.to/as1as/fine-tuning-ai-for-free-kaggle-qlora-hands-on-guide-2174</guid>
      <description>&lt;p&gt;I wanted to fine-tune an AI model to sound more human.&lt;/p&gt;

&lt;p&gt;Not the usual stiff AI tone — something closer to how people actually write &lt;br&gt;
on Reddit. Natural, direct, sometimes blunt. So I decided to fine-tune &lt;br&gt;
Qwen3-8B on Reddit-style data.&lt;/p&gt;

&lt;p&gt;The problem was my local PC. Not enough VRAM. So I went looking for a free &lt;br&gt;
GPU solution and found Kaggle.&lt;/p&gt;

&lt;p&gt;Fair warning: &lt;strong&gt;I made quite a few mistakes along the way.&lt;/strong&gt; &lt;br&gt;
That's what this post is really about.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why Kaggle
&lt;/h2&gt;

&lt;p&gt;Kaggle is known as a data science competition platform, but the key thing is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It gives you free GPU.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;NVIDIA Tesla T4 (15.6GB VRAM)&lt;/li&gt;
&lt;li&gt;30 hours of GPU per week&lt;/li&gt;
&lt;li&gt;Completely free&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One thing to know — Kaggle defaults to internet &lt;strong&gt;OFF&lt;/strong&gt;. You can switch it &lt;br&gt;
on under Settings → Internet, and turning it on costs nothing extra.&lt;/p&gt;

&lt;p&gt;I worked with internet OFF, which led to my first mistake.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Full Flow
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Prepare a public dataset from Hugging Face
2. Connect the model + dataset in Kaggle
3. Run QLoRA fine-tuning
4. Save the adapter and evaluate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 1 — Data Preparation
&lt;/h2&gt;

&lt;p&gt;I wanted Reddit-style data to get that natural, human-sounding tone.&lt;/p&gt;

&lt;p&gt;To be clear: I didn't scrape Reddit. Hugging Face already has publicly &lt;br&gt;
available Reddit-based datasets with proper licensing. That's the right &lt;br&gt;
approach.&lt;/p&gt;

&lt;p&gt;Some options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;webis/tldr-17&lt;/code&gt; — Reddit posts + summaries&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reddit&lt;/code&gt; — based on the public Reddit archive&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sentence-transformers/reddit-title-body&lt;/code&gt; — title/body pairs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data format I used was ChatML-style JSONL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You are a helpful member of r/SideProject."&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Just shipped my side project. Nobody's using it."&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Congrats on shipping. Seriously..."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Upload this to Hugging Face as a Dataset and you can connect it directly &lt;br&gt;
in Kaggle.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 2 — Kaggle Setup
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Connecting the model and dataset
&lt;/h3&gt;

&lt;p&gt;After creating a Kaggle Notebook:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Settings → Accelerator → &lt;strong&gt;GPU T4 x2&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Right panel → Input → Models → search Qwen3-8B → Add&lt;/li&gt;
&lt;li&gt;Right panel → Input → Datasets → add your dataset&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Mistake #1: The model path
&lt;/h3&gt;

&lt;p&gt;I connected the model, then wrote this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen3-8B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;OSError: Can't load the configuration of 'Qwen/Qwen3-8B'.
'[Errno -3] Temporary failure in name resolution'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With internet OFF, it tried to reach HuggingFace and failed. Even though &lt;br&gt;
I had connected the model in Kaggle, passing &lt;code&gt;"Qwen/Qwen3-8B"&lt;/code&gt; still &lt;br&gt;
sends a request to the HuggingFace server instead of using the local copy.&lt;/p&gt;
&lt;h3&gt;
  
  
  Fix: find the real path with glob
&lt;/h3&gt;

&lt;p&gt;Kaggle doesn't clearly tell you where your connected model actually lives. &lt;br&gt;
You have to find it yourself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;glob&lt;/span&gt;

&lt;span class="c1"&gt;# Find the dataset
&lt;/span&gt;&lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/kaggle/input/**/*.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recursive&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No jsonl file found. Check that your dataset is added.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;DATASET_PATH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dataset: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DATASET_PATH&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Dataset: /kaggle/input/datasets/jisungyeom/datafinetune-dataset/finetune_dataset.jsonl
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do the same for the model path, then use that actual path in &lt;code&gt;MODEL&lt;/code&gt;. &lt;br&gt;
Skipping this step means hitting &lt;code&gt;FileNotFoundError&lt;/code&gt; or the OSError above.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 3 — QLoRA Fine-Tuning
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Check the environment
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PYTORCH_CUDA_ALLOC_CONF&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expandable_segments:True&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GPU: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_device_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VRAM: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_device_properties&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;total_memory&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;1e9&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; GB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PyTorch: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__version__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# GPU: Tesla T4
# VRAM: 15.6 GB
# PyTorch: 2.9.0+cu126
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Mistake #2: pip install with internet OFF
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;ERROR: Could not find a version that satisfies the requirement bitsandbytes&amp;gt;&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.46.1
&lt;span class="go"&gt;Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;With internet OFF, pip can't reach PyPI. Either turn internet ON first, &lt;br&gt;
or check if the package is already installed in the Kaggle environment. &lt;br&gt;
In my case, the default version worked fine.&lt;/p&gt;
&lt;h3&gt;
  
  
  Load the model with 4-bit quantization
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BitsAndBytesConfig&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_peft_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prepare_model_for_kbit_training&lt;/span&gt;

&lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/kaggle/input/your-actual-path-found-with-glob&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;MAX_SEQ_LENGTH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;  &lt;span class="c1"&gt;# Reduced from 2048 to save memory
&lt;/span&gt;
&lt;span class="n"&gt;bnb_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BitsAndBytesConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;load_in_4bit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bnb_4bit_quant_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nf4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bnb_4bit_compute_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bnb_4bit_use_double_quant&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pad_token&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pad_token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eos_token&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantization_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bnb_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;prepare_model_for_kbit_training&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  LoRA config
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;lora_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# Reduced from 16 to save memory
&lt;/span&gt;    &lt;span class="n"&gt;lora_alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;target_modules&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;o_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gate_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;up_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;down_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;lora_dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CAUSAL_LM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_peft_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lora_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_trainable_parameters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;LoRA only updates roughly 1% of the total parameters. That's how it fits &lt;br&gt;
in a T4's 15GB VRAM.&lt;/p&gt;
&lt;h3&gt;
  
  
  Dataset class
&lt;/h3&gt;

&lt;p&gt;Only the assistant turn contributes to the loss. The prefix is masked with -100:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ChatMLDataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TorchDataset&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sample&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="n"&gt;prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;add_generation_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;full&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;add_generation_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;prefix_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;add_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;full_enc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;add_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;truncation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;full_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;full_enc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prefix_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;full_ids&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prefix_ids&lt;/span&gt;&lt;span class="p"&gt;):]&lt;/span&gt;
            &lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;full_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;attention_mask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;full_enc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;attention_mask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;labels&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Training
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;training_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TrainingArguments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/kaggle/working/qwen3-reddit-ft&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gradient_accumulation_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;warmup_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_train_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2e-4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;fp16&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;logging_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;save_strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;steps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;save_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;eval_strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;steps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;eval_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;optim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adamw_8bit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gradient_checkpointing&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dataloader_pin_memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;report_to&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;training_args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;train_dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;eval_dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;valid_dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;data_collator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_collator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 4 — Save and Evaluate
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Save LoRA adapter
&lt;/span&gt;&lt;span class="n"&gt;OUTPUT_DIR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/kaggle/working/qwen3-reddit-ft&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OUTPUT_DIR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OUTPUT_DIR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Fuse LoRA into the base model
&lt;/span&gt;&lt;span class="n"&gt;FINAL_DIR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/kaggle/working/qwen3-reddit-final&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;merged_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;merge_and_unload&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;merged_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FINAL_DIR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FINAL_DIR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Evaluation prompts
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;PROMPTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r/SideProject&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Just shipped my side project after 6 months. Nobody&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s using it.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r/artificial&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GPT-5 was just released and it&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s apparently 10x better than Claude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r/webdev&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Should I learn React or just stick with vanilla JS in 2025?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r/LocalLLaMA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Running a 70B model locally on a 4090, is it worth it?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;PROMPTS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful member of &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="c1"&gt;# generate and print response
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What I Actually Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;518 samples was enough to shift the tone.&lt;/strong&gt; Train: 466, Valid: 52. &lt;br&gt;
Even with a small dataset, the response style changed noticeably.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mistakes summary:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model path — don't use &lt;code&gt;"Qwen/Qwen3-8B"&lt;/code&gt; directly. Use glob to find the real path first&lt;/li&gt;
&lt;li&gt;Internet is OFF by default — pip won't work. Turn it on (free) or use pre-installed packages&lt;/li&gt;
&lt;li&gt;VRAM limits — set &lt;code&gt;MAX_SEQ_LENGTH=1024&lt;/code&gt;, &lt;code&gt;batch_size=1&lt;/code&gt;, &lt;code&gt;r=8&lt;/code&gt; to fit on T4&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Fine-tuning isn't that hard. The setup is where the friction is.&lt;/p&gt;

&lt;p&gt;Free Kaggle GPU + open model + public dataset = zero cost to get started. &lt;br&gt;
If you've been curious about fine-tuning but assumed you needed expensive &lt;br&gt;
hardware, this stack removes that excuse.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>OG Images Done Right — How I Made Every Shared Link Work Harder</title>
      <dc:creator>as1as</dc:creator>
      <pubDate>Thu, 26 Mar 2026 03:41:35 +0000</pubDate>
      <link>https://dev.to/as1as/og-images-done-right-how-i-made-every-shared-link-work-harder-4e96</link>
      <guid>https://dev.to/as1as/og-images-done-right-how-i-made-every-shared-link-work-harder-4e96</guid>
      <description>&lt;p&gt;One thing I learned building TalkWith.chat:&lt;/p&gt;

&lt;p&gt;No matter how good your product is, if sharing a link on KakaoTalk or Slack shows no image — nobody clicks.&lt;/p&gt;

&lt;p&gt;A single OG image determines your click-through rate. So I decided to do it properly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Everyone Knows What OG Images Are
&lt;/h2&gt;

&lt;p&gt;The image specified in &lt;code&gt;&amp;lt;meta property="og:image"&amp;gt;&lt;/code&gt; that appears as a preview when sharing links. Works on Twitter, Slack, KakaoTalk, Discord — everywhere.&lt;/p&gt;

&lt;p&gt;The problem with &lt;strong&gt;static images&lt;/strong&gt; is that every page shows the same thumbnail. On a debate platform where a new topic drops every day, having every topic page share the same image is pointless.&lt;/p&gt;

&lt;p&gt;"What if the topic title was baked into the OG image?" That was the starting point.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dynamic OG Images in Next.js: &lt;code&gt;opengraph-image.tsx&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Since Next.js 13 App Router, placing an &lt;code&gt;opengraph-image.tsx&lt;/code&gt; file in a folder automatically registers it as that page's OG image. Inside that file, you use &lt;code&gt;ImageResponse&lt;/code&gt; to generate an image from JSX.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/[locale]/topic/[date]/opengraph-image.tsx&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ImageResponse&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/og&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;runtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;edge&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;630&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;OGImage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getTopic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ImageResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;background&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#050508&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;100%&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;100%&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;display&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;flex&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;flexDirection&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;column&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;60px&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;p&lt;/span&gt; &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;color&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#00f0ff&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;fontSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;TODAY'S BATTLE&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;p&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;h1&lt;/span&gt; &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;color&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;white&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;fontSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;48&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;h1&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;color&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#00f0ff&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;fontSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;marginTop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;auto&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
          ▶ JOIN THE DEBATE
        &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;630&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It runs on Edge Runtime so it's fast, and writing plain JSX makes designing straightforward.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned Building This
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. SEO title and OG title should be different
&lt;/h3&gt;

&lt;p&gt;At first I used the SEO title directly as the OG title. But a title like "Should the U.S. government prioritize national security alerts over diplomatic engagement with Mexico? | TalkWith.chat" gets brutally truncated in a KakaoTalk share preview.&lt;/p&gt;

&lt;p&gt;SEO title is for search results. OG title is for share previews. Manage them separately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;generateMetadata&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getTopic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; | TalkWith.chat`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// SEO&lt;/span&gt;
    &lt;span class="na"&gt;openGraph&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;My AI debates for me.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Share preview — short and punchy&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;160&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Custom fonts need to be loaded manually
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;ImageResponse&lt;/code&gt; only has access to system fonts by default. For custom fonts, you need to fetch the font file and pass it in explicitly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;font&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/fonts/Orbitron-Bold.ttf&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NEXT_PUBLIC_URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ImageResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jsx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;630&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;fonts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Orbitron&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;font&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;700&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I spent a while confused about why my font wasn't rendering before I figured this out.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. A CTA button changes everything
&lt;/h3&gt;

&lt;p&gt;My first OG image just showed the topic title. Feedback came back: "I see the link preview but I don't know what I'm supposed to do."&lt;/p&gt;

&lt;p&gt;Adding &lt;code&gt;▶ JOIN FREE NOW&lt;/code&gt; at the bottom made a real difference. Think of OG images as mini ad banners. A title-only image and a title-plus-CTA image perform differently.&lt;/p&gt;




&lt;h2&gt;
  
  
  OG Image Strategy by Page Type
&lt;/h2&gt;

&lt;p&gt;I took a different approach for each page type on TalkWith.chat:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Page&lt;/th&gt;
&lt;th&gt;OG Image Strategy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Home (&lt;code&gt;/&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Today's topic title + CTA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Topic detail (&lt;code&gt;/topic/[date]&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Topic title + PRO/CON stance + CTA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User profile (&lt;code&gt;/user/[username]&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Username + level + stats&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Static pages (about, privacy)&lt;/td&gt;
&lt;td&gt;Fixed brand image&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The more dynamic the page, the more value there is in putting that page's core content into the OG image.&lt;/p&gt;




&lt;h2&gt;
  
  
  OG Image Checklist
&lt;/h2&gt;

&lt;p&gt;Things to verify once you've built it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Testing tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.opengraph.xyz" rel="noopener noreferrer"&gt;opengraph.xyz&lt;/a&gt; — full preview across platforms&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cards-dev.twitter.com/validator" rel="noopener noreferrer"&gt;cards-dev.twitter.com/validator&lt;/a&gt; — Twitter card checker&lt;/li&gt;
&lt;li&gt;Kakao Developer Debugger — KakaoTalk share preview&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Common mistakes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Image dimensions: 1200×630px is the standard. Smaller images render blurry&lt;/li&gt;
&lt;li&gt;Cache issues: updated your OG image but the old one keeps showing? Each platform caches aggressively — you need to manually clear it per platform&lt;/li&gt;
&lt;li&gt;Make sure &lt;code&gt;og:image&lt;/code&gt; uses an absolute URL. Relative paths silently fail on some platforms&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;OG images feel like a set-it-and-forget-it thing once they're in place. But in practice, they determine the first impression every single time your link gets shared.&lt;/p&gt;

&lt;p&gt;Don't use one static image for everything. Putting the right dynamic content in each page's OG image is what actually drives clicks.&lt;/p&gt;

&lt;p&gt;Next.js &lt;code&gt;opengraph-image.tsx&lt;/code&gt; is easier to implement than it looks. If you haven't done it yet, now's a good time.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>nextjs</category>
      <category>seo</category>
      <category>javascript</category>
    </item>
    <item>
      <title>WASM in 2026: What I Found After Testing It for a Video Platform</title>
      <dc:creator>as1as</dc:creator>
      <pubDate>Thu, 26 Mar 2026 03:37:07 +0000</pubDate>
      <link>https://dev.to/as1as/wasm-in-2026-what-i-found-after-testing-it-for-a-video-platform-3lne</link>
      <guid>https://dev.to/as1as/wasm-in-2026-what-i-found-after-testing-it-for-a-video-platform-3lne</guid>
      <description>&lt;p&gt;I'm an IT engineer at an online video platform company.&lt;/p&gt;

&lt;p&gt;My job involves constantly evaluating new technologies to deliver better services to our clients. Last year, a question came up within the team: "What if we could process video directly in the browser, without a server?"&lt;/p&gt;

&lt;p&gt;That single question pulled me down a WASM rabbit hole.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why In-Browser Video Processing?
&lt;/h2&gt;

&lt;p&gt;One of the services we provide to clients involves video upload and processing. The traditional approach was straightforward — users upload a file, the server handles encoding, splitting, and analysis, then returns the result.&lt;/p&gt;

&lt;p&gt;The problem was cost and latency. The larger the file, the higher the server cost, and the longer the wait. It felt wasteful to route even simple preprocessing or analysis tasks through a server.&lt;/p&gt;

&lt;p&gt;"What if we could handle this on the client side?" That idea was the starting point for evaluating WASM.&lt;/p&gt;




&lt;h2&gt;
  
  
  Can WASM Actually Handle Video Processing in the Browser?
&lt;/h2&gt;

&lt;p&gt;The short answer: &lt;strong&gt;yes.&lt;/strong&gt; And more than you'd expect.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;ffmpeg.wasm&lt;/code&gt; — FFmpeg compiled to WebAssembly — all of the following become possible directly in the browser:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Video analysis&lt;/strong&gt; — extracting resolution, codec, framerate, bitrate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encoding / transcoding&lt;/strong&gt; — converting between MP4, WebM, MOV&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Video splitting&lt;/strong&gt; — trimming specific segments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Video merging&lt;/strong&gt; — concatenating multiple clips&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thumbnail extraction&lt;/strong&gt; — grabbing frames at specific timestamps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No server. Inside the browser. The user's file never leaves their device. From a privacy standpoint, that's a genuinely powerful advantage.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Found in Practice: Potential and Limits
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Potential
&lt;/h3&gt;

&lt;p&gt;Performance was better than expected. For short clips, encoding ran at roughly 2–5x slower than native — which sounds bad until you remember this is running inside a browser tab. The fact that it works at all is impressive.&lt;/p&gt;

&lt;p&gt;Video analysis in particular ran close to real-time. Being able to extract metadata instantly without uploading the file to a server is something you can put directly into a better UX.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Limit: Memory
&lt;/h3&gt;

&lt;p&gt;The biggest constraint was &lt;strong&gt;memory&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;WebAssembly memory in the browser has hard limits. Feed it a large video file without care and you'll hit an out-of-memory crash. I experienced this firsthand — loading a 1GB file directly killed the tab.&lt;/p&gt;

&lt;p&gt;The solution is &lt;strong&gt;chunked processing&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Split the file into chunks and process sequentially&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CHUNK_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// 64MB&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;CHUNK_SIZE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;CHUNK_SIZE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Write each chunk to the buffer and process&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="c1"&gt;// WASM processing here&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Splitting large files into chunks and writing them to the buffer sequentially sidesteps the memory issue. The tradeoff is added implementation complexity — but it's manageable.&lt;/p&gt;




&lt;h2&gt;
  
  
  The WASM Ecosystem in 2026: What Actually Got Better
&lt;/h2&gt;

&lt;p&gt;While evaluating WASM for our platform, I took a broader look at the ecosystem.&lt;br&gt;
The changes from even two years ago are significant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Safari Finally Caught Up
&lt;/h3&gt;

&lt;p&gt;For years, Safari was the "new Internet Explorer" of the WASM world.&lt;br&gt;
Developers had to write fallback code or avoid features entirely because&lt;br&gt;
Apple consistently lagged behind Chrome and Firefox.&lt;/p&gt;

&lt;p&gt;Safari 18.4 added support for the new Wasm exception spec, and Safari 26.0 introduced a new in-place interpreter for faster startup of large Wasm modules. This has meaningfully closed the cross-browser gap. If you shelved a WASM project a couple of years ago because of Safari compatibility concerns, it's worth revisiting.&lt;/p&gt;

&lt;h3&gt;
  
  
  WebAssembly 3.0 and WASI Preview 3
&lt;/h3&gt;

&lt;p&gt;WebAssembly 3.0 was announced, bringing a host of new features into the main specification. The Bytecode Alliance has also been adding async support to WASI in preparation for a 0.3 release, with Wasmtime now having experimental support for WASI 0.3.&lt;/p&gt;

&lt;p&gt;The async support in particular is a big deal for video processing use cases.&lt;br&gt;
Previously, long-running operations would block. Native async means cleaner&lt;br&gt;
code and better UX without the workarounds.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Component Model: Mixing Languages Is Finally Practical
&lt;/h3&gt;

&lt;p&gt;In 2026, the Wasm Component Model has largely solved the problem of mixing libraries from different languages. Developers can now write business logic in Rust, data processing modules in Python, and glue code in JavaScript, compiling them all into composable Wasm components.&lt;/p&gt;

&lt;p&gt;For a video platform this is meaningful. FFmpeg bindings in C, custom&lt;br&gt;
processing logic in Rust, orchestration in JavaScript — these can now&lt;br&gt;
talk to each other without painful FFI layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloud Providers Are Treating WASM as First-Class
&lt;/h3&gt;

&lt;p&gt;AWS Lambda now supports Wasm functions as a first-class runtime, with benchmarks showing 10-40x improvements in cold start times compared to container-based functions. Google Cloud offers Wasm through Cloud Run, and Azure Functions provides Wasm support through a dedicated preview.&lt;/p&gt;

&lt;p&gt;At SUSECON 2025, Fermyon's CEO demonstrated sub-millisecond cold starts (~0.5ms) for Wasm functions on Kubernetes versus hundreds of milliseconds for AWS Lambda.&lt;/p&gt;

&lt;p&gt;This changes the calculus for server-side processing too.&lt;br&gt;
If you're running video analysis jobs on Lambda, switching to Wasm&lt;br&gt;
could be a serious cost optimization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Debugging Got Real
&lt;/h3&gt;

&lt;p&gt;One of the biggest frustrations with WASM historically was debugging.&lt;br&gt;
When something went wrong, you were mostly guessing.&lt;/p&gt;

&lt;p&gt;Modern browser DevTools now include DWARF debugging support for WebAssembly. You can set breakpoints in your original source code — Rust, C++, etc. — and step through execution, inspect variables, and view call stacks.&lt;/p&gt;

&lt;p&gt;It's not quite as smooth as debugging JavaScript yet, but it's functional.&lt;br&gt;
This alone makes WASM significantly more approachable for production use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adoption Numbers Back It Up
&lt;/h3&gt;

&lt;p&gt;WebAssembly adoption grew to 5.5% of sites in 2025, driven by AI needs and performance demands, moving toward becoming a mainstream infrastructure layer.&lt;/p&gt;

&lt;p&gt;That's still a minority, but the trajectory is clear.&lt;br&gt;
The technology is no longer in the "wait and see" category.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Should You Actually Use WASM?
&lt;/h2&gt;

&lt;p&gt;After wrapping up the evaluation, here's where I landed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have CPU-intensive operations (encoding, encryption, image/video processing)&lt;/li&gt;
&lt;li&gt;Sending data to a server is difficult (privacy concerns, large files)&lt;/li&gt;
&lt;li&gt;You want to reuse existing C/C++/Rust libraries on the web&lt;/li&gt;
&lt;li&gt;You need fast computation at the edge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Skip it when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're doing standard UI rendering or form handling&lt;/li&gt;
&lt;li&gt;The data manipulation is lightweight&lt;/li&gt;
&lt;li&gt;JavaScript is already fast enough&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core principle: &lt;strong&gt;reach for WASM when JavaScript starts feeling slow.&lt;/strong&gt; Building with WASM from the start is likely over-engineering.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;By 2026, WASM has crossed the line from "a technology worth watching" to "a technology people are actually using."&lt;/p&gt;

&lt;p&gt;Encoding video in the browser, running ML inference at the edge, handling encryption without a server — these are real things now.&lt;/p&gt;

&lt;p&gt;That said, the memory constraints and other limitations are still real, and WASM isn't the answer to every problem. Think of it as a tool you reach for when JavaScript isn't enough. That's WASM's position in 2026.&lt;/p&gt;

</description>
      <category>webassembly</category>
      <category>javascript</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Why We're Going Back to the Server — The SSR Revival of 2026</title>
      <dc:creator>as1as</dc:creator>
      <pubDate>Tue, 24 Mar 2026 06:38:58 +0000</pubDate>
      <link>https://dev.to/as1as/why-were-going-back-to-the-server-the-ssr-revival-of-2026-576i</link>
      <guid>https://dev.to/as1as/why-were-going-back-to-the-server-the-ssr-revival-of-2026-576i</guid>
      <description>&lt;p&gt;In the mid-2010s, web developers started treating server-rendered HTML as something old-fashioned.&lt;/p&gt;

&lt;p&gt;React, Vue, and Angular ushered in the SPA era. "The server just serves the API" became the dominant philosophy. Fast interactions, app-like experiences, clean separation between frontend and backend. Everyone ran in that direction.&lt;/p&gt;

&lt;p&gt;And now, in 2026, we are &lt;strong&gt;quietly but unmistakably&lt;/strong&gt; going back to the server.&lt;/p&gt;




&lt;h2&gt;
  
  
  What SPAs Promised vs. What Actually Happened
&lt;/h2&gt;

&lt;p&gt;The promise of SPAs was clear: load once, navigate without full page refreshes, reduce server load, dramatically improve UX.&lt;/p&gt;

&lt;p&gt;Reality played out a little differently.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;First load got slower.&lt;/strong&gt; The browser receives an empty HTML shell, downloads hundreds of kilobytes of JavaScript, parses it, executes it — and only then does anything appear. Users stare at a white screen.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SEO broke.&lt;/strong&gt; If a search engine can't execute JavaScript, it sees an empty page. Even when Google does crawl properly, indexing timing is slower and less reliable than SSR.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bundle sizes exploded.&lt;/strong&gt; As features grew, JavaScript bundles ballooned. Initial JS bundles over 500KB–2MB became commonplace. Code splitting helped, but complexity kept climbing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client-side state management became a nightmare.&lt;/strong&gt; Redux, MobX, Zustand, Jotai... things that would have taken one line on the server ballooned into &lt;strong&gt;tens, sometimes hundreds of lines&lt;/strong&gt; of state management code.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Changed in 2026
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Next.js App Router becoming the default&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The App Router introduced in Next.js 13 has now settled in as the &lt;strong&gt;de facto standard for new projects in 2026&lt;/strong&gt;. React Server Components (RSC) opened a new paradigm: render on the server by default, handle interactivity on the client only where needed.&lt;/p&gt;

&lt;p&gt;You can choose server or client at the component level. Server components don't ship to the bundle at all. You can run database queries directly inside a component — just like PHP used to do. But with the full React ecosystem intact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Server-first architecture in the AI era&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As AI features moved into web apps, the server's role became critical again. LLM API calls, embedding generation, vector search — all of this needs to happen server-side for security alone. In a pure SPA architecture, you'd need a separate backend to handle this safely. With SSR, it's solved naturally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core Web Vitals with real consequences&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once Google started factoring LCP (Largest Contentful Paint), FID, and CLS into search rankings, slow SPAs started paying a real SEO penalty. SSR — which sends fully-rendered HTML — has a structural advantage on these metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The rise of Astro, Remix, and SvelteKit&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Frameworks built around "server-rendered by default, client-side only when necessary" have grown fast. Astro in particular set a new performance benchmark with its Islands Architecture: ship zero JavaScript by default.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Misconception: Isn't SSR Slow?
&lt;/h2&gt;

&lt;p&gt;A common pushback: "Doesn't SSR put more load on the server and slow things down?"&lt;/p&gt;

&lt;p&gt;In 2015, that was fair. SSR then meant generating HTML on the server for every single request, with direct cost implications.&lt;/p&gt;

&lt;p&gt;Today it's different.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Edge Runtime&lt;/strong&gt;: Vercel, Cloudflare Workers, and similar platforms run SSR at the edge — the compute node closest to each user. Latency is minimal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming SSR&lt;/strong&gt;: Instead of generating all the HTML before sending anything, the server streams it — ready parts first. TTFB (Time To First Byte) improves dramatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental Static Regeneration&lt;/strong&gt;: Pages that don't change often get cached and only regenerate when needed. Static site speed plus dynamic data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result: TTFB drops significantly, and users see an almost-instant first screen.&lt;/p&gt;




&lt;h2&gt;
  
  
  SPAs Aren't Dead
&lt;/h2&gt;

&lt;p&gt;To be honest, SSR isn't always the right answer.&lt;/p&gt;

&lt;p&gt;Dashboards, admin panels, real-time collaboration tools — apps with no SEO requirements and complex interactions are still well-suited to SPAs. If users only access the app after logging in, UX responsiveness matters more than initial load time.&lt;/p&gt;

&lt;p&gt;The trend isn't converging on "SSR vs SPA" as a binary. It's converging on &lt;strong&gt;hybrid&lt;/strong&gt;. Public pages rendered server-side, authenticated dashboards running as SPAs — this architecture is becoming the standard.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Experienced Building With It
&lt;/h2&gt;

&lt;p&gt;I built TalkWith.chat on Next.js App Router.&lt;/p&gt;

&lt;p&gt;The public pages — today's debate topic, the archive, the rankings — are server components that query Supabase directly and send back fully-rendered HTML. No separate API routes, no client-side fetch. The code shrank by nearly half, and LCP improved noticeably.&lt;/p&gt;

&lt;p&gt;Interactive elements — the like button, the opinion submission form — are client components declared with &lt;code&gt;'use client'&lt;/code&gt;. Mixing server and client where each makes sense clicked immediately. It just felt right.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Summary
&lt;/h2&gt;

&lt;p&gt;The return to server-side rendering isn't a step backward.&lt;/p&gt;

&lt;p&gt;The SPA era gave us real experience with the limits of client-first architecture. What's coming back is a more refined form of server rendering. React Server Components, Streaming SSR, Edge Runtime — this isn't your PHP-era server rendering.&lt;/p&gt;

&lt;p&gt;The era where developers can freely choose where each boundary between server and client lives has arrived.&lt;br&gt;
And &lt;strong&gt;in 2026, the web is making it clear: for most cases, the default should be the server.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is the second post in my TalkWith.chat dev log series.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;First post: &lt;a href="https://dev.to/as1as/the-limits-of-vibe-coding-what-nobody-tells-you-after-the-honeymoon-phase-4c44"&gt;The Limits of Vibe Coding — What Nobody Tells You After the Honeymoon Phase&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I built TalkWith.chat solo. It's live — AI Characters debating global topics every day.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;→ &lt;a href="https://www.talkwith.chat" rel="noopener noreferrer"&gt;https://www.talkwith.chat&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




</description>
      <category>webdev</category>
      <category>nextjs</category>
      <category>react</category>
      <category>ssr</category>
    </item>
    <item>
      <title>The Limits of Vibe Coding — What Nobody Tells You After the Honeymoon Phase</title>
      <dc:creator>as1as</dc:creator>
      <pubDate>Tue, 24 Mar 2026 06:21:16 +0000</pubDate>
      <link>https://dev.to/as1as/the-limits-of-vibe-coding-what-nobody-tells-you-after-the-honeymoon-phase-4c44</link>
      <guid>https://dev.to/as1as/the-limits-of-vibe-coding-what-nobody-tells-you-after-the-honeymoon-phase-4c44</guid>
      <description>&lt;p&gt;I built an AI debate platform &lt;strong&gt;solo&lt;/strong&gt; in one week.&lt;/p&gt;

&lt;p&gt;AI Characters, daily auto-generated global debate topics, 72-badge gamification, i18n, a bot runner, cron jobs, an admin panel. All of it. All through vibe coding.&lt;/p&gt;

&lt;p&gt;Let me be honest — vibe coding is what made it possible. I'm not here to trash it.&lt;/p&gt;

&lt;p&gt;But after 100+ commits and real production experience, I've hit enough walls to talk about what vibe coding &lt;em&gt;doesn't&lt;/em&gt; tell you — especially if you're trying to build something beyond a weekend project.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Mean by Vibe Coding
&lt;/h2&gt;

&lt;p&gt;Describing what you want to an AI (Claude Code, Cursor, Copilot), iterating on the output, and shipping — without necessarily reading every line you deploy.&lt;/p&gt;

&lt;p&gt;Andrej Karpathy's original framing was about liberation: just say what you want and let the AI figure it out. And for going from zero to something that actually works, it genuinely delivers.&lt;/p&gt;

&lt;p&gt;The problem starts the moment that "something" needs to be maintained.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limit 1: The AI Doesn't Know What It Doesn't Know
&lt;/h2&gt;

&lt;p&gt;When I asked Claude Code to write my Supabase RLS policies, the output looked perfect and passed local tests. The issue only appeared in production — a permission error triggered when the bot runner executed in a specific pattern.&lt;/p&gt;

&lt;p&gt;The AI had no idea how my bot runner worked. It wrote correct code for the scenario I described, not for the full system it couldn't see.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pattern I kept hitting:&lt;/strong&gt; AI produces locally correct code that's globally wrong. It solves the problem you described — not necessarily the problem you actually have.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limit 2: Refactoring Debt Accumulates Fast
&lt;/h2&gt;

&lt;p&gt;Vibe coding is additive by nature. Need a feature? Add it. Bug? Patch it.&lt;/p&gt;

&lt;p&gt;By commit 60, my codebase had:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 slightly different patterns for the same Supabase auth check&lt;/li&gt;
&lt;li&gt;API routes that were 80% identical but never abstracted&lt;/li&gt;
&lt;li&gt;A component with 12 props because adding one more was always easier than restructuring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I asked the AI to "clean this up," it would fix the file I showed it — and leave the 4 other files doing the same thing differently completely untouched.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vibe coding doesn't refactor. It accumulates.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Fixing this required me to read the code myself, map the patterns, and give precise instructions. Which is fine — but it's not vibe coding anymore.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limit 3: You Can't Debug What You Don't Understand
&lt;/h2&gt;

&lt;p&gt;This one hurt the most. And it was the most embarrassing.&lt;/p&gt;

&lt;p&gt;My bot runner started producing duplicate comments intermittently. The AI had written the execution phases (opinions → likes → comments → attacks), and I deployed it without fully understanding the flow.&lt;/p&gt;

&lt;p&gt;When the bug appeared, I had no mental model of the code. I knew &lt;em&gt;what&lt;/em&gt; it did. I didn't know &lt;em&gt;how&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Debugging with the AI looked like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Describe the symptom&lt;/li&gt;
&lt;li&gt;AI proposes a fix&lt;/li&gt;
&lt;li&gt;Fix doesn't work or breaks something else&lt;/li&gt;
&lt;li&gt;Repeat&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I ran this loop for over an hour. Each suggestion from the AI was plausible. Some actually fixed something — but something else broke. I was going deeper into the hole, and my trust in the codebase was hitting the floor.&lt;/p&gt;

&lt;p&gt;What finally fixed the bug was stopping all prompting entirely and spending 20 minutes reading the code from scratch. I mapped out which phase ran in what order, where an API call could duplicate. The cause became obvious. The fix took 5 minutes.&lt;/p&gt;

&lt;p&gt;Instead of asking the AI "why does this bug exist," I should have &lt;strong&gt;read the code properly from the start.&lt;/strong&gt;&lt;br&gt;
Those 20 minutes made the previous hour a complete waste.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you can't debug it without the AI, you don't own it.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Limit 4: Early Architecture Decisions Get Locked In
&lt;/h2&gt;

&lt;p&gt;When I set up i18n with next-intl, I quickly chose the &lt;code&gt;[locale]&lt;/code&gt; dynamic segment approach. The AI scaffolded everything and it worked.&lt;/p&gt;

&lt;p&gt;Three weeks later, when I needed server components to access locale in a specific way, I realized the architecture already had opinions baked in that I hadn't consciously chosen — I'd just accepted the AI's first reasonable answer.&lt;/p&gt;

&lt;p&gt;Changing it would mean touching 40+ files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The AI optimizes for making the current feature work — not for the architecture you'll want in three months.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This isn't a criticism of the AI. It did exactly what I asked. The problem is that "make this work" and "design this well" are different requests, and vibe coding defaults to the first one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limit 5: The Context Window Is a Silent Killer
&lt;/h2&gt;

&lt;p&gt;Every new conversation starts fresh. The AI doesn't remember the 47 decisions you made last week.&lt;/p&gt;

&lt;p&gt;I eventually created a &lt;code&gt;CLAUDE.md&lt;/code&gt; file in the repo — a project state document the AI reads at the start of every session. Here's what actually went into it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stack rules&lt;/strong&gt;: "TailwindCSS v4 only — no separate CSS files"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture decisions&lt;/strong&gt;: "All AI calls must use Gemini → GPT-4o Mini failover"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy rules&lt;/strong&gt;: "Always run &lt;code&gt;pnpm build&lt;/code&gt; locally before pushing"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DB migration method&lt;/strong&gt;: "Supabase CLI has auth issues — use Management API directly"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gotchas&lt;/strong&gt;: "Turn off VPN before running the bot runner (can't reach talkwith.chat)"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It helped. Reading this document at the start of every session significantly reduced contradictions with previous decisions. But keeping &lt;code&gt;CLAUDE.md&lt;/code&gt; up to date itself requires discipline. Early on, I'd forget to update it. The AI would make choices that contradicted things we'd "already decided." I'd catch it two days later in a code review.&lt;/p&gt;

&lt;p&gt;So I added a separate &lt;code&gt;history.md&lt;/code&gt;. If &lt;code&gt;CLAUDE.md&lt;/code&gt; is "here's how the project works now," &lt;code&gt;history.md&lt;/code&gt; is "here's what we did and why." Having the AI read both at session start cut down repeated mistakes noticeably.&lt;/p&gt;

&lt;p&gt;One more thing that actually worked: &lt;strong&gt;using Claude Code's Todo feature aggressively&lt;/strong&gt;. Before starting any task, I'd have the AI write a checklist first, then check off each step as it completed. The AI always knew where it was in the flow — which meant far less "going back to something we already finished" on long tasks. The longer the task, the bigger the payoff.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vibe coding assumes continuity the AI can't provide. You have to build that continuity yourself — in documents.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What Vibe Coding Is Actually Great At
&lt;/h2&gt;

&lt;p&gt;I don't want to end on a sour note, because vibe coding genuinely changed what's possible for solo developers.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fast prototyping&lt;/strong&gt;: From idea to working UI in hours, not days&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boilerplate elimination&lt;/strong&gt;: Auth flows, CRUD APIs, form validation — the AI handles it and I move on&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Staying unblocked&lt;/strong&gt;: When I don't know the right API or pattern, I get a working answer in 30 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confidence to try things&lt;/strong&gt;: The AI makes the learning curve nearly flat, so I built features I'd never have attempted alone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;TalkWith.chat exists because of vibe coding. Shipping an AI platform with 100+ features solo in a week wasn't really possible before.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Honest Summary
&lt;/h2&gt;

&lt;p&gt;Vibe coding makes &lt;strong&gt;building 10x faster.&lt;/strong&gt;&lt;br&gt;
But maintenance, debugging, and long-term evolution run at &lt;strong&gt;0.5x&lt;/strong&gt; — unless you actively compensate for the limits.&lt;/p&gt;

&lt;p&gt;The developers I've seen struggle most with vibe coding treat it as a complete replacement for engineering judgment. The ones who thrive treat it like a very fast junior developer: incredible output speed, needs direction, can't own the system.&lt;/p&gt;

&lt;p&gt;The system still has to be owned by you.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I built TalkWith.chat solo. It's live — AI Chracters debating global topics every day. All the chaos and lessons are going into this series.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;→ &lt;a href="https://www.talkwith.chat" rel="noopener noreferrer"&gt;https://www.talkwith.chat&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>vibecoding</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
