<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shivnath Tathe</title>
    <description>The latest articles on DEV Community by Shivnath Tathe (@shivnathtathe).</description>
    <link>https://dev.to/shivnathtathe</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3855910%2F92a9b9bf-4309-41f6-89eb-0aa72677047d.jpg</url>
      <title>DEV Community: Shivnath Tathe</title>
      <link>https://dev.to/shivnathtathe</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shivnathtathe"/>
    <language>en</language>
    <item>
      <title>I Tried Building GPT Without Training — Just Math. Here’s Where It Broke | Shivnath Tathe</title>
      <dc:creator>Shivnath Tathe</dc:creator>
      <pubDate>Fri, 17 Apr 2026 07:01:29 +0000</pubDate>
      <link>https://dev.to/shivnathtathe/i-tried-to-create-gpt-with-pure-math-and-no-training-heres-where-it-broke-shivnath-tathe-23a6</link>
      <guid>https://dev.to/shivnathtathe/i-tried-to-create-gpt-with-pure-math-and-no-training-heres-where-it-broke-shivnath-tathe-23a6</guid>
      <description>&lt;h2&gt;
  
  
  The Question
&lt;/h2&gt;

&lt;p&gt;What if we skipped training entirely?&lt;/p&gt;

&lt;p&gt;Every language model — GPT, LLaMA, BERT — learns by optimising a loss function over millions of gradient steps. But the underlying data is just text: words appearing near other words. Co-occurrence. Counting.&lt;/p&gt;

&lt;p&gt;So I asked: &lt;strong&gt;how far can pure mathematics take us toward text generation, without a single training step?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I built the whole thing from scratch in Python with NumPy. No PyTorch, no TensorFlow, no &lt;code&gt;model.train()&lt;/code&gt;. Just matrices, statistics, and formulas.&lt;/p&gt;

&lt;p&gt;Here's what happened.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup: nanoVectorDB
&lt;/h2&gt;

&lt;p&gt;I started with &lt;a href="https://github.com/shivnathtathe/nanoVectorDB" rel="noopener noreferrer"&gt;nanoVectorDB&lt;/a&gt; — a vector database I'd built from scratch using only NumPy. The original goal was embeddings and similarity search. But then I thought: if I can build word vectors without training, can I also &lt;em&gt;generate&lt;/em&gt; text without training?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The corpus:&lt;/strong&gt; WikiText-103 — 80 million tokens of Wikipedia articles. 10,000 word vocabulary covering 89% of all tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The math pipeline:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Co-occurrence matrix&lt;/strong&gt; — For each word pair, count how often they appear within a window of 5 words. Forward-heavy weighting (0.7 forward, 0.3 backward) because "the king" tells you more about what follows than what came before.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PPMI (Positive Pointwise Mutual Information)&lt;/strong&gt; — Raw counts are dominated by common words. PPMI asks: "does this word pair appear together MORE than chance would predict?" It's the formula: &lt;code&gt;PMI(x,y) = log(P(x,y) / P(x)P(y))&lt;/code&gt;, clamped to zero for negative values.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SVD (Singular Value Decomposition)&lt;/strong&gt; — Compress the sparse 10,000×10,000 PPMI matrix into dense 64-dimensional word embeddings. Each word becomes a vector of 64 numbers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bigram grammar matrix&lt;/strong&gt; — Separately, count every word-to-word transition. &lt;code&gt;P(next | last_word)&lt;/code&gt;. A 10,000×10,000 matrix of raw transition probabilities.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No training. Just counting and matrix factorisation.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Pure Math Gets Right: Meaning
&lt;/h2&gt;

&lt;p&gt;The embeddings were shockingly good.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Word neighbours:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;king  → heir, regent, throne, prince, emperor, pope
queen → princess, duchess, sophia, isabella, catherine
music → indie, pop, hop, jazz, rap, songs, dance
river → lake, creek, valley, upstream, canyon
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Analogies (on real data, 80M tokens):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;king:man :: queen:? → woman ✓
man:woman :: boy:?  → girl ✓
france:paris :: japan:? → tokyo ✓  
king:queen :: prince:? → princess ✓
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;5 out of 8 exact matches at rank 1. 7 out of 8 in the top 5.&lt;/p&gt;

&lt;p&gt;This isn't a toy result. The SVD embeddings understand that king-queen has the same relationship as prince-princess. They understand that France-Paris maps to Japan-Tokyo. All from counting word co-occurrences in Wikipedia, factorising the matrix, and computing cosine similarities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This validates Levy &amp;amp; Goldberg (2014)&lt;/strong&gt; — Word2Vec is implicitly factorising a PMI co-occurrence matrix. We just did it explicitly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where It Breaks: Generation
&lt;/h2&gt;

&lt;p&gt;Meaning was solved. Now I tried to generate text. Seed the system with "the king and queen" and predict the next word, then the next, and so on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 1: Semantic only (cosine similarity to context)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the king and queen → isabella sophia catherine isabella sophia 
catherine isabella sophia catherine...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pure synonym loop. The most similar word to "queen" is "isabella". The most similar word to "isabella" is "sophia". Then back to "catherine". Forever.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 2: Bigram grammar only&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Grammar knew that "queen" is often followed by "anne", and "anne" is followed by "elizabeth". But it produced generic Wikipedia filler with no topic awareness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 3: Two-stage (the breakthrough)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This was the key idea:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Semantic filter:&lt;/strong&gt; Find the 20 words most similar to the current context (cosine similarity of SVD embeddings)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grammar rerank:&lt;/strong&gt; Among those 20, score each by bigram probability — how often does it actually follow the last word in real text?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Combine:&lt;/strong&gt; &lt;code&gt;final = 0.7 × grammar + 0.3 × semantic&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No repeat:&lt;/strong&gt; Block every word that's been used before&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Semantic proposes. Grammar disposes. No-repeat forces forward motion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The result:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the war → guerrilla forces fighting troops advancing germans 
italians retreated retreat battle captured ottoman army soldiers 
surrendered marched garrison surrender siege reinforcements 
assault force deployed units corps cavalry division th infantry 
regiment rd battalion nd brigade headquarters unit commanding 
anzac divisional artillery
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's a &lt;strong&gt;coherent military narrative&lt;/strong&gt; — from guerrilla warfare through retreat, surrender, siege, to specific military units and hierarchy. Every transition makes bigram sense. The semantic filter keeps it on-topic. No-repeat pushes it forward.&lt;/p&gt;

&lt;p&gt;More outputs from the same system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the school was built → constructed building construction block 
tower walls towers arches columns carved wooden stone wall arch 
roof tiles marble floors panels decorated brick exterior 
decoration decorative sculptures paintings depicting figures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Architecture → materials → decoration → art. A visual journey through a building.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;she won the award → winning medal awarded prize award recipient 
honorary academy graduate school student faculty students 
enrolled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Awards → academia → enrollment. A career trajectory.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 15 Versions That Followed
&lt;/h2&gt;

&lt;p&gt;The two-stage system generated impressive topic walks but not sentences. So I spent the next 15 versions trying to fix it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v3.2 — Union pools.&lt;/strong&gt; Instead of only semantic candidates, I combined semantic top-20 + grammar top-20 into a pool of ~40 candidates. Grammar words like "was", "of", "the" could now compete. &lt;strong&gt;Result:&lt;/strong&gt; grammar words dominated after a few steps. Every seed converged to &lt;code&gt;"...of his own right to be used as well known..."&lt;/code&gt; — the same generic Wikipedia filler.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v3.3 — Dual memory.&lt;/strong&gt; Semantic context tracked only content words (skipping grammar picks). Grammar context used the full sentence. &lt;strong&gt;Result:&lt;/strong&gt; semantic stayed on topic but grammar picks were random glue words. Content and structure weren't coordinated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v3.4 — Forced alternation (SEM/GRAM/SEM/GRAM).&lt;/strong&gt; Forced the system to alternate between semantic and grammar picks. &lt;strong&gt;Result:&lt;/strong&gt; the most readable output yet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the war → victorious IN surrender OF surrendered TO seized BY 
besieged AND captured ON
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Grammar words (of, in, by, to, and) appeared as glue between content words. Almost readable — but the grammar words weren't chosen &lt;em&gt;for&lt;/em&gt; the content words. "Of" appeared because it has a high bigram score after almost anything, not because the sentence needed it there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v3.5 — Trigram grammar.&lt;/strong&gt; Built a trigram dictionary from the corpus (5.9 million unique contexts). Trigrams captured real phrases that bigrams couldn't:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dining → hall
shopping → centre  
tourist → attraction
nobel → peace
honorary → degree
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are genuine multi-word expressions. The bigram only saw "dining → room" or "dining → area". The trigram saw "dining hall" as a unit. But trigram sparsity meant frequent fallback to bigram.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v3.7 — 4-gram.&lt;/strong&gt; Even sparser, rarely fired, fell back to trigram → bigram. Marginal improvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v3.8 — Fuzzy n-grams.&lt;/strong&gt; The most creative attempt. Instead of exact trigram lookup, find &lt;em&gt;similar&lt;/em&gt; contexts via embedding cosine similarity. "emperor empress" could borrow predictions from "king queen" because their embeddings are close. &lt;strong&gt;Result:&lt;/strong&gt; the fuzzy matching was too loose — it matched contexts that sounded similar but had completely different meanings. Pulled in noise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v3.9 — Union pools + fuzzy trigram.&lt;/strong&gt; Combined everything. Same gravity-well problem — converged to generic filler after ~8 steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v4.0 — Alpha sweep.&lt;/strong&gt; Tested grammar weights from 0.3 to 0.9 across 10 seeds. Different seeds needed different alpha values. No single alpha worked universally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v4.1 — MMR soft diversity.&lt;/strong&gt; Instead of hard-blocking used words, computed max cosine similarity to all previously used word embeddings as a penalty. &lt;code&gt;final = relevance - λ × redundancy&lt;/code&gt;. λ=0.4 forced exploration of adjacent semantic regions. "the war" at λ=0.4 traced history across civilizations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;guerrilla forces fighting retreat battle army troops captured 
turkish soldiers surrendered italians germans retreated 
outnumbered defenders withdrew exhausted armies marched siege 
ottoman turks byzantine empire conquered egypt syria lebanon 
palestine israel occupation vietnam cambodia independence
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From guerrilla warfare → Ottoman Empire → Byzantine Empire → Egypt/Syria → Israel/Palestine → Vietnam/Cambodia → independence. A walk through centuries of military history, forced by diversity to keep exploring.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Scorecard
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Toy (213 words)&lt;/th&gt;
&lt;th&gt;100k tokens&lt;/th&gt;
&lt;th&gt;80M tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Similarity separation&lt;/td&gt;
&lt;td&gt;0.93&lt;/td&gt;
&lt;td&gt;0.21&lt;/td&gt;
&lt;td&gt;0.87&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Analogies @5&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;33%&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NTP (token accuracy)&lt;/td&gt;
&lt;td&gt;41%&lt;/td&gt;
&lt;td&gt;2.7%&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generation&lt;/td&gt;
&lt;td&gt;Semantic chains&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Topic walks, no sentences&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;PPMI + SVD solves meaning. Bigrams solve local transitions. Together they generate coherent topic walks. But they cannot generate grammatical sentences.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why It Can't Generate Sentences
&lt;/h2&gt;

&lt;p&gt;After 15 versions, the diagnosis is clear. Every fix solved one problem and created another:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What we tried&lt;/th&gt;
&lt;th&gt;What it fixed&lt;/th&gt;
&lt;th&gt;What it broke&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SVD semantic only&lt;/td&gt;
&lt;td&gt;Meaning&lt;/td&gt;
&lt;td&gt;Loops, no grammar&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+ Bigram grammar&lt;/td&gt;
&lt;td&gt;Basic transitions&lt;/td&gt;
&lt;td&gt;Generic glue chains&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+ No repeat&lt;/td&gt;
&lt;td&gt;No loops&lt;/td&gt;
&lt;td&gt;Exhausts topic words&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+ Dual pool&lt;/td&gt;
&lt;td&gt;Grammar words appear&lt;/td&gt;
&lt;td&gt;Grammar dominates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+ Dual memory&lt;/td&gt;
&lt;td&gt;Topic stays alive&lt;/td&gt;
&lt;td&gt;Grammar picks random glue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+ Alternation&lt;/td&gt;
&lt;td&gt;Content+glue pattern&lt;/td&gt;
&lt;td&gt;No coordination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+ Trigram&lt;/td&gt;
&lt;td&gt;Real phrases&lt;/td&gt;
&lt;td&gt;Sparsity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+ Fuzzy n-gram&lt;/td&gt;
&lt;td&gt;Generalisation&lt;/td&gt;
&lt;td&gt;Too loose, noise&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+ MMR diversity&lt;/td&gt;
&lt;td&gt;Explores new regions&lt;/td&gt;
&lt;td&gt;Still no sentences&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The missing piece is always the same: position-dependent context tracking.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After "the king ruled the", our system needs to know "we need a noun here — specifically an object of 'ruled'." But:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Semantic scoring only knows "what word is RELATED to the recent context" — it doesn't know about syntactic roles.&lt;/li&gt;
&lt;li&gt;Grammar scoring only knows "what word commonly FOLLOWS the last word" — &lt;code&gt;P(next | kingdom)&lt;/code&gt; doesn't know we're in the object position of "ruled".&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A transformer solves this with attention over the full sequence. At position 5, it can look back at position 2 ("ruled") and learn that "ruled the ___" needs a noun object. Our system can only look at the last 1-4 words, and it can't learn positional patterns because there's no learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Static embeddings give every word one fixed vector regardless of context.&lt;/strong&gt; "King" after "the" (needs a verb next) has the same vector as "king" after "became" (needs a determiner). Dynamic, context-dependent representations require attention — and attention requires training.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Proves
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Levy &amp;amp; Goldberg (2014)&lt;/strong&gt; proved that Word2Vec implicitly factorises a PMI matrix. &lt;strong&gt;Zhao et al. (2025)&lt;/strong&gt; proved that next-token prediction training converges to SVD factors of co-occurrence structure.&lt;/p&gt;

&lt;p&gt;Our experiments confirm both from the other direction: we built the SVD factorisation explicitly and got embeddings that rival Word2Vec quality. But we also proved WHERE that equivalence breaks down — at generation.&lt;/p&gt;

&lt;p&gt;Transformers aren't doing something fundamentally different from SVD for &lt;em&gt;meaning&lt;/em&gt;. But they add the crucial missing piece: &lt;strong&gt;positional, context-dependent reweighting&lt;/strong&gt; of those factors at every step.&lt;/p&gt;

&lt;p&gt;The map of meaning can be built with pure math. The navigator through that map requires learning.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Language:&lt;/strong&gt; Python&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core:&lt;/strong&gt; NumPy, SciPy (sparse SVD)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU acceleration:&lt;/strong&gt; CuPy (&lt;code&gt;cupyx.scatter_add&lt;/code&gt; for co-occurrence matrix building on CUDA)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data:&lt;/strong&gt; WikiText-103 via HuggingFace datasets (80M tokens)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardware:&lt;/strong&gt; Kaggle T4 GPU&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training:&lt;/strong&gt; Zero. None. Not a single gradient step.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What I'd Build Next
&lt;/h2&gt;

&lt;p&gt;This isn't a dead end — it's a foundation. The experiments point to several directions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Retrieval instead of generation.&lt;/strong&gt; The embeddings are excellent for finding relevant content. Instead of generating word-by-word, use the SVD vectors to RETRIEVE real sentences from the corpus that match the semantic context. That's what vector databases are actually for.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hybrid systems.&lt;/strong&gt; Use the pure-math embeddings as a pre-computed semantic layer, then a small trained model (even a simple RNN) just for the sequential state tracking. The heavy lifting of meaning is already done.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Educational tool.&lt;/strong&gt; This entire pipeline is transparent — every number is interpretable. No black boxes. Perfect for teaching how language models work from first principles.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;All you need is NumPy, SciPy, and WikiText-103. Build a &lt;br&gt;
co-occurrence matrix, apply PPMI, run SVD, add a bigram &lt;br&gt;
grammar matrix. Two matrices. Two stages. No training. &lt;br&gt;
Just math.&lt;/p&gt;

&lt;p&gt;And now you know exactly where the math stops and the &lt;/p&gt;

&lt;h2&gt;
  
  
  learning begins.
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This research was conducted as an independent exploration. Thanks to &lt;a href="https://arxiv.org/abs/1405.4053" rel="noopener noreferrer"&gt;Levy &amp;amp; Goldberg, 2014&lt;/a&gt; for the theoretical foundation and to Zhao et al. (2025) for extending the connection to next-token prediction.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>openai</category>
      <category>gpt3</category>
    </item>
    <item>
      <title>Claude Code Just Got a Desktop Redesign — Here's What Changed!</title>
      <dc:creator>Shivnath Tathe</dc:creator>
      <pubDate>Wed, 15 Apr 2026 07:43:34 +0000</pubDate>
      <link>https://dev.to/shivnathtathe/claude-code-just-got-a-desktop-redesign-heres-what-changed-3jg3</link>
      <guid>https://dev.to/shivnathtathe/claude-code-just-got-a-desktop-redesign-heres-what-changed-3jg3</guid>
      <description>&lt;p&gt;Anthropic dropped a major redesign of the Claude Code desktop app today, and it's not just a visual refresh. The whole thing has been rethought around one idea: &lt;strong&gt;you're not waiting on one task anymore — you're orchestrating many.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you've been using Claude Code seriously, you know the friction. Multiple repos, multiple tasks, alt-tabbing between your terminal and the app, losing context mid-session. This update addresses most of that directly.&lt;/p&gt;

&lt;p&gt;Here's what actually changed.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Sidebar With Multi-Project Session Management
&lt;/h2&gt;

&lt;p&gt;The new sidebar groups all your sessions &lt;strong&gt;by project&lt;/strong&gt;. Every repo you've worked on shows up with its sessions listed underneath — you can jump between them instantly without losing context.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4jynaypfsp0y06qg5e7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4jynaypfsp0y06qg5e7.png" alt="Sidebar showing multiple sessions grouped by project — MCP Server, zero-noise-bench" width="800" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the biggest UX shift. Before, managing multiple parallel workstreams felt clunky. Now it's front and center. Kick off a refactor in one repo, a bug fix in another, and switch between them as results come in.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Integrated Terminal — No More Alt-Tabbing
&lt;/h2&gt;

&lt;p&gt;The terminal is now a &lt;strong&gt;built-in pane&lt;/strong&gt; inside the app itself. You can have your chat session on the left and a live terminal on the right, docked together in one window.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frm5cx38q8vhkqjx8jrzq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frm5cx38q8vhkqjx8jrzq.png" alt="Claude Code session on left with integrated terminal pane on right showing PS C:\Users&amp;gt;" width="800" height="486"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This sounds small, but it's significant in practice. Before, you kept a separate terminal open on the side. Now everything lives in one place — run commands, check output, continue the conversation, all without switching windows.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Side Chat — Ask Without Interrupting the Agent
&lt;/h2&gt;

&lt;p&gt;This is my personal favorite addition. Press &lt;strong&gt;&lt;code&gt;Ctrl + ;&lt;/code&gt;&lt;/strong&gt; (or &lt;code&gt;⌘ + ;&lt;/code&gt; on Mac) during any active session to open a &lt;strong&gt;side chat&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqu1ry2h89adej0ivh67u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqu1ry2h89adej0ivh67u.png" alt="Side chat panel open showing " width="800" height="207"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The key detail — straight from the UI itself:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Claude sees the full session context, and nothing here is added to the main conversation.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So you can ask "wait, what does this function actually do?" or "is this approach correct?" without derailing the agent mid-task. It's context-aware but isolated. Genuinely useful.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Model + Effort Control in One Place
&lt;/h2&gt;

&lt;p&gt;Click the model selector at the bottom right, and you get a clean dropdown with two controls together: &lt;strong&gt;model&lt;/strong&gt; and &lt;strong&gt;effort level&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fniwg0hduquha73o62zp5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fniwg0hduquha73o62zp5.png" alt="Model selector dropdown showing Opus 4.6, Sonnet 4.6 (selected), Haiku 4.5 with Effort: Low / Medium (selected) / High" width="800" height="263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Effort&lt;/strong&gt; setting (Low / Medium / High) is underrated. Medium is the default sweet spot — fast enough, thorough enough. Switch to High when you need Claude to really think through a complex architectural decision. Switch to Low for quick edits and boilerplate.&lt;/p&gt;

&lt;p&gt;Having model + effort together in one click is clean. No digging through settings.&lt;/p&gt;




&lt;h2&gt;
  
  
  Other Notable Changes
&lt;/h2&gt;

&lt;p&gt;Beyond the four big ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Drag-and-drop pane layout&lt;/strong&gt; — arrange terminal, preview, diff viewer, and chat in any grid you want&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster diff viewer&lt;/strong&gt; — rebuilt for large changesets, noticeably snappier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three view modes&lt;/strong&gt; — Verbose, Normal, Summary — dial from full transparency into Claude's tool calls down to just the results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugin parity with CLI&lt;/strong&gt; — desktop app now fully supports Claude Code plugins&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSH support on Mac&lt;/strong&gt; — previously Linux only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expanded preview pane&lt;/strong&gt; — open HTML files and PDFs directly in-app&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;The old Claude Code desktop felt like a wrapper around the CLI. This redesign feels like a proper IDE for agentic work — one where you're managing agents, not just chatting with one.&lt;/p&gt;

&lt;p&gt;The parallel session management, integrated terminal, and side chat together tell a coherent story: &lt;strong&gt;you're the orchestrator now, and the UI should reflect that.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're on Pro, Max, Team, or Enterprise — update and restart. It's worth it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Using Claude Code heavily? Drop your workflow in the comments — curious how others are structuring multi-session work.&lt;/em&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>vibecoding</category>
      <category>claude</category>
      <category>coding</category>
    </item>
    <item>
      <title>I Built an AI Agent That Thinks Out Loud While Using Your APIs—Here's the Non-Obvious Part</title>
      <dc:creator>Shivnath Tathe</dc:creator>
      <pubDate>Tue, 14 Apr 2026 13:24:34 +0000</pubDate>
      <link>https://dev.to/shivnathtathe/i-built-an-ai-agent-that-thinks-out-loud-while-using-your-apis-heres-the-non-obvious-part-2bef</link>
      <guid>https://dev.to/shivnathtathe/i-built-an-ai-agent-that-thinks-out-loud-while-using-your-apis-heres-the-non-obvious-part-2bef</guid>
      <description>&lt;p&gt;&lt;em&gt;How MCP + Claude turned a boring hotel search into a reasoning machine&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;You've probably seen the demos. An AI assistant that calls tools, fetches data, and returns an answer. Clean. Impressive. But here's what those demos don't show you — the part where the model &lt;strong&gt;thinks between tool calls&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That's what I actually built. And the bugs I hit along the way taught me more about agentic AI than any tutorial ever could.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is MCP and Why Should You Care
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol (MCP) is Anthropic's open standard for connecting AI models to external tools and data sources. Think of it as a USB-C port for AI—one protocol, any tool.&lt;/p&gt;

&lt;p&gt;Instead of hardcoding tool logic into your LLM app, you expose tools via an MCP server. The model discovers them, decides when to use them, calls them, and reasons about the results. Your backend stays clean. Your AI gets smarter.&lt;/p&gt;

&lt;p&gt;I built an MCP server on top of a corporate travel API—hotels, weather, city lookups—and connected it to a Claude-powered chat interface. What came out the other side surprised me.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Part Nobody Talks About—Agentic Loops
&lt;/h2&gt;

&lt;p&gt;Most tutorials show you a single tool call. User asks, model calls tool, model answers. Done.&lt;/p&gt;

&lt;p&gt;Real agentic behavior is messier and far more interesting.&lt;/p&gt;

&lt;p&gt;Here is what actually happens when a user asks, &lt;em&gt;"Is Mumbai good to visit this month with the best hotels?"&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User:    Is Mumbai good to visit this month? Find best hotels too.

Claude:  I'll help plan your trip. Let me first look up Mumbai's city code,
         then fetch the weather and hotel listings.

[calls city_lookup -&amp;gt; gets city code MUI]

Claude:  Found Mumbai with code MUI. Weather looks great — let me pull hotels now.

[calls get_weather_forecast -&amp;gt; 5 day forecast]
[calls get_hotels -&amp;gt; 20 luxury properties]

Claude:  Perfect. Here's your complete trip summary...
         [full structured answer with weather table + hotel list]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model &lt;strong&gt;narrates what it is about to do&lt;/strong&gt;, executes the tool, &lt;strong&gt;interprets the result&lt;/strong&gt;, decides if it needs more data, and continues. That is not a chatbot. That is a reasoning agent.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bug That Took Half a Day to Find
&lt;/h2&gt;

&lt;p&gt;When I first implemented streaming, tool calls were silently disappearing. The model would narrate perfectly—"Let me search for hotels..."*—then stop. No tool call. No error. Just silence.&lt;/p&gt;

&lt;p&gt;The culprit was buried in the Anthropic Python SDK.&lt;/p&gt;

&lt;p&gt;When a tool-use block completes during streaming, the SDK emits a &lt;code&gt;ParsedContentBlockStopEvent&lt;/code&gt;. My code was only handling &lt;code&gt;RawContentBlockStopEvent&lt;/code&gt;. Two different event types. One character difference in a string comparison. Half a day gone.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# What I expected
&lt;/span&gt;&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;etype&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RawContentBlockStopEvent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# handle tool completion
&lt;/span&gt;
&lt;span class="c1"&gt;# What the SDK actually sends for tool_use blocks
&lt;/span&gt;&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;etype&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ParsedContentBlockStopEvent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content_block&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# THIS is where the tool data lives
&lt;/span&gt;        &lt;span class="n"&gt;tool_block&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix was five lines. The discovery was everything.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture That Actually Works
&lt;/h2&gt;

&lt;p&gt;Here is the full stack I landed on after a lot of iteration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Frontend (React + Vite)
    │  SSE stream
Backend (FastAPI)
    │  Anthropic API (claude-sonnet-4-5)
    │  MCP client
MCP Server (FastAPI + MCP SDK)
    │  REST APIs (Hotels, Weather, City Lookup)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key design decision—the backend runs a &lt;strong&gt;multi-turn agentic loop&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX_ITERATIONS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream_agentic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text_delta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nf"&gt;to_frontend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;   &lt;span class="c1"&gt;# stream narration live
&lt;/span&gt;        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tool_use_events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stop_reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;                                  &lt;span class="c1"&gt;# model is done
&lt;/span&gt;
    &lt;span class="c1"&gt;# execute tools, feed results back, loop again
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content_blocks&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every iteration, narration streams to the frontend in real time. Tools execute server-side. Results feed back into the next iteration. The loop continues until the model says it is done.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Frontend Problem Nobody Mentions
&lt;/h2&gt;

&lt;p&gt;Streaming an agentic response to a UI is harder than it looks.&lt;/p&gt;

&lt;p&gt;The naive approach — separate arrays for tool calls and text — breaks immediately. When the model narrates, calls a tool, narrates again, calls another tool, then gives a final answer, you need the UI to reflect that &lt;strong&gt;exact sequence&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I refactored the Turn data model from this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Bad -- loses ordering information&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;Turn&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;toolCalls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ToolCall&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;       &lt;span class="c1"&gt;// all tools, no position&lt;/span&gt;
  &lt;span class="nx"&gt;assistantMessage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;    &lt;span class="c1"&gt;// all text, no position&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Good -- preserves arrival order&lt;/span&gt;
&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;ContentBlock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tool&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ToolCall&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;Turn&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ContentBlock&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;      &lt;span class="c1"&gt;// interleaved, in order&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the UI renders exactly what happened—narration, tool call, more narration, another tool call, and final answer—in the sequence the model produced.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conversation Memory Across Tool Calls
&lt;/h2&gt;

&lt;p&gt;Here is the subtle bug that broke follow-up questions.&lt;/p&gt;

&lt;p&gt;After a full agentic turn I was saving history like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Wrong -- loses all tool data
&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;append_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;append_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So when a user asked *"Give me those hotels in a table," Claude had no idea what hotels they meant. The formatted summary text was there but the actual JSON tool results were gone.&lt;/p&gt;

&lt;p&gt;The fix—save the &lt;strong&gt;full message array,&lt;/strong&gt; including every tool use block and tool result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Right -- full context preserved
&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;set_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# messages contains: user -&amp;gt; assistant+tool_use -&amp;gt; tool_result -&amp;gt; assistant -&amp;gt; ...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now Claude can answer follow-up questions using the actual data from previous tool calls. No re-fetching, no guessing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Surprised Me Most
&lt;/h2&gt;

&lt;p&gt;The model is a better orchestrator than I expected. Given a vague question like &lt;em&gt;"Is Mumbai good to visit?"&lt;/em&gt;, it independently decides to look up the city code first, then the weather, then hotels—in the right order, with the right parameters—without being told the sequence.&lt;/p&gt;

&lt;p&gt;That is not prompt engineering. That is the model reasoning about what it needs and going to get it.&lt;/p&gt;

&lt;p&gt;MCP makes this possible because the tools are described, not hardcoded. The model reads the tool definitions and figures out the rest.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The MCP server exposes tools over HTTP using the streamable-HTTP transport—no websockets, no special infrastructure. Any HTTP client can call it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Your MCP server is just a FastAPI app&lt;/span&gt;
uvicorn main:app &lt;span class="nt"&gt;--host&lt;/span&gt; 0.0.0.0 &lt;span class="nt"&gt;--port&lt;/span&gt; 8000

&lt;span class="c"&gt;# Claude Desktop connects via config&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"mcpServers"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"my-server"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="s2"&gt;"url"&lt;/span&gt;: &lt;span class="s2"&gt;"http://localhost:8000/mcp"&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Where This Goes Next
&lt;/h2&gt;

&lt;p&gt;A few things I am actively working on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Precise vs Expressive mode&lt;/strong&gt;—user-configurable system prompts so the model gives short direct answers or detailed, friendly responses depending on context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;City code disambiguation&lt;/strong&gt;—when a user says "Dubai," the model should confirm which Dubai before searching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smooth streaming&lt;/strong&gt;—React 18 automatic batching creates interesting problems when you want character-level typewriter effects on SSE streams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MCP is still early. The tooling is moving fast. But the core idea—give the model well-described tools and let it reason about when and how to use them—is already working in production.&lt;/p&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Shivnath Tathe&lt;/strong&gt;—Software Engineer and Independent Researcher working at the intersection of LLMs and production systems. Published work on 4-bit quantized neural network training and continual learning on arXiv.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LinkedIn&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.linkedin.com/in/shivnathtathe/" rel="noopener noreferrer"&gt;shivnathtathe&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;X&lt;/td&gt;
&lt;td&gt;&lt;a href="https://x.com/shivtathe" rel="noopener noreferrer"&gt;@shivtathe&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;If you are building something with MCP or agentic AI, I would love to connect.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>agents</category>
      <category>programming</category>
    </item>
    <item>
      <title>Run a 397B AI Model for Free Using Claude Code (3 Commands)</title>
      <dc:creator>Shivnath Tathe</dc:creator>
      <pubDate>Mon, 06 Apr 2026 12:18:10 +0000</pubDate>
      <link>https://dev.to/shivnathtathe/run-a-397b-ai-model-for-free-using-claude-code-3-commands-1lpj</link>
      <guid>https://dev.to/shivnathtathe/run-a-397b-ai-model-for-free-using-claude-code-3-commands-1lpj</guid>
      <description>&lt;p&gt;Most people think you need expensive APIs or a powerful GPU to run large AI models.&lt;/p&gt;

&lt;p&gt;You don't. Here's how I ran a 397B parameter model for free in under 5 minutes on Windows.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You Actually Need
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A Windows machine&lt;/li&gt;
&lt;li&gt;An Ollama account (free)&lt;/li&gt;
&lt;li&gt;Internet connection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. No GPU. No API key. No Anthropic billing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's happening under the hood
&lt;/h2&gt;

&lt;p&gt;Two things working together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code CLI&lt;/strong&gt; is just the terminal interface and agent shell. Nothing goes through Anthropic's servers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; hosts and runs Qwen3.5 397B on their own cloud infrastructure, completely free&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude Code supports Ollama as a backend. So you get a familiar, powerful agent interface while Ollama handles 100% of the actual inference. Anthropic is not involved beyond providing the CLI shell.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Install Claude Code
&lt;/h2&gt;

&lt;p&gt;Open PowerShell and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;irm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://claude.ai/install.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iex&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 2: Install Ollama
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;irm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://ollama.com/install.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iex&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 3: Launch Claude Code with the 397B model
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;launch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;claude&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;qwen3.5:397b-cloud&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll be asked to verify with your Ollama account once. After that it just works.&lt;/p&gt;




&lt;h2&gt;
  
  
  What you get
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Full Claude Code agent interface&lt;/li&gt;
&lt;li&gt;397B parameter model (Qwen3.5)&lt;/li&gt;
&lt;li&gt;~256K token context window&lt;/li&gt;
&lt;li&gt;Cloud hosted, so no local GPU needed&lt;/li&gt;
&lt;li&gt;Free as long as you have an Ollama account&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Important notes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;This routes through Ollama's cloud, so you need internet&lt;/li&gt;
&lt;li&gt;Don't use it for sensitive or private data&lt;/li&gt;
&lt;li&gt;Free access may change in the future, use it while it lasts&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;We're entering a phase where the agent layer and the model layer are completely separate.&lt;/p&gt;

&lt;p&gt;Claude Code is just the interface. The model underneath can be swapped. Local or cloud, open or closed, free or paid.&lt;/p&gt;

&lt;p&gt;This setup is a simple example of that shift. The barrier to running frontier-scale models is no longer hardware or money. It's just knowing how to connect the right tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Three commands. Less than 5 minutes. No credit card.&lt;/p&gt;

&lt;p&gt;If you run into issues drop them in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I research LLM training and continual learning. Follow for more no-fluff AI content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>mcp</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Your AI Is Not Thinking. It's Multiplying Numbers. Let Me Show You Exactly How.</title>
      <dc:creator>Shivnath Tathe</dc:creator>
      <pubDate>Mon, 06 Apr 2026 06:25:24 +0000</pubDate>
      <link>https://dev.to/shivnathtathe/your-ai-is-not-thinking-its-multiplying-numbers-let-me-show-you-exactly-how-d1g</link>
      <guid>https://dev.to/shivnathtathe/your-ai-is-not-thinking-its-multiplying-numbers-let-me-show-you-exactly-how-d1g</guid>
      <description>&lt;p&gt;&lt;em&gt;Everyone's talking about AI like it's magic. I work with it daily. It's not. Here's what's actually happening inside.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I've fine-tuned LLMs. I've published research on them. I've built systems around them.&lt;/p&gt;

&lt;p&gt;And the single most honest thing I can tell you about large language models is this:&lt;/p&gt;

&lt;p&gt;At the bottom, it's matrix multiplication. That's it.&lt;/p&gt;

&lt;p&gt;Not intelligence. Not reasoning. Not understanding. Matrices of floating point numbers being multiplied together, billions of times per second.&lt;/p&gt;

&lt;p&gt;But here's the uncomfortable part. That doesn't mean nothing interesting is happening.&lt;/p&gt;

&lt;p&gt;Let me break this down without the hype, without the doomsaying, and without the marketing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a "Model" Actually Is
&lt;/h2&gt;

&lt;p&gt;Forget the word "model." It carries too much baggage.&lt;/p&gt;

&lt;p&gt;What you're actually dealing with is a file. A very large file full of numbers, floats arranged in matrices. GPT-2 has 117 million of them. GPT-3 has 175 billion. These numbers are called weights.&lt;/p&gt;

&lt;p&gt;That's the model. Numbers in a file.&lt;/p&gt;

&lt;p&gt;When you send a message to an LLM, here's what happens mechanically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your text gets converted to tokens (integers from a vocabulary)&lt;/li&gt;
&lt;li&gt;Each token gets looked up in an embedding table (a matrix) and becomes a vector&lt;/li&gt;
&lt;li&gt;That vector passes through N identical blocks, each doing attention (matmul) and feedforward (matmul + nonlinearity)&lt;/li&gt;
&lt;li&gt;Final layer produces logits over the vocabulary&lt;/li&gt;
&lt;li&gt;Sample from that distribution and get the next token&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Repeat until done.&lt;/p&gt;

&lt;p&gt;No memory. No state. No "thinking." Pure function application.&lt;/p&gt;




&lt;h2&gt;
  
  
  What "Training" Actually Is
&lt;/h2&gt;

&lt;p&gt;This is where people get philosophical, so let me be precise.&lt;/p&gt;

&lt;p&gt;Training is not teaching. There's no curriculum, no explanation, no understanding being transferred.&lt;/p&gt;

&lt;p&gt;Here's the actual process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Show the model some text&lt;/li&gt;
&lt;li&gt;It predicts the next token&lt;/li&gt;
&lt;li&gt;Compare prediction to actual next token, compute loss (a single number)&lt;/li&gt;
&lt;li&gt;Backpropagate gradients through every matrix&lt;/li&gt;
&lt;li&gt;Nudge every weight by a tiny amount in the direction that reduces loss&lt;/li&gt;
&lt;li&gt;Repeat roughly a trillion times&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The only signal the model ever receives is: your probability distribution over the next token was wrong, adjust.&lt;/p&gt;

&lt;p&gt;No grammar lessons. No semantic explanations. No world knowledge explicitly provided. Just: you were wrong, here's by how much, here's which direction to shift.&lt;/p&gt;

&lt;p&gt;And yet grammar emerges. Semantics emerges. World knowledge emerges.&lt;/p&gt;

&lt;p&gt;That's not magic. That's what happens when you apply a single optimization pressure billions of times across the entire written record of human thought.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Next-Token Prediction Even Works
&lt;/h2&gt;

&lt;p&gt;This is the question nobody asks clearly enough.&lt;/p&gt;

&lt;p&gt;The implicit assumption is that predicting the next word is a shallow task. A parlor trick.&lt;/p&gt;

&lt;p&gt;It's not. Here's why.&lt;/p&gt;

&lt;p&gt;Language is not random. Language is massively structured at every level simultaneously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Surface statistics&lt;/strong&gt;: "New York" appears together with near-deterministic frequency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Syntax&lt;/strong&gt;: "The ___" expects a noun or adjective, not a verb. Every time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantics&lt;/strong&gt;: "eat" expects a food object. "drink" expects liquid. Violations sound wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;World knowledge&lt;/strong&gt;: "Paris is the capital of ___" has essentially one answer across millions of documents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discourse&lt;/strong&gt;: A medical article doesn't randomly switch to cooking. Topics are coherent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Causal structure&lt;/strong&gt;: "He dropped the glass. It ___" and physics is implicitly encoded because text describing physics is consistent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To predict the next token accurately across all of these simultaneously, the model is forced to learn all of these regularities. Not because anyone labeled them. Because they're all load-bearing for loss reduction.&lt;/p&gt;

&lt;p&gt;Shannon estimated English has roughly 1 to 1.5 bits of true entropy per character. Language is not a high-entropy signal. It's a highly compressible, deeply structured one.&lt;/p&gt;

&lt;p&gt;Next-token prediction works because language itself is learnable. The model just exploits that ruthlessly.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Weights Actually Encode
&lt;/h2&gt;

&lt;p&gt;Here's the honest answer: nobody fully knows.&lt;/p&gt;

&lt;p&gt;What we do know from interpretability research is directional. Early layers tend to capture surface patterns, later layers tend to capture more abstract task-relevant signals. But it's distributed, messy, and not cleanly decomposable.&lt;/p&gt;

&lt;p&gt;You cannot point to a weight and say "this one handles subject-verb agreement."&lt;/p&gt;

&lt;p&gt;What the weights collectively store is a giant tangled function that behaves as if it knows grammar, semantics, world facts, and reasoning patterns. Because that behavior is what minimizes loss on human-generated text.&lt;/p&gt;

&lt;p&gt;It's not a list of rules. It's a compressed statistical model of human language and thought, discovered purely by gradient pressure.&lt;/p&gt;




&lt;h2&gt;
  
  
  "Just Matrix Multiplication" Is Not the Same as "Trivial"
&lt;/h2&gt;

&lt;p&gt;This is where I'll push back on the reductive take, including my own initial instinct.&lt;/p&gt;

&lt;p&gt;Yes, it's matmul. But DNA is just chemical interactions. The brain is just electrical signals. Both produce systems of staggering complexity.&lt;/p&gt;

&lt;p&gt;The Universal Approximation Theorem tells us that stacked layers with nonlinearities can approximate any function given enough capacity. The architecture isn't doing something magical. But the composition of many simple operations produces a function of extraordinary complexity.&lt;/p&gt;

&lt;p&gt;"Just matmul" at the mechanistic level does not imply "nothing meaningful" at the behavioral level.&lt;/p&gt;

&lt;p&gt;What it does imply is that the meaningful behavior is not designed, it's emergent. And that's a genuinely different and more honest framing than either "it's just statistics" or "it's basically thinking."&lt;/p&gt;




&lt;h2&gt;
  
  
  The Transformer Is Not The Only Answer
&lt;/h2&gt;

&lt;p&gt;Here's something the hype cycle obscures: the transformer architecture is not special in principle. It's dominant in practice.&lt;/p&gt;

&lt;p&gt;What you actually need to exploit language regularities:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A way to consume sequential input&lt;/li&gt;
&lt;li&gt;A way to condition predictions on context&lt;/li&gt;
&lt;li&gt;Enough capacity to store learned regularities&lt;/li&gt;
&lt;li&gt;A next-token prediction objective on enough data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;RNNs satisfied all four. So did CNNs on text. So do State Space Models like Mamba, which use no attention at all, run in O(n) instead of O(n squared), and are competitive with transformers on many benchmarks today.&lt;/p&gt;

&lt;p&gt;The transformer won because attention handles long-range context without compression, and because it parallelizes perfectly on GPUs. The hardware fit mattered as much as the architecture itself.&lt;/p&gt;

&lt;p&gt;Mamba is a serious contender. Hybrid architectures mixing SSM and attention layers are emerging. Five years from now, the dominant architecture is probably neither pure transformer nor pure SSM.&lt;/p&gt;

&lt;p&gt;And honestly? It's probably something nobody is currently working on.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We Still Don't Know
&lt;/h2&gt;

&lt;p&gt;I want to be precise here because this matters.&lt;/p&gt;

&lt;p&gt;We don't know why scaling works.&lt;/p&gt;

&lt;p&gt;We know that scaling works. More parameters plus more data plus more compute equals better performance, with surprising consistency. But the mechanism by which scaling produces emergent capabilities, tasks the model couldn't do at smaller scale suddenly working at larger scale, is genuinely unresolved.&lt;/p&gt;

&lt;p&gt;"Combinatorial pattern composition" is a label for the mystery, not an explanation of it.&lt;/p&gt;

&lt;p&gt;We built something that behaves intelligently. We know the training procedure in full detail. We do not know why that procedure, at scale, produces the behavior it does.&lt;/p&gt;

&lt;p&gt;That's not a gap that marketing will fill. That's an open research problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Claim&lt;/th&gt;
&lt;th&gt;Truth&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;It's just matrix multiplication&lt;/td&gt;
&lt;td&gt;True, mechanistically&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nothing meaningful is happening&lt;/td&gt;
&lt;td&gt;False&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;It understands language&lt;/td&gt;
&lt;td&gt;False&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;It learns statistical structure&lt;/td&gt;
&lt;td&gt;True&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;We fully understand why it works&lt;/td&gt;
&lt;td&gt;False&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;The transformer is the final architecture&lt;/td&gt;
&lt;td&gt;Almost certainly false&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Why This Framing Matters
&lt;/h2&gt;

&lt;p&gt;If you think LLMs are magic, you'll use them wrong. You'll trust outputs that are confident but wrong, anthropomorphize failure modes, expect capabilities that aren't there.&lt;/p&gt;

&lt;p&gt;If you think LLMs are "just statistics" and therefore uninteresting, you'll also use them wrong. You'll dismiss genuine capabilities, fail to understand where they're reliable vs brittle.&lt;/p&gt;

&lt;p&gt;The accurate framing is: a large function approximator trained via optimization that exhibits emergent structured behavior, whose full mechanism we don't yet understand.&lt;/p&gt;

&lt;p&gt;Not magic. Not trivial. Something genuinely new that deserves precise thinking.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I research LLM training, continual learning, and quantization. If this sparked something, let's discuss in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
    </item>
    <item>
      <title>Attention Is All You Need — Explained Like You’re Building It From Scratch</title>
      <dc:creator>Shivnath Tathe</dc:creator>
      <pubDate>Thu, 02 Apr 2026 05:32:13 +0000</pubDate>
      <link>https://dev.to/shivnathtathe/attention-is-all-you-need-explained-like-youre-building-it-from-scratch-1o00</link>
      <guid>https://dev.to/shivnathtathe/attention-is-all-you-need-explained-like-youre-building-it-from-scratch-1o00</guid>
      <description>&lt;p&gt;Everyone has seen this diagram.&lt;/p&gt;

&lt;p&gt;And almost everyone says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Transformers use attention instead of recurrence.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But that doesn’t actually explain anything.&lt;/p&gt;

&lt;p&gt;So let’s rebuild this from scratch — the way you would understand it if you were designing it yourself.&lt;/p&gt;

&lt;h1&gt;
  
  
  🧠 The Real Problem Transformers Solve
&lt;/h1&gt;

&lt;p&gt;Before transformers, models like RNNs and LSTMs processed text like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;One word at a time → sequentially&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This caused two big problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slow training (no parallelism)&lt;/li&gt;
&lt;li&gt;Long-range dependencies break&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The cat, which was sitting near the window for hours, suddenly jumped.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To understand &lt;em&gt;“jumped”&lt;/em&gt;, the model needs context from far back.&lt;/p&gt;

&lt;p&gt;RNNs struggle here.&lt;/p&gt;

&lt;h1&gt;
  
  
  ⚡ The Core Idea
&lt;/h1&gt;

&lt;p&gt;Instead of processing words one by one:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What if every word could look at every other word &lt;em&gt;at the same time&lt;/em&gt;?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s attention.&lt;/p&gt;

&lt;h1&gt;
  
  
  🔍 What “Attention” Actually Means
&lt;/h1&gt;

&lt;p&gt;Forget formulas for a second.&lt;/p&gt;

&lt;p&gt;Think like this:&lt;/p&gt;

&lt;p&gt;Each word asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Which other words are important for me?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Example:
&lt;/h2&gt;

&lt;p&gt;Sentence:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The animal didn’t cross the street because &lt;strong&gt;it&lt;/strong&gt; was tired.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What does “it” refer to?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;animal?&lt;/li&gt;
&lt;li&gt;street?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Attention helps the model decide.&lt;/p&gt;

&lt;h1&gt;
  
  
  ⚙️ How It Works (Intuition First)
&lt;/h1&gt;

&lt;p&gt;Each word is converted into a vector.&lt;/p&gt;

&lt;p&gt;From that vector, we create:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query (Q) → what I’m looking for&lt;/li&gt;
&lt;li&gt;Key (K) → what I offer&lt;/li&gt;
&lt;li&gt;Value (V) → actual information&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🧠 Matching Process
&lt;/h2&gt;

&lt;p&gt;Every word does:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Compare its Query with all Keys&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This gives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;similarity scores&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;normalize (softmax)&lt;/li&gt;
&lt;li&gt;use scores to combine Values&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  💡 Translation:
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;“Take information from important words, ignore the rest”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  🔥 Multi-Head Attention (Why multiple?)
&lt;/h1&gt;

&lt;p&gt;One attention head might learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;grammar&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;relationships&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;positional meaning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So instead of one view:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We use multiple perspectives in parallel&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  🧱 Transformer Block (Now the Diagram Makes Sense)
&lt;/h1&gt;

&lt;p&gt;Each block has:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Attention
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Words interact with each other&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Add &amp;amp; Norm
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Stabilizes training&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Feed Forward
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Processes each token independently&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  🔁 Why “Add &amp;amp; Norm”?
&lt;/h1&gt;

&lt;p&gt;This is often ignored but critical.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It keeps gradients stable and prevents information loss&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Without it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deep transformers won’t train well&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  ⚡ Encoder vs Decoder
&lt;/h1&gt;

&lt;p&gt;From your diagram:&lt;/p&gt;

&lt;h2&gt;
  
  
  Left side → Encoder
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Reads input&lt;/li&gt;
&lt;li&gt;Builds representation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Right side → Decoder
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Generates output&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;masked attention (can’t see future)&lt;/li&gt;
&lt;li&gt;encoder output&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h1&gt;
  
  
  🔒 Masked Attention (Important for LLMs)
&lt;/h1&gt;

&lt;p&gt;When generating:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Model should not see future words&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So we mask them.&lt;/p&gt;

&lt;h1&gt;
  
  
  🚀 Why Transformers Changed Everything
&lt;/h1&gt;

&lt;p&gt;Because they:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allow &lt;strong&gt;parallel computation&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Handle &lt;strong&gt;long-range dependencies&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Scale extremely well&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  💥 The Real Insight
&lt;/h1&gt;

&lt;p&gt;Transformers don’t “understand language”.&lt;/p&gt;

&lt;p&gt;They do something simpler but powerful:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;They learn &lt;strong&gt;relationships between tokens&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  🧠 Connecting to LLMs
&lt;/h1&gt;

&lt;p&gt;When you do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What happens?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each token attends to previous tokens&lt;/li&gt;
&lt;li&gt;Builds context dynamically&lt;/li&gt;
&lt;li&gt;Predicts next token&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  🔥 Final Thought
&lt;/h1&gt;

&lt;p&gt;The paper says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Attention Is All You Need”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But the deeper idea is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You don’t need sequence —&lt;br&gt;
You need relationships&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once you understand that…&lt;/p&gt;

&lt;p&gt;Transformers stop looking complex&lt;br&gt;
and start looking inevitable.&lt;/p&gt;

&lt;p&gt;If you're building models or exploring low-bit training like I am, this perspective changes everything.&lt;/p&gt;

&lt;p&gt;Because now the question becomes&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How efficiently can we compute these relationships?&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>deeplearning</category>
    </item>
  </channel>
</rss>
