<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mathias Leonhardt</title>
    <description>The latest articles on DEV Community by Mathias Leonhardt (@ki-mathias).</description>
    <link>https://dev.to/ki-mathias</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3886949%2F54cf66ba-2c38-4946-9b57-4a57f72641d4.jpg</url>
      <title>DEV Community: Mathias Leonhardt</title>
      <link>https://dev.to/ki-mathias</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ki-mathias"/>
    <language>en</language>
    <item>
      <title>Transformer Attention Is Hopfield's 1982 Update Rule (And What That Tells Us About LLM Memory)</title>
      <dc:creator>Mathias Leonhardt</dc:creator>
      <pubDate>Thu, 04 Jun 2026 15:40:53 +0000</pubDate>
      <link>https://dev.to/ki-mathias/transformer-attention-is-hopfields-1982-update-rule-and-what-that-tells-us-about-llm-memory-4i7f</link>
      <guid>https://dev.to/ki-mathias/transformer-attention-is-hopfields-1982-update-rule-and-what-that-tells-us-about-llm-memory-4i7f</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Hopfield's associative-memory equation from 1982 and the scaled dot-product attention from Vaswani 2017 are the same operation. One substitution turns one into the other. The 2024 Nobel Prize in Physics — to Hopfield and Hinton — is the academic acknowledgement that the mathematics behind today's LLMs was already written four decades ago, in a different vocabulary.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is a condensed write-up of the longer, interactive piece at &lt;a href="https://ki-mathias.de/en/hopfield.html" rel="noopener noreferrer"&gt;ki-mathias.de/en/hopfield.html&lt;/a&gt;. Seven chapters there, five live MNIST demos. Here I focus on the four steps where the story has interesting empirical edges.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. The identity
&lt;/h2&gt;

&lt;p&gt;Modern Hopfield (Ramsauer et al., 2020) writes the update rule as&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;v ← X · softmax(β · Xᵀv)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where &lt;code&gt;X ∈ ℝ^(N×p)&lt;/code&gt; is the matrix of stored patterns and &lt;code&gt;β &amp;gt; 0&lt;/code&gt; is an inverse-temperature parameter.&lt;/p&gt;

&lt;p&gt;Scaled dot-product attention (Vaswani et al., 2017) writes&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Attention(Q, K, V) = V · softmax(Kᵀ Q / √dₖ)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set &lt;code&gt;Q = v&lt;/code&gt;, &lt;code&gt;K = X&lt;/code&gt;, &lt;code&gt;V = X&lt;/code&gt;, and &lt;code&gt;β = 1/√d_k&lt;/code&gt;. The two equations become identical. Not analogous. &lt;strong&gt;Identical.&lt;/strong&gt; Same operation, written in two different notations.&lt;/p&gt;

&lt;p&gt;In a Transformer, K and V are independent learned projections of the same input rather than the same matrix, and Q is yet another projection. Those are extra learnable transformations &lt;em&gt;around&lt;/em&gt; the Hopfield core; the softmax-weighted lookup in the middle is unchanged.&lt;/p&gt;

&lt;p&gt;Krotov &amp;amp; Hopfield (2016) had already worked out the &lt;em&gt;dense associative memory&lt;/em&gt; generalisation that gives this form its exponential storage capacity. Vaswani 2017 reached the same equation by iterating on machine-translation benchmarks. Ramsauer 2020 noticed they were the same. The independent rediscovery is itself diagnostic: the structure isn't a design choice, it's a forced consequence of the requirements.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Why classical Hopfield breaks on MNIST (and why that's &lt;em&gt;not&lt;/em&gt; a bug)
&lt;/h2&gt;

&lt;p&gt;The original 1982 recall rule is&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vᵢ ← sign(Σⱼ Wᵢⱼ · vⱼ)        # W = (1/N) Σₘ ξₘ ξₘᵀ,  Wᵢᵢ = 0   (index m runs over stored patterns)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the Hebb construction. Store ten MNIST digits, query each with 15 % pixel noise, observe what comes back.&lt;/p&gt;

&lt;p&gt;Result: &lt;strong&gt;all ten queries collapse into the same end-state&lt;/strong&gt; — an image that isn't visually any of the stored digits. Mean pairwise similarity between the ten "recalls": 0.99.&lt;/p&gt;

&lt;p&gt;This is fully explained by the spectrum of &lt;code&gt;W_Hebb&lt;/code&gt;. The eigenvalues are roughly&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;λ₁ ≈ 6.65,   λ₂ ≈ 0.65,   λ₃ ≈ 0.48,   ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A factor-of-ten gap between &lt;code&gt;λ₁&lt;/code&gt; and the rest. The top eigenvector is essentially &lt;code&gt;ξ̄ = (1/p) Σₘ ξₘ&lt;/code&gt;, the per-pixel mean — cosine 0.9999.&lt;/p&gt;

&lt;p&gt;The Hebb rule is provably correct &lt;em&gt;only&lt;/em&gt; under two conditions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pairwise orthogonality of stored patterns.&lt;/li&gt;
&lt;li&gt;Zero-mean patterns.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;MNIST digits violate both: pairwise inner products are 400–600 out of 784 (≈ two thirds of the pixels shared), and mean pixel values are −0.63 to −0.90 (much more "background" than "ink"). The failure is therefore not an implementation bug; it's the construction operating outside its range of validity. Centring the patterns kills the bias sink but reveals the next defect — the &lt;code&gt;v → −v&lt;/code&gt; symmetry of &lt;code&gt;E(v) = -½vᵀWv&lt;/code&gt; causes recalls to land on &lt;em&gt;negations&lt;/em&gt; of stored patterns.&lt;/p&gt;

&lt;p&gt;The didactic point: &lt;strong&gt;a learning rule is correct or incorrect relative to a data geometry.&lt;/strong&gt; "Hebb is broken" is not a sentence. "Hebb is broken &lt;em&gt;on MNIST&lt;/em&gt;" is.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. The pseudoinverse fix and its capacity cliff
&lt;/h2&gt;

&lt;p&gt;The Personnaz–Guyon–Dreyfus construction (1985) keeps the same recall machinery but builds W differently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;W_PI = X (XᵀX)⁻¹ Xᵀ
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The factor &lt;code&gt;(XᵀX)⁻¹&lt;/code&gt; is exactly what's missing in Hebb — the inverse of the pattern-pattern Gram matrix. It removes correlations between stored patterns before the matrix becomes the energy landscape. For orthogonal patterns the two rules coincide; for correlated ones, only &lt;code&gt;W_PI&lt;/code&gt; carries the algebraic guarantee&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;W_PI · ξₚ = ξₚ              # every stored pattern is a fixed point with eigenvalue 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Empirical capacity on MNIST, p stored patterns, 10 % pixel noise, fraction of queries that recover the original:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;p&lt;/th&gt;
&lt;th&gt;Hebb&lt;/th&gt;
&lt;th&gt;Pseudoinverse&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;0 %&lt;/td&gt;
&lt;td&gt;100 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;0 %&lt;/td&gt;
&lt;td&gt;100 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;td&gt;0 %&lt;/td&gt;
&lt;td&gt;97 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;0 %&lt;/td&gt;
&lt;td&gt;32 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;td&gt;0 %&lt;/td&gt;
&lt;td&gt;1 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;td&gt;0 %&lt;/td&gt;
&lt;td&gt;0 %&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A sharp phase transition between p ≈ 150 and p ≈ 250. Far below the algebraic ceiling p = N = 784, where the Gram matrix becomes singular. The identity &lt;code&gt;W_PI ξₚ = ξₚ&lt;/code&gt; holds throughout — but the &lt;strong&gt;basin of attraction&lt;/strong&gt; around each fixed point shrinks as the patterns crowd one another, and 10 % noise overshoots the basin once p exceeds ~150.&lt;/p&gt;

&lt;p&gt;Side note for readers who came in via the &lt;a href="https://ki-mathias.de/en/eigenvalues.html" rel="noopener noreferrer"&gt;Eigenvalues post&lt;/a&gt;: the operator &lt;code&gt;X(XᵀX)⁻¹Xᵀ&lt;/code&gt; is &lt;em&gt;exactly&lt;/em&gt; ridge regression with &lt;code&gt;λ = 0&lt;/code&gt; — the pseudoinverse hat matrix. The Hopfield update with this W is therefore a non-linear filter built on top of an ordinary projection onto the span of stored patterns. The capacity cliff is the cliff of unregularised projection at near-singular Gram.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Modern Hopfield = Attention (the move that fixes capacity)
&lt;/h2&gt;

&lt;p&gt;Stop iterating &lt;code&gt;sign(Wv)&lt;/code&gt;. Replace it with the soft, input-dependent&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;v ← X · softmax(β · Xᵀv)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three structural changes happen at once:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Classical (1982/1985)&lt;/th&gt;
&lt;th&gt;Modern (Ramsauer 2020)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Operator&lt;/td&gt;
&lt;td&gt;fixed &lt;code&gt;W ∈ ℝ^(N×N)&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;none — direct softmax-lookup on X&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Update&lt;/td&gt;
&lt;td&gt;linear in v + sign&lt;/td&gt;
&lt;td&gt;non-linear (softmax in v)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Energy&lt;/td&gt;
&lt;td&gt;quadratic &lt;code&gt;-½ vᵀWv&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;log-sum-exp + &lt;code&gt;½‖v‖²&lt;/code&gt; (Lyapunov)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Convergence&lt;/td&gt;
&lt;td&gt;iterative, many sweeps&lt;/td&gt;
&lt;td&gt;one step (for sufficiently large β)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Capacity&lt;/td&gt;
&lt;td&gt;dynamically ≪ N&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;Ω(exp(N))&lt;/code&gt; — exponential in N&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The exponential capacity is the practical reason this works for LLMs at all: with &lt;code&gt;N = 768&lt;/code&gt; (a typical embedding dim), you can store effectively-unbounded context. With &lt;code&gt;N = 784&lt;/code&gt; (MNIST), the classical pseudoinverse rule plateaus near p ≈ 150 on real data.&lt;/p&gt;

&lt;p&gt;And the parameter &lt;code&gt;β&lt;/code&gt; is interpretable. At small &lt;code&gt;β&lt;/code&gt;, the softmax is near-uniform and the recall is a soft average of all stored patterns. At large &lt;code&gt;β&lt;/code&gt;, it concentrates on the single best match — Modern Hopfield converges to 1-nearest-neighbour. Ramsauer's analysis of Transformer heads shows early layers running at low β (global averaging) and deeper layers running at high β (sharp lookup on a single token). The classical "attention is mysterious" complaint dissolves into a continuous interpolation between two known operations.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. The surprise: same learning rule, different geometry, totally different outcome
&lt;/h2&gt;

&lt;p&gt;The interesting finding from &lt;a href="https://arxiv.org/abs/2407.05658" rel="noopener noreferrer"&gt;Negri, Tudisco, Lucibello et al. 2024&lt;/a&gt; — &lt;em&gt;Random Features Hopfield Networks generalize retrieval to previously unseen examples&lt;/em&gt; — is &lt;strong&gt;not&lt;/strong&gt; "we made Hopfield better." It's the opposite:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The exact same learning rule that scores 65 % accuracy on MNIST (i.e., barely matches 1-NN, no real generalisation) achieves &lt;strong&gt;perfect generalisation&lt;/strong&gt; — magnetisation 1.0 on unseen test patterns — when the data is built as a sparse mixture of a small set of random features.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Setup: let &lt;code&gt;F ∈ {-1,+1}^(N×D)&lt;/code&gt; be a random feature matrix. Each pattern is &lt;code&gt;ξ = sign(F · c)&lt;/code&gt; with &lt;code&gt;c&lt;/code&gt; an L-sparse binary coefficient vector. Three sets share the same F:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Train&lt;/strong&gt; (p stored patterns)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Features&lt;/strong&gt; (the D feature columns of F — never stored)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test&lt;/strong&gt; (new patterns from the same distribution, never stored)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sweep &lt;code&gt;α = p/N&lt;/code&gt; and measure the magnetisation of each set. Three phases appear in order:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt; (α small): only train patterns are stable attractors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning&lt;/strong&gt; (α medium): train magnetisation drops, features magnetisation rises — the network has begun to recognise the &lt;em&gt;components&lt;/em&gt; it was implicitly trained on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generalisation&lt;/strong&gt; (α large): test patterns become attractors too — &lt;em&gt;without ever having been stored&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the &lt;strong&gt;pseudoinverse rule&lt;/strong&gt; this last transition is a hard jump to magnetisation 1.0, and the math explains why: once the trained patterns span enough of the feature mixtures, every feature mixture becomes an eigenvector of &lt;code&gt;W_PI&lt;/code&gt; with eigenvalue 1 — by the same identity that made stored patterns fixed points.&lt;/p&gt;

&lt;p&gt;The takeaway is not subtle: &lt;strong&gt;generalisation is a property of the data geometry, not of the learning rule.&lt;/strong&gt; A textbook claim that "this learning rule generalises better" is well-typed only relative to a class of data. The reason language models generalise so well isn't that the attention mechanism has a special "ability" — it's that natural language already has the sparse compositional structure that makes Hopfield-style retrieval transfer beyond the training set. Words and constructions are a finite set of components; sentences are sparse mixtures. Hopfield-friendly by accident of biology.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. What runs on this mathematics today
&lt;/h2&gt;

&lt;p&gt;A non-exhaustive list, with the empirical claim each item is making:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Every Transformer attention layer.&lt;/strong&gt; Modern Hopfield is what's there. With high probability the most-executed mathematical operation on global compute, by raw volume.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MHNfs&lt;/strong&gt; (Klambauer/Hochreiter, Linz) — few-shot drug discovery, 100k+ context molecules as memory, SOTA on FS-Mol.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepRC&lt;/strong&gt; (Widrich et al., NeurIPS 2020) — multi-instance learning over ~10⁶ immune-repertoire sequences, used for SARS-CoV-2 classification.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memristor Hopfield chips&lt;/strong&gt; (HP Labs &lt;em&gt;Nature Electronics&lt;/em&gt; 2020; Peking U. &lt;em&gt;Nature Comms&lt;/em&gt; 2024) — analogue MAX-CUT solvers, ~4 orders of magnitude energy advantage over digital. The Peking paper proves mathematical equivalence to a Hopfield attractor network, not just analogy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyTorch drop-in:&lt;/strong&gt; &lt;a href="https://github.com/ml-jku/hopfield-layers" rel="noopener noreferrer"&gt;ml-jku/hopfield-layers&lt;/a&gt; — &lt;code&gt;Hopfield&lt;/code&gt;, &lt;code&gt;HopfieldPooling&lt;/code&gt;, &lt;code&gt;HopfieldLayer&lt;/code&gt; modules, swap-in replacements for LSTM / pooling / attention.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  7. What this does &lt;em&gt;not&lt;/em&gt; say
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;It does &lt;strong&gt;not&lt;/strong&gt; claim that every modern ML algorithm is "secretly a Hopfield network." The identity is precise between &lt;em&gt;Modern&lt;/em&gt; Hopfield and &lt;em&gt;scaled dot-product attention&lt;/em&gt;. Diffusion models, state-space models, ConvNets have different mathematical structures.&lt;/li&gt;
&lt;li&gt;The Chapter-6 generalisation result is for synthetic feature-mixture data, not for MNIST. To transfer it to real data you need to first extract a feature basis (PCA, dictionary learning, learned embeddings) — i.e. you have to engineer the right kind of sparse architecture, the data won't give it to you for free.&lt;/li&gt;
&lt;li&gt;Classical Hopfield is not "back" as a general-purpose ML tool. ConvNets/diffusion/Transformers are better for almost every benchmark task. The Hopfield reading earns its keep when &lt;em&gt;memory&lt;/em&gt; is the actual problem — few-shot, multi-instance, episodic recall, combinatorial optimisation on dedicated hardware.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full interactive post:&lt;/strong&gt; &lt;a href="https://ki-mathias.de/en/hopfield.html" rel="noopener noreferrer"&gt;Hopfield Networks — From Spin Glass to Attention&lt;/a&gt; — seven chapters, five interactive MNIST demos (live Hebb recall, bias-sink, Hebb↔PI spectrum slider, β-slider, three-phase diagram).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The papers that close the loop:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Hopfield 1982 — &lt;a href="https://www.pnas.org/doi/10.1073/pnas.79.8.2554" rel="noopener noreferrer"&gt;&lt;em&gt;Neural networks and physical systems with emergent collective computational abilities&lt;/em&gt;&lt;/a&gt; (PNAS).&lt;/li&gt;
&lt;li&gt;Krotov &amp;amp; Hopfield 2016 — &lt;a href="https://arxiv.org/abs/1606.01164" rel="noopener noreferrer"&gt;&lt;em&gt;Dense Associative Memory for Pattern Recognition&lt;/em&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Vaswani et al. 2017 — &lt;a href="https://arxiv.org/abs/1706.03762" rel="noopener noreferrer"&gt;&lt;em&gt;Attention Is All You Need&lt;/em&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Ramsauer et al. 2020 — &lt;a href="https://arxiv.org/abs/2008.02217" rel="noopener noreferrer"&gt;&lt;em&gt;Hopfield Networks Is All You Need&lt;/em&gt;&lt;/a&gt; (the identity).&lt;/li&gt;
&lt;li&gt;Negri et al. 2024 — &lt;a href="https://arxiv.org/abs/2407.05658" rel="noopener noreferrer"&gt;&lt;em&gt;Random Features Hopfield Networks generalize retrieval to previously unseen examples&lt;/em&gt;&lt;/a&gt; (the three-phase diagram).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Nobel Prize 2024:&lt;/strong&gt; &lt;a href="https://www.nobelprize.org/prizes/physics/2024/summary/" rel="noopener noreferrer"&gt;official citation&lt;/a&gt; — Hopfield &amp;amp; Hinton, &lt;em&gt;for foundational discoveries and inventions that enable machine learning with artificial neural networks&lt;/em&gt;.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;If you spot a mistake or a sharper statement of any of the above, the source repo is open — corrections welcome.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>neuralnetworks</category>
      <category>math</category>
      <category>ai</category>
    </item>
    <item>
      <title>Your Ridge Parameter Is PageRank's Damping Factor (and Why BLAS Beats GPU for Mid-Sized KRR)</title>
      <dc:creator>Mathias Leonhardt</dc:creator>
      <pubDate>Tue, 21 Apr 2026 18:45:29 +0000</pubDate>
      <link>https://dev.to/ki-mathias/your-ridge-parameter-is-pageranks-damping-factor-and-why-blas-beats-gpu-for-mid-sized-krr-4hac</link>
      <guid>https://dev.to/ki-mathias/your-ridge-parameter-is-pageranks-damping-factor-and-why-blas-beats-gpu-for-mid-sized-krr-4hac</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;The ridge parameter λ in kernel regression and the damping factor d in PageRank are the same object. Both create a spectral gap. Both stabilize an otherwise-unstable iteration. Once you see it, four unrelated-looking algorithms collapse into one story.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This post is a condensed, math-first write-up of two longer pieces on ki-mathias.de:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://ki-mathias.de/en/eigenvalues.html" rel="noopener noreferrer"&gt;Eigenvalues &amp;amp; AI&lt;/a&gt; — the unification story, with interactive eigendecomposition demos.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ki-mathias.de/en/krr-chat-explained.html" rel="noopener noreferrer"&gt;KRR Chat: Under the Hood&lt;/a&gt; — a 3 MB browser chatbot trained with kernel ridge regression, no neural network, ~50 k parameters.&lt;/li&gt;
&lt;li&gt;Paper: &lt;a href="https://doi.org/10.5281/zenodo.19595641" rel="noopener noreferrer"&gt;Leonhardt 2026 — &lt;em&gt;Kernel Ridge Regression and PageRank: A Unified Absorber-Stochastic Framework&lt;/em&gt;&lt;/a&gt; (Zenodo, v2, 10 pages).&lt;/li&gt;
&lt;li&gt;Try the chatbot: &lt;a href="https://pmmathias.github.io/krr-chat/" rel="noopener noreferrer"&gt;Kalle&lt;/a&gt; · Source: &lt;a href="https://github.com/pmmathias/krr-chat" rel="noopener noreferrer"&gt;pmmathias/krr-chat&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  1. The identity that does the work
&lt;/h2&gt;

&lt;p&gt;Every algorithm below is an eigenvalue problem with a regularizer. The regularizer is spelled differently each time, but it plays the same role: it pushes eigenvalues away from the unit circle, which makes power iteration converge.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Algorithm&lt;/th&gt;
&lt;th&gt;Core equation&lt;/th&gt;
&lt;th&gt;"Regularizer"&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ridge regression&lt;/td&gt;
&lt;td&gt;&lt;code&gt;(K + λI) α = y&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;λ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PageRank&lt;/td&gt;
&lt;td&gt;&lt;code&gt;v = d · M · v + (1 − d) · u&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;d (damping ≈ 0.85)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Early-stopped gradient descent&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;α_{t+1} = α_t − η (K α_t − y)&lt;/code&gt;, stop at iteration t&lt;/td&gt;
&lt;td&gt;1/t (Theorem 1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schrödinger eigenproblem&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Ĥ ψ = E ψ&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;mass/potential scaling&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Theorem 1 in the paper makes the second-last row precise:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Running gradient descent on &lt;code&gt;½ ‖K α − y‖²&lt;/code&gt; for exactly &lt;em&gt;t&lt;/em&gt; iterations and stopping approximates the ridge solution with regularizer λ ≈ 1/t.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Early stopping &lt;em&gt;is&lt;/em&gt; spectral filtering. You never wrote λ into your training script, but by choosing a step count you chose one. That's why "no regularization + early stopping" and "ridge regression" often train to the same test loss. They are the same filter applied two different ways.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Why the spectrum matters — with one concrete example
&lt;/h2&gt;

&lt;p&gt;A row-stochastic matrix &lt;em&gt;M&lt;/em&gt; (like the link matrix of a hyperlink graph) has a leading eigenvalue of exactly 1, and the rest can hug that 1 arbitrarily closely. Power iteration will converge to the leading eigenvector, but slowly — the convergence rate is set by the gap &lt;code&gt;1 − |λ₂|&lt;/code&gt;. If the gap is 10⁻⁶, expect a million iterations.&lt;/p&gt;

&lt;p&gt;PageRank fixes this by mixing in a teleport term:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;v ← d · M · v + (1 − d) · u      # d = 0.85, u = uniform
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That pins the spectral radius of the &lt;em&gt;effective&lt;/em&gt; operator at d, not 1. Now every eigenvalue except the leading one is multiplied by 0.85 each iteration — the gap becomes ≥ 0.15, and the walk converges in tens of steps.&lt;/p&gt;

&lt;p&gt;Ridge regression is the linear-algebra twin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;α = (K + λI)⁻¹ y
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same trick. The matrix &lt;em&gt;K&lt;/em&gt; has eigenvalues that can be tiny (noise directions). Inverting &lt;em&gt;K&lt;/em&gt; alone amplifies noise. Adding &lt;code&gt;λI&lt;/code&gt; floors the spectrum: every eigenvalue of &lt;code&gt;K + λI&lt;/code&gt; is at least λ, so the inverse is bounded by 1/λ. You trade bias for variance, and the λ knob is &lt;em&gt;exactly&lt;/em&gt; PageRank's damping factor d.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Absorber-Stochasticization (the paper's §6 trick)
&lt;/h2&gt;

&lt;p&gt;Here's the subtle bit. Writing PageRank as an eigenvalue problem requires &lt;em&gt;M&lt;/em&gt; to be properly stochastic (rows sum to 1). Real link matrices have dangling nodes — pages with no outgoing links — so they're &lt;em&gt;sub&lt;/em&gt;-stochastic. Standard treatments patch this with a uniform "teleport from dead ends" rule, but that mixes two different regularizers (λ and teleport).&lt;/p&gt;

&lt;p&gt;The paper's construction is cleaner: add a single extra row and column — the &lt;strong&gt;absorber&lt;/strong&gt; — that catches all missing mass. Now the matrix is exactly stochastic. The absorber state is transient; the stationary distribution lives on the original nodes.&lt;/p&gt;

&lt;p&gt;The identical construction works for kernel ridge regression. Given &lt;code&gt;(K + λI) α = y&lt;/code&gt;, build the augmented system&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ K + λI   r ]   [ α ]     [ y ]
[  sᵀ      σ ] · [ β ]  =  [ 0 ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where the extra row/column carries the residual. The system is now block-solvable via a Krylov iteration whose spectrum is controlled by λ alone — no auxiliary regularizers, no hand-tuned dangling-node rule.&lt;/p&gt;

&lt;p&gt;The practical payoff: you can train KRR at D ≈ 6k with a &lt;strong&gt;Block Preconditioned Conjugate Gradient&lt;/strong&gt; and hit numerical convergence in ~11 iterations.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. The benchmark that surprised me (BLAS &amp;gt; GPU)
&lt;/h2&gt;

&lt;p&gt;Here is the v2 paper's Table 4 — training the Kalle chatbot (D = 6144, V = 2977):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Backend&lt;/th&gt;
&lt;th&gt;Precision&lt;/th&gt;
&lt;th&gt;Time (ms)&lt;/th&gt;
&lt;th&gt;Iterations&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Direct solve (NumPy/LAPACK)&lt;/td&gt;
&lt;td&gt;Float64&lt;/td&gt;
&lt;td&gt;2097&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Block-PCG (NumPy)&lt;/td&gt;
&lt;td&gt;Float64&lt;/td&gt;
&lt;td&gt;6731&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Block-PCG (PyTorch CPU)&lt;/td&gt;
&lt;td&gt;Float32&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1704&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Block-PCG (PyTorch MPS / GPU)&lt;/td&gt;
&lt;td&gt;Float32&lt;/td&gt;
&lt;td&gt;1622&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The surprise: &lt;strong&gt;PyTorch CPU is within 5% of GPU&lt;/strong&gt;, and it beats NumPy's direct solve. Why?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;BLAS is why.&lt;/strong&gt; PyTorch CPU dispatches matmuls to MKL/oneDNN with AVX-512, multi-threaded. NumPy's &lt;code&gt;linalg.solve&lt;/code&gt; calls LAPACK's direct factorization, which is O(D³) and doesn't parallelize as cleanly as a blocked matmul.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU transfer overhead eats the advantage.&lt;/strong&gt; For D ≈ 6k, the operators fit in L2/L3 cache. Moving α and the preconditioner to GPU memory each iteration costs more than the matmul you save.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Float32 is enough.&lt;/strong&gt; The preconditioner is well-conditioned after the absorber step, so Float32 converges in the same iteration count as Float64 with a 2× speed gain.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If your D is in the 10³–10⁴ range, &lt;strong&gt;don't reach for a GPU&lt;/strong&gt;. Reach for PyTorch CPU with Float32. You'll ship on commodity hardware.&lt;/p&gt;

&lt;p&gt;The inflection point is around D ~ 2·10⁴ on an M2 Pro — above that, GPU wins decisively. Your mileage varies with cache size and memory bandwidth; the takeaway is to measure, not to assume "GPU always wins."&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Corpus scaling — N grows, solve stays 3–7% of total
&lt;/h2&gt;

&lt;p&gt;The other surprise from the benchmarks: when I grew the training corpus from ~3 k to ~21 k sentences (V = 2977 → V = 21,433), the KRR solve stayed between 3% and 7% of the end-to-end pipeline. The bulk of time was spent in tokenization and RFF feature construction, not the linear solve.&lt;/p&gt;

&lt;p&gt;This matters because the common intuition — "bigger corpus → quadratically harder training" — comes from full-kernel methods where you materialize the N×N gram matrix. With Random Fourier Features, the bottleneck is the D×D normal equations, and D is decoupled from N. Doubling your corpus doubles feature-extraction cost (linear in N) but leaves the solve essentially unchanged.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The scaling lever is D, not N.&lt;/strong&gt; Budget accordingly.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Kalle — a 3 MB chatbot you can read
&lt;/h2&gt;

&lt;p&gt;The paper's toy system is Kalle: a kernel-ridge-regression chatbot that I've deployed as a static web page at &lt;a href="https://pmmathias.github.io/krr-chat/" rel="noopener noreferrer"&gt;pmmathias.github.io/krr-chat&lt;/a&gt;. It has about 50 k parameters (three D×V matrices), fits in 3 MB, runs in-browser on your phone's CPU, and speaks German at a level somewhere between a broken Markov chain and a very confused graduate student. It won't replace ChatGPT. It &lt;em&gt;will&lt;/em&gt; let you read every equation in the paper and see the matrices line up.&lt;/p&gt;

&lt;p&gt;The point is pedagogical: at this scale, the mathematics is still visible. You can print out &lt;code&gt;K + λI&lt;/code&gt;, look at its eigenvalues, and nothing is hidden behind a billion-parameter transformer. That's why I built it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Train step — the paper's Algorithm 1, 20 lines of Python
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train_krr_bpcg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lam&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tol&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1e-6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;D&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;lam&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eye&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;D&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;M_inv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;diagonal&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Jacobi preconditioner
&lt;/span&gt;    &lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt;
    &lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;M_inv&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;Ap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
        &lt;span class="n"&gt;rz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rz&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;Ap&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;Ap&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;tol&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="n"&gt;z_new&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;M_inv&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;
        &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;z_new&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;rz&lt;/span&gt;
        &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;z_new&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
        &lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;z_new&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_iter&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The absorber construction lives inside the preconditioner; the outer loop is textbook CG.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. What this means if you're building
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;If you're writing a blog post that says "PageRank is an iterative algorithm and ridge regression is a closed-form algorithm" — they're the same eigenvalue problem, just with different stopping rules.&lt;/li&gt;
&lt;li&gt;If you're choosing between GPU and CPU training for a moderate-D kernel method, measure first. For D &amp;lt; 10⁴, BLAS CPU is frequently the winner.&lt;/li&gt;
&lt;li&gt;If you're looking for an early-stopping criterion, you can read it off the paper's Table 1: running for &lt;em&gt;t&lt;/em&gt; iterations applies a filter approximately equivalent to λ ≈ 1/t. Choose t such that your effective λ matches the cross-validated ridge λ.&lt;/li&gt;
&lt;li&gt;If you're trying to explain modern ML to someone else, the eigenvalue story is the shortest honest path. Every algorithm they'll encounter — PCA, PageRank, Schrödinger, ridge regression, neural network training — reduces to "find the eigenvectors of this operator, and regularize the spectrum so the iteration converges."&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full interactive post:&lt;/strong&gt; &lt;a href="https://ki-mathias.de/en/eigenvalues.html" rel="noopener noreferrer"&gt;Eigenvalues &amp;amp; AI&lt;/a&gt; — includes three React demos (spectrum visualizer, PageRank convergence, ridge parameter slider) and a 9-minute companion video.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kalle walkthrough:&lt;/strong&gt; &lt;a href="https://ki-mathias.de/en/krr-chat-explained.html" rel="noopener noreferrer"&gt;KRR Chat: Under the Hood&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paper (v2, open access):&lt;/strong&gt; &lt;a href="https://doi.org/10.5281/zenodo.19595641" rel="noopener noreferrer"&gt;Zenodo 10.5281/zenodo.19595641&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code:&lt;/strong&gt; &lt;a href="https://github.com/pmmathias/krr-chat" rel="noopener noreferrer"&gt;github.com/pmmathias/krr-chat&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you find a mistake, open an issue on the repo or @ me — every correction improves the next version of the paper.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>math</category>
      <category>ai</category>
    </item>
    <item>
      <title>Why Quantum Computers Are Faster — the Answer Isn't Parallelism</title>
      <dc:creator>Mathias Leonhardt</dc:creator>
      <pubDate>Mon, 20 Apr 2026 14:22:54 +0000</pubDate>
      <link>https://dev.to/ki-mathias/why-quantum-computers-are-faster-the-answer-isnt-parallelism-5ef1</link>
      <guid>https://dev.to/ki-mathias/why-quantum-computers-are-faster-the-answer-isnt-parallelism-5ef1</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A qubit isn't a bit with randomness — it's a rotating arrow on a sphere. A quantum gate is a rotation of that arrow. A quantum algorithm is &lt;strong&gt;one&lt;/strong&gt; global transformation that reshapes the whole solution landscape.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The headline everyone has read in a press release — &lt;em&gt;"quantum computers try all possibilities in parallel"&lt;/em&gt; — is dangerously misleading. A measurement would just return one random answer anyway. So "parallel" on its own is useless. The actual trick is subtler, more beautiful, and the point of this post.&lt;/p&gt;

&lt;p&gt;This is a condensed, tutorial-style version of a &lt;a href="https://ki-mathias.de/en/quantum-computing.html" rel="noopener noreferrer"&gt;longer, interactive piece on ki-mathias.de&lt;/a&gt; — the full version has four interactive React demos (Bloch sphere, circuit builder, measurement simulator, Qiskit playground) and a 9-minute companion video. Everything here is runnable in Qiskit 1.x. The &lt;a href="https://github.com/pmmathias/quantum-computing" rel="noopener noreferrer"&gt;quantum-computing repo&lt;/a&gt; contains the notebooks.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. A qubit, for real
&lt;/h2&gt;

&lt;p&gt;At the hardware level, a qubit is always a two-level quantum system. There are several competing technologies in 2026 — superconducting circuits (Google, IBM), trapped ions (IonQ), neutral atoms (QuEra), photons (PsiQuantum) — but let's pick one and stay concrete: the &lt;strong&gt;superconducting qubit&lt;/strong&gt;, because that's where most of today's qubits live.&lt;/p&gt;

&lt;p&gt;A superconducting qubit is essentially a quantized electrical oscillator. You take an LC circuit — a coil and a capacitor, just like in an old radio — cool it to about 10 millikelvin (colder than intergalactic space), and add a special component called a &lt;strong&gt;Josephson junction&lt;/strong&gt;. This junction is two superconductors separated by a thin insulating barrier. Brian Josephson predicted in 1962 that bound electron pairs (Cooper pairs) can quantum-mechanically tunnel through this barrier — Nobel Prize 1973.&lt;/p&gt;

&lt;p&gt;The Josephson junction bends the energy ladder. Without it, you'd get a harmonic oscillator with equally spaced levels — a microwave pulse tuned to excite |0⟩→|1⟩ would also kick |1⟩→|2⟩. With it, the rungs get closer together as you go up, and you can address the lowest transition selectively. That selectivity is what turns a quantum oscillator into a usable qubit.&lt;/p&gt;

&lt;p&gt;The two lowest levels, &lt;code&gt;|0⟩&lt;/code&gt; and &lt;code&gt;|1⟩&lt;/code&gt;, form the qubit. Their energy gap corresponds to a microwave frequency around &lt;strong&gt;5 GHz&lt;/strong&gt; — that's the frequency of the pulses we'll use to manipulate the qubit.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Amplitudes, not probabilities
&lt;/h2&gt;

&lt;p&gt;Every quantum state is a superposition&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|ψ⟩ = α |0⟩ + β |1⟩
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where &lt;code&gt;α&lt;/code&gt; and &lt;code&gt;β&lt;/code&gt; are &lt;em&gt;complex numbers&lt;/em&gt; — amplitudes. The Born rule (Max Born, 1926, Nobel 1954) says the probability of measuring state &lt;code&gt;|0⟩&lt;/code&gt; is &lt;code&gt;|α|²&lt;/code&gt; and &lt;code&gt;|1⟩&lt;/code&gt; is &lt;code&gt;|β|²&lt;/code&gt;, with &lt;code&gt;|α|² + |β|² = 1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Crucially, amplitudes can be &lt;em&gt;negative&lt;/em&gt; or &lt;em&gt;complex&lt;/em&gt; — unlike probabilities, they can &lt;strong&gt;cancel each other out&lt;/strong&gt;. That's interference, and it's the whole point.&lt;/p&gt;

&lt;p&gt;A Hadamard gate H turns &lt;code&gt;|0⟩&lt;/code&gt; into a perfectly balanced superposition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;H |0⟩ = (1/√2) (|0⟩ + |1⟩)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's the proof that amplitudes aren't classical probabilities: apply &lt;code&gt;H&lt;/code&gt; twice to &lt;code&gt;|0⟩&lt;/code&gt;. Each individual &lt;code&gt;H&lt;/code&gt; puts you in a 50/50 superposition. A classical coin would still give 50/50 after a second flip. The qubit? Goes back to &lt;code&gt;|0⟩&lt;/code&gt;. &lt;strong&gt;Deterministic.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Why? Because two &lt;code&gt;H&lt;/code&gt; matrices multiply to the identity. Physically: the amplitude for &lt;code&gt;|0⟩&lt;/code&gt; coming from the &lt;code&gt;|0⟩&lt;/code&gt;-branch adds constructively with the one from the &lt;code&gt;|1⟩&lt;/code&gt;-branch; the amplitudes for &lt;code&gt;|1⟩&lt;/code&gt; cancel. Exactly the kind of destructive cancellation classical probabilities can't do.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. The whole secret, one sentence
&lt;/h2&gt;

&lt;p&gt;Before we look at an algorithm, the mental model has to shift.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Old picture:&lt;/strong&gt; quantum computers try all possibilities in parallel. &lt;em&gt;Wrong.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;New picture:&lt;/strong&gt; a quantum algorithm is &lt;strong&gt;one&lt;/strong&gt; global linear transformation on a &lt;code&gt;2^n&lt;/code&gt;-amplitude vector, carefully designed so that wrong answers &lt;em&gt;destructively cancel&lt;/em&gt; and the right one &lt;em&gt;constructively amplifies&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Classical searches. Quantum shapes.&lt;/p&gt;

&lt;p&gt;This isn't just poetry. Every quantum gate, expressed as a matrix, is &lt;code&gt;2^n × 2^n&lt;/code&gt; — it acts on the entire state vector at once. Ten qubits → a &lt;code&gt;1024 × 1024&lt;/code&gt; matrix per gate. This is not parallel trial-and-error; it's a global transformation of a high-dimensional probability field.&lt;/p&gt;

&lt;p&gt;If you've touched linear algebra before, there's an even tighter way to say this: a quantum algorithm is a chain of unitary operations whose &lt;em&gt;eigenstates align with the answer we're looking for&lt;/em&gt;. What starts spread across all directions gets pushed onto the right axis with each iteration. Quantum computation is &lt;strong&gt;eigenstate amplification&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Grover, step by step
&lt;/h2&gt;

&lt;p&gt;The cleanest place to see this in action is Grover's algorithm (1996), which finds a marked entry in an unsorted database of size &lt;code&gt;N&lt;/code&gt; in &lt;code&gt;O(√N)&lt;/code&gt; steps versus &lt;code&gt;O(N)&lt;/code&gt; classically. Let's walk through &lt;code&gt;N = 8&lt;/code&gt; — three qubits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup.&lt;/strong&gt; Three Hadamards on three qubits create the equal superposition: eight basis states, each with amplitude &lt;code&gt;1/√8&lt;/code&gt;. Measurement here is just an 8-sided die. Useless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — The oracle.&lt;/strong&gt; A unitary circuit that recognizes the marked state (say, &lt;code&gt;|101⟩&lt;/code&gt;) and flips its sign. After the oracle, &lt;code&gt;|101⟩&lt;/code&gt; has amplitude &lt;code&gt;-1/√8&lt;/code&gt;, the other seven stay at &lt;code&gt;+1/√8&lt;/code&gt;. Measuring still gives equal probability — the squared magnitudes are the same. But the phase information is hiding in the sign.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 — Diffusion.&lt;/strong&gt; Geometrically, this is a single operation: &lt;em&gt;reflect every amplitude about the mean of all amplitudes&lt;/em&gt;. The mean is about &lt;code&gt;0.24&lt;/code&gt; before reflection (seven positives dominate over one negative). After reflection, the marked state jumps to about &lt;code&gt;+2.12 / √8&lt;/code&gt; (amplified dramatically), the others drop to about &lt;code&gt;0.18 / √8&lt;/code&gt; (shrunk).&lt;/p&gt;

&lt;p&gt;Visually, in bar-chart form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Start       | | | | | | | |    (all equal positive)
After oracle| | | | | |▽| |    (one flipped negative)
After diff. ▁ ▁ ▁ ▁ ▁ █ ▁ ▁    (one big, others small)
Round 2     . . . . . ██ . .   (solution dominates ~95%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3 — Repeat.&lt;/strong&gt; One "Grover step" = oracle + diffusion. Optimal peak is around &lt;code&gt;⌊π/4 · √N⌋&lt;/code&gt; iterations. For &lt;code&gt;N = 8&lt;/code&gt;, that's 2 rounds. Then measure, and you get the marked state with &amp;gt; 94% probability.&lt;/p&gt;

&lt;p&gt;Classical average: &lt;code&gt;N/2 = 4&lt;/code&gt; oracle queries. Grover: 2. The win grows: for &lt;code&gt;N = 10⁹&lt;/code&gt;, classical needs ~500 million queries, Grover ~25,000.&lt;/p&gt;

&lt;p&gt;Quadratic, not exponential — but enough to change everything about unsorted search. And the mechanism is exactly what we said: &lt;strong&gt;amplitudes being shaped&lt;/strong&gt;, not possibilities being tried in parallel.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Your first quantum circuit: the Bell state in Qiskit
&lt;/h2&gt;

&lt;p&gt;The "Hello World" of quantum computing. Two qubits, two gates, perfect correlation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;qiskit&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;QuantumCircuit&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;qiskit_aer&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AerSimulator&lt;/span&gt;

&lt;span class="n"&gt;qc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;QuantumCircuit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;qc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;h&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;            &lt;span class="c1"&gt;# put q0 in superposition
&lt;/span&gt;&lt;span class="n"&gt;qc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# CNOT: flip q1 iff q0 is 1
&lt;/span&gt;&lt;span class="n"&gt;qc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;measure&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;sim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AerSimulator&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;qc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# {'00': ~512, '11': ~512}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Hadamard puts qubit 0 into &lt;code&gt;(|0⟩ + |1⟩)/√2&lt;/code&gt;. The CNOT flips qubit 1 &lt;em&gt;iff&lt;/em&gt; qubit 0 is 1. Applied to the superposition, you end up with&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|Φ⁺⟩ = (1/√2) (|00⟩ + |11⟩)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Measuring gives you &lt;code&gt;|00⟩&lt;/code&gt; 50% of the time, &lt;code&gt;|11⟩&lt;/code&gt; 50% — and &lt;strong&gt;never&lt;/strong&gt; &lt;code&gt;|01⟩&lt;/code&gt; or &lt;code&gt;|10⟩&lt;/code&gt;. That's entanglement. Measure one qubit, you instantly know the other — Einstein's "spooky action at a distance," experimentally confirmed (Aspect et al., Nobel 2022).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Running on real hardware&lt;/strong&gt; is two more lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;qiskit_ibm_runtime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;QiskitRuntimeService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SamplerV2&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;qiskit&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;transpile&lt;/span&gt;

&lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;QiskitRuntimeService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ibm_quantum_platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;backend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;least_busy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;simulator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;operational&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;qc_t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;transpile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;qc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sampler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SamplerV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sampler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;qc_t&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;meas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_counts&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real hardware gives you noisy results — maybe &lt;code&gt;490 / 508 / 20 "garbage"&lt;/code&gt;. That's decoherence and gate errors. The &lt;a href="https://quantum.cloud.ibm.com/" rel="noopener noreferrer"&gt;IBM Quantum Platform&lt;/a&gt; has a free tier with 10 minutes of QPU time per month on 100+ qubit systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Shor in one paragraph
&lt;/h2&gt;

&lt;p&gt;Shor's algorithm (1994) uses the same &lt;strong&gt;shaping principle&lt;/strong&gt; for a totally different problem. The key insight: factoring &lt;code&gt;N&lt;/code&gt; reduces to finding the &lt;em&gt;period&lt;/em&gt; of the sequence &lt;code&gt;a, a², a³, ... mod N&lt;/code&gt;. Period-finding classically is nearly as hard as factoring itself — but quantum mechanically, you put qubits into a superposition over all &lt;code&gt;x&lt;/code&gt;, compute &lt;code&gt;aˣ mod N&lt;/code&gt; into a second register, and apply the &lt;strong&gt;Quantum Fourier Transform&lt;/strong&gt; to the first register. The QFT converts a periodic amplitude distribution into a sharply peaked one — amplitudes that don't match the true period cancel destructively, matching ones amplify constructively. A single measurement returns, with high probability, a value from which the period can be reconstructed classically.&lt;/p&gt;

&lt;p&gt;Same move as Grover, different shaping tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. What does NOT get faster (honest bit, 2026)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No Quantum Excel.&lt;/strong&gt; No Quantum ChatGPT. No "10,000× AI training speedup."&lt;/li&gt;
&lt;li&gt;Quantum computers are &lt;strong&gt;special-purpose processors&lt;/strong&gt; for problems whose structure interference can exploit. Training neural networks on classical data? No speedup.&lt;/li&gt;
&lt;li&gt;Current hardware (Google Willow 105 qubits, IBM Heron, IonQ Tempo) is &lt;strong&gt;NISQ-era&lt;/strong&gt; — noisy, no full error correction. For Shor on 2048-bit RSA, we need a few million error-corrected qubits. Conservative estimate: 2030s.&lt;/li&gt;
&lt;li&gt;But: the &lt;strong&gt;post-quantum crypto migration&lt;/strong&gt; is already happening (NIST standards ML-KEM, ML-DSA since 2024), because &lt;em&gt;"harvest now, decrypt later"&lt;/em&gt; is a real threat.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  8. Further reading + code
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full blog post&lt;/strong&gt; (interactive Bloch sphere, circuit builder, measurement simulator, Qiskit playground, SVG of Grover amplitudes): &lt;a href="https://ki-mathias.de/en/quantum-computing.html" rel="noopener noreferrer"&gt;ki-mathias.de/en/quantum-computing.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;9-minute companion video&lt;/strong&gt; with Manim animations: &lt;a href="https://youtu.be/iMaqNfA0Gs0" rel="noopener noreferrer"&gt;youtu.be/iMaqNfA0Gs0&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qiskit notebooks on GitHub&lt;/strong&gt; (Bell state, Deutsch-Jozsa, Grover search, VQE for H₂ — all runnable on IBM free tier): &lt;a href="https://github.com/pmmathias/quantum-computing" rel="noopener noreferrer"&gt;github.com/pmmathias/quantum-computing&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Classical computation is a path. &lt;strong&gt;Quantum computation is a field.&lt;/strong&gt; The speedup comes not from running many computations in parallel, but from transforming the entire probability landscape in one shot — destructively cancelling wrong answers, amplifying the right one, before the measurement collapses everything to a single value.&lt;/p&gt;

&lt;p&gt;Three concrete takeaways for any developer:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;|ψ⟩ = α|0⟩ + β|1⟩&lt;/code&gt; — amplitudes, not probabilities. They can cancel.&lt;/li&gt;
&lt;li&gt;Every gate is a &lt;code&gt;2^n × 2^n&lt;/code&gt; unitary matrix acting on the whole state vector at once.&lt;/li&gt;
&lt;li&gt;Grover amplifies a marked state by reflecting amplitudes about their mean. Shor amplifies the right Fourier peak. Same principle, different shaping tools.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you've got Python and 15 minutes, fork the repo above, run &lt;code&gt;01_bell_state.ipynb&lt;/code&gt;, and see your first entangled pair of qubits. It's less intimidating than it sounds.&lt;/p&gt;

</description>
      <category>quantumcomputing</category>
      <category>qiskit</category>
      <category>python</category>
      <category>physics</category>
    </item>
    <item>
      <title>Deepfakes Explained — From Vectors to the Decoder Swap (Interactive)</title>
      <dc:creator>Mathias Leonhardt</dc:creator>
      <pubDate>Sun, 19 Apr 2026 10:17:19 +0000</pubDate>
      <link>https://dev.to/ki-mathias/deepfakes-explained-from-vectors-to-the-decoder-swap-interactive-5gek</link>
      <guid>https://dev.to/ki-mathias/deepfakes-explained-from-vectors-to-the-decoder-swap-interactive-5gek</guid>
      <description>&lt;p&gt;Most explanations of deepfakes start with "AI swaps faces." That's like explaining a car engine by saying "it goes vroom." I wanted to understand the &lt;em&gt;math&lt;/em&gt; — so I built a blog post that walks through it from scratch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The result:&lt;/strong&gt; &lt;a href="https://ki-mathias.de/en/deepfakes.html" rel="noopener noreferrer"&gt;13 chapters, 6 interactive demos, and a real autoencoder running in your browser&lt;/a&gt; — no TensorFlow.js, just 200KB of pure JavaScript matrix math.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/d6qwD3Pa1KE"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  The path from vectors to face-swapping
&lt;/h2&gt;

&lt;p&gt;The post follows the arc of a tech talk I gave at my company. Each concept builds on the previous:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Vectors &amp;amp; Matrices&lt;/strong&gt; — what does it mean to "transform" data?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Orthogonality&lt;/strong&gt; — when are features truly independent?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blind Source Separation&lt;/strong&gt; — separating mixed audio signals with linear algebra&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-dimensional data&lt;/strong&gt; — a 200-pixel image is a 200-dimensional vector&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PCA&lt;/strong&gt; — finding the axes that explain the most variance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linear vs. nonlinear interpolation&lt;/strong&gt; — why blending faces in pixel space gives you ghosts, not faces&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The kernel trick&lt;/strong&gt; — lifting data into higher dimensions where nonlinear becomes linear&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Neural networks&lt;/strong&gt; — the machine that does dimensionality reduction AND increase simultaneously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autoencoders&lt;/strong&gt; — compress → latent space → decompress&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latent space arithmetic&lt;/strong&gt; — smiling woman − neutral woman + neutral man = smiling man&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deepfakes&lt;/strong&gt; — swap the decoder. That's it. That's the trick.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The interactive parts
&lt;/h2&gt;

&lt;p&gt;Every key concept has a visualization you can play with:&lt;/p&gt;

&lt;h3&gt;
  
  
  PCA Playground
&lt;/h3&gt;

&lt;p&gt;Draw 2D points → watch the principal component axes update live. Eigenvalues, explained variance, the whole thing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Matrix Transform
&lt;/h3&gt;

&lt;p&gt;Drag sliders for rotation and scale → see how a matrix transforms a vector in real-time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kernel Trick
&lt;/h3&gt;

&lt;p&gt;Toggle between 2D (not linearly separable) and 3D (linearly separable!) with a button. The separating plane appears when you lift the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rotating Tesseract
&lt;/h3&gt;

&lt;p&gt;A 4D hypercube projected to 2D, with three animated rotation planes. Includes a &lt;strong&gt;3D↔4D slider&lt;/strong&gt; that collapses the tesseract into a regular cube — showing what the 4th dimension "does."&lt;/p&gt;

&lt;h3&gt;
  
  
  MNIST Autoencoder
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Draw a digit → the autoencoder encodes and reconstructs it live.&lt;/strong&gt; The model is a tiny PyTorch-trained network (784→64→16→2→16→64→784) exported as float16 binary weights. Inference runs in pure JavaScript — no TensorFlow.js, no WebAssembly, just matrix multiplications and ReLU.&lt;/p&gt;

&lt;p&gt;Total model size: &lt;strong&gt;200KB&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Latent Space Explorer
&lt;/h3&gt;

&lt;p&gt;Move your mouse across a 2D latent space → the decoder generates digits in real-time. Each color cluster = one digit class. You can see how the autoencoder organized the digits spatially.&lt;/p&gt;

&lt;h2&gt;
  
  
  The autoencoder — no TensorFlow.js needed
&lt;/h2&gt;

&lt;p&gt;This was the most satisfying part to build. Instead of shipping a 3MB TensorFlow.js dependency, I:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Trained a tiny autoencoder in PyTorch (12 epochs on MNIST, ~30 seconds)&lt;/li&gt;
&lt;li&gt;Exported weights as &lt;strong&gt;float16 binary&lt;/strong&gt; (not JSON — 200KB instead of 2.2MB)&lt;/li&gt;
&lt;li&gt;Wrote inference in ~60 lines of vanilla JavaScript:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;W&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Float32Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;W&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;W&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;W&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nx"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;W&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;W&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;h1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;relu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;enc_0_weight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;enc_0_bias&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;h2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;relu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;h1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;enc_2_weight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;enc_2_bias&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;h2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;enc_4_weight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;enc_4_bias&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No framework, no build step, no WASM. Load the binary weights, do matrix math, draw pixels on a canvas.&lt;/p&gt;

&lt;h2&gt;
  
  
  The deepfake mechanism in one paragraph
&lt;/h2&gt;

&lt;p&gt;Train two autoencoders with a &lt;strong&gt;shared encoder&lt;/strong&gt; but &lt;strong&gt;different decoders&lt;/strong&gt;. The encoder learns a universal face representation (latent space). Decoder A reconstructs face A, Decoder B reconstructs face B. Now swap: feed person A's image through the shared encoder, then through decoder B. Result: A's expression and pose, B's appearance. A deepfake.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ki-mathias.de/en/deepfakes.html" rel="noopener noreferrer"&gt;Interactive blog post&lt;/a&gt;&lt;/strong&gt; — all 6 demos, no login, no tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://youtu.be/d6qwD3Pa1KE" rel="noopener noreferrer"&gt;YouTube explainer&lt;/a&gt;&lt;/strong&gt; — 6:34 min narrated version&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/pmmathias/blog" rel="noopener noreferrer"&gt;Source code&lt;/a&gt;&lt;/strong&gt; — static HTML, React components, self-hosted everything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire blog is built with Claude Code as an AI pair programmer. The autoencoder training, Manim animations, ElevenLabs voiceover, ffmpeg video assembly, and YouTube upload were all orchestrated from the terminal.&lt;/p&gt;

&lt;p&gt;Questions welcome — especially if you spot a mistake in the math.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>javascript</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I built a bird flight simulator you control by flapping your arms (Three.js + MediaPipe)</title>
      <dc:creator>Mathias Leonhardt</dc:creator>
      <pubDate>Sun, 19 Apr 2026 06:51:23 +0000</pubDate>
      <link>https://dev.to/ki-mathias/i-built-a-bird-flight-simulator-you-control-by-flapping-your-arms-threejs-mediapipe-ld6</link>
      <guid>https://dev.to/ki-mathias/i-built-a-bird-flight-simulator-you-control-by-flapping-your-arms-threejs-mediapipe-ld6</guid>
      <description>&lt;p&gt;My 6-year-old son wanted a game where you "fly like a real bird." I'm a web developer, not a game dev — but I had &lt;a href="https://www.anthropic.com/product/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; as a pair programmer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The result:&lt;/strong&gt; &lt;a href="https://ki-mathias.de/en/flight-simulator.html" rel="noopener noreferrer"&gt;A 3D bird flight simulator&lt;/a&gt; running entirely in your browser. You control the bird by moving your arms in front of your webcam.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/Tk-UhsrEnY4"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Arms up&lt;/strong&gt; → glide. &lt;strong&gt;Arms down&lt;/strong&gt; → dive. &lt;strong&gt;Flap&lt;/strong&gt; → climb. No controller, no download.&lt;/p&gt;

&lt;p&gt;The webcam feeds into &lt;a href="https://ai.google.dev/edge/mediapipe/solutions/vision/pose_landmarker" rel="noopener noreferrer"&gt;MediaPipe's PoseLandmarker&lt;/a&gt;, which detects 33 body landmarks in real-time. I map shoulder-to-wrist vectors to flight controls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hands above shoulders → lift increases&lt;/li&gt;
&lt;li&gt;Hands below shoulders → pitch down&lt;/li&gt;
&lt;li&gt;Rapid up-down motion → flap detected → altitude boost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also works on mobile with gravity-based gesture controls (tilt to steer).&lt;/p&gt;

&lt;h2&gt;
  
  
  The tech stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Tech&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;3D engine&lt;/td&gt;
&lt;td&gt;Three.js + WebGL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pose detection&lt;/td&gt;
&lt;td&gt;MediaPipe PoseLandmarker&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Terrain&lt;/td&gt;
&lt;td&gt;Custom GLSL shader, 5 texture layers by altitude&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ocean&lt;/td&gt;
&lt;td&gt;FFT-based (Tessendorf method) — same math JPEG uses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trees&lt;/td&gt;
&lt;td&gt;InstancedMesh × 100,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Culling&lt;/td&gt;
&lt;td&gt;Octree-based frustum culling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Physics&lt;/td&gt;
&lt;td&gt;Custom aerodynamics (lift, drag, stall)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The interesting parts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  FFT ocean waves
&lt;/h3&gt;

&lt;p&gt;The ocean uses the same Fourier transform that powers JPEG compression — but in reverse. A Phillips spectrum describes wave energy per frequency, and an inverse FFT turns that into a height map that deforms the water mesh. 512×512 spectral points = 262,144 overlapping wave frequencies per frame.&lt;/p&gt;

&lt;h3&gt;
  
  
  Octree frustum culling
&lt;/h3&gt;

&lt;p&gt;With 100k trees, you can't render everything. An octree partitions 3D space recursively, and frustum culling discards entire branches that fall outside the camera's view. Result: ~3,000–5,000 trees rendered per frame instead of 100k.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AI pair programming workflow
&lt;/h3&gt;

&lt;p&gt;Claude Code handled the boilerplate, shader code, and physics implementation. I focused on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product decisions ("the bird needs to turn faster")&lt;/li&gt;
&lt;li&gt;Visual taste (sunset colors, water transparency)&lt;/li&gt;
&lt;li&gt;Architecture ("use an octree, not a flat array")&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The hard part wasn't the code — it was knowing &lt;em&gt;what&lt;/em&gt; to build. My 6-year-old was surprisingly good at that. His requirements were non-negotiable: "there need to be sharks."&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ki-mathias.de/en/flight-simulator.html" rel="noopener noreferrer"&gt;Play in your browser&lt;/a&gt;&lt;/strong&gt; (needs webcam or use keyboard: WASD + Space)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/pmmathias/VogelSimulator" rel="noopener noreferrer"&gt;Source code&lt;/a&gt;&lt;/strong&gt; (MIT license)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ki-mathias.de/en/flight-simulator.html" rel="noopener noreferrer"&gt;Blog post&lt;/a&gt;&lt;/strong&gt; explaining the Fourier math in detail&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Built with Three.js, MediaPipe, and a very opinionated 6-year-old. Questions welcome.&lt;/p&gt;

</description>
      <category>threejs</category>
      <category>webgl</category>
      <category>mediapipe</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
