<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Maulik Sompura</title>
    <description>The latest articles on DEV Community by Maulik Sompura (@maulik_sompura_22).</description>
    <link>https://dev.to/maulik_sompura_22</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3130157%2Fdc83bf1e-6346-4192-8ed9-d1885b047012.jpeg</url>
      <title>DEV Community: Maulik Sompura</title>
      <link>https://dev.to/maulik_sompura_22</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/maulik_sompura_22"/>
    <language>en</language>
    <item>
      <title>What If GPT Didn’t “Learn”, It Just Found a Winning Lottery Ticket?</title>
      <dc:creator>Maulik Sompura</dc:creator>
      <pubDate>Sat, 21 Feb 2026 16:10:36 +0000</pubDate>
      <link>https://dev.to/maulik_sompura_22/what-if-gpt-didnt-learn-it-just-found-a-winning-lottery-ticket-4bcl</link>
      <guid>https://dev.to/maulik_sompura_22/what-if-gpt-didnt-learn-it-just-found-a-winning-lottery-ticket-4bcl</guid>
      <description>&lt;p&gt;I used to say it confidently:&lt;/p&gt;

&lt;p&gt;“The model learned this.”&lt;/p&gt;

&lt;p&gt;It felt obvious. We initialize weights. We run gradient descent. We minimize loss. The network learns.&lt;/p&gt;

&lt;p&gt;End of story.&lt;/p&gt;

&lt;p&gt;But then I came across a research paper that genuinely disturbed that simple narrative.&lt;/p&gt;

&lt;p&gt;While reading late one night, I stumbled upon a paper by Jonathan Frankle and Michael Carbin. At first, it looked like just another pruning paper. But the core claim made me stop and reread it twice.&lt;/p&gt;

&lt;p&gt;It suggested something radical:&lt;/p&gt;

&lt;p&gt;What if neural networks don’t build intelligence from scratch during training?&lt;/p&gt;

&lt;p&gt;What if, hidden inside a randomly initialized network, there already exists a smaller subnetwork that is capable of solving the task and training merely discovers it?&lt;/p&gt;

&lt;p&gt;That idea is known as the Lottery Ticket Hypothesis.&lt;/p&gt;

&lt;p&gt;And if it’s even partially true, then gradient descent isn’t constructing intelligence.&lt;/p&gt;

&lt;p&gt;It’s searching for a winning ticket inside structured randomness.&lt;/p&gt;

&lt;p&gt;Before we go further, let’s unpack what that actually means.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Idea That Quietly Shook Deep Learning
&lt;/h2&gt;

&lt;p&gt;In 2018, &lt;strong&gt;Jonathan Frankle&lt;/strong&gt; and &lt;strong&gt;Michael Carbin&lt;/strong&gt; proposed something almost heretical:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Inside every randomly initialized neural network, there exists a smaller subnetwork that is already capable of learning the task.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They called it the &lt;strong&gt;Lottery Ticket Hypothesis (LTH)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The claim?&lt;/p&gt;

&lt;p&gt;A huge neural network contains a sparse “winning ticket” —&lt;br&gt;
a subnetwork that, when trained alone (starting from the original initialization), matches the full network’s performance.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why “Lottery Ticket”?
&lt;/h2&gt;

&lt;p&gt;Imagine you randomly initialize a huge neural network.&lt;/p&gt;

&lt;p&gt;Most weights are useless.&lt;/p&gt;

&lt;p&gt;But hidden inside that random initialization is a rare configuration of connections that is already aligned with the task.&lt;/p&gt;

&lt;p&gt;Training doesn’t create intelligence from scratch.&lt;br&gt;
It discovers and amplifies that lucky subnetwork.&lt;/p&gt;

&lt;p&gt;Like buying lottery tickets, most are worthless, but a few are already winners.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Experiment (Simplified)
&lt;/h2&gt;

&lt;p&gt;Researchers performed a surprisingly simple experiment:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Train a full network to convergence.&lt;/li&gt;
&lt;li&gt;Prune (remove) the smallest-magnitude weights.&lt;/li&gt;
&lt;li&gt;Reset the remaining weights back to their original random initialization.&lt;/li&gt;
&lt;li&gt;Retrain only the pruned subnetwork.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Shockingly:&lt;/p&gt;

&lt;p&gt;The pruned network trained just as well.&lt;/p&gt;

&lt;p&gt;Sometimes even faster.&lt;/p&gt;

&lt;p&gt;This means the full network wasn’t necessary.&lt;br&gt;
The winning subnetwork was already present at initialization.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why This Is Mind-Blowing
&lt;/h2&gt;

&lt;p&gt;It challenges the intuitive story:&lt;/p&gt;

&lt;p&gt;❌ “Training builds intelligence.”&lt;/p&gt;

&lt;p&gt;Instead, it suggests:&lt;/p&gt;

&lt;p&gt;✅ “Initialization already contains many potential intelligent subnetworks.”&lt;/p&gt;

&lt;p&gt;Training becomes less about constructing intelligence and more about &lt;strong&gt;searching for one good structure hidden inside randomness.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That shifts deep learning from:&lt;/p&gt;

&lt;p&gt;Optimization theory&lt;br&gt;
to&lt;br&gt;
Combinatorial search through structured randomness.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Mathematical Angle
&lt;/h2&gt;

&lt;p&gt;Let a neural network be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;f(x; θ)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;θ ∈ ℝᵈ&lt;/li&gt;
&lt;li&gt;d could be millions or billions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LTH says:&lt;/p&gt;

&lt;p&gt;There exists a binary mask m ∈ {0,1}ᵈ such that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;f(x; m ⊙ θ₀)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;trained alone performs as well as the full model.&lt;/p&gt;

&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;θ₀ = original random initialization&lt;/li&gt;
&lt;li&gt;⊙ = elementwise multiplication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the “winning ticket” is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;m ⊙ θ₀
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The crucial detail:&lt;/p&gt;

&lt;p&gt;If you randomly reinitialize after pruning, performance drops.&lt;/p&gt;

&lt;p&gt;That means the specific random draw of θ₀ contained structure.&lt;/p&gt;

&lt;p&gt;Randomness wasn’t neutral noise.&lt;br&gt;
It already encoded useful geometry.&lt;/p&gt;


&lt;h2&gt;
  
  
  Even Deeper Implications
&lt;/h2&gt;
&lt;h2&gt;
  
  
  Overparameterization Might Be a Search Strategy
&lt;/h2&gt;

&lt;p&gt;Why are large language models enormous?&lt;/p&gt;

&lt;p&gt;Maybe not because they need every parameter.&lt;/p&gt;

&lt;p&gt;Maybe because a larger network increases the probability that a good subnetwork exists.&lt;/p&gt;

&lt;p&gt;If each subnetwork has a tiny probability p of being “trainable,” and you have N possible subnetworks, then the probability at least one works is roughly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 − (1 − p)^N
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As N grows, this rapidly approaches 1.&lt;/p&gt;

&lt;p&gt;Bigger models might simply mean more lottery tickets.&lt;/p&gt;




&lt;h2&gt;
  
  
  Random Initialization Is Not “Just Noise”
&lt;/h2&gt;

&lt;p&gt;We usually think:&lt;/p&gt;

&lt;p&gt;Random weights = meaningless chaos.&lt;/p&gt;

&lt;p&gt;But in high dimensions, strange things happen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Concentration of measure&lt;/li&gt;
&lt;li&gt;Emergent correlations&lt;/li&gt;
&lt;li&gt;Structured spectral properties of random matrices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Random high-dimensional systems often contain surprising regularities.&lt;/p&gt;

&lt;p&gt;LTH suggests intelligence might emerge from those hidden regularities.&lt;/p&gt;

&lt;p&gt;This connects to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random matrix theory&lt;/li&gt;
&lt;li&gt;High-dimensional probability&lt;/li&gt;
&lt;li&gt;Sparse approximation theory&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Compression and Pruning Make More Sense
&lt;/h2&gt;

&lt;p&gt;LTH helps explain why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pruned networks often retain performance&lt;/li&gt;
&lt;li&gt;Sparse models can match dense ones&lt;/li&gt;
&lt;li&gt;Quantized models still work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Redundancy may not be inefficiency.&lt;/p&gt;

&lt;p&gt;It may be a probabilistic discovery mechanism.&lt;/p&gt;




&lt;h2&gt;
  
  
  LTH and Large Language Models
&lt;/h2&gt;

&lt;p&gt;Now the dangerous thought.&lt;/p&gt;

&lt;p&gt;Modern systems like GPT have billions of parameters.&lt;/p&gt;

&lt;p&gt;What if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only a fraction are truly necessary?&lt;/li&gt;
&lt;li&gt;Reasoning lives in sparse subnetworks?&lt;/li&gt;
&lt;li&gt;Scaling works because it increases the chance of containing rare reasoning-capable configurations?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;p&gt;“Bigger models are smarter.”&lt;/p&gt;

&lt;p&gt;It might be:&lt;/p&gt;

&lt;p&gt;“Bigger models are more likely to contain a rare intelligent structure.”&lt;/p&gt;

&lt;p&gt;That’s a radically different interpretation of scaling laws.&lt;/p&gt;




&lt;h2&gt;
  
  
  But It Gets Stranger
&lt;/h2&gt;

&lt;p&gt;Later research found that, for very large networks, resetting to &lt;strong&gt;early training weights&lt;/strong&gt; works better than resetting to full initialization.&lt;/p&gt;

&lt;p&gt;This led to ideas like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Early-bird tickets&lt;/li&gt;
&lt;li&gt;Mode connectivity&lt;/li&gt;
&lt;li&gt;Linear low-loss paths between solutions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The loss landscape appears smoother and more connected than classical intuition suggests.&lt;/p&gt;

&lt;p&gt;The geometry of deep learning is far stranger than we once believed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Open Questions
&lt;/h2&gt;

&lt;p&gt;We still don’t fully understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why winning tickets exist&lt;/li&gt;
&lt;li&gt;How large they must be&lt;/li&gt;
&lt;li&gt;Whether the hypothesis scales cleanly to transformers&lt;/li&gt;
&lt;li&gt;Whether reasoning is localized or distributed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The theory is incomplete.&lt;/p&gt;

&lt;p&gt;The implications are enormous.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Radical Interpretation
&lt;/h2&gt;

&lt;p&gt;Some researchers speculate:&lt;/p&gt;

&lt;p&gt;Deep networks behave like massive random feature ensembles.&lt;br&gt;
Training selects coherent sparse structures from that ensemble.&lt;/p&gt;

&lt;p&gt;That comes dangerously close to saying:&lt;/p&gt;

&lt;p&gt;Intelligence emerges from structured randomness.&lt;/p&gt;

&lt;p&gt;Not from careful deterministic construction.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Truly Mind-Bending Part
&lt;/h2&gt;

&lt;p&gt;If the Lottery Ticket Hypothesis holds at scale…&lt;/p&gt;

&lt;p&gt;Then scaling laws might reflect:&lt;/p&gt;

&lt;p&gt;Extreme value statistics in high dimensions.&lt;/p&gt;

&lt;p&gt;GPT’s performance curves might be explainable through probability theory not just optimization.&lt;/p&gt;

&lt;p&gt;That would connect large language models to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Statistical physics&lt;/li&gt;
&lt;li&gt;Spin glass theory&lt;/li&gt;
&lt;li&gt;Phase transitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And that’s when deep learning stops looking like engineering…&lt;/p&gt;

&lt;p&gt;…and starts looking like high-dimensional statistical mechanics.&lt;/p&gt;

&lt;p&gt;So next time someone says:&lt;/p&gt;

&lt;p&gt;“GPT learned that.”&lt;/p&gt;

&lt;p&gt;You might ask:&lt;/p&gt;

&lt;p&gt;Did it learn?&lt;/p&gt;

&lt;p&gt;Or did it find a winning ticket hidden inside randomness?&lt;/p&gt;

&lt;p&gt;The Actual Research Paper You can find it here..&lt;br&gt;
&lt;a href="https://arxiv.org/abs/1803.03635" rel="noopener noreferrer"&gt;https://arxiv.org/abs/1803.03635&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>nlp</category>
      <category>machinelearning</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Stop Manual Segmentation: Meet NotumAi - An Open-Source AI Annotation Tool</title>
      <dc:creator>Maulik Sompura</dc:creator>
      <pubDate>Sat, 14 Feb 2026 15:41:48 +0000</pubDate>
      <link>https://dev.to/maulik_sompura_22/introducing-notumai-the-open-source-ai-powered-annotation-tool-1o9</link>
      <guid>https://dev.to/maulik_sompura_22/introducing-notumai-the-open-source-ai-powered-annotation-tool-1o9</guid>
      <description>&lt;p&gt;If you've ever built a computer vision model, you know this truth:&lt;/p&gt;

&lt;p&gt;Data annotation is the slowest, most painful part of the pipeline.&lt;/p&gt;

&lt;p&gt;You have thousands of images.&lt;br&gt;
You need high-quality segmentation masks.&lt;br&gt;
And your options usually look like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a clunky, outdated desktop tool.&lt;/li&gt;
&lt;li&gt;Upload sensitive data to a cloud service and pay monthly.&lt;/li&gt;
&lt;li&gt;Spend hours manually outlining objects.&lt;/li&gt;
&lt;li&gt;Or build your own annotation tool from scratch.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There had to be a better way.&lt;/p&gt;

&lt;p&gt;So we built one.&lt;/p&gt;

&lt;p&gt;Meet NotumAi.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5jzghd77swer37nfec2m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5jzghd77swer37nfec2m.png" alt="NotumAi Main Page" width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;💡 Why We Built NotumAi&lt;/p&gt;

&lt;p&gt;Modern segmentation models like Segment Anything Model 2 (SAM 2) from Meta can generate high-quality masks instantly.&lt;/p&gt;

&lt;p&gt;But here’s the problem:&lt;/p&gt;

&lt;p&gt;There isn’t a clean, developer-friendly, fully local tool that integrates these models into a smooth dataset creation workflow.&lt;/p&gt;

&lt;p&gt;Most solutions are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud-based&lt;/li&gt;
&lt;li&gt;Expensive&lt;/li&gt;
&lt;li&gt;Closed source&lt;/li&gt;
&lt;li&gt;Not optimized for custom pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We wanted a tool that is:&lt;/p&gt;

&lt;p&gt;🔒 100% Local — Your data never leaves your machine&lt;/p&gt;

&lt;p&gt;⚡ GPU Accelerated — Real-time AI-assisted segmentation&lt;/p&gt;

&lt;p&gt;🎨 Modern &amp;amp; Clean — Built for long annotation sessions&lt;/p&gt;

&lt;p&gt;🌍 Open Source — Built with and for the community&lt;/p&gt;

&lt;p&gt;That’s how NotumAi was born.&lt;/p&gt;

&lt;p&gt;🛠️ What is NotumAi?&lt;br&gt;
NotumAi is a professional-grade image annotation tool specifically designed for creating computer vision datasets. It combines a robust Python backend (handling the heavy lifting with PyTorch and SAM 2) with a modern Electron frontend (ensuring a responsive, beautiful interface).&lt;/p&gt;

&lt;p&gt;Key Features&lt;br&gt;
⚡ AI-Assisted Segmentation: Just click on an object, and NotumAi instantly generates a precise polygon mask using SAM 2.&lt;br&gt;
🎨 Professional UI: A glassmorphism-inspired design with a focus on usability and aesthetics.&lt;br&gt;
📂 Project Management: Organize your datasets, persist your progress, and manage multiple classes effortlessly.&lt;br&gt;
💾 Flexible Export: Export your work in standard formats like COCO, YOLO, and Pascal VOC, ready for training.&lt;br&gt;
🔒 Local &amp;amp; Secure: Your data stays with you. Perfect for sensitive or proprietary datasets.&lt;/p&gt;

&lt;p&gt;🏗️ Under the Hood&lt;br&gt;
For the developers out there, here's how we built it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frontend: Electron, HTML5, Vanilla JS, and CSS (no heavy frameworks, just pure performance).&lt;/li&gt;
&lt;li&gt;Backend: Python with FastAPI and Uvicorn.&lt;/li&gt;
&lt;li&gt;AI Engine: PyTorch running Meta's SAM 2.1 model.&lt;/li&gt;
&lt;li&gt;Communication: Seamless HTTP communication between the frontend client and the local inference server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🌟 We Need You!&lt;br&gt;
NotumAi is an Open Source project, and we are just getting started. We have a solid foundation, but to make this the ultimate annotation tool, we need the community's help.&lt;/p&gt;

&lt;p&gt;Whether you are a:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frontend Wizard: Help us polish the UI, add new interaction modes, or optimize the canvas rendering.&lt;/li&gt;
&lt;li&gt;AI Engineer: Help us optimize the inference pipeline or integrate new models.&lt;/li&gt;
&lt;li&gt;Pythonista: detailed backend logic, better file handling, or new export formats.
...your contributions are welcome!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🔗 &lt;a href="https://maulik225.github.io/NotumAi/" rel="noopener noreferrer"&gt;Get Involved&lt;/a&gt;&lt;br&gt;
Check out the repository, give us a star, and let's build the best open-source annotation tool together.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/maulik225/NotumAi" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Happy Annotating! 🖊️✨&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>javascript</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>Why Studying the Turing Machine Changed How I See AI And Why Every New AI Engineer Should Revisit It</title>
      <dc:creator>Maulik Sompura</dc:creator>
      <pubDate>Thu, 27 Nov 2025 18:29:32 +0000</pubDate>
      <link>https://dev.to/maulik_sompura_22/why-studying-the-turing-machine-changed-how-i-see-ai-and-why-every-new-ai-engineer-should-revisit-it-43hp</link>
      <guid>https://dev.to/maulik_sompura_22/why-studying-the-turing-machine-changed-how-i-see-ai-and-why-every-new-ai-engineer-should-revisit-it-43hp</guid>
      <description>&lt;h2&gt;
  
  
  How a “boring theory subject” ended up shaping my entire AI career
&lt;/h2&gt;

&lt;p&gt;When I was doing my Master’s in computer engineering at the University of Padova, there was one subject everyone whispered about:&lt;/p&gt;

&lt;p&gt;Automata Theory &amp;amp; Computation.&lt;/p&gt;

&lt;p&gt;Not because it was exciting…&lt;br&gt;
…but because most students wanted to survive it.&lt;/p&gt;

&lt;p&gt;I remember sitting in the lecture hall asking myself:&lt;/p&gt;

&lt;p&gt;“Why are we learning about an imaginary tape machine in 2024?&lt;br&gt;
I want to build AI systems, not decode puzzles from 1936.”&lt;/p&gt;

&lt;p&gt;What I didn’t know was that this single subject—the one we all underestimated—would quietly reshape the way I think about AI, computation, and even my day-to-day engineering work.&lt;/p&gt;

&lt;p&gt;Let me tell you how.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Day I Realized a Neural Network Is Not Magic
&lt;/h2&gt;

&lt;p&gt;Months later, when I was working on high-speed machine vision projects (with 1ms deadlines), something struck me:&lt;/p&gt;

&lt;p&gt;Everything I was building, every pipeline, every RL loop, every segmentation model could be reduced to:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;State -&amp;gt; Transition -&amp;gt; New State&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Exactly like the thing I thought was useless in university.&lt;/p&gt;

&lt;p&gt;Suddenly, the Turing Machine wasn’t a historical artifact.&lt;br&gt;
It was a mirror showing me the essence of modern AI.&lt;/p&gt;

&lt;p&gt;A lot of students think the Turing Machine is just a boring theoretical device.&lt;br&gt;
But in reality, it answers two of the most important questions in modern AI:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What can be computed?&lt;/li&gt;
&lt;li&gt;What cannot be computed by ANY machine — even GPT-50?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No matter how big or advanced a neural network becomes, it still cannot solve anything beyond what a Turing Machine can solve.&lt;br&gt;
This means AI is still bound by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;undecidable problems&lt;/li&gt;
&lt;li&gt;halting limitations&lt;/li&gt;
&lt;li&gt;computational complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Modern AI might look magical, but it does not break the laws established in 1936.&lt;/p&gt;

&lt;p&gt;The Classic Turing Machine We All Ignored.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx2dzebedu2lvw8l1kch4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx2dzebedu2lvw8l1kch4.png" alt="Turing Machine" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Back then, it was just this.&lt;br&gt;
A tape.&lt;br&gt;
Some states.&lt;br&gt;
A transition function.&lt;/p&gt;

&lt;p&gt;But what I didn’t understand was:&lt;/p&gt;

&lt;p&gt;This is literally the foundation of all computation including today’s AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Modern AI Model vs. Turing Machine
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1n2h5v7uj2i6jbb1yme.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1n2h5v7uj2i6jbb1yme.png" alt="AI model vs TM" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At a conceptual level:&lt;/p&gt;

&lt;p&gt;A Transformer is a sophisticated state machine&lt;br&gt;
built on a theory created in the 1930s.&lt;/p&gt;

&lt;p&gt;Mind = blown.&lt;br&gt;
Mine definitely was.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hidden Superpower of Automata Theory
&lt;/h2&gt;

&lt;p&gt;Once you get it, something changes:&lt;/p&gt;

&lt;p&gt;You stop thinking like a coder.&lt;br&gt;
You start thinking like a computational architect.&lt;/p&gt;

&lt;p&gt;Automata teaches you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how to break problems down into minimal logic&lt;/li&gt;
&lt;li&gt;how to reason in sequences (vital for NLP + RL)&lt;/li&gt;
&lt;li&gt;why some problems are inherently slow&lt;/li&gt;
&lt;li&gt;why some optimizations are impossible&lt;/li&gt;
&lt;li&gt;how systems transition, not just compute&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most importantly:&lt;/p&gt;

&lt;p&gt;It gives you a mental model so strong&lt;br&gt;
that AI becomes less of a black box and more of a predictable system.&lt;/p&gt;

&lt;p&gt;Every pipeline is a giant state machine.&lt;/p&gt;

&lt;p&gt;Even the most advanced RLHF systems.&lt;br&gt;
Even computer vision.&lt;br&gt;
Even GPT.&lt;/p&gt;

&lt;p&gt;This is what separates AI “users” from AI “engineers.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Message to New AI Learners
&lt;/h2&gt;

&lt;p&gt;If you’re entering AI today…&lt;/p&gt;

&lt;p&gt;Don’t skip the fundamentals.&lt;br&gt;
Don’t choose short-term speed over long-term mastery.&lt;br&gt;
And don’t underestimate the Turing Machine the way I did.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Because when the hype fades and trust me, it will, &lt;br&gt;
the engineers who understand theory&lt;br&gt;
are the ones who keep building the future.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>deeplearning</category>
      <category>learning</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>The Hidden Role of Probability in Large Language Models</title>
      <dc:creator>Maulik Sompura</dc:creator>
      <pubDate>Tue, 06 May 2025 16:01:25 +0000</pubDate>
      <link>https://dev.to/maulik_sompura_22/the-hidden-role-of-probability-in-large-language-models-5f9g</link>
      <guid>https://dev.to/maulik_sompura_22/the-hidden-role-of-probability-in-large-language-models-5f9g</guid>
      <description>&lt;p&gt;Have you ever wondered how this LLM works? this question sparks curiosity in me and lead me into deep research. Allow me to share some of my thoughts on this latest trend. &lt;/p&gt;

&lt;p&gt;Most people believe large language models like GPT-4, Claude "understand" language and gives the best answer based on intelligent.&lt;/p&gt;

&lt;p&gt;But the truth is every word these models generate is a mathematical gamble, A calculated probability distribution over thousands of possible next tokens.&lt;/p&gt;

&lt;p&gt;In this post we will explore beyond usual "transformers and attention" explanation and explore how probability is the real hero behind an LLM does from creativity to hallucinations.&lt;/p&gt;

&lt;h2&gt;
  
  
  What really happening inside an LLM?
&lt;/h2&gt;

&lt;p&gt;When you prompt a model with a sentence, it doesn't just look up an answer. instead, it&lt;/p&gt;

&lt;p&gt;1.Converts the prompt into tokens (like "Hello","","how","are","you")&lt;br&gt;
2.Passes those tokens through layers of &lt;a href="https://en.wikipedia.org/wiki/Neural_network_(machine_learning)" rel="noopener noreferrer"&gt;neural network&lt;/a&gt;.&lt;br&gt;
3.Produces a list of logits, Raw scores for each possible next token.&lt;br&gt;
4.Applies a &lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Softmax_function" rel="noopener noreferrer"&gt;softmax function&lt;/a&gt;&lt;/strong&gt; to convert those scores into a &lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Probability_distribution" rel="noopener noreferrer"&gt;Probability distribution&lt;/a&gt;&lt;/strong&gt;.&lt;br&gt;
5.Samples or selects the next token based on that distribution.&lt;/p&gt;

&lt;p&gt;This process repeats one token at a time.&lt;/p&gt;

&lt;p&gt;How LLMs Choose words: it's All probabilities&lt;/p&gt;

&lt;p&gt;Here's a simple example.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"The cat sat on the _____"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;it might internally assign the probabilities like this.&lt;/p&gt;

&lt;p&gt;Token--Probability&lt;br&gt;
mat -&amp;gt;           0.64&lt;br&gt;
floor -&amp;gt;         0.17&lt;br&gt;
roof -&amp;gt;          0.08&lt;br&gt;
table -&amp;gt;         0.04&lt;br&gt;
car -&amp;gt;           0.01&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxp0jvxsdfyupvainrpu0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxp0jvxsdfyupvainrpu0.png" alt="Probability distribution" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It might choose "mat" 64% of the time, but the temperature adjustment or top-k sampling, it could choose "floor" or even "roof" to keep things creative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters?&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;LLM don't know the facts they predict what's most probable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This is why they &lt;strong&gt;hallucinate&lt;/strong&gt; sometimes the most likely token just sounds right even if it is wrong.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tools like temperature, top-k and top-p controls this randomness.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Even prompt engineering is really just guiding the probability space.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The next time ChatGPT feels smarter, remember it is not reasoning like human. it's rolling a weighted die, one token at a time and the die is shaped by your input, training data and probability.&lt;/p&gt;

</description>
      <category>learning</category>
      <category>programming</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
