<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Govind kumar</title>
    <description>The latest articles on DEV Community by Govind kumar (@expert-jha).</description>
    <link>https://dev.to/expert-jha</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3812357%2F18296814-8eb8-4c8b-9686-ddf8961affb1.png</url>
      <title>DEV Community: Govind kumar</title>
      <link>https://dev.to/expert-jha</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/expert-jha"/>
    <language>en</language>
    <item>
      <title>From Words to Intelligence: How LLMs Actually Work (Without the Math Headache)</title>
      <dc:creator>Govind kumar</dc:creator>
      <pubDate>Sun, 08 Mar 2026 03:15:14 +0000</pubDate>
      <link>https://dev.to/expert-jha/from-words-to-intelligence-how-llms-actually-work-without-the-math-headache-156</link>
      <guid>https://dev.to/expert-jha/from-words-to-intelligence-how-llms-actually-work-without-the-math-headache-156</guid>
      <description>&lt;p&gt;Large Language Models often feel magical. You type a sentence, and suddenly an AI writes code, explains physics, or drafts emails.&lt;/p&gt;

&lt;p&gt;But under the hood, the system is doing something surprisingly structured.&lt;/p&gt;

&lt;p&gt;Let’s walk through the &lt;strong&gt;core building blocks of modern AI models&lt;/strong&gt; in a simple and fun way.&lt;/p&gt;

&lt;h1&gt;
  
  
  1. Tokenization — Breaking Language into Pieces
&lt;/h1&gt;

&lt;p&gt;Before an AI understands text, it must &lt;strong&gt;split the sentence into smaller units called tokens&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Example sentence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"I love artificial intelligence"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tokenized form might look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;["I", "love", "artificial", "intelligence"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sometimes tokens are even smaller:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;["art", "ificial", "intelli", "gence"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This depends on the model’s &lt;strong&gt;vocabulary size&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vocabulary Size
&lt;/h3&gt;

&lt;p&gt;This is the number of tokens a model knows.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Approx Vocabulary&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Small models&lt;/td&gt;
&lt;td&gt;~30k tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Modern LLMs&lt;/td&gt;
&lt;td&gt;100k+ tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Think of it like a &lt;strong&gt;dictionary the AI uses to read text&lt;/strong&gt;.&lt;/p&gt;




&lt;h1&gt;
  
  
  2. Vectors — Turning Words into Numbers
&lt;/h1&gt;

&lt;p&gt;Computers don’t understand words.&lt;br&gt;
They understand &lt;strong&gt;numbers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So each token becomes a &lt;strong&gt;vector&lt;/strong&gt; (a list of numbers).&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"cat" → [0.21, -0.34, 0.77, 0.11]
"dog" → [0.19, -0.30, 0.74, 0.10]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice something interesting?&lt;/p&gt;

&lt;p&gt;The vectors for &lt;strong&gt;cat and dog look similar&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That’s because they have &lt;strong&gt;similar meaning&lt;/strong&gt;.&lt;/p&gt;




&lt;h1&gt;
  
  
  3. Embeddings — Capturing Meaning
&lt;/h1&gt;

&lt;p&gt;These vectors are called &lt;strong&gt;embeddings&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Embeddings represent &lt;strong&gt;semantic meaning&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Example relationships:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Word&lt;/th&gt;
&lt;th&gt;Meaning Relation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;King&lt;/td&gt;
&lt;td&gt;Male ruler&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queen&lt;/td&gt;
&lt;td&gt;Female ruler&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Paris&lt;/td&gt;
&lt;td&gt;Capital of France&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Embeddings capture these relationships mathematically.&lt;/p&gt;

&lt;p&gt;Famous example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;King - Man + Woman ≈ Queen
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pretty cool, right?&lt;/p&gt;




&lt;h1&gt;
  
  
  4. Position Encoding — Remembering Word Order
&lt;/h1&gt;

&lt;p&gt;Words alone are not enough.&lt;/p&gt;

&lt;p&gt;Compare:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Dog bites man
Man bites dog
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same words, very different meaning.&lt;/p&gt;

&lt;p&gt;Transformers solve this using &lt;strong&gt;positional encoding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Each word receives a &lt;strong&gt;position signal&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Dog (position 1)
bites (position 2)
man (position 3)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows the model to understand &lt;strong&gt;sequence and grammar&lt;/strong&gt;.&lt;/p&gt;




&lt;h1&gt;
  
  
  5. Encoder vs Decoder
&lt;/h1&gt;

&lt;p&gt;Most LLM architectures use two types of networks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Encoder
&lt;/h3&gt;

&lt;p&gt;Reads and understands input.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Translate English → French
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The encoder converts the sentence into &lt;strong&gt;meaning vectors&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Decoder
&lt;/h3&gt;

&lt;p&gt;Generates new text &lt;strong&gt;one token at a time&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The capital of France is → Paris
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Many modern chat models are primarily &lt;strong&gt;decoder-based&lt;/strong&gt;.&lt;/p&gt;




&lt;h1&gt;
  
  
  6. Self-Attention — The Secret Sauce
&lt;/h1&gt;

&lt;p&gt;Self-attention allows each word to &lt;strong&gt;look at every other word&lt;/strong&gt; in the sentence.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"The animal didn't cross the street because it was tired"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The word &lt;strong&gt;"it"&lt;/strong&gt; must figure out what it refers to.&lt;/p&gt;

&lt;p&gt;Self-attention helps the model connect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;it → animal
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of reading text strictly left-to-right, the model looks at &lt;strong&gt;relationships between words&lt;/strong&gt;.&lt;/p&gt;




&lt;h1&gt;
  
  
  7. Softmax — Turning Scores into Probabilities
&lt;/h1&gt;

&lt;p&gt;When predicting the next word, the model produces &lt;strong&gt;scores&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Paris: 8.1
London: 3.2
Pizza: -1.4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Softmax converts these scores into probabilities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Paris → 0.92
London → 0.07
Pizza → 0.01
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model then chooses the most likely token.&lt;/p&gt;




&lt;h1&gt;
  
  
  8. Multi-Head Attention — Multiple Perspectives
&lt;/h1&gt;

&lt;p&gt;Instead of one attention mechanism, transformers use &lt;strong&gt;multiple attention heads&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Think of it like &lt;strong&gt;a team of analysts&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;One head focuses on grammar&lt;br&gt;
One focuses on subject relationships&lt;br&gt;
One focuses on context&lt;/p&gt;

&lt;p&gt;Together they build a deeper understanding of the sentence.&lt;/p&gt;


&lt;h1&gt;
  
  
  9. Temperature — Controlling Creativity
&lt;/h1&gt;

&lt;p&gt;Temperature controls &lt;strong&gt;how adventurous the AI becomes&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Temperature&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0.1&lt;/td&gt;
&lt;td&gt;Very predictable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;Balanced&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;Creative&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.0&lt;/td&gt;
&lt;td&gt;Chaotic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Example prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The best food in Italy is...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Low temperature → “pizza”&lt;br&gt;
High temperature → “truffle pasta with wild mushrooms”.&lt;/p&gt;


&lt;h1&gt;
  
  
  10. Knowledge Cutoff — When the Model Stopped Learning
&lt;/h1&gt;

&lt;p&gt;LLMs are trained on huge datasets, but &lt;strong&gt;training eventually stops&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That date is called the &lt;strong&gt;knowledge cutoff&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If the cutoff is 2024, the model might not know events from 2025 unless external tools provide updated data.&lt;/p&gt;


&lt;h1&gt;
  
  
  Final Picture: How Everything Works Together
&lt;/h1&gt;

&lt;p&gt;When you type a prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Explain black holes simply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI pipeline roughly looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tokenization&lt;/strong&gt; splits the text&lt;/li&gt;
&lt;li&gt;Tokens become &lt;strong&gt;embeddings&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Positional encoding&lt;/strong&gt; adds order&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-attention&lt;/strong&gt; analyzes relationships&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-head attention&lt;/strong&gt; extracts deeper context&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;decoder predicts the next token&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Softmax selects the most probable word&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Temperature controls creativity&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Repeat this thousands of times per response.&lt;/p&gt;

&lt;p&gt;And that’s how a machine writes paragraphs.&lt;/p&gt;




&lt;h1&gt;
  
  
  Closing Thought
&lt;/h1&gt;

&lt;p&gt;Large Language Models are not magic.&lt;/p&gt;

&lt;p&gt;They are &lt;strong&gt;layers of clever mathematics turning language into patterns&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But when these pieces work together — tokens, vectors, attention, and probabilities — something remarkable happens:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Machines start speaking our language.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
