<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Suny Dutta</title>
    <description>The latest articles on DEV Community by Suny Dutta (@suny_dutta_48e066ae354d0e).</description>
    <link>https://dev.to/suny_dutta_48e066ae354d0e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1654345%2F1e2b88ee-e802-464b-b81d-98b03e364fe8.jpg</url>
      <title>DEV Community: Suny Dutta</title>
      <link>https://dev.to/suny_dutta_48e066ae354d0e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/suny_dutta_48e066ae354d0e"/>
    <language>en</language>
    <item>
      <title>How LLMs Work</title>
      <dc:creator>Suny Dutta</dc:creator>
      <pubDate>Wed, 01 Jul 2026 14:43:08 +0000</pubDate>
      <link>https://dev.to/suny_dutta_48e066ae354d0e/how-llms-work-4h68</link>
      <guid>https://dev.to/suny_dutta_48e066ae354d0e/how-llms-work-4h68</guid>
      <description>&lt;h2&gt;
  
  
  1. What is an LLM?
&lt;/h2&gt;

&lt;h4&gt;
  
  
  What Does LLM Stand For?
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;LLM&lt;/strong&gt; stands for &lt;strong&gt;Large Language Model&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Let's unpack that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Large&lt;/strong&gt; → trained on enormous amounts of text data (think billions of web pages, books, articles)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Language&lt;/strong&gt; → it works with human language — English, Hindi, code, you name it&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model&lt;/strong&gt; → it's a mathematical system that has "learned" patterns from all that text Think of an LLM as a very well-read assistant. It hasn't &lt;em&gt;experienced&lt;/em&gt; the world, but it has &lt;em&gt;read&lt;/em&gt; an unimaginable amount about it.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  What Problems Do LLMs Solve?
&lt;/h4&gt;

&lt;p&gt;Before LLMs, getting computers to understand human language was incredibly hard. Computers are great at structured commands (&lt;code&gt;print("Hello")&lt;/code&gt;), but terrible at vague, context-heavy human requests like &lt;em&gt;"Can you explain this in simpler terms?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;LLMs solve this by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Understanding natural, conversational language&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generating human-like text responses&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Summarizing, translating, and explaining content&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Writing code, emails, essays, and more&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Popular Examples of LLMs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;LLM&lt;/th&gt;
&lt;th&gt;Made By&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4 / ChatGPT&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;td&gt;Google DeepMind&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLaMA&lt;/td&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral&lt;/td&gt;
&lt;td&gt;Mistral AI&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Common Applications in Daily Life
&lt;/h4&gt;

&lt;p&gt;You're probably already using LLMs without realizing it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Chatbots&lt;/strong&gt; — Customer support bots that actually understand your problem&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Writing assistants&lt;/strong&gt; — Grammarly, Notion AI, Google Docs Smart Compose&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Code helpers&lt;/strong&gt; — GitHub Copilot suggests code as you type&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Search engines&lt;/strong&gt; — Google's AI Overviews summarize results for you&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Email&lt;/strong&gt; — Gmail's "Help me write" feature&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Translation&lt;/strong&gt; — DeepL and Google Translate&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2. What Happens When You Send a Message to ChatGPT?
&lt;/h2&gt;

&lt;p&gt;Ever typed a question into ChatGPT and wondered what happens in those few seconds before the response appears? Here's a simple walkthrough.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 1: You Type a Prompt
&lt;/h4&gt;

&lt;p&gt;You write something like: &lt;em&gt;"Explain black holes in simple terms."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's your &lt;strong&gt;prompt&lt;/strong&gt; — the input you're giving to the model.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 2: Your Message Gets Processed
&lt;/h4&gt;

&lt;p&gt;Your text doesn't travel to ChatGPT as-is. It gets:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Broken into pieces&lt;/strong&gt; (called tokens — more on this soon)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Converted into numbers&lt;/strong&gt; (because computers only speak numbers)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fed into the model&lt;/strong&gt; along with any previous conversation context&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Step 3: The Model Generates a Response
&lt;/h4&gt;

&lt;p&gt;The LLM doesn't "look up" an answer. It &lt;strong&gt;predicts&lt;/strong&gt; the most likely next word, then the next, then the next — until it builds a complete response.&lt;/p&gt;

&lt;p&gt;It's less like a search engine and more like a very sophisticated autocomplete.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 4: You See the Response
&lt;/h4&gt;

&lt;p&gt;The numbers get converted back into human-readable text and streamed to your screen — often word by word, which is why responses appear gradually.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why Responses Are NOT Copied From the Internet
&lt;/h4&gt;

&lt;p&gt;This is a common misconception. ChatGPT doesn't Google things in real time (unless it has a browsing tool enabled). Instead, it generates responses from &lt;strong&gt;patterns it learned during training&lt;/strong&gt;. It's more like a person who read a lot and is now answering from memory — not someone who's Googling the answer live.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is also why LLMs can sometimes be confidently wrong — a phenomenon called &lt;strong&gt;hallucination&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fp7pfdb39mlk006mpd28g.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fp7pfdb39mlk006mpd28g.jpg" alt="User → Prompt → LLM → Response" width="799" height="200"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Diagram 1: The high-level flow from your prompt to the model's response&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Why Computers Don't Understand Human Language
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Text vs Numbers
&lt;/h4&gt;

&lt;p&gt;Here's the fundamental problem: &lt;strong&gt;computers only understand numbers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Everything inside a computer — images, videos, music, and text — is stored as numbers (specifically, binary: 0s and 1s).&lt;/p&gt;

&lt;p&gt;When you look at the letter &lt;strong&gt;"A"&lt;/strong&gt;, you see a character. Your computer sees &lt;code&gt;65&lt;/code&gt; (its ASCII code). That works fine for storage, but it doesn't capture &lt;em&gt;meaning&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The word &lt;code&gt;"bank"&lt;/code&gt; could mean a riverbank or a financial institution. Storing it as a single number loses that nuance entirely.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why Computers Need Everything Converted to Numbers
&lt;/h4&gt;

&lt;p&gt;For an AI to &lt;em&gt;understand&lt;/em&gt; language, it needs numbers that capture &lt;strong&gt;meaning and context&lt;/strong&gt;, not just character codes.&lt;/p&gt;

&lt;p&gt;This is where a concept called &lt;strong&gt;embeddings&lt;/strong&gt; comes in — words and phrases get converted into long lists of numbers (called vectors) that place similar concepts close together in mathematical space.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;"king"&lt;/code&gt; and &lt;code&gt;"queen"&lt;/code&gt; end up as nearby numbers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;"apple"&lt;/code&gt; (fruit) and &lt;code&gt;"apple"&lt;/code&gt; (company) get different representations depending on context&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Introduction to Tokens
&lt;/h4&gt;

&lt;p&gt;But before any of that meaning-capture can happen, the text needs to be split into manageable pieces first.&lt;/p&gt;

&lt;p&gt;Those pieces are called &lt;strong&gt;tokens&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Tokenization
&lt;/h2&gt;

&lt;h4&gt;
  
  
  What Are Tokens?
&lt;/h4&gt;

&lt;p&gt;Tokens are the small chunks that text gets split into before being fed to an LLM.&lt;/p&gt;

&lt;p&gt;A token is roughly &lt;strong&gt;¾ of a word&lt;/strong&gt; on average — but it can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A whole word: &lt;code&gt;"cat"&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Part of a word: &lt;code&gt;"un"&lt;/code&gt;, &lt;code&gt;"believ"&lt;/code&gt;, &lt;code&gt;"able"&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A punctuation mark: &lt;code&gt;"."&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A space or special character&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Why Is Tokenization Needed?
&lt;/h4&gt;

&lt;p&gt;LLMs can't process raw text. They need text broken into consistent, manageable units with known numerical IDs. Tokenization is the bridge between human text and the numerical world the model operates in.&lt;/p&gt;

&lt;h4&gt;
  
  
  Words vs Tokens — Simple Examples
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Text&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;th&gt;Token Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;"Hello"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;["Hello"]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;"ChatGPT"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;["Chat", "G", "PT"]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;"unbelievable"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;["un", "believ", "able"]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;"I love AI"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;["I", " love", " AI"]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Notice that &lt;code&gt;"ChatGPT"&lt;/code&gt; becomes &lt;em&gt;three&lt;/em&gt; tokens — the model splits unfamiliar or compound words into known subpieces.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why This Matters
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;LLMs have a &lt;strong&gt;context window&lt;/strong&gt; — a maximum number of tokens they can process at once (like a short-term memory limit)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GPT-4 can handle ~128,000 tokens; older models had 4,000&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Knowing this helps you understand why very long conversations sometimes make the model "forget" earlier parts&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fi3elnuin19f4tsac3opk.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fi3elnuin19f4tsac3opk.jpg" alt="Text → Tokens → Transformer → Response" width="799" height="234"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Diagram 2: How your text travels through the tokenization pipeline&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fqn9nqubujbruasefpscp.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fqn9nqubujbruasefpscp.jpg" alt="Context Window Visualization" width="800" height="230"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Diagram 3: The context window is the model's working memory — once full, earlier content gets dropped&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Transformers
&lt;/h2&gt;

&lt;h4&gt;
  
  
  What Is a Transformer?
&lt;/h4&gt;

&lt;p&gt;A &lt;strong&gt;Transformer&lt;/strong&gt; is the neural network architecture that powers almost every modern LLM.&lt;/p&gt;

&lt;p&gt;It was introduced in a landmark 2017 research paper titled &lt;em&gt;"Attention Is All You Need"&lt;/em&gt; by researchers at Google. That paper changed AI forever.&lt;/p&gt;

&lt;p&gt;Before Transformers, AI models struggled with long sequences of text — they'd "forget" earlier parts of a sentence by the time they reached the end.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why It Changed AI
&lt;/h4&gt;

&lt;p&gt;Transformers introduced a mechanism called &lt;strong&gt;self-attention&lt;/strong&gt;, which lets the model look at &lt;em&gt;all&lt;/em&gt; words in a sequence simultaneously and weigh which ones matter most for understanding each word.&lt;/p&gt;

&lt;p&gt;Example: In the sentence &lt;em&gt;"The animal didn't cross the street because it was too tired"&lt;/em&gt; — what does &lt;strong&gt;"it"&lt;/strong&gt; refer to? The animal, not the street.&lt;/p&gt;

&lt;p&gt;A Transformer figures this out by attending to all the other words at once and learning that &lt;code&gt;"it"&lt;/code&gt; is more connected to &lt;code&gt;"animal"&lt;/code&gt; than to &lt;code&gt;"street"&lt;/code&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  How It Helps Understand Language
&lt;/h4&gt;

&lt;p&gt;The Transformer's key innovations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Parallel processing&lt;/strong&gt; — processes all tokens simultaneously (much faster than sequential models)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Self-attention&lt;/strong&gt; — understands relationships between words regardless of distance&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt; — works incredibly well as you throw more data and compute at it&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Why Almost Every Modern LLM Uses Transformers
&lt;/h4&gt;

&lt;p&gt;It's simple: nothing else has come close in performance.&lt;/p&gt;

&lt;p&gt;GPT (the "T" literally stands for Transformer), Claude, Gemini, LLaMA, Mistral — they all use the Transformer architecture at their core. Some use variations or improvements, but the fundamental design remains the same. GPT generates multiple next possible tokens with probabilities. User can change temperature i.e. value of the softmax to select most suitable response he likes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fodehpiw6nax5u772ycd9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fodehpiw6nax5u772ycd9.jpg" alt="Low Temperature vs High Temperature output comparison" width="800" height="260"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Diagram-4: Low Temperature vs High Temperature output comparison&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting It All Together: The Complete Picture
&lt;/h2&gt;

&lt;p&gt;Here's the full journey of your message from input to output:&lt;/p&gt;

&lt;p&gt;Every time you chat with an AI, this entire pipeline runs in seconds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F0rlml462t4u7lbgck70n.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F0rlml462t4u7lbgck70n.jpg" alt="Complete high-level LLM workflow" width="800" height="298"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Diagram 5: Complete high-level LLM workflow&lt;/em&gt;&lt;/p&gt;




</description>
      <category>genai</category>
      <category>llm</category>
      <category>chatgpt</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
