AbhiJsDev

Posted on Jul 1

Behind the AI Chatbox: What Actually Happens When You Send a Prompt?

#ai #webdev #architecture #beginners

Most of my day as a developer is spent crafting UI components, managing state, and making sure web interfaces look pixel-perfect and accessible. But recently, a new layer has been added to almost every app we build: the AI chatbox.

You type a prompt, hit Send, and a magical, human-like response streams onto your screen. But if you look under the hood, it's not magic at all. It's a beautifully orchestrated system of mathematics, probabilities, and predictions.

Let's lift the curtain on what's actually happening behind that glowing screen—without the dry, intimidating jargon.

1. Demystifying the Buzzword: What is an LLM?

Before we follow your message's journey, let's meet the engine powering everything.

LLM stands for Large Language Model.

At its core, an LLM is a deep learning model trained on enormous amounts of text, allowing it to understand and generate human language with remarkable fluency.

What Problems Do LLMs Solve?

Historically, computers struggled with human language.

Traditional software relies on explicit rules. If you misspelled a word, used slang, or phrased something differently than expected, the program often failed because it could only follow predefined instructions.

LLMs changed that completely.

Instead of relying solely on rigid rules, they learn patterns from massive amounts of text, enabling them to understand unstructured human language—the messy, flexible, and often ambiguous way people naturally communicate.

Rather than matching exact keywords, they infer the intent behind your words.

Popular Examples

Some of today's most widely used LLMs include:

OpenAI's GPT models (used in ChatGPT)
Google Gemini
Anthropic Claude
Meta Llama

Where You Already Use Them

You probably interact with LLMs every day without even realizing it.

They power:

Smart autocomplete
AI coding assistants
Language translation
Email drafting
Customer support chatbots
Document summarization
AI search experiences

2. Why Computers Are Completely Text-Blind

Before understanding how an LLM thinks, we need to understand one strange fact:

Computers have absolutely no idea what letters like A, B, or C actually mean.

Computers don't understand English.

They don't understand Hindi.

They don't understand any human language.

They only understand numbers.

If you handed the word:

Hello

to your computer's processor, it wouldn't recognize it as a greeting.

To the computer, it's simply a collection of characters with no inherent meaning.

Before an AI can process your message, every piece of text must first be transformed into numbers.

That journey begins with Tokenization.

3. The Journey of Your Prompt

Your Prompt
      │
      ▼
Tokenization
      │
      ▼
Embeddings + Positional Encoding
      │
      ▼
Transformer (Self-Attention)
      │
      ▼
Next Token Prediction
      │
      ▼
Streaming Response

Let's walk through what happens during each step.

Step 1 — Typing a Prompt

Everything begins when you type something like:

Give me a recipe for a quick snack.

After clicking Send, the application (such as ChatGPT) securely sends your prompt over the internet to an LLM running on powerful cloud servers.

Only then does the model begin processing your request.

Step 2 — The Slicer (Tokenization)

The first thing an LLM does is slice your text into manageable pieces called tokens.

Words vs Tokens

A token isn't always a complete word.

Depending on the tokenizer, it can represent:

A full word
Part of a word
Punctuation
Whitespace
Emojis
Numbers

For example:

Input

Hello, I like coding.

Tokens

["Hello", ",", " I", " like", " cod", "ing", "."]

Token IDs

[15496, 11, 314, 588, 3842, 278, 13]

Every token receives a unique numeric identifier.

These numbers—not the original words—are what the model actually processes.

Breaking language into standardized tokens allows the model to efficiently process millions of words every second.

Step 3 — The Map of Meaning (Embeddings & Positional Encoding)

Now that the text has become numbers, the AI still doesn't know what those numbers mean.

To solve this, every token is converted into an embedding—a high-dimensional mathematical representation that captures semantic meaning.

Imagine a Giant Map

Words and concepts with similar meanings naturally appear close together.

Apple  ───────── Banana

Paris ───────── Eiffel Tower

Doctor ──────── Hospital

The closer two words appear in this embedding space, the more semantically related they are.

This allows the model to recognize relationships without anyone explicitly programming them.

Figure 1: Imagine a sentence being broken into tokens and projected into a massive semantic space. Words like "Feast", "Gathering", and "Everyone" naturally cluster together, while concepts such as "Ideas" drift closer to "Creativity." Rather than memorizing definitions, the model learns relationships between concepts.

But Word Order Matters

Consider these two sentences:

The cat chased the mouse.
The mouse chased the cat.

They contain exactly the same words.

Yet they mean completely different things.

To preserve order, every token receives additional positional information called Positional Encoding.

This tells the model where each token appears in the sentence.

Without positional encoding, the model would know which words exist—but not where they occur.

Step 4 — The Context Detective (The Transformer)

This is the crown jewel of modern AI.

The Transformer architecture, introduced in 2017, completely changed natural language processing.

Older AI models processed text sequentially.

I → sat → on → the → river → bank

By the time they reached the last word, they often struggled to remember important information from the beginning.

Transformers solved this problem using Self-Attention.

Instead of reading one word after another, every token can examine every other token simultaneously.

Understanding Self-Attention

Consider the word:

Bank

Sentence 1:

I sat on the river bank.

The nearby word river tells the model that "bank" refers to land beside water.

Sentence 2:

I deposited money in the bank.

Now the nearby word money changes the meaning entirely.

The model understands that the same word has different meanings depending on surrounding context.

This contextual understanding is made possible through Self-Attention.

Old AI vs Transformer

Old AI (Sequential)

[I] → [sat] → [on] → [the] → [river] → [bank]

Reads one word at a time.


Transformer (Parallel)

[I]
[sat]
[on]
[the]
[river]
[bank]

Every token attends to every other token simultaneously.

Instead of forgetting earlier words, every token continuously considers the entire sentence.

Figure 2: Compare an older sequential model reading one word after another with a Transformer where every token connects to every other token through Self-Attention. The illustration should highlight how the meaning of the word "Bank" changes depending on whether it connects more strongly to "River" or "Money."

Step 5 — Generating a Response (It's Not Copying Google)

One of the biggest misconceptions about ChatGPT is that it searches Google and copies an answer.

That's not how it works.

Instead, once the Transformer understands your prompt, it predicts the most likely next token.

Think of it as an incredibly advanced autocomplete system.

Suppose your prompt is:

The capital of France is

Internally, the model estimates probabilities similar to:

Paris      97%

London      1%

Berlin      1%

Rome        1%

It selects one token.

Now the sentence becomes:

The capital of France is Paris

The entire process repeats.

Again.

And again.

Hundreds of times every second.

Each newly generated token becomes part of the context used to predict the next one, allowing the model to produce completely new responses instead of copying existing text.

4. The "Creativity" Dial — Temperature

Have you ever asked the exact same prompt twice and received different answers?

That's because of a parameter called Temperature.

LOW TEMPERATURE (0.2)

✔ Predictable
✔ Consistent
✔ Factual
✔ Great for coding


HIGH TEMPERATURE (0.9)

✔ Creative
✔ Diverse
✔ Imaginative
✔ Better for brainstorming

Low Temperature

The model strongly favors the highest-probability token.

Best suited for:

Coding
Mathematics
Technical documentation
Structured content

High Temperature

The model becomes more adventurous.

Instead of always choosing the highest-probability token, it occasionally selects lower-probability alternatives, producing more diverse and creative outputs.

Ideal for:

Brainstorming
Story writing
Poetry
Creative marketing
Idea generation

5. The Context Window

Think of the Context Window as the AI's short-term memory.

    Older Messages (Forgotten)

                │

                ▼

──────────────────────────────────────────────

        Current Conversation

         Your Latest Prompt

──────────────────────────────────────────────

                │

                ▼

               LLM

The model can only "remember" a limited number of tokens at once.

As conversations become longer, older messages eventually fall outside this window.

Once that happens, the model no longer has access to them, which is why it may appear to forget earlier parts of the conversation.

Who Builds What? (The Division of Labor)

Figure 3: On the left, illustrate an ML Engineer ("The Chef") training massive Transformer models using GPUs, datasets, and neural networks. On the right, show an Application / GenAI Engineer ("The Restaurateur") building chat interfaces, integrating APIs, streaming responses, storing conversation history, and creating polished AI-powered applications for users.

People often assume application developers spend their days designing neural networks and writing complex mathematical equations.

In reality, responsibilities are generally divided into two complementary roles.

Machine Learning Engineers

Think of them as the master chefs.

They build the intelligence itself by:

Designing neural network architectures
Training foundation models
Creating embeddings
Optimizing GPU workloads
Improving Transformer architectures
Fine-tuning and evaluating models

Application / GenAI Engineers

Think of them as the restaurant owners.

They take an already-trained LLM and transform it into products people use every day by building:

AI chat interfaces
API integrations
Streaming responses
Authentication
Conversation history
Prompt engineering
RAG pipelines
Vector databases
Monitoring
Production-ready user experiences

One builds the brain.

The other builds the product.

Both are equally important in delivering a great AI experience.

Wrapping Up

The next time you watch an AI generate a response, you'll know there's no magic happening behind the scenes.

Your words are first broken into tokens, transformed into mathematical representations, enriched with positional information, analyzed through the Transformer's self-attention mechanism, and then used to predict the most likely next token. That prediction becomes part of the conversation, and the process repeats hundreds of times until a complete response is generated.

What feels like a natural conversation is actually millions—or even billions—of mathematical operations happening in just a few seconds.

Understanding this pipeline doesn't mean every application developer needs to become a machine learning researcher. But knowing what happens behind the API helps us build better AI-powered applications, write more effective prompts, design more intuitive user experiences, and make smarter engineering decisions.

Modern AI isn't powered by magic—it's powered by mathematics, probability, and an extraordinary amount of engineering.

And perhaps that's even more fascinating.

DEV Community