Vaibhav Doddihal

Posted on Jun 18 • Originally published at blocksimplified.com

What is Generative AI? A Practical Introduction

#ai #generativeai #llm #aifluency

What is Generative AI? A Practical Introduction

Originally published on BlockSimplified — 13 min read

Welcome to the AI Fluency Curriculum, a series I'm building to help engineers and technical folks get genuinely comfortable with applied AI. Not the hype. The actual mechanics.
This is the first post in Module 1: Foundations of Generative AI.

I remember the first time I got ChatGPT to write a bash script for me. I'd described what I needed in plain English, and out came working code. My first reaction: "How does it know this?" My second reaction: "Wait, that variable name is wrong." That tension between impressive capability and subtle wrongness is what we're going to unpack.

What You'll Learn

By the end of this post, you'll be able to:

Explain GenAI in simple terms (to your manager, your team, your confused relatives)
Differentiate it from traditional software and predictive AI
Identify real capabilities and limitations, not just the marketing version

We'll cover three depth levels: Beginner, Intermediate, and Advanced. Skip around based on what you need.

Beginner: GenAI as "Autocomplete on Steroids"

Let me start with an analogy that helped me get it.

The Restaurant Analogy

Imagine a restaurant kitchen:

Traditional Software is like a recipe book. You give it inputs (ingredients), it follows exact steps, you get a predictable output. Same input = same dish. Every. Single. Time. A calculator works this way. Your banking app works this way.

Predictive AI (the old kind) is like a sommelier who looks at your order and predicts: "Based on customers who ordered the lamb, you'll probably want the Malbec." It classifies, predicts, and recommends, but it doesn't create anything new.

Generative AI is like a chef who's eaten at thousands of restaurants, read millions of recipes, and watched countless cooking shows. Give them a prompt ("I want something spicy, Italian-inspired, but with Thai flavors") and they'll generate something entirely new. Sometimes brilliant. Sometimes... experimental.

The key insight: GenAI doesn't look up answers. It generates them by predicting what tokens (words, code, pixels) should come next, based on patterns learned from massive training data.

Your First API Call

Let's stop talking and actually run something. Here's a minimal Python example using OpenAI's API:

# genai_hello.py
# Your first Generative AI API call
# Requires: pip install openai

from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY env variable

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Explain Generative AI in one sentence, like I'm a software engineer."}
    ]
)

print(response.choices[0].message.content)

What's happening:

You send a prompt (the "user" message)
The model processes it through billions of parameters
It generates a response, token by token
You get back text that didn't exist before your request

Run this a few times. Notice how the response varies slightly each time? That's not a bug. It's the core mechanic.

What GenAI Can (and Can't) Do

Genuine capabilities I use daily:

Drafting documentation, emails, technical specs
Explaining unfamiliar code or concepts
Generating boilerplate code (with review!)
Brainstorming approaches to problems
Summarizing long documents

Real limitations that have bitten me:

Hallucinations: confidently wrong answers that sound perfect
No actual reasoning: it's pattern matching, not thinking
Knowledge cutoffs: models don't know recent events
Inconsistency: same prompt can yield different quality outputs
Context limits: can't read your entire codebase (yet)

Intermediate: The Paradigm Shift from Deterministic to Probabilistic

Here's where things get interesting. If you've been writing software for a while, you've internalized a core assumption: same input → same output.

GenAI breaks that contract.

Why It's Probabilistic

Under the hood, LLMs work by:

Tokenizing your input (breaking text into chunks)
Processing tokens through neural network layers
For each position, calculating probability distributions over ALL possible next tokens
Sampling from that distribution to pick the actual next token
Repeating until done

The output isn't retrieved from a database. It's constructed on the fly, one token at a time.

The Temperature Parameter: Your Control Dial

Here's the single most important parameter you should understand: temperature.

# temperature_demo.py
# See how temperature affects output variability

from openai import OpenAI

client = OpenAI()

prompt = "Write a one-sentence description of what Python is."

for temp in [0, 0.5, 1.0, 1.5]:
    print(f"\n--- Temperature: {temp} ---")
    for i in range(3):  # Run 3 times to see variance
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            temperature=temp,
            messages=[{"role": "user", "content": prompt}]
        )
        print(f"  {i+1}: {response.choices[0].message.content}")

What you'll observe:

Temperature 0: Nearly identical outputs every run. The model always picks the highest-probability token.
Temperature 0.5: Slight variations, still coherent
Temperature 1.0: More creative, occasional surprises
Temperature 1.5: Wild variations, sometimes off the rails

Think of temperature like the spice level at a restaurant. Zero is the safe, house recipe every time. Higher values let the chef improvise, sometimes inspired, sometimes questionable.

When to Trust the Output

This probabilistic nature means you can't treat GenAI outputs like database queries. Here's my mental model:

Task Type	Trust Level	Verification Approach
Brainstorming	High	None needed
Drafting	Medium	Human review
Code generation	Low-Medium	Tests + code review
Factual claims	Very Low	Always verify sources
Critical decisions	None	Don't delegate these

The chef analogy again: You'd happily let them experiment with appetizer specials, but you'd want to taste-test before serving to customers, and you'd never let them guess at food allergy information.

Advanced: Transformers and Emergent Abilities

Alright, let's pop the hood. If you're comfortable with software architecture, this section explains how these systems actually work.

The Transformer Architecture (The Short Version)

Before 2017, sequence models like RNNs processed text token-by-token, like reading a book one word at a time while trying to remember everything. Slow, and information from early in the sequence got fuzzy.

The transformer architecture (from the "Attention Is All You Need" paper) introduced a radical idea: process all tokens in parallel using something called "attention."

Attention in plain terms: Instead of reading sequentially, the model can directly look at relationships between ANY two tokens in the input. When processing "The cat sat on the mat because it was tired," attention lets the model directly connect "it" to "cat" rather than hoping that connection survives through sequential processing.

Why this matters for you:

Parallel processing → trainable on massive datasets
Attention patterns → models can handle long-range dependencies
Stacking transformer layers → each layer learns more abstract patterns

The models you're using (GPT-4, Claude, Gemini) are just really big stacks of transformer blocks, trained on really big datasets, with clever fine-tuning.

Emergent Abilities: The Weird Part

Here's something that still surprises me: abilities that "emerge" at scale without being explicitly trained.

When you train small models, they get gradually better at their training task. But at certain scale thresholds, capabilities appear that weren't in the training objective:

Chain-of-thought reasoning
Following complex multi-step instructions
In-context learning (learning from examples in the prompt)
Code debugging and generation

Nobody trained GPT-4 on "how to debug Python code." It emerged from training on enough text that contained code discussions, Stack Overflow answers, and technical documentation.

This is both exciting and concerning. Exciting because we get useful capabilities "for free." Concerning because we don't fully understand when or why they emerge, or when they might fail.

Stress Test: Long-Context Degradation

Let's run an experiment that exposes real limitations. Models advertise large context windows (100K+ tokens), but performance isn't uniform across that window.

# long_context_stress_test.py
# Test the "Lost in the Middle" phenomenon

from openai import OpenAI

client = OpenAI()

def test_retrieval_position(needle_position: str):
    """
    Hide a fact in different positions within a long context
    and test if the model can retrieve it.
    """

    # The "needle" - a specific fact to retrieve
    needle = "The secret project code name is AURORA-7."

    # "Haystack" - filler paragraphs about various topics
    filler = """
    Cloud computing has transformed how organizations deploy applications. 
    The shift from on-premise servers to managed cloud services has enabled 
    rapid scaling and reduced operational overhead. Major providers include 
    AWS, Azure, and Google Cloud Platform, each with distinct strengths.
    """ * 20  # Repeat to create bulk

    # Construct the context based on position
    if needle_position == "start":
        context = needle + "\n\n" + filler
    elif needle_position == "middle":
        half = len(filler) // 2
        context = filler[:half] + "\n\n" + needle + "\n\n" + filler[half:]
    else:  # end
        context = filler + "\n\n" + needle

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        messages=[
            {"role": "user", "content": f"""
Here is a document:

{context}

Question: What is the secret project code name?
Answer with just the code name, nothing else.
"""}
        ]
    )

    return response.choices[0].message.content

# Test all positions
for pos in ["start", "middle", "end"]:
    result = test_retrieval_position(pos)
    print(f"Needle at {pos}: Retrieved '{result}'")

What you'll likely see: Models perform better when the key information is at the start or end of the context, and worse when it's buried in the middle. This is the Lost in the Middle phenomenon, and it has real implications for how you structure prompts and RAG systems.

The Honest Summary

Generative AI is genuinely transformative technology, and it's also genuinely overhyped.

What's real:

These models can generate useful text, code, and creative content
They can adapt to new tasks via prompting without retraining
They're getting better fast: what fails today might work next quarter

What's marketing:

"It understands": No, it predicts based on patterns
"It reasons": No, it mimics reasoning patterns from training data
"It will replace X": It changes how X is done, rarely replaces it entirely

The engineers who thrive with GenAI are the ones who understand both: who leverage the real capabilities while building guardrails around the limitations.

Next up in this series: Prompt Engineering foundations, covering how to actually communicate effectively with these systems.

Quick Reference

Concept	What It Means
GenAI	AI that creates new content by predicting what comes next
Temperature	Controls randomness (0 = deterministic, higher = more random)
Token	Basic unit of text processing (~4 chars in English)
Transformer	Architecture that processes all tokens in parallel via attention
Emergence	Capabilities that appear at scale without explicit training
Hallucination	Confident generation of plausible but false information

FAQs

Q: Is GenAI just a more sophisticated search engine?

No, and this confusion causes a lot of problems. Search engines retrieve existing information. GenAI generates new text that may or may not reflect real information. When you ask ChatGPT a question, it's not looking anything up. It's constructing an answer based on patterns. That's why it can confidently state things that don't exist. Treat it like a creative collaborator who's well-read but occasionally makes stuff up, not like a factual reference.

Q: Should I be worried about my job as a developer?

I've been using GenAI heavily for about a year now. My honest take: it changes what I spend time on, not whether I'm needed. I write less boilerplate, but I spend more time on architecture, review, and verification. The developers who struggle are those who either refuse to use these tools OR blindly trust their output. The sweet spot is treating GenAI like a very fast junior developer who needs supervision.

Q: How do I know which model to use?

Start with the cheapest one that works for your task. For most things, smaller models like GPT-4o-mini or Claude Haiku are fine. Graduate to larger models (GPT-4, Claude Opus) when you hit quality limits. I use Haiku for simple tasks, Sonnet for most coding, and Opus for complex reasoning. Your token bill will thank you.

Continue Learning

Enjoyed this article? Put your knowledge to the test:

Take the interactive quiz on BlockSimplified to see how much you retained
Explore 11 linked Learning Blocks, curated resources for deeper understanding
Follow for more insights on AI, development, and tech

DEV Community

What is Generative AI? A Practical Introduction

What is Generative AI? A Practical Introduction

What You'll Learn

Beginner: GenAI as "Autocomplete on Steroids"

The Restaurant Analogy

Your First API Call

What GenAI Can (and Can't) Do

Intermediate: The Paradigm Shift from Deterministic to Probabilistic

Why It's Probabilistic

The Temperature Parameter: Your Control Dial

When to Trust the Output

Advanced: Transformers and Emergent Abilities

The Transformer Architecture (The Short Version)

Emergent Abilities: The Weird Part

Stress Test: Long-Context Degradation

The Honest Summary

Quick Reference

FAQs

Continue Learning

Top comments (0)