If you are a Python beginner or a general developer, your code probably looks like this most days:
- Take some input (JSON, CSV, form data).
- Apply clear, hand‑written rules (if/else, loops, functions).
- Return a predictable output (a number, a label, a response object).
This works great when the problem is well‑defined.
Now think about the “messy” tasks around you:
- Summarise 100 support tickets into 5 clear bullet points.
- Turn a rough feature list into a friendly email for a potential client.
- Scan a few blog posts and give a quick “pros vs cons” summary of a framework.
You could fight these with regexes, templates, and a lot of if statements… but it would be painful and fragile.
Generative AI is built for exactly this kind of fuzzy, language‑heavy work.
Instead of encoding every rule, you send the model some text (a prompt) and it generates the output for you.
At its core, a generative model repeats one simple idea:
- Look at all the text it has so far (your prompt and any previous output).
- Predict what the next tiny piece of text should be.
- Add that piece to the text.
- Repeat until you tell it to stop.
That “predict the next piece, again and again” loop is how we go from a single prediction to paragraphs, summaries, and code.
LLM 101: what changed and why transformers matter
When people talk about text‑based generative AI today, they are usually talking about large language models (LLMs).
A large language model is a deep neural network trained on a huge amount of text (web pages, books, code, and more) so that it can:
- Understand your prompt in natural language.
- Predict the next pieces of text in a way that feels coherent and useful.
Modern LLMs are usually built with the transformer architecture, introduced in the paper “Attention Is All You Need”. The key idea is the attention mechanism:
- Instead of reading text strictly left‑to‑right and slowly forgetting the start,
- The model can look at many parts of the input at once and decide which words or phrases matter most for the next prediction.
You do not need the math to be productive. The mental model you need is:
“An LLM is a big text predictor, powered by transformers, that is very good at reading and writing language.”
Under the hood, the model generates text in a loop:
- Take the existing text.
- Predict the next tiny piece.
- Attach it.
- Repeat.
Tokens and context windows (the two numbers you should care about)
Two concepts matter a lot when you start calling these models from Python: tokens and context window.
Tokens
Models do not work directly on raw strings.
They break text into tokens: small pieces that roughly correspond to words or parts of words.
- Your prompt becomes a sequence of tokens.
- The model predicts one token at a time.
- The output is also a sequence of tokens, which is then turned back into text.
Simple intuition:
A token is a small, context‑aware chunk of the sentence.
Most API limits and pricing are defined in tokens, not characters.
Context window
The context window is how many tokens the model can “see” at once. That total includes:
- Your current prompt.
- Any previous messages you send as history.
- The model’s latest output.
If the total number of tokens goes beyond the context window, the oldest tokens fall out of view.
This is why long chats sometimes “forget” earlier details: those tokens are no longer inside the window.
As a developer, this means:
- Very long prompts and very long responses both consume the same context budget.
- You often need to summarise or trim older content.
- If you ignore the context window, you get errors or confused models.
Simple LLM calls: limits and why agents matter
If you send one prompt to a model and print the answer, you already have a simple generative AI app. That pattern is enough for tasks like:
- “Summarise this document.”
- “Rewrite this paragraph to sound more professional.”
- “Explain this error message in simple language.”
Useful, but limited:
- The model cannot call your systems by itself (APIs, databases, other Python functions).
- It does not remember anything between separate requests unless you build that.
- It does not plan multi‑step workflows on its own.
- It can “hallucinate” and invent facts if it does not have access to real data.
To make LLMs useful in real products, you wrap them in agents.
An agent is still powered by an LLM, but it adds three key pieces around it:
- Tools – functions and services it can use (web search, databases, calculators, CRMs, internal APIs).
- Memory – storage for past conversations, user preferences, leads, documents.
- Orchestration – the workflow logic: which tool to call, when to ask a clarification, when to look up context, when to stop.
You can think of it like this:
- Plain LLM:
prompt → model → one answer. - Agent:
goal → model + tools + memory + workflow → series of actions → reliable result.




Top comments (0)