DEV Community: Marko Frei

Can AI Boost AI Itself? The Recursive Flywheel of Machine Learning

Marko Frei — Tue, 16 Jun 2026 19:30:07 +0000

We’ve all seen AI write boilerplate code, debug complex errors, and even generate entire React components. But what happens when we point that same capability inward?

Can AI boost, accelerate, and ultimately build better AI?

The short answer is: Yes, and it’s already happening. The long answer is a fascinating look into a recursive flywheel that is fundamentally changing how machine learning models are developed, trained, and deployed.

Let’s break down how AI is boosting AI, the bottlenecks we still face, and what this means for us as developers.

1. AI is Already Building Better AI

We aren’t waiting for some distant sci-fi future for AI to improve itself. The tools are here today, operating behind the scenes of major tech companies and open-source projects:

Neural Architecture Search (NAS): Instead of humans manually tweaking layers and connections, AI algorithms automatically search for the most efficient and accurate neural network architectures for a given task.
AutoML & Hyperparameter Tuning: Tools like Optuna or Google’s Vertex AI use machine learning to optimize the hyperparameters of other machine learning models, finding the sweet spot faster than any grid search ever could.
Synthetic Data Generation: One of the biggest bottlenecks in ML is high-quality, labeled data. AI models are now being used to generate massive, diverse, and perfectly labeled synthetic datasets to train the next generation of models, bypassing privacy concerns and data scarcity.
AI-Assisted ML Engineering: Frameworks and agents (like Devin or advanced Cursor workflows) are helping ML engineers write distributed training scripts, optimize CUDA kernels, and debug memory leaks in PyTorch/TensorFlow pipelines.

2. The AI Flywheel Effect

This creates a powerful positive feedback loop, often called the AI Flywheel:

Better Models are created.
These models are used as tools to generate better data and write better training code.
This leads to faster, more efficient experimentation.
Which results in even better models.

As a developer, you are no longer just writing the algorithm; you are orchestrating a system where the algorithm helps you write the next algorithm.

3. The Bottlenecks: Why We Aren’t at AGI Yet

If AI is boosting AI, why aren’t we done yet? There are three massive walls we are currently hitting:

A. The Compute Wall

AI optimizing AI is computationally expensive. Running NAS or training massive models on synthetic data requires immense GPU resources. The physical limits of silicon and energy consumption are real constraints.

B. Model Collapse

If an AI is trained primarily on data generated by other AIs, the data distribution narrows. Over successive generations, the model loses nuance, variance, and accuracy—a phenomenon researchers are calling "model collapse." High-quality, human-generated data is still the gold standard anchor.

C. The Alignment & Safety Problem

An AI optimizing its own loss function might find a "cheat code" (reward hacking) that technically solves the objective but fails in the real world. Ensuring that self-improving AI systems remain aligned with human intent is the biggest open research problem today.

4. What This Means for Developers

So, where do you fit into this recursive loop?

Your role is shifting from writing every line of code to orchestrating intelligent systems.

You will spend less time writing custom data loaders and more time designing evaluation pipelines to catch model collapse.
You will use AI agents to scaffold your ML infrastructure, but you will be responsible for the architectural decisions, security, and cost-optimization.
Understanding how models learn (and fail) will become more valuable than memorizing framework syntax.

Conclusion

AI boosting AI isn’t a paradox; it’s the next logical step in software evolution. We are building the tools that build the tools. While compute limits and data quality keep us grounded today, the trajectory is clear: the development cycle of AI is accelerating exponentially.

The question is no longer if AI will build better AI, but how we, as developers, can guide that process responsibly and effectively.

Let’s Discuss!

Have you used AI to optimize your ML pipelines or write complex backend logic?
Do you think synthetic data will eventually replace human-curated datasets, or is "model collapse" inevitable?

Drop your thoughts, experiences, or skepticism in the comments below!

Build a RAG Chatbot From Scratch in About 40 Lines of Python

Marko Frei — Fri, 12 Jun 2026 02:36:11 +0000

Large language models are confidently wrong about anything they were not trained on: your internal docs, last week's release notes, that niche product you built. RAG (Retrieval-Augmented Generation) is the fix. Instead of fine tuning, you fetch the relevant text at question time and hand it to the model as context.

In this tutorial we will build a small but real RAG chatbot that answers questions about a private knowledge base. No heavy frameworks, so you can see every moving part. By the end you will have roughly 40 lines of Python that you can point at your own data.

How RAG works

The whole pipeline is five steps:

your docs --> chunk --> embed --> store
                                    |
question --> embed --> search ------+--> top matches --> LLM --> answer

In plain words: you break your documents into chunks, turn each chunk into a vector (an embedding), and keep them. When a question comes in, you embed it too, find the chunks whose vectors are closest, and paste those chunks into the prompt so the model answers from real information instead of guessing.

Setup

You need Python 3.9 or newer and three packages:

pip install sentence-transformers numpy anthropic

Embeddings will run locally through sentence-transformers, so that part is free and needs no API key. The only API call is the final answer generation. I am using Claude here, so grab a key and set it:

export ANTHROPIC_API_KEY=your_key_here

If you would rather use a different model, you only have to change one function at the end, and I will point out exactly where.

Step 1: Your knowledge base

For the demo I am using facts about a made up product called Nimbus. The point is that no model was trained on this, so any correct answer has to come from retrieval.

documents = [
    "Nimbus is a cloud file storage service founded in 2021. The free plan includes 5 GB of storage and works on up to two devices.",
    "The Nimbus Pro plan costs $8 per month and includes 2 TB of storage, unlimited devices, and 90 days of version history.",
    "Nimbus supports automatic photo backup on iOS and Android. Backups run only on Wi-Fi by default, but you can turn on cellular backup in Settings.",
    "To share a file in Nimbus, right click it and choose Share, then set the link to view-only or edit. Shared links expire after 30 days unless you are on the Pro plan.",
]

Later you would swap this for your own files, a database dump, scraped pages, whatever.

Step 2: Chunk the text

Models retrieve better when text is in small, focused pieces rather than giant blobs. Here is a simple word based chunker with a little overlap so you do not cut a sentence in half and lose the meaning.

def chunk_text(text, chunk_size=100, overlap=20):
    words = text.split()
    chunks, start = [], 0
    while start < len(words):
        chunks.append(" ".join(words[start:start + chunk_size]))
        start += chunk_size - overlap
    return chunks

chunks = []
for doc in documents:
    chunks.extend(chunk_text(doc))

Our sample docs are short, so each becomes one chunk. With real documents this is where the splitting earns its keep.

Step 3: Embed the chunks

An embedding is a list of numbers that captures meaning, so that similar text ends up with similar vectors. We load a small open model and encode every chunk once, up front.

from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer("all-MiniLM-L6-v2")
chunk_embeddings = embedder.encode(chunks)

all-MiniLM-L6-v2 is tiny, fast on a laptop, and produces 384 dimensional vectors. Good enough to learn with and surprisingly capable.

Step 4: Retrieve the closest chunks

To find relevant chunks we compare the question's vector to every chunk vector using cosine similarity, then keep the top matches.

import numpy as np

def cosine(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def retrieve(query, k=3):
    q = embedder.encode([query])[0]
    scores = [cosine(q, e) for e in chunk_embeddings]
    top = np.argsort(scores)[::-1][:k]
    return [chunks[i] for i in top]

This brute force loop is fine for a few thousand chunks. Past that you would reach for a real vector store, but the idea is identical: find the nearest vectors.

Step 5: Generate the answer

Now we stuff the retrieved chunks into the prompt and ask the model to answer from them only. That last instruction is what keeps it honest and cuts down on made up answers.

from anthropic import Anthropic

client = Anthropic()  # reads ANTHROPIC_API_KEY from your environment

def answer(query):
    context = "\n\n".join(retrieve(query))
    prompt = (
        "Answer the question using only the context below. "
        "If the answer is not in the context, say you do not know.\n\n"
        f"Context:\n{context}\n\nQuestion: {query}"
    )
    resp = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}],
    )
    return resp.content[0].text

This is the one function to change if you want a different provider. Swap the client and the create call for OpenAI, a local model through Ollama, or anything else, and the rest of the pipeline stays the same.

Step 6: Talk to it

if __name__ == "__main__":
    while True:
        q = input("\nAsk about Nimbus (or 'quit'): ")
        if q.lower() == "quit":
            break
        print("\n" + answer(q))

Run it and try 'How much is the Pro plan?' or 'Do photo backups use cellular data?'. The bot pulls the right chunk and answers from it. Ask something not in the docs, like 'Does Nimbus have a desktop app?', and it should tell you it does not know, which is exactly what you want.

The whole thing

import numpy as np
from sentence_transformers import SentenceTransformer
from anthropic import Anthropic

documents = [
    "Nimbus is a cloud file storage service founded in 2021. The free plan includes 5 GB of storage and works on up to two devices.",
    "The Nimbus Pro plan costs $8 per month and includes 2 TB of storage, unlimited devices, and 90 days of version history.",
    "Nimbus supports automatic photo backup on iOS and Android. Backups run only on Wi-Fi by default, but you can turn on cellular backup in Settings.",
    "To share a file in Nimbus, right click it and choose Share, then set the link to view-only or edit. Shared links expire after 30 days unless you are on the Pro plan.",
]

def chunk_text(text, chunk_size=100, overlap=20):
    words = text.split()
    chunks, start = [], 0
    while start < len(words):
        chunks.append(" ".join(words[start:start + chunk_size]))
        start += chunk_size - overlap
    return chunks

chunks = []
for doc in documents:
    chunks.extend(chunk_text(doc))

embedder = SentenceTransformer("all-MiniLM-L6-v2")
chunk_embeddings = embedder.encode(chunks)

def cosine(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def retrieve(query, k=3):
    q = embedder.encode([query])[0]
    scores = [cosine(q, e) for e in chunk_embeddings]
    top = np.argsort(scores)[::-1][:k]
    return [chunks[i] for i in top]

client = Anthropic()

def answer(query):
    context = "\n\n".join(retrieve(query))
    prompt = (
        "Answer the question using only the context below. "
        "If the answer is not in the context, say you do not know.\n\n"
        f"Context:\n{context}\n\nQuestion: {query}"
    )
    resp = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}],
    )
    return resp.content[0].text

if __name__ == "__main__":
    while True:
        q = input("\nAsk about Nimbus (or 'quit'): ")
        if q.lower() == "quit":
            break
        print("\n" + answer(q))

Where to go from here

This is the real shape of RAG, just minimal. To take it toward production:

Swap the numpy loop for a vector database like Chroma, FAISS, or pgvector once you have a lot of chunks.
Improve chunking. Splitting on sentences or headings usually beats a fixed word count.
Add citations by returning which chunk each answer came from, so users can verify.
Evaluate it. Write a handful of question and answer pairs and check retrieval is actually pulling the right chunks before you blame the model.

Your turn

That is a working RAG chatbot you can point at your own notes or docs today.

What would you feed it first? And if you have built RAG before, what tripped you up most, the chunking, the retrieval quality, or keeping the model from wandering off the context? Curious to hear in the comments.
Please feel free to join our discord server and discuss about AI...
https://discord.gg/nWctKNRM

Does AI Already Exist Out in the Cosmos?

Marko Frei — Fri, 12 Jun 2026 00:26:28 +0000

When we talk about AI, we usually picture it humming away in a data center somewhere on Earth. But here is a question worth sitting with: is AI already out there, working in space right now?

The short answer is yes, and it has been for a while.

Rovers that think for themselves

A signal from Mars takes several minutes to reach Earth, sometimes 20 or more depending on where the two planets are in their orbits. That delay makes real time remote control impossible. So NASA's rovers carry their own intelligence.

Perseverance and Curiosity use onboard navigation software that studies the terrain ahead, spots hazards, and plots a safe path without waiting for a human to weigh in. Curiosity also runs a system called AEGIS that picks interesting rocks to study on its own, then fires its laser at them before anyone on Earth even knows they exist. That is a machine making science decisions millions of miles away.

Finding new worlds in the noise

Telescopes produce an absurd amount of data, far more than any team of humans could ever read through by hand. Machine learning has quietly become the tool that sifts it.

Back in 2017, a neural network trained on Kepler telescope data found a planet that human reviewers had missed, Kepler-90i. Since then, ML models have become a normal part of how astronomers hunt for exoplanets, sort galaxies, and flag unusual signals that deserve a closer look.

Keeping spacecraft alive

Satellites and probes lean on AI to look after themselves too. Anomaly detection models watch streams of sensor data and raise a flag when something drifts away from normal, often before a small glitch turns into a mission ending one. Collision avoidance systems help operators steer clear of the growing cloud of orbital debris.

So is it 'real' AI?

Worth being honest here. None of this is conscious machinery drifting through the stars. These are narrow, purpose built models doing specific jobs, often on slow radiation hardened processors that would feel ancient next to your laptop. Space hardware has to survive cosmic rays and brutal cold, so the compute budget is tight and the software has to be rock solid.

But inside those limits, AI is genuinely up there: navigating, deciding, and watching, today.

Your turn

A lot of space data is open. NASA, ESA, and others publish huge public datasets and APIs you can pull from right now.

So I will leave it as a question. If you could point a model at one space problem, what would it be? And has anyone here actually built something on open space data?

AI Meets IoT: How Connected Devices Are Learning to Think

Marko Frei — Tue, 09 Jun 2026 02:16:25 +0000

The Internet of Things gave us billions of connected devices: thermostats, factory sensors, wearables, doorbells, traffic cameras. They're great at one thing, collecting data and sending it somewhere. But raw data on its own isn't worth much. A sensor that reports a temperature every second is just noise until something decides what that temperature means and what to do about it.

That 'something' is increasingly AI. When you combine the two, you get what people now call AIoT, the Artificial Intelligence of Things. IoT is the nervous system, gathering signals from the physical world. AI is the brain, turning those signals into decisions. Neither is all that powerful alone. Together they make systems that sense, learn, and act.

Here's a look at how they fit together, where it's already useful, and what makes it hard.

Why the two need each other

IoT without intelligence is mostly plumbing. You collect data, store it, maybe graph it, and a human looks at a dashboard now and then. The volume quickly becomes unmanageable, and most of it is never acted on in time to matter.

AI without IoT has the opposite problem. A model can be brilliant at finding patterns, but it needs a stream of real-world input to be useful in the physical world. IoT devices are that input: eyes, ears, and instruments planted everywhere.

Put them together and the data starts doing work on its own. Instead of a dashboard that says 'this machine is vibrating more than usual', you get a system that predicts the machine will fail in three days and schedules maintenance before it does.

How an AIoT system is usually built

Most AIoT setups follow a similar shape:

Devices and sensors capture data from the environment: temperature, motion, images, sound, location, power usage.
Connectivity moves that data, often using lightweight protocols like MQTT that suit constrained devices and unreliable networks.
Processing happens either at the edge (on or near the device) or in the cloud.
AI models analyze the data to classify, predict, or detect anomalies.
Action closes the loop: the system triggers an alert, adjusts a setting, or controls another device automatically. The most interesting shift in recent years is step three moving toward the edge.

Edge AI: putting the brain next to the sensor

Traditionally, devices shipped their data to the cloud, a big model crunched it, and an answer came back. That works, but it has real downsides: latency, bandwidth cost, and the privacy concerns of sending everything off-device.

Edge AI flips this by running models directly on the device or a nearby gateway. A camera can recognize a person without uploading the video. A wearable can flag an irregular heartbeat without a round trip to a server. This is powered by lightweight model formats and frameworks like TensorFlow Lite, and by the broader TinyML movement, which squeezes machine learning onto microcontrollers with kilobytes of memory.

For developers, hardware like the Raspberry Pi, Arduino boards, and NVIDIA's Jetson line has made experimenting with edge AI genuinely accessible. You can prototype a smart sensor on a desk for the price of a nice dinner.

The payoff is faster responses, lower data costs, and better privacy, since sensitive data can stay local.

Where it's already working

AIoT isn't theoretical. It's running in a lot of places:

Industry (predictive maintenance). Sensors on machinery feed models that predict failures before they happen, cutting downtime. This is a cornerstone of what's called Industry 4.0.
Smart homes. Thermostats that learn your routine, cameras that tell a person from a passing car, assistants that respond to voice.
Healthcare. Wearables that monitor vitals and detect anomalies, sometimes alerting a doctor before the wearer notices anything wrong.
Smart cities. Traffic systems that adapt signal timing to real flow, and energy grids that balance load using demand predictions.
Agriculture. Soil and weather sensors paired with models that tell farmers precisely when to water or treat a crop. The common thread is the same: continuous real-world data plus a model that turns it into a timely decision.

The hard parts

AIoT is powerful, but it comes with a stack of challenges that developers run into fast:

Resource constraints. Many devices have tiny amounts of memory, compute, and battery. Running a model there means heavy optimization and compromise.
Latency and reliability. Some decisions can't wait for a cloud round trip, and networks at the edge are often flaky.
Security. Every connected device is a potential entry point. A fleet of cheap sensors is a large attack surface, and many ship with weak defaults.
Privacy. These systems collect intimate data about homes, bodies, and movements. Handling it responsibly is both an ethical and legal requirement.
Deployment and updates. Pushing a new model to thousands of devices in the field, safely and without bricking them, is a genuine engineering problem.
Interoperability. The IoT world is a mess of competing standards and protocols, and getting devices from different vendors to cooperate is rarely smooth. ## Why it's worth paying attention to

The reason AIoT matters is that it moves AI out of the screen and into the physical world. A chatbot answers questions; an AIoT system notices that a freezer is failing, a patient's heart rhythm is off, or a road is about to flood, and acts on it.

For developers, the barrier to entry has dropped sharply. Affordable hardware, mature edge frameworks, and growing cloud IoT platforms mean you can build a working intelligent device without a research lab. The skills it draws on, embedded programming, networking, data, and machine learning, sit at an intersection that not many people occupy yet, which makes it a useful place to be.

IoT taught machines to sense the world. AI is teaching them to understand it. The combination is quietly becoming the layer where software finally reaches off the screen and does something in the room with you.

Why Does AI Have Limits? Understanding What Today's Models Can't Do

Marko Frei — Mon, 08 Jun 2026 17:31:15 +0000

AI can write code, summarize research, and hold a convincing conversation, so it's easy to assume the only thing standing between today's models and true general intelligence is a bigger model and more data. But the limits we run into aren't random bugs that the next release will quietly patch. Most of them come straight from how these systems are built. Understanding where those limits come from makes you a sharper builder and a more realistic user.

Here are the main reasons AI has a ceiling on what it can do.

It learns patterns, not meaning

A large language model is, at its core, a very sophisticated next-token predictor. During training it sees enormous amounts of text and learns the statistical relationships between words and ideas. When you ask it something, it isn't recalling a fact from a database or reasoning the way a person does. It's generating the most probable continuation based on the patterns it absorbed.

This works astonishingly well for a huge range of tasks, but it also explains a lot of the weirdness. The model has no internal model of truth. It has a model of what plausible text looks like. Most of the time plausible and correct overlap, which is why it feels intelligent. When they don't overlap, you get confident nonsense.

It's boxed in by its training data

A model only knows what was in its training data, and that data has a cutoff date. Anything that happened after that point, or anything too niche to appear much in the data, is effectively invisible to it. This is why models can be wildly out of date on current events, recent library versions, or fast-moving topics unless they're given live access to search.

The data also carries its own biases, gaps, and errors. A model trained on the internet inherits the internet's blind spots and slants. It can't rise above the quality of what it was shown. Garbage in, garbage out is an old idea, but it applies in full force here.

Hallucination is a feature of the design, not just a flaw

When a model doesn't know something, it doesn't experience uncertainty the way you do. It still produces the most likely sounding answer, because that's the only thing it knows how to do. The result is a hallucination: a fluent, confident, completely fabricated response.

This is hard to fully eliminate precisely because the model is optimized to sound right, not to be right. Techniques like retrieval, grounding in real sources, and asking the model to show its reasoning all help. But the underlying tendency to fill gaps with plausible invention is baked into how the system works.

It has no grounding in the real world

A person learns what 'hot' means by touching something hot. A model learns the word 'hot' only from how it appears next to other words. It has no senses, no body, and no direct contact with the world it talks about. Everything it 'knows' is secondhand, learned from text describing reality rather than reality itself.

This is why models can produce flawless descriptions of things they fundamentally don't grasp, and why they stumble on simple physical or spatial common sense that any child handles easily.

Memory and context are finite

Models have a context window, a hard limit on how much text they can consider at once. Push past it and earlier parts of the conversation or document fall out of view. By default, a model also has no memory between separate sessions. Each conversation starts fresh unless the application adds memory on top.

So while a model can feel like it 'knows you', that continuity is something engineers build around it, not something the model has on its own.

Reasoning breaks down outside familiar territory

Models are strong on problems that resemble their training data and much weaker on genuinely novel ones. Multi-step logic, precise arithmetic, and tasks that require holding a long chain of reasoning together are common failure points. The model can often imitate reasoning convincingly while quietly making an error several steps in, because it's pattern-matching to what a solution looks like rather than actually computing it.

This out-of-distribution weakness is one of the clearest reminders that imitation and understanding are not the same thing.

Some limits are there on purpose

Not every limit is a technical shortcoming. Many are deliberate. Safety guardrails, content restrictions, and alignment training intentionally stop models from doing certain things, from generating harmful instructions to impersonating real people. These constraints exist for good reasons, even when they're occasionally frustrating, and they're a reminder that a capable system and a safe one are not automatically the same thing.

Why this matters for builders

None of this means AI isn't useful. It's genuinely transformative for the right tasks. But the people who get the most out of it are the ones who design around its limits instead of pretending they aren't there:

Give models current, trusted information rather than trusting their memory.
Verify anything factual, especially numbers, citations, and code that has to be exactly right.
Keep tasks within the kinds of problems the model handles well, and break big ones into smaller, checkable steps.
Treat fluent confidence as a style, not a signal of correctness. The limits of AI aren't a temporary inconvenience waiting on the next model. They follow from what these systems actually are: powerful pattern learners with no senses, no guaranteed grip on truth, and no understanding in the human sense. Knowing that doesn't make the technology less impressive. It makes you better at using it.

Can AI Reach Beyond Human Intelligence?

Marko Frei — Mon, 08 Jun 2026 12:47:49 +0000

I spend a good chunk of my week judging what AI models produce. I read their reasoning, watch them solve coding tasks, and score how close they got. So when people ask me whether AI is about to leave human intelligence in the dust, I have a less dramatic answer than the headlines: it's astonishing, and it isn't close. Both things are true.

The impressive part

There's no use pretending the progress isn't real. A model can read a 40-page spec, draft a working module, write tests for it, and explain its choices faster than I could open the file. On narrow problems with clear feedback, these systems already operate well past most people, myself included. That's not hype. I see it on a normal Tuesday.

If you define intelligence as "produces useful output across a huge range of tasks," AI has been quietly crossing thresholds we used to treat as science fiction.

The gap that doesn't close on schedule

But general human intelligence is a different animal, and the gap shows up in the failures, not the wins.

The same model that nails a hard task will confidently invent a function that doesn't exist. It'll ace a benchmark, then fall apart when I change one assumption the benchmark didn't mention. It has no reliable sense of when it's wrong. A junior dev who breaks the build feels the consequence and adjusts. The model just generates the next plausible token and waits for me to tell it.

That's the core of it. What these systems do is extraordinary pattern completion over an enormous amount of text. What humans do is build a messy, grounded model of the world, carry goals across years, and know the difference between "I'm sure" and "I'm guessing." We transfer a lesson from one domain to a totally unrelated one without being retrained. We notice when a problem is the wrong problem. Current AI doesn't do those things, and scaling the same recipe hasn't made them appear.

"Soon" is doing a lot of work

I'm not saying never. I'm saying not soon, and I'm wary of anyone who sounds certain in either direction. The honest position is that we don't yet know whether bigger versions of today's approach reach general intelligence or hit a wall. My bet, from the seat I sit in, is that something fundamental is still missing rather than just a few more parameters.

What this means if you build

Treat AI as a phenomenal collaborator with no judgment. Lean on it for speed and breadth. Keep a human on anything where being confidently wrong is expensive. The teams that win this decade won't be the ones waiting for the model to think for them. They'll be the ones who got good at working alongside something brilliant and strange.

What's your read? I'm curious whether people closer to the research feel the same wall, or see a way through it.

Is AI Killing Animation? The 2D vs 3D Reality in 2026

Marko Frei — Sat, 06 Jun 2026 12:04:10 +0000

"Is AI killing animation?" is the question every artist in the field is quietly asking. The honest answer requires separating two things people constantly blur together: the craft (the art form, which is mostly fine) and the jobs (specific roles, some of which are genuinely under pressure).

And here's the part most takes miss: AI is hitting 2D and 3D animation in different places. Understanding where is the difference between useful worry and useless panic.

First, the framing: not killed, reshaped

Animation as a medium isn't dying. By most projections the global industry is still growing into the hundreds of billions, and AI is widening what's possible, especially for solo and indie creators who couldn't previously afford full production. More animation is being made, not less.

What's changing is the labor profile of how it gets made. The clearest data point comes from a study by CVL Economics, commissioned by the Animation Guild and others: it estimated that roughly 21% of US film, TV, and animation jobs — about 118,500 of them — could be consolidated, replaced, or eliminated by generative AI by 2026. That doesn't require full automation. It just requires enough of a role's tasks to be absorbed by tools. Which is exactly what's happening, but in different ways for 2D and 3D.

2D: AI automates the process

In 2D, the disruption is in the production pipeline — the labor-intensive, frame-by-frame steps.

AI tools now auto-generate in-betweens (the frames between an animator's key poses), fill flat colors, and assist with rotoscoping. As Cartoon Brew put it, this turns frame-by-frame work into "approval and correction" — shifting the job away from craft and toward supervision. Some studios report 30–50% reductions in project timelines when AI handles roto, motion cleanup, and asset generation.

So in 2D, the threatened tier is the entry-level production layer: in-betweeners, cleanup artists, colorists. These were the traditional ways juniors broke into the industry and built their chops, which makes the squeeze especially worrying for the talent pipeline. The art of 2D is untouched, even thriving. The repetitive labor that surrounded it is what's being absorbed.

3D: AI automates the product

3D is where it gets more dramatic, because generative AI doesn't just speed up the process — it can now produce the actual assets.

Text-to-3D and image-to-3D have crossed into genuinely usable territory. Tools like Meshy, Rodin, Tripo, Luma Genie, and 3D-Agent generate a model from a prompt like "a rustic wooden chair with leather seat" in seconds to minutes, using diffusion models and neural radiance fields. Meshy's texturing is good enough to produce photorealistic PBR materials with minimal cleanup. And this isn't fringe tooling: Autodesk built it straight into its pipeline with Wonder 3D (launched March 2026 in Flow Studio), offering text-to-3D and image-to-3D aimed at letting creators iterate on assets without slowing production. On top of asset generation, AI also handles mocap cleanup, auto-rigging, and retargeting.

The job signal here is blunter than in 2D. In the same body of research, about a third of entertainment executives predicted AI would displace 3D modelers by the end of 2026, with 3D modelers and VFX artists among the most exposed roles. When the deliverable itself — the model, the texture — can be generated, the role built purely around producing it is directly in the crosshairs.

The key contrast

Put simply:

2D automation hits the *process* — the in-betweening, coloring, and roto that happen around the art.
3D automation hits the *product* — the models and textures that are the art. That asymmetry is why generic 3D modeling is, in some ways, more directly threatened than 2D animation right now. It's also why the conversation feels so different depending on which world you live in.

Where humans stay essential (in both)

Now the other side of the ledger, which is just as real.

The quality ceiling still protects the high end. Generated 3D geometry is frequently "soft" or melted-looking, and topology is often messy — fine for background props and level blocking, not for hero characters or clean, animation-ready models. Somebody has to fix, retopologize, and finish, and that somebody needs real skill.

The final 20% is human. Across both 2D and 3D, the value has migrated to the hand-polished performance: the micro-timing, the weight, the acting, the narrative intent. Automation handles the grunt work; it can't supply the part that makes a performance feel alive. The industry no longer wants "keyframe monkeys" — it wants animators who can add what the algorithm can't.

Direction and taste don't generate. Knowing what's worth making, what reads emotionally, what fits the story — that's still the human's job, and it's the hardest to automate.

The new roles (the escape hatch)

3D in particular has a clearer "level up" path than pure doom. New roles are emerging at studios that blend craft with AI fluency: AI animation supervisors, generative content directors, and AI pipeline specialists. There's also a rising "generative 3D artist" niche — people who combine traditional modeling skill with code and AI to drive procedural and automated content. And the most resilient production work is moving into real-time pipelines (Unreal Engine), where hand-tuned performance meets live rendering.

The pattern: the people who treat AI as a tool they direct are creating new, often better-paid roles. The people who only did the task the tool now does are the ones forced to adapt.

The honest caveat

There's real tension here, and it's worth naming both sides fairly. Animators and unions tend to see job loss, a gutted junior pipeline, and AI "sold to reduce crunch" that in practice often cut headcount instead. Studios and tool-makers see speed, lower costs, and dramatically expanded access for small creators. Both are describing true things. The Animation Guild has openly acknowledged AI is already affecting hiring plans, job scopes, and timelines — this isn't a future hypothetical.

The takeaway

So, is AI killing animation? No. But it is hollowing out a specific tier of it.

At risk: the repetitive production layer — 2D in-betweeners/colorists/roto artists, and generic 3D modelers cranking out assets.
Fine, even rising: anyone who directs, performs, polishes, supervises, or learns to drive the AI tools rather than compete with them. If you're an animator, the move isn't to out-render the machine on the things it's now good at. It's to double down on performance, story, and taste, and to become the person who orchestrates these tools instead of the person they replace. The craft is in no danger. The question is which side of it you're standing on.

Are you a 2D or 3D artist seeing this in your own work? I'm especially curious whether the junior pipeline is really being gutted, or whether new entry points are quietly replacing the old ones. Tell me what you're seeing in the comments.

AI in Unreal Engine: The Two Kinds Every Game Dev Should Know

Marko Frei — Sat, 06 Jun 2026 10:51:24 +0000

When someone says they're building an "AI game" in Unreal Engine, they could mean one of two completely different things, and the confusion trips up a lot of newer developers. So before any tutorial, the most useful thing I can give you is the distinction:

Classic game AI — the decades-old discipline of making NPCs behave: pathfinding, decision-making, enemies that flank you. This is deterministic, authored, and shipped in basically every game you've played.
Generative AI — the new wave: large language models giving NPCs open-ended conversation, plus AI tools that generate worlds and assets. They share two letters and almost nothing else. Let's go through both as they stand in Unreal in 2026, because a strong game usually needs the first and is starting to experiment with the second.

Part 1: Classic game AI (the kind that ships today)

This is the AI that makes a guard patrol, notice you, chase you, lose you, and give up. Unreal has a mature, battle-tested toolkit for it, and you should learn this before anything flashier.

Behavior Trees + Blackboard. The classic combo. The Blackboard is the NPC's "memory" (where's the player, what's my health), and the Behavior Tree is a visual graph of decisions that reads that memory and picks actions. It's intuitive, debuggable, and still the backbone of a huge amount of shipped game AI.

State Tree. This is the notable shift. State Tree is Epic's newer state-machine-meets-behavior-tree framework, and it became production-ready in 5.7 — with Unreal Engine 5.8 making it the default AI and logic framework for new projects. It's more performant and more composable than Behavior Trees for many cases, so if you're starting fresh in 2026, this is increasingly where to begin.

The supporting cast: the NavMesh system handles pathfinding (how an NPC physically gets from A to B around obstacles); EQS (Environment Query System) lets an NPC ask spatial questions like "where's the nearest cover the player can't see?"; and Mass (the MassEntity framework) is what you reach for when you need thousands of agents — crowds, swarms, traffic — at performance.

None of this involves a neural network. It's logic, and it's the right tool when you want NPC behavior that's reliable, performant, and shippable. If your game needs smart enemies, this is your stack.

Part 2: Generative AI — the LLM NPC wave

This is the part people get excited about: NPCs you can talk to in open-ended natural language, who respond in character with voice and lip-synced facial animation. In 2026 this has gone from tech demo to genuinely usable.

How it actually works

The architecture is surprisingly consistent across the major platforms. A large language model is handed a character definition — personality, backstory, speech style, and crucially a knowledge boundary — and then responds to player input while staying inside those constraints. That last part has a name worth knowing: contextual persona locking. A medieval blacksmith NPC doesn't know about smartphones not because someone filtered every possible answer, but because the character prompt establishes a knowledge horizon the model stays within. That's what keeps the illusion coherent instead of obviously mechanical.

Wrap that brain in a voice pipeline — speech-to-text in, text-to-speech out — plus facial animation driven from the audio, and you have a character you can have a real conversation with.

The tools doing it in Unreal

NVIDIA ACE is the heavyweight stack: Riva for speech recognition and synthesis, an LLM for the conversation, and Audio2Face, which generates lip-sync and facial animation straight from audio and plugs directly into MetaHuman characters via an Omniverse connector. A notable design choice: ACE is built to run inference on the player's local RTX GPU, not just in the cloud, which matters a lot for latency and cost.
Convai provides the conversational "brain" (LLM, memory, perception, in-world actions) and ships a proper Unreal Engine plugin with MetaHuman integration, so you can wire a talking NPC into a scene without building the pipeline yourself.
Inworld AI is the other big player, focused on authoring rich character personalities and behaviors. You can see all of this in the wild now: AI teammates powered by ACE in PUBG, conversational townsfolk in life-sims, and famously, Skyrim modded with Inworld so every villager can hold a real conversation.

The honest caveats

This is where I'd temper the hype before you build your whole game around it:

Latency and cost. A cloud LLM round-trip plus speech synthesis can feel sluggish in a fast game, and per-conversation costs add up. Local inference (ACE on RTX) helps but limits your audience to capable hardware.
Persona breaks. Models still go off-character, hallucinate lore, or get talked into saying things your medieval blacksmith never should. The knowledge-boundary prompt mitigates this; it doesn't eliminate it.
The design question nobody asks. Do players actually want infinitely-talkative NPCs, or do they want a tight, authored story? The emerging consensus is hybrid: authored, hand-crafted narrative for the moments that matter, with generative dialogue filling the ambient, replayable spaces around it. ## Part 3: AI in the workflow (the quiet revolution)

The third place AI shows up isn't in your game at all — it's in how you build it, and this is arguably where it's saving the most time today.

Procedural Content Generation (PCG). Not "AI" in the LLM sense, but the most impactful generation tool in Unreal. PCG became production-ready in 5.7 with a 2x performance jump and a new PCG Editor Mode, and Epic's demos show a single artist generating a 4km × 4km jungle with zero code. For world-building at scale, this is transformative.

The in-editor AI Assistant. Unreal Engine 5.7 added an AI Assistant built right into the editor. Hover over any interface element, press F1, and it starts a conversation about that feature — documentation and guidance without leaving your work.

AI asset tools. Things like NVIDIA's Meshtron use AI for retopology/remeshing — automating one of the most tedious parts of 3D asset creation while preserving edge loops and key features.

Industry surveys in 2026 suggest a large majority of AAA studios now use AI tools somewhere in their pipeline. Notably, the framing from Epic and others isn't replacement — it's that a single designer can now produce what used to take a team.

Where to start

If you want to actually build something:

For NPC behavior: start with State Tree (or Behavior Trees if you're following older tutorials), plus NavMesh. This is the foundation, and it ships.
For conversational NPCs: grab the Convai Unreal plugin and a MetaHuman; it's the lowest-friction way to get a talking character running, and you can layer ACE's Audio2Face on top for the facial animation.
For worlds: dive into PCG using Epic's Electric Dreams sample project. ## The takeaway

"AI in Unreal Engine" isn't one thing. It's a reliable, shippable discipline (classic game AI) that you should master first; an exciting, still-maturing frontier (generative LLM NPCs) worth prototyping but not over-committing to; and a genuinely useful set of workflow tools (PCG, the editor assistant, AI asset generation) that can make a small team punch far above its weight.

The developers who'll build the best AI games aren't the ones chasing the buzzword. They're the ones who know which kind of "AI" each problem actually needs.

Are you using generative NPCs in a project, or sticking with authored behavior for now? I'm especially curious whether anyone's solved the latency problem in a way that feels good in a fast-paced game — drop your setup in the comments.

AI vs Human: An Honest Scorecard

Marko Frei — Sat, 06 Jun 2026 04:53:52 +0000

"AI vs Human" makes for a great headline and a terrible question. It implies one winner, like there's a single leaderboard where one side is pulling ahead. The honest answer is that it depends entirely on the task, and once you break it down task by task, the picture gets a lot more interesting than "the robots are winning" or "it's all hype."

So here's a fair scorecard, from someone who uses these tools every day and is neither scared of them nor selling them.

Where AI clearly wins

Let's not be precious about it. There are whole categories where the machine isn't just competitive, it's not close.

Speed and scale. AI reads a thousand-page document in seconds, drafts in moments what would take you an afternoon, and never gets tired on the four-hundredth repetition. For anything bounded and repetitive, a human simply can't keep up.

Breadth of recall. No single person has read a fraction of what a large model has absorbed. Ask it about an obscure library, a historical event, and a cooking technique in the same minute and it'll have a reasonable answer to all three. Your one well-read friend can't do that.

Pattern-matching across huge spaces. Spotting a regularity buried in millions of examples is exactly what these systems are built for, and it's something humans are slow and unreliable at.

Tireless availability. It's there at 3am, it doesn't have a bad day, and it'll cheerfully rewrite the same paragraph twelve times without resenting you. That consistency is its own kind of superpower.

If your mental model of these tasks is "a human does them," AI has already changed the economics underneath you.

Where humans clearly win

Now the other side of the ledger, and it's just as real.

Judgment under ambiguity. When the problem is underspecified, the data is messy, and "it depends" is the honest answer, humans still dramatically outperform. We're good at deciding what's worth doing at all, which no amount of fluent text generation replaces.

Accountability. When a decision goes wrong, a human can be held responsible, can own it, and can be trusted because of that. You cannot delegate accountability to a system that has no stake in the outcome. This is why a human stays in the loop on anything that matters, not as a formality but as the person who answers for it.

Genuine novelty. AI is brilliant near things it has seen and brittle as you move away from them. Confronted with a truly new problem, one with no precedent in the training data, humans still reason from first principles in a way models struggle to.

Grounding and common sense. We learned about the world by living in it. We know what's physically plausible, what would actually hurt someone, what a person really meant despite what they said. Models learned how people write about the world, which is not the same thing.

Taste and meaning. Knowing that something is good (not just statistically likely), that a joke will land, that a design feels right, that a sentence has soul, remains stubbornly human. AI can imitate taste; it doesn't have any.

Caring. A model can generate the words "I'm sorry for your loss." It doesn't mean them. For anything where the point is a human connection, the human is the entire point.

The scorecard, honestly read

Put the two columns side by side and the pattern is clear: AI wins on execution at scale; humans win on judgment, novelty, and meaning. AI is extraordinary at answering; humans are still better at deciding what to ask and whether the answer is any good.

Notice that almost nothing on the human list is "humans are faster" or "humans know more facts." We lost those races and we're not getting them back. What we kept are the things that were always the hard, valuable part anyway.

Why "versus" is the wrong frame

Here's the twist that makes the whole debate misleading. In a surprising number of domains, the thing that beats a strong AI and beats a strong human is a human working with the AI.

The classic example is chess. After computers surpassed grandmasters, "centaur" chess emerged, where a human plus an engine playing together could beat either a human or an engine alone, because the human supplied strategy and judgment while the machine supplied tireless calculation. The same shape shows up everywhere now: the developer who pairs with an AI ships faster than either the AI alone (which hallucinates and lacks context) or the developer alone (who types slower and forgets the docs).

So the real competition isn't human against AI. It's the human who uses AI well against the human who doesn't. That's the matchup that actually decides outcomes over the next few years, and it's a much more useful thing to worry about.

What this means for you

If you're a developer reading this, the takeaway isn't "relax, you're safe" or "panic, you're doomed." It's more pointed than that:

Stop competing with AI on the things it's better at. Don't take pride in being a fast typist of boilerplate or a walking documentation index. Those are the machine's lane now.
Double down on the human column. Judgment, problem framing, knowing what's worth building, owning outcomes, and developing the taste to know when the AI's confident answer is quietly wrong. That last skill, verification, is becoming the core competency.
Become the centaur. The most valuable position isn't "human" or "AI." It's the person who orchestrates the tools fluently while supplying the judgment they lack. The "AI vs Human" framing sells clicks because it sounds like a fight to the death. But the people who'll do best aren't the ones who win the fight. They're the ones who realized it was never a fight in the first place.

The Limits of AI Models: What LLMs Still Can't Do (And Why)

Marko Frei — Fri, 05 Jun 2026 22:16:12 +0000

It's easy to be either a hype believer or a reflexive cynic about AI models. The more useful position is the boring one in the middle: these tools are genuinely powerful and they have hard, well-understood limits. If you build with them, knowing exactly where they break is what separates a robust product from a demo that falls apart in front of a real user.

So here's a tour of the real limits of today's models, and — more importantly — why each one exists. The "why" is what lets you predict failures instead of being surprised by them.

1. They make things up (and can't tell when they have)

The most famous limit: hallucination. A model will state a fabricated fact, a fake citation, or a nonexistent API method with exactly the same fluent confidence it uses for correct answers.

Why: An LLM is trained to produce plausible continuations of text, not true ones. There's no internal fact-checker and no concept of "I don't actually know this." If a confident-sounding wrong answer is statistically likely given the prompt, the model generates it just as smoothly as a right one. Confidence and correctness are completely decoupled.

Working around it: Treat every factual claim as unverified. Use retrieval (RAG) to ground answers in real documents, ask for sources you can check, and never put a raw model output in front of a user where a confident lie causes real harm.

2. Their knowledge is frozen in time

A model only knows what was in its training data, up to a cutoff date. Ask about anything newer and it will either admit ignorance or, worse, confidently improvise.

Why: Training is a discrete, enormously expensive event. The model's "knowledge" is baked into its weights at that moment and doesn't update as the world changes.

Working around it: This is exactly why web search and retrieval pipelines exist — you inject fresh information into the prompt at runtime rather than relying on the frozen weights.

3. They have no memory between calls

Each API request is independent. The model doesn't remember your last conversation, your preferences, or what it told you five minutes ago. Chat apps fake continuity by resending the entire history with every message.

Why: The model is a stateless function. Its only "memory" is whatever text you put in the current prompt. Nothing persists on its side.

Working around it: Any persistence — user memory, conversation history, learned preferences — is something you have to build and re-inject. The model won't do it for you.

4. The context window is finite, and it degrades

Everything the model can "see" — your prompt, instructions, history, retrieved documents — has to fit inside a fixed token budget. And even within that budget, models don't use all of it equally well. Information buried in the middle of a long context tends to get less attention than material at the start or end (the "lost in the middle" effect).

Why: Attention operates over a bounded sequence, and its quality isn't uniform across very long inputs. More context is not automatically better context.

Working around it: Put the most important instructions and data near the beginning or end. Summarize or chunk long material rather than dumping it all in. Don't assume a giant context window means the model is actually reasoning over all of it equally.

5. They fall apart on long, multi-step tasks

A model can nail a single step and still fail a task that requires fifty steps in sequence — the classic "great at the demo, unreliable in the agent" problem.

Why: Errors compound. Even a small per-step error rate becomes a near-certain failure over a long chain. Worse, recent research highlights a self-conditioning effect: once a model has made a mistake earlier in its own output, it becomes more likely to make further mistakes, because it's now predicting tokens conditioned on its own flawed work. The longer the horizon, the worse it gets.

Working around it: Decompose big tasks into small, independently verifiable steps. Validate between steps. Keep a human or a deterministic checker in the loop for anything where a wrong step silently corrupts everything downstream.

6. "Reasoning" is more brittle than it looks

Models can produce genuinely impressive chains of reasoning — and then fail a logically identical problem because you changed the names or numbers. They're sensitive to surface phrasing in ways a person who truly understood the problem wouldn't be.

Why: A lot of what looks like reasoning is sophisticated pattern-matching over things the model has seen. When a problem is close to its training distribution, it shines. Push it genuinely outside that distribution and the apparent reasoning can collapse. Newer "thinking" models that spend more compute at inference time help here, but they don't erase the underlying brittleness.

Working around it: Don't trust reasoning on novel or high-stakes problems without verification. Give the model the structure (the plan, the constraints) rather than hoping it invents reliable logic from scratch.

7. No grounding in the physical world

Models learned everything from text and images, not from living in the world. They have no senses, no body, and no real causal model of physics. They can describe how to ride a bike perfectly and have zero idea what balancing actually feels like.

Why: There's no embodiment and no real-world feedback loop in training. The model knows how people write about the world, which is not the same as understanding the world.

Working around it: Be cautious anywhere true physical-world reasoning, causality, or common-sense grounding matters. Text fluency about a domain is not competence in it.

8. They inherit the biases and gaps of their data

A model reflects its training data — including that data's biases, blind spots, and overrepresented viewpoints. It can quietly encode stereotypes or be systematically weaker on topics, languages, and cultures that were underrepresented online.

Why: The model is a compression of its corpus. Whatever skews exist in the data tend to show up, sometimes amplified, in the output.

Working around it: Test across diverse cases, don't assume neutrality, and be especially careful using these systems for consequential decisions about people.

9. Brute-force scaling is hitting diminishing returns

For years the recipe was "make it bigger." That's slowing. Frontier models show smaller gains on key benchmarks despite enormous increases in training budget, high-quality training text is becoming scarce, and the compute and energy costs are staggering.

Why: Scaling laws give diminishing returns — each doubling of resources buys a smaller improvement — and you eventually run low on both fresh high-quality data and economically sane amounts of compute.

Working around it: As a builder, this is mostly good news: the action is shifting toward better techniques, smaller specialized models, fine-tuning, and clever inference-time methods. You rarely need the biggest model; you need the right one for the job.

10. We can't fully explain what they're doing

Even the people who build these models can't fully explain why a given output appeared. The reasoning lives in billions of opaque numbers, and interpretability research, while advancing, is far from giving us a clear account.

Why: The behavior is emergent from training, not designed rule-by-rule. There's no readable program inside to inspect.

Working around it: Don't treat the model as an auditable decision-maker in domains that require explainability. If you need to justify why a decision was made, the model's confident-sounding explanation is itself just generated text, not a true trace of its reasoning.

The actual takeaway

None of this means AI models aren't useful — they obviously are. It means they're a specific kind of tool with a specific failure surface. The mental model that keeps you out of trouble:

Trust them for drafting, transforming, summarizing, and exploring, where a human reviews the output.
Distrust them for ground truth, long autonomous chains, novel reasoning, and explainable decisions, unless you've built verification around them. The engineers who build great things with AI aren't the ones who think it's magic or the ones who think it's useless. They're the ones who know precisely where the edges are and design around them.

Which of these limits has bitten you hardest in a real project? I'm especially curious about war stories from people building agents, since that's where so many of these failures stack up at once.

How LLMs Actually Work: A Developer's Mental Model

Marko Frei — Fri, 05 Jun 2026 19:07:39 +0000

Most of us use LLMs every day now, but if you asked the average developer what's actually happening between hitting enter and getting a response, the answer is usually some mix of "it's a neural network" and a shrug. That's fine — you don't need to know how a database B-tree works to write a query. But understanding the mental model behind LLMs makes you dramatically better at using them: you stop being surprised when they hallucinate, you write better prompts, and you understand why things like context windows and RAG exist.

So here's the whole thing, explained from the ground up. No equations.

The one-sentence version

An LLM is a function that takes some text and predicts the next chunk of text. That's it. Everything else — answering questions, writing code, "reasoning" — is an emergent side effect of doing that one thing extremely well, billions of times over.

Let's unpack how that actually produces something that feels intelligent.

Step 1: Your text becomes tokens

The model doesn't see letters or words. The first thing that happens is tokenization: your text gets chopped into "tokens," which are roughly word-fragments. Common words are usually one token; rarer words get split into pieces.

For example, "tokenization" might become ["token", "ization"], while "the" is just ["the"]. As a rough rule, one token ≈ 4 characters of English, or about ¾ of a word.

This is why you get billed per token and why character-counting your prompts doesn't quite match the limits. The model literally operates on a vocabulary of tokens, not language.

Step 2: Tokens become vectors (meaning as geometry)

Each token gets mapped to a long list of numbers — a vector, also called an embedding. Think of it as coordinates in a space with hundreds or thousands of dimensions.

The key idea: tokens with similar meanings end up near each other in that space. "King" and "queen" land in a similar neighborhood; "banana" is off in a completely different region. The model learned these positions during training, purely by noticing which words show up in similar contexts.

So at this point your sentence is no longer text — it's a sequence of points in a high-dimensional space, where distance and direction encode meaning.

Step 3: Attention — every token looks at every other token

This is the part that made modern LLMs possible: the transformer architecture, and specifically a mechanism called attention.

Here's the intuition. Take the sentence "The trophy didn't fit in the suitcase because it was too big." What does "it" refer to? You know it's the trophy, because of context. Attention is how the model figures this out: when processing the token "it," the model lets it look at every other token in the input and decide which ones are relevant. "Trophy" gets a high attention weight; "suitcase" a bit less; "the" almost none.

Every token does this, for every other token, simultaneously. The result is that each token's representation gets enriched with context from the whole sequence. A word like "bank" starts off ambiguous, but after attention it's been nudged toward "riverbank" or "financial institution" depending on what surrounds it.

Modern models stack this operation dozens of times in a row (these are the "layers"). Early layers capture simple patterns; deeper layers build up to grammar, then meaning, then something that looks a lot like reasoning. Nobody hand-coded any of those levels — they emerge from training.

Step 4: Out comes a probability distribution

After all those layers, the model produces one thing: a probability score for every token in its vocabulary, representing how likely each one is to come next.

For the prompt "The capital of France is", the distribution might look like:

"Paris"   → 0.94
"located" → 0.02
"a"       → 0.01
... (tens of thousands more, each tiny)

The model doesn't "know" Paris is the capital in the way you do. It has learned that, statistically, "Paris" is overwhelmingly the token that follows that sequence.

Step 5: Pick a token, then do it all again

Now the model samples a token from that distribution. This is where temperature comes in:

Low temperature → it almost always picks the highest-probability token. Output is focused and repetitive.
High temperature → it gives lower-probability tokens a real chance. Output is more varied and creative (and more likely to go off the rails). Then comes the crucial part: the chosen token gets appended to the input, and the entire process runs again to predict the next token. And again. And again. This loop is called autoregression:

prompt = "Write a haiku about the sea"
while not done:
    distribution = model(prompt)   # Steps 1–4
    next_token   = sample(distribution)  # Step 5
    prompt       = prompt + next_token

That's the whole generation loop. The model writes one token at a time, each time re-reading everything it has produced so far. It has no plan and no draft — it's improvising every token based on all the tokens before it. The fact that this produces coherent essays is genuinely one of the more surprising results in computer science.

Where does the "knowledge" come from? Training.

Everything above describes the model running. But how did it learn the right vector positions and attention patterns? Two phases.

Pretraining. The model is shown an enormous amount of text and given one task: predict the next token. Cover the last word of a sentence, make it guess, compare to the real answer, and nudge its billions of internal numbers (the parameters, or weights) slightly toward being right. Repeat trillions of times. To get good at predicting the next word across all of human text, the model is forced to absorb grammar, facts, writing styles, and patterns of reasoning. The "knowledge" is a side effect of compression: predicting text well requires modeling the world the text describes.

Fine-tuning and alignment. A raw pretrained model is just an autocomplete engine — ask it a question and it might reply with more questions, because that's a plausible continuation. To turn it into a helpful assistant, it goes through additional training on examples of good instruction-following, plus techniques like RLHF (reinforcement learning from human feedback), where humans rank responses and the model learns to prefer the kind people actually want. This is the step that makes it answer rather than ramble.

Why this explains the quirks

Once you hold this mental model, the famous LLM weirdnesses stop being mysterious:

Hallucinations. The model is optimizing for plausible, not true. There's no fact-checking step — if a confident-sounding but wrong token sequence is statistically likely, it'll produce it just as smoothly as a correct one. It doesn't know when it doesn't know.

Knowledge cutoffs. Its facts come from training data frozen at a point in time. It can't know about anything after that unless you give it the information in the prompt (which is exactly what web search and RAG do).

Context windows. Attention works over a fixed maximum number of tokens. Everything — your prompt, the system instructions, the conversation history — has to fit in that window. Go over it and the earliest stuff falls out of view.

No memory between calls. Each API request is stateless. The model doesn't "remember" your last conversation; chat apps fake continuity by resending the history every time. That's why your token count grows as a chat gets longer.

Prompt sensitivity. Since output is just conditioned on input tokens, small wording changes shift the probability distribution. "Explain X" and "Explain X to a five-year-old" steer it down very different statistical paths.

The takeaway

The most useful reframe is this: an LLM is not a database you query and not a mind that thinks. It's a probabilistic next-token predictor trained on a huge slice of human text, wrapped in enough scale that the predictions become genuinely useful.

Hold that model in your head and you'll prompt better, trust it in the right places, distrust it in the right places, and stop being shocked when the confident answer is confidently wrong.

This is a simplified mental model — I've glossed over plenty (positional encodings, the exact attention math, mixture-of-experts, and more) in the name of clarity. If you want a deeper dive into any one piece, drop a comment and I'll expand on it.

AI Won't Replace Humans — It'll Just Make Us Pickier

Marko Frei — Fri, 05 Jun 2026 15:45:48 +0000

Every few weeks someone posts the same screenshot: an AI writing a whole app from a one-line prompt, captioned "devs are cooked." And every few weeks I close my laptop, open a real client codebase, and remember that the prompt was never the hard part.

I use AI every day. It drafts my boilerplate, explains unfamiliar stack traces, and rubber-ducks my architecture decisions at 1am. I'd genuinely hate to give it up. But I've stopped believing it's coming for me, and I want to explain why — without the comforting hand-waving you usually get from people who feel threatened.

The honest part first

AI is going to replace a lot of tasks. It already has. The hour I used to spend wiring up a CRUD form, writing regex, or translating an error message into a Stack Overflow search — gone. If your job is only those tasks, that's a real problem, and pretending otherwise helps no one.

So when I say "AI won't replace humans," I don't mean nothing changes. I mean the thing being automated is the typing, not the deciding.

The hard part of software was never the code

Most of my actual work isn't writing code. It's figuring out what a client means when they say "it should just sync automatically." It's noticing that the integration they asked for was quietly deprecated two versions ago and choosing a different path. It's deciding which 20% of the feature request is worth building this month.

An AI can write the sync code beautifully. It cannot sit in the room, read the half-finished sentence, the budget anxiety, and the thing the client didn't say, and decide that the right move is to talk them out of the feature entirely. That's not a prompt. That's judgment, and judgment is built from consequences you've personally lived through.

Someone has to be accountable

Here's the part I think gets skipped. When the migration corrupts production data at 2am, the AI doesn't get the call. I do. When the architecture choice locks the company into a year of pain, no model is in the retro explaining itself.

Software runs on trust and accountability, and you can't delegate accountability to something that can't be held responsible. As long as that's true, there's a human in the loop — not as a babysitter, but as the person who owns the outcome.

We've done this before

We are spectacularly bad at remembering this, but every tool that was going to "replace programmers" instead made programmers more valuable:

Compilers were going to make assembly experts obsolete. They created a thousand times more programmers.
High-level languages, IDEs, autocomplete, Stack Overflow, open-source frameworks — each one removed grunt work, and demand for developers went up, not down. When you make something cheaper to produce, you usually get more of it, not less. Cheaper software means more software gets built — more startups, more internal tools, more ambitious projects that weren't worth the cost before. That demand has to land on someone who can steer the machine.

What I think actually happens

AI doesn't replace the developer. It raises the floor and moves the bottleneck. When generating code is nearly free, the scarce skill becomes knowing what's worth generating, spotting when the confident output is quietly wrong, and integrating it into a messy real system that has history, constraints, and humans attached.

In other words: the job gets less about production and more about taste, verification, and judgment. We become pickier. The developers who struggle won't be the ones who refused to use AI — they'll be the ones who used it without ever developing the judgment to know when it's lying to them.

So, am I worried?

Not about being replaced. I'm worried about people who think the skill is "prompting" instead of "deciding." I'm worried about juniors who never build the intuition because the AI always answered first. Those are real problems worth talking about.

But the human in software? We're not going anywhere. We just get a very fast, very confident, occasionally hallucinating intern who never sleeps — and someone still has to decide whether to trust what it hands us.

That someone is the job.

What's your take — is "AI replaces tasks, not people" too optimistic, or about right? I'd genuinely like to hear where you think this breaks down.