Nicanor Korir

Posted on Nov 14

Optical Flow: How Robots (and maybe your Phone) See Motion

#ai #robotics #computervision #nicanorkorir

Okay, so here's a weird question: how do you know something is moving?

Like, right now, if I threw a ball at you, you'd catch it or try to. Not because you're doing complex calculations. You just see it moving. Your brain processes the motion instantly, and your hands know where to be.

But how? What's actually happening when you perceive motion?

That's optical flow. And honestly? Understanding optical flow changed how I think about vision in general. Let me explain.

The Coffee Cup Experiment

Imagine you're sitting at a table with a coffee cup in front of you. The cup isn't moving. You're not moving. Everything is still.

Now, I walk past you. As I walk, from your perspective, the background behind me seems to shift. The wall behind me appears to move in the opposite direction I'm walking. The floor seems to slide past.

But here's the thing, nothing is actually moving except me. The wall isn't really moving. The floor isn't sliding. Your brain knows this because it's processing relative motion.

What you're actually seeing is, the pixels in your visual field are changing position over time.

Optical flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer (you, a camera, a robot, etc) and the scene.

Optical flow is basically asking, "Which pixels are moving, and in which direction?"

Why is Optical Flow So Important?

Here's where it gets practical. Imagine a robot navigating through a hallway. How does it know it's moving forward?

One way: it has odometers on its wheels, or it uses GPS, or it has a motion sensor. But what if those sensors break? Or what if it's in an environment where GPS doesn't work?

Another way: the robot looks at what it sees and figures out, "Hey, everything in my visual field is moving away from the center. That means I'm moving forward." This is optical flow.

If a robot is trying to catch a moving object, it needs to know:

Is the object moving, or am I moving?
In which direction is it moving?
How fast?
Will it hit something?

All of this can be extracted from optical flow.

Similarly, when your phone stabilizes video, it's using optical flow to detect camera shake and compensate for it. When a drone hovers in place without GPS, it's using optical flow to stay put.

But: What's Actually Happening?

Let's go back to basics, you're looking at a video, let's say it's a video of a person walking across a room.

Frame 1: You see the person at position X.
Frame 2: You see the person at position X+5 pixels to the right.
Frame 3: You see the person at position X+10 pixels to the right.

Optical flow is literally: "The person moved 5 pixels to the right between frame 1 and 2, and another 5 pixels between frame 2 and 3."

But it's not just about where things moved. It's about the pattern of motion across the entire image.

Think of it like this: imagine you're looking at a piece of paper with arrows drawn on it. Each arrow points in a direction, and its length shows how far something moved.

In a video of a person walking toward the camera:

The edges of the image show motion outward (things moving away)
The center shows less motion
The person's limbs show rapid motion (arms swinging)

When you visualize all these arrows together, you get a motion field.

Okay, But How Do You Actually Calculate It?

The Basic Principle:

A pixel's brightness (or color) doesn't change much between consecutive frames, unless something moves.

So if you see a pixel that was bright white in frame 1, and it's also bright white in frame 2, but a few pixels to the right, you can infer: "That pixel content moved to the right."

This is called the brightness constancy assumption: the intensity of a pixel remains constant as it moves.

In math terms:

I(x, y, t) = I(x + dx, y + dy, t + dt)

This just means: "The brightness at position (x, y) at time t equals the brightness at the new position after movement."

The Lucas-Kanade Method (One Popular Approach)

There are many ways to calculate optical flow, one of the most famous is Lucas-Kanade. Here is how it works:

Look at a small window of pixels (like a 3x3 or 5x5 grid)
Find the best motion vector (how far it moved, in which direction) that explains the change between frames
Repeat for every pixel in the image
You get a motion field, every pixel has an associated motion vector

It's like saying: "For this window, the best explanation for the change I see is that everything shifted 3 pixels to the right and 1 pixel down."

The Dense vs. Sparse Problem

Sparse Optical Flow: Track only a few key points (like corners or features). You end up with arrows pointing from frame 1 to frame 2 for a few hundred points.

Advantage: Fast, works even with significant motion.
Disadvantage: Doesn't tell you about the entire scene, just key points.

Dense Optical Flow: Calculate motion for every pixel, every single pixel gets a motion vector.

Advantage: Complete picture of motion.
Disadvantage: Computationally expensive, can fail with large motion or occlusions.

For a robot navigating a hallway? Sparse is usually enough. You just need to know the general motion pattern.

A Real Example: Following a Ball

Let's say you're building a robot that needs to track a tennis ball.

Frame 1: The ball is at position (100, 150) in the image.
Frame 2: The ball is at position (115, 148) in the image.

Optical flow detected: The ball moved 15 pixels right, 2 pixels up.

Frame 3: The ball is at position (130, 145).

Optical flow detected: The ball moved 15 pixels right, 3 pixels up.

Now the robot can predict: "The ball is moving consistently to the right and slightly upward. At this rate, in the next frame it will be around (145, 142)."

Extrapolate further, and the robot can predict where the ball will be and position itself to catch it. Optical flow is the vision equivalent of prediction.

The Challenges: When Optical Flow Fails

Challenge 1: Occlusion

Imagine someone walks behind a tree. From the camera's perspective, the person disappears. Optical flow can't track what it can't see. The motion vectors suddenly stop.

Robots have to be smart about this: "The person disappeared, but based on the last known motion vector, I predict they'll emerge here."

Challenge 2: Lighting Changes

Remember the brightness constancy assumption? It breaks if the lighting changes. If a cloud passes overhead and the entire scene gets darker, optical flow gets confused.

It might think things moved when really just the lighting changed.

Challenge 3: Large Motion

If something moves really fast between frames, optical flow struggles. It expects motion to be small and smooth, think of fast action footage. Optical flow can't always keep up with rapid motion. This is why video codecs that use optical flow sometimes struggle with fast cuts.

Challenge 4: Texture-less Regions

If you're looking at a blank wall, there are no features to track. Optical flow can't tell if the wall moved or not because there's nothing distinctive to latch onto.

Challenge 5: Reflections and Transparency

Mirrors, windows, water, these break optical flow because the brightness doesn't correlate with actual motion.

Uses of Optical Flow

1. Autonomous Vehicles

Self-driving cars use optical flow to understand their motion relative to the scene. "The lane markings are flowing backward, which means I'm moving forward." It's also used to detect obstacles e.g. "That region isn't flowing like the background—something is there."

2. Video Compression

When Netflix streams a video to you, it doesn't send every pixel every frame. It uses optical flow to predict motion: "In the next frame, these pixels will probably be here based on the motion I detected." Then it only sends the changes.

This saves massive amounts of bandwidth.

3. Video Stabilization

Your phone camera detects motion between frames using optical flow. If it detects motion that seems like camera shake (small, jittery motion), it digitally shifts the image to compensate.

4. Robotics Navigation

Mobile robots use optical flow to navigate when other sensors fail. "I can see the environment is flowing past me, so I know I'm moving forward. If the flow pattern changes, something is blocking me."

5. Action Recognition

If you're building a system that understands "What is happening in this video?", optical flow helps. Running looks different from walking looks different from falling, and these differences show up in the motion patterns.

6. Frame Interpolation

Ever seen a slow-motion video created from a regular video? Sometimes it uses optical flow to predict intermediate frames. "Between frame 1 and frame 3, based on the motion I see, frame 2 probably looked like this."

Resources

OpenCV documentation on optical flow: https://docs.opencv.org/master/d4/dee/tutorial_optical_flow.html
Research paper (accessible): "An Introduction to Image Processing" by Gonzalez & Woods covers optical flow basics
YouTube channel Sentdex has OpenCV tutorials including optical flow
RAFT paper (modern deep learning approach): https://arxiv.org/abs/2003.12039

The reason why optical flow matters in robotics is that it's one of the fundamental ways a robot can understand the world without relying on explicit sensors. A robot with just a camera can: