In the world of automated vision, there's only so much one can do with a single sensor. Optimize for one thing, and lose the other; and even with one-size-fits-all attempts, it's hard to paint a full sensory picture of the world at 30fps.
We use sensor combinations to overcome this restriction. This intuitively seems like the right play; more sensors mean more information. The jump from one camera to two, for instance, unlocks binocular vision, or the ability to see behind as well as in front. Better yet, use three cameras to do both at once. Add in a LiDAR unit, and see farther. Add in active depth, and see with more fidelity. Tying together multiple data streams is so valuable this act of Sensor Fusion is a whole discipline in itself.
Yet this boon in information often makes vision-enabled systems harder to build, not easier. Binocular vision relies on stable intrinsic and extrinsic camera properties, which cameras don't have. Depth sensors lose accuracy with distance. A sensor can fail entirely, like LiDAR on a foggy day.
This means that effective sensor fusion involves constructing vision architecture in a way that minimizes uncertainty in uncertain conditions. Sensors aren't perfect, and data can be noisy. It's the job of the engineer to sort this out and derive assurances about what is actually true. This challenge is what makes sensor fusion so difficult: it takes competency in information theory, geometry, optimization, fault tolerance, and a whole mess of other things to get right.
So how do we start?
...just kidding. Though you would be surprised how many times an educated guess gets thrown in! No, we're talking
So let's review our predicament:
- We have a rough idea of our current state (e.g. the position of our robot), and we have a model of how that state changes through time.
- We have multiple sensor modalities, each with their own data streams.
- All of these sensors give noisy and uncertain data.
Nonetheless, this is all that we have to work with. This seems troubling; we can't be certain about anything!
Instead, what we can do is minimize our uncertainty. Through the beauty of mathematics, we can combine all of this knowledge and actually come out with a more certain idea of our state through time than if we used any one sensor or model.
This 👆 is the magic of Kalman filters.
Let's pretend that we're driving an RC car in a completely flat, very physics-friendly line.
There are two things that we can easily track about our car's state: its position pt and velocity vt.
We can speed up our robot by punching the throttle, something we do frequently. We do this by exerting a force f on the RC car's mass m, resulting in an acceleration a (see Newton's Second Law of Motion).
With just this information, we can derive a model for how our car will act over a time period using some classical physics:
We can simplify this for ourselves using some convenient matrix notation. Let's put the values we can track, position pt and velocity vt, into a state vector:
...and let's put out applied forces into a control vector that represents all the outside influences affecting our state:
Now, with a little rearranging, we can organize our motion model for position and velocity into something a bit more compact:
By rolling up these terms, we get some handy notation that we can use later:
- is called our prediction matrix. It models what our system would do over , given its current state.
- is called our control matrix. This relates the forces in our control vector to the state prediction over .
However, we’re not exactly sure whether or not our state values are true to life; there’s uncertainty! Let’s make some assumptions about what this uncertainty might look like in our system:
- Any error we might get in an observation is inherently random; that is, there isn’t a bias towards one result.
- Errors are independent of one another.
We will use our understanding of Gaussian curves later to great effect, so take note!
Our state vector
represents the mean
of this PDF. To derive the rest of the function, we can model our state uncertainty using a covariance matrix
There are some interesting properties here in . The diagonal elements ( , ) represent how much these variables deviate from their own mean. We call this variance.
The off-diagonal elements of
express covariance between state elements. If
is zero, for instance, then we know that an error in velocity won’t influence an error in position. If it’s any other value, we can safely say that one affects the other in some way. PDFs without covariance terms look like Figure 1 above, with major and minor axes aligned with our world axes. PDFs with covariance are skewed off-axis depending on how extreme the covariance is:
Variance, covariance, and the related correlation of variables are valuable, as they make our PDF more information-dense.
Notice that got tossed out! Control has no uncertainty that we can directly observe, so we can’t use the same math that we did on .
Yes, we are literally adding noise.
Our results are… ok. We got a good guess at our new state out of this process, sure, but we’re a lot more uncertain than we used to be!
There’s a good reason for that: everything up to this point has been a sort of “best guess”. We have our state, and we have a model of how the world works; all that we’re doing is using both to predict what might happen over time. We still need something to support these predictions outside of our model.
Something like sensor measurements, for instance.
We’re getting there! So far, this post has covered:
- The motivation for using a Kalman filter in the first place
- A toy use case, in this case our physics-friendly RC car
- The Kalman filter prediction step: our best guess at where we’ll be, given our state
We’ll keep it going in Part II (posted later this week) by bringing in our sensor measurements (finally). We will use these measurements, along with our PDFs, to uncover the true magic of Kalman filters!
Spoiler: it’s not magic. It’s just more math.
Note: this excellent post was written by Tangram Vision CEO Brandon Minor