Kalman Filter

Storing notes on what they mean, and how to use them. Going to start with the Wiki article, summarize it intuitively, and go from there.

Initial thoughts:

  • What is the connection to Bayes Theorem?
  • How can we express this more simply (with several variables) before going to Linear Algebra?
  • What is a simple, everyday use case?

From this pdf


  • Noise is Gaussian / normally distributed [which necessitates another article to truly understand the implications there. High-level, it’s symmetric, variations fall within known % of the standard deviation, etc.]
  • If not Gaussian, Kalman is still best linear estimator. Ok. How do we know when to use a linear vs. non-linear estimator?

Main advantages:

  • Can continually update estimate as new data comes in. Reminds me of a Bayesian Spam Filter – as new messages come in, and you mark them spam (or not), we are updating that tables that indicate the % spam frequency of various words. Kalman filter similar? (Except instead of spam/not spam, it’s trying to estimate the value of a parameter).

Gotcha: called a “filter” because it removes noise from the measurement. Better name would be a Kalman Estimator.

Covariance Matrix

  • Expected value of difference between x1 and its mean times x2 and its mean
  • Essentially, is there some correlation between x1 and x2… where if x1 is greater than its mean, x2 is as well?
  • Use linear algebra to store the values of x in a column vector
  • Need an actual example, want to understand this better

Once we have the covariance, we can try to extract the independent noise contribution (intuitively, separate the “core” correlations from the random ones?)

Goal: to find a vector which summarizes (describes) the past behavior the best. Assume we have a linear dynamic system with additive white noise.

From Understanding the bassis of the Kalman Filter Via a Simple and Intuitive Derivation

The Kalman filter model assumes that the state of a system at time $t$ evolved from the prioer state at time $t-1$ according to the equation:

$$x_t = F_tx_{t-1} + B_tu_t + w_t $$


  • $x_t$ state vector containing terms of interest at time $t$ (ok… that is new state)
  • $u_t$ : Control inputs
  • $B_t$ : Control input matrix that applies the effect of each control to the state vector
  • $F_t$ : State transition matrix, applies effect of state parameter at previous time $x_{t-1}$ to current state
  • $w_t$ : contains process noise for each parameter.

I’m thinking: We want to isolate the contribution from:

  • Previous states
  • Current controls
  • Random noise


  • process noise (random friction bumps on the road)
  • observation noise (dust on your radar detector)

To Read: http://bilgin.esme.org/BitsBytes/KalmanFilterforDummies.aspx

From https://www.youtube.com/watch?v=DlQ7HVu8zQE

  • We have our actual signal (internal state: $x_k$ )
  • Then we have our measured state $z_k$
  • There is noise, $B$ , which turns $z_k = B x_k$
  • There are two noise functions $w_k$ when applying the state change and $v_k$ when making the measurement. Then we have $z_k = Bx_k + v_k$ and $x_k = Ax_{k-1} + w_k$

Two steps:

  • Predict current state based on last state, taking noise into account
  • Update: takes measured values, and updates the state estimate

Hrm… have a sequence of data coming in. Like a Bayesian filter. You create a model and start adjusting it as you get another data point.

Think of some analogy: every data point you get helps to clarify your estimate. But you know there is likely to be noise, so you need to adjust for this.



Idea: make a javascript kalman filter simulator to see what it’s doing?

SIMPLE scenario: We use a running average.

BETTER scenario: We use a Kalman filter to get a prediction and account for noise… and with a changing signal.

Intuition: Kalman filter is a more general running average. Can account for a changing variable.

Javascript demo: http://leonid-shevtsov.github.io/kalman/

Javascript lib: https://github.com/itamarwe/kalman/blob/master/kalman.js

Idea: modify this. See if it can predict a better path. Contrast with a running average.

Goal: Better Smoothing. Simple moving average, polynomial approximation, etc.

To read: Kalman Filter GPS http://www.cs.unc.edu/~welch/kalman/Levy1997/index.html

Quick intuition:

I’ve done some digging, it’ll take more time to get a solid intuition, but here’s my understanding.

Let’s say we have a bunch of data coming in, and we want to make a prediction about the next point we’ll see.

A simple approach might be to take a simple moving average of the last few points, and assume the new point will be at that average.

Let’s say our first point is 1.

We can’t really make a good prediction, but let’s guess the next point is 1 as well.

The second point is 5.

Whoops, we were off. Ok, let’s guess the next point is going to be 3 (the average).

The next point is 9.

Whoops, we were off again. Let’s guess the next point is the average of the ones we’ve seen so far, or (1 + 5 + 9) / 3 = 5.

The next point is 14.

Whoops, off again.

A human being would quickly notice that “Hey, this moving average isn’t working very well. The points are increasing by 4 or 5 each time and we are really under-estimating the next point.”

The Kalman filter is like a more advanced moving average, which keeps track of how far it was off and tries to adjust its predictions as they come in. It can look back at the previous items and think “which model would have worked well as I was getting these data points?”.

It can only handle systems with certain conditions (linear, time-invariant) but makes much better predictions than a straight moving average. In this case it may say something like:

x_new = x_old + 4.5

One application for Kalman filters for navigation, GPS, etc. is that we know projectiles, cars, people are moving along fairly predictable paths. The Kalman filter can quickly get an accurate estimate for the path compared to a moving average, which is really slow to update and always one-step behind, it seems.

That’s my really high level intuition so far.

Key Intuition:

  • Kalman Filter is a fancy moving average. Instead of a static prediction (“here is the average”) it gives you an equation for the path you are going to take. (In a simple case, it can predict a static value too.)
  • The secret sauce is that it filters out the noise. Any data path you have has noise in it.

Meta-Notes About My Learning Strategy

  1. Research Wikipedia. Pick out some Key words.
  • Why is it a “filter”? Actually, it’s an estimator. That is really good at filtering out noise.
  1. Start looking on YouTube, Google, etc. for presentations
  • Shorter videos are better :).
  • Some examples: stock market price prediction. Navigation/pathfinding. Physics. Face-tracking.
  • It’s claimed to be one of the most important advances in the 20th century. Wow! Why do we never hear about it?
  1. See if you can find a very simple scenario. What does it seem like?
  • Here, it’s moving averages. There was a stack overflow question about how the Kalman filter was different. Compare and contrast to what you know!
  • Kalman filters are like a better moving averages because…
  1. Are there any demos where we can actually understand what is going on?
  • Javascript demos for tracking click position.
  • Learn to separate the goal (build a prediction system) from the implementation details (linear algebra, etc.)