Finished post:

How To Understand Mathematical Convolution

Like making engineering students squirm? Ask them to explain convolution. They’ll mumble about sliding overlaps as they themselves inch towards the door.

Convolution is usually introduced with the formal definition:

$$ (f * g )(t)\ \ \stackrel{\mathrm{def}}{=}\ \int_{-\infty}^\infty f(\tau) g(t - \tau) d\tau $$

and we’re left to unwind this baffling integral ourselves. Ugh. Here’s the plain-English analogy that helped me:

Convolution is multiplication that accumulates effects.

A Quick Analogy: Hospital Treatment Plans

Imagine you’re treating patients with a rare disease. You have:

  • A treatment plan: Everyone gets 3 units of the cure. (Plan: [3])
  • A list of patients: You see 1 person on Monday, 2 on Tuesday, 3 on Wednesday, 4 on Thursday, and 5 on Friday. (Patients: [1 2 3 4 5])

How much of the drug is used each day? Well, that’s a simple multiplication:

Plan  *  Patients       = Total Daily Usage

[3]   *  [1 2 3 4 5]    = [3 6 9 12 15]

The plan can be multiplied with the patient list, giving the usage pattern [3 6 9 12 15].

Extend To Multi-Day Treatment Plans

Unsanitized ping-pong paddles have led to a mutated disease. After furious research, you devise a new treatment plan: 3 units the first day, 2 the next, and 1 on the final day.

Plan: [3 2 1]

Assume the same patient schedule: [1 2 3 4 5]. How much of the drug do you need each day?

Oh, that must be an easy multipl… wait a minute.

  • Monday: 1 patient comes in, she gets 3 units.
  • Tuesday: The Monday gal gets 2 units, but we have two new Tuesday patients, who get 3 each. The total is 2 + (3*2) = 8 units.
  • Wednesday: The Monday gal gets 1 unit. The two Tuesday people get 2 units each. But there are 3 new Wednesday people, ah… forget it. Everyone just grab some pills.

The details are getting mixed together. Can we organize them better?

Here’s an idea: imagine flipping the list, forming a line where the first patient is at the front:

           Start of line
 5 4 3 2 1

Next, imagine we have 3 rooms where we apply the cure:

 Rooms     3 2 1              

On your first day, you walk into the first room and get 3 units of medicine. The next day, you move to the next room (2 units), and the next day, you’re in the last room (1 unit). Then you’re done.

To work out the overall usage, imagine the patients are lined up and walk forward each day:

 Rooms                  3 2 1              
 Patients       5 4 3 2 1

 Usage                  3

Neat! On Monday, only the first patient is in a room. She gets 3 units, for a total usage of 3. Makes sense, right?

On Tuesday, everyone takes a step forward:

 Rooms                  3 2 1              
 Patients         5 4 3 2 1
 Usage                  6 2      = 8

Ah! This is simpler than the text description. The first patient is at the front of the line, but in the next room (2 units). We have two patients in the first room for their first day (2 * 3 = 6 units). The total is 8 units.

Imagine the list “walking forward” each day:

 Rooms                  3 2 1              
 Patients           5 4 3 2 1

 Usage                  9 4 1    = 14

 Rooms                  3 2 1              
 Patients             5 4 3 2 1

 Usage                 12 6 2    = 20
 Rooms                  3 2 1              
 Patients               5 4 3 2 1

 Usage                 15 8 3    = 26

Neat, right? We have a strategy to predict the usage any day. Just flip the patient list and “walk it forward” to whatever day you need the usage for.

Here’s the daily usage:

Plan      *  Patient List   = Total Daily Usage

[3 2 1]   *  [1 2 3 4 5]    = [3 8 14 20 26 ...]

(Note: we’ll need drugs for Saturday and Sunday since the Friday patients have to finsih up.)

Discovering The Official Definition

Convolving two functions is written with $f * g$, with an asterisk (beware: usually, an asterisk means regular multiplication).

The convolution gives the total usage when applying a plan to a set of patients. What did we do, exactly?

  • Get $f(x)$, the plan to use. In our case, the plan was [3 2 1]

  • Get the inputs, $g(x)$. However, we want to flip them so the first input is on the right. This means we actually want $g(-x)$, which is the mirror image of $g(x)$.

  • We pick an amount of time to advance, call it t. This means we really want $g(-x + t)$.

So, we now have our setup:

  • f(x) : the plan to use
  • g(-x + t) : the inputs, in right-to-left order and slid to the correct position

Finally, we’re ready to compute how much “total medicine” is needed by multiplying each input by the part of the plan it’s on, and adding up the results.

This is a job for the integral, which is “piece by piece multiplication”.


Here’s the official definition:

[ ]

Show image, and colorize it.

PS. “Simplifying” to $$ f(x) \cdot g(t - x)$$ hides what we are doing: we flip g(x), then we slide it over!

See math as a DESCRIPTION of something you know, not a set of symbols that, you know, may or may not have some meaning behind them.

Convolution is seeing the result of a multi-step plan on your input.


  • We flip the patient list
  • We slide the plan forward the correct number of days.
  • After we’ve slid, we take the integral of whatever overlap is there.

It’s written -g but its better written

$$g(\text{flip plus slide})$$
$$g(-t + T)$$

Convolution represented with asterisk (*), which gets confusing since in most cases, people (informally) mean regular multiplication. But you can see why: convolution is a turbocharged multiplication.

Convolution steps in when multiplication can’t handle the job.

Technically, we can flip convolution around. In practice, we have the “plan” (called a kernel) and the “patients” (the data). We apply the plan to each patient and want to know the total outcome.

Properties of Convolution

  • Backwards and forwards
  • Total integral (total usage)
  • Fourier Transform

Properties Of Convolution

  • reversing the order

Imagine the patients are immoble. So, we have vans that drive

The patients are lined up:

  1 2 3

      1 2 3 4 5

Every day the vans drive forward and treat the patients they see.

  • We flip the patient list, and they walk into the rooms (earliest patient first)
  • We flip the treatment plan, and the vans drive to the patients (earliest van first)

Neat, right? So f * g = g * f.

The patients walk through the hospital, or the vans move and attend each patient.


Blurring an image
Taking an average
Taking a moving average

Total Usage

Slide It By

Here’s a few ways to imagine it

  1. Line everyone up.

The patients get lined up. You have a van and drive down the line. Each day, you move one unit down.

  1. Have patients walk through.

You have a static van. The patients walk through. Each day, they get up and move one room to the right.

The convolution is how much medicine is being used that day. It’s a combinatino of how many patients are being treated, and what stage of the plan they’re on.

Useful, right? Can also work backwards.

What is convolution? Given a sequence of incoming patients, and a known multi-day antibiotic treatment plan, convolution finds the total daily usage pattern.

How? Each day, count the patients at each stage of treatment, multiply by the dosage at that stage, and add it all up.

Why? The overall usage is eye-opening compared to just analyzing the sequence of patients and the treatment plan.

In Math-English:

  • Convolution applies a sequence of inputs to a system where the output is a function, not a single point. Because the output can linger (treatments started yesterday still impact today’s usage), we must account for changes from previous inputs.

In the math world, inputs return instantaneous outputs. When y = f(x), we put in x, get y, and that’s it. A single output was made, nothing more!

Real-world systems aren’t instantaneous. Clap your hands and the sound wave ramps up, peaks, then ramps down. The output is a smushy, smeared wave, not a perfect click.

Convolution gets the overall effect when outputs are allowed to “blur” over a duration.

Examples: what things blur?

literal blurring of images

if we apply a PERFECT spike … then we can extract the original.

Fourier / LaPlace overlap

Just intuit the reasons why.

I want you to INTUITIVELY see how convolution is EXTREMELY useful.

You have monthly customer numbers: 100, 200, 300

and you have their monthly average spend

$20, $10, $10, $10, $5

The convolution gives you the revenue graph.

How many customers at what stage of the lifecycle?

Use it everywhere.

Continuous version? Probably not. Go for the discrete.

Wolfram alpha example.

Work through properties:

  • integral totals

  • Fourier/LaPlace transform

  • reversible

  • the identities. Is there a “treatment plan” that would just count up the number of patients? Yeah! (1 0 0 0)… the dirac delta function. or the discrete delta. it’s a perfect spike. one input, one output.

  • require linear, time-invariant systems. linear: pieces just add up, no interaction. time-invariant: it doesn’t matter when a signal shows up (Monday, Friday, etc. Labor Day, Thanksgiving). Same treatment plan applies.

  • Can use it “like a logarithm”. Logs turn exponents into multiplication. Multiplication into addition.

Convolution turns Fourier Transforms into multiplications. So you can solve it, then undo.

Can see it as an ACTUAL TOOL (“applying one to another”) or just some intermediate representation which is more useful to manipulate things with. Solve, then undo.

Fourier breaks interaction into frequency components. Only inputs and outputs of the same frequency overlap and make a contribution. So convolution can be written with point wise multiplication (convolution theorem).

Fourier transform of the convolution is another integral… This double integral can be split into two point wise integrals. Math mechanics vs. intuition about interacting frequencies… Only same freq have an overlap.

Use linear algebra terms here…

for inverse convolution, need to REDUCE the area that was added [right?]

convolve arrays… have a js demo

Inverse convolution: restore original. Have the blurry, remove the effect of the camera lens. Bad focus? Inverse convolve.


(Do a transcription! Just write as I speak. See what I can write here.)

Technical Appendix

  1. Impulse response

Sounds so fancy. “A LTI system is characterized by its impulse response.” Translation: Send a SINGLE patient through to figure out what the baseline treatment plan is. Now that you have the plan, you can scale it up to any number of patients. [Imagine a rival hospital sending a patient in and taking notes about the treatment.]

Send the “impulse” which is just [1]. A single patient. If you send a single patient through, you’ll get a copy of the treatment plan back out! [What happens to them each day].

  1. Linear and Time-invariant systems

If a system is linear [double the patients, double the treatment] and time-invariant [Your treatment just depends on when YOU show up, not a global time. I.e., whether your first day is Monday or Thursday, it doesn’t matter – you get the same dose]. We only care that it’s your first day, not whether it’s Monday or Thursday.

[You can imagine a system where your treatment depends on what day you show up, which staff is working that day, etc. To make it simpler, we assume everything is identical (non-variant) no matter the date you ask for treatment.]