Posted here:
http://betterexplained.com/articles/calculus-building-intuition-for-the-derivative/
The derivative is the heart of calculus, buried inside this definition:
$$ \frac{dy}{dx}=\lim_{dx\to 0} \frac{f(x+dx)-f(x)}{dx}$$
So, what does this mean -- in English?
Imagine I listed out the daily stock market changes for the next few years (up 1%, down 3%, up 5%...). You could apply the changes one-by-one, plot out all future prices, and buy low / sell high. This might entice the monkeys throwing darts at newspapers (they do well, better than most managers) and you'd become a robber barron.
The derivative isn't simply the "slope of a function". Like the stock list, it gives a total understanding of a system. With this knowledge, you can plot out the past/present/future, make predictions, discover miniums/maximums, and yes, staff your simian workforce.
So what's with the gnarly equation above? Let's step back.
Calculus developed over thousands of years: that definition is the "DNA" description of a cat. Let's start at the "cats are furry and cute" level and work our way up.
In a sentence: derivatives create a perfect model of change by refining an imperfect one.
We all live in a shiny continuum
Infinity is a constant source of paradoxes (aka "headaches"):
- A line is made up of points? Sure.
- So there's an infinite number of points on a line? Yep.
- How do you cross a room when there's an infinite number of points to visit? (Gee, thanks Zeno).
Hrm. My intuition is to fight infinity with infinity. Sure, there's infinity points between 0 and 1. But I move two infinities of points per second (somehow!) and I cross the gap in half a second.
Distance has infinite points, motion is possible, therefore motion is in terms of "infinities of points per second".
Instead of using absolute counts (How many points did you move through?) we can compare rates ("How fast are you moving through this continuum?").
Analogy: See division as motion through the continuum
It's weird, but you can think of
$$ \frac{10}{5} = 2$$
as "I need to travel 10 "infinities" in 5 segments of time. To do this, I travel 2 "infinities" for each unit of time".
What's after zero?
Another brain-buster: what's the next number after zero? .01? .0001?
Hrm. Anything you can name, I can name smaller (take your number and halve it... nyah!).
We can't construct or point to the number after zero, but it must be there. Right? (I hope so).
Let's name the mystical gap to the next number "dx". I don't know what it is, I can't say how big it is, but it's there!
Analogy: dx is the "gap" to the next number.
Measuring change
The derivative is about predicting change. Consider the following:
Officer: Do you know how fast you were going?
Driver: I have no idea.
Officer: 95 miles per hour.
Driver: But I haven't been driving for an hour!
95 mph is an instantaneous rate of change: you don't need to go a "full hour" to get your speed!
And how do we measure speed anyway? We take a before and after (over 1 second, let's say). If you go 140 feet in that one second you're at 95 mph. Simple enough.
Not exactly. Imagine a video camera aimed at Clark Kent. It records 24 pictures per second (40ms between photos), and shows him perfectly still. In one second he doesn't move, and his speed is 0mph... right?
Wrong! Between each 40ms picture, he changes into Superman, solves crimes, and returns to his chair. He moves too fast for your camera!
(Imagine a cop seeing your car in your driveway at 9am and 5pm, and assuming it didn't move).
Analogy: Like a camera watching Superman, the measured speed depends on the instrument!
Running the Treadmill
Now we're near the chewy, slightly tangy philosophical center. The change we measure can't be trusted -- it depends on how we ran the experiment!
Imagine a shirtless Santa on a treadmill. We're going to measure his heart rate in a stress test: we wire up dozens of heavy, cold electrodes (circa 1890) and let him start jogging.
He huffs, he puffs, and his heart rate is 190 beats per minute. Tada!
Sorry. See, the very presence of heavy, cold electrodes probably increased his heart rate! We measured 190, but who knows what it'd be if the electrodes weren't there?
It's more accurate to say:
- measurement = actual amount + measurement effect
Ah. If we do lots of studies, we might discover "Oh, we add an extra 10bpm for every kg of electrode". With this, we can subtract the impact and get a "perfect" measurement.
Analogy: Remove the effect of your electrodes when making a measurement!
Understanding the derivative
Phew. Armed with these analogies, we can understand how the derivative predicts changes:
$$ \frac{dy}{dx}=\lim_{dx\to 0} \frac{f(x+dx)-f(x)}{dx}$$
- Start with some system to study: f(x)
- Change it the smallest about possible (dx) to get a before and after: f(x + dx) - f(x)
- I don't know how small "dx" is, but we can get the rate of moving through the continuum: [f(x + dx) - f(x)] / dx
- dx, however small, adds measurement error. Predict what happens if it's not there (take the limit as dx goes to zero).
- Get your final rate of change at every point, dy/dx
This last step is the key. We have two approaches:
- Limits: what happens when dx shrinks to nothingness, beyond our error margin?
- Infinitesimals: What if dx is a tiny number, undetectable to us?
Both are ways to say "What happens when dx disappears (to us)?". Early critics of calculus argued "Calculus is illogical: is dx zero or not? How can it be non-zero, you divide by it, and suddenly you throw it away?"
Example: f(x) = x^2
Nothing like an example to shake loose the cobwebs. How does f(x) = x^2 change?
Well, we can plug it into the derivative:
$$ \frac{dy}{dx}=\lim_{dx\to 0} \frac{f(x+dx)-f(x)}{dx}$$
$$ \frac{dy}{dx}=\lim_{dx\to 0} \frac{(x+dx)^2-x^2}{dx}$$
$$ \frac{dy}{dx}=\lim_{dx\to 0} \frac{x^2 + 2xdx + dx^2 - x^2}{dx}$$
$$ \frac{dy}{dx}=\lim_{dx\to 0} 2x + dx$$
$$ \frac{dy}{dx}= 2x$$
We're describing how x^2 changes over time. Note the difference in the last 2 equations: one has the error built in (see the dx!), the last is the "real" change where our measurements have no effect on the outcome.
Time for some numbers: here's the values for x^2, with intervals of dx = 1:
- 1, 4, 9, 16, 25, 36, 49, 64...
The absolute change between each result is:
- 1, 3, 5, 7, 9, 11, 13, 15...
(In this case, the absolute change is the "speed" since the interval was 1.)
Look at the difference from x=2 to x=3 (3^2 - 2^2 = 5). What is 5 made of?
- Measured rate = Actual Rate + Error
- 5 = 2x + dx
- 5 = 2(2) + 1
Sure, we measured a rate of "5 units moved per second" because we went from 4 to 9. But our instrument tricked us: 4 units of speed was the real change, and 1 unit was due to shoddy instruments (1.0 is a large interval, right?).
If we limit ourselves to the integers, then 5 is the perfect speed measurement from 4 to 9. There's no "imperfection" assuming dx = 1 because yes, that is the smallest interval between two points.
But in the real world, we could move a smaller amount! What if our dx was .1? Would we get a better measurement? Considering the change from x=2 to x=2.1:
But remember, 0.41 is the rate we traversed our interval of 0.1. Our speed is really 0.41 / .1 = 4.1. And again we have:
- Measured rate = Actual Rate + Error
- 4.1 = 2x + dx
This time, the measure vs actual rate is close (4.1 to 4) compared to 5 to 4 when using the giant steps of dx = 1.
Using this pattern, we see that throwing out the electrodes (letting dx = 0) lets us unearth the true rate of 2x.
In plain English: We analyzed f(x) = x^2, discovered the "imperfect" model of 2x + dx, and the perfect model of 2x.
Gotcha: The Many meanings of "Derivative"
The term "derivative" has a few meanings:
"The derivative of x^2 is 2x" means "At every point, you are changing by 2x (twice your current position)"
"The derivative is 44" means "At the point we're considering, we're changing at a rate of 44." For example, f(x) = x^2 at x=22 has an absolute rate of change of 44.
"The derivative is dx" means "The tiny, hypothetical change we're considering is dx". Technically, dx is called the "differential" but I can tell you: the terms get mixed up, and people will say "derivative of x" and mean dx.
Gotcha: Integration doesn't exist
This blew my mind: the derivative has a specific definition. The integral, the opposite of the derivative, does not have a definition.
Here's an analogy:
- I can break a plate (the function) and see a pile of shards (the derivative)
- If you show me a pile of shards, I can guess what plate it may have come from (by secretly breaking a similar plate in the back room, and seeing if the shard pile matches)
See, there's no algorithm to find the integral, or anti-derivative (anti-derivative is a better term, I think). We essentially have a lookup table and say "Well, 2x is the derivative of x^2. Oh, your derivative is 10x? Well... scribble scribble... it looks like that came from 5x^2".
Finding derivatives is mechanical; finding integrals is art. Sometimes we throw up our hands: here's the list of changes (the stock market prices), apply them piece by piece, and give me an estimate of the original pattern (here's the pieces, use some computer modeling to recreate the plate).
Onward
Math is a language, and I'm still learning to "read" calculus -- to see the message, not the grammar, behind the definitions.
My biggest aha! was seeing the transient nature of dx: it takes a measurement, and is removed to make a perfect model.
Limits and infinitesimals are formal ways to "make dx go away". But don't get caught up in them -- they are safety mechanisms to protect us against the criticisms of Bishops. Newton seemed to do ok without them.
Why do we need limits? To make the derivative more accurate. Mindlessly introducing limits before the derivative is like giving spelling tests in a language you don't speak.
When you get the key idea, questions suddenly become interesting:
- What are the rules for making "dx go away?" (How do infinitesimals and limits work?)
- We can describe numbers without being able to write them down! ("The next number after 0"). Whoa! (Beginnings of analysis)
When the analogies are in your head, the questions become interesting. Happy math.
Extra Notes
Now, this isn't perfect: sometimes the effect of "putting dx in and taking it out" gives a different answer from dx never being there in the first place.
Functions where this happens are called "discontinuous" -- they're often jumpy and don't behave nicely! Thankfully, most "real world" (i.e., what you see in science) equations are nice and smooth, and our models of what happens when dx disappears is truly accurate.
Pretend you have a video camera aimed at, recording 24 pictures per second
- We need to make a perfect model of how something changes
- We make a guess with flawed instruments
- We modify the guess to see what'd happen if our instruments were perfect
However, it's taken humanity thousands of years to create the math:
The derivative lets us predict behavior.
$$ d/dx = ... $$
roll our eyes, and begin learning to use it.
The
"The essence of calculus can be understood with a few metaphors". In a phrase:
1) Calculus is based on a funky definition: here it is [derivative]
- Show analogies for each part of the definition
2) We can decipher each part with a few analogies
- Measure changes with the continuum with a rate
- Need a before-and-after
- Want the change to be small
- Use the model to predict a perfect-model
3) Tada: Put the analogies together: "Use a flawed model to predict the behavior of a perfect one".
- Generalization!
- Taking surveys (there is a "survey effect")
- Diets, people's behavior, etc. Your measurements effect the outcome!
- Figure out how large the "survey effect" is and correct for it.
- Publication bias, etc. (we get filtered reality).
4) Notes / caveats
- No analytic way to find the integral [really just trial and error]
- Difference from limit / infinitesimal -- justifying the step
==============
I've struggled to be fluent with Calculus. Sure, anyone can memorize stock phrases (Donde esta el bano?) and translate in your head -- how do we become fluent?
Immersion, analogy, and relating to what we already know. Calculus is the culmination of thousands of years of math and philosophy. Let's not pretend it's this simple 1-semester class we can memorize and move on from.
No, it's a way of thinking. Let's build it with analogies.
The essence:
We want to measure a changing thing. But the only way to measure it is to change it.
Did our measurement effect the result? Probably. So, let's make an estimate of what the measurement would be... if we had never measured (ooh, meta).
The continuum is infinite. Yes. But our rate of traveling through the continuum is infinite as well. These things can cancel.
Division gives us a rate of traveling through the continuum. We just don't call it that.
A number line has an awful lot of points
You've probably heard of Zeno's paradoxes:
- How can you walk across a room? First, you have to get halfway there (call it B). But to get to B, you have to get halfway there, to C. But to get to C, you have to get to D... and since you can't move through an infinite number of points, you can't arrive anywhere at all.
Most curious. And yet we move.
My resolution: Fight infinity with infinity.
Motion is also through a continuum. If going 10 feet means traveling through "ten infinities of points", well, that's ok: I'm moving "1 infinity of points per second" and will get there in 10 seconds.
What's after zero?
Take a ruler and shave off the bottom number (0). Drop it on the ground.
What position hits first? .1? .01? .0000000000000001? What's the smallest number?
I don't know. I can't constructively say how we move from 0 to the next smallest number. But "motion" somehow figures this out, and we can move smoothly from 0 to 10 in ten seconds.
Math phrase: Let's call the distance to the next point "dx". No, we don't know exactly how small it is. But it's there, and moving through the continuum makes it "ok".
Measure change with change
Imagine two arrows. I shoot one from a bow, and drop the other towards the ground.
And... time freeze! Like the Matrix (or Saved By The Bell), I halt all time, the arrows are stuck in place, and I walk around.
What's the difference between the arrows? How can I tell, when time is frozen, which is going to fly straight ahead and which will drop? Is there some property in the atoms that says "Hey, when this bozo remembers to un-time-freeze, keep going forward at 200 mph".
I don't think so. The "solution" to figuring out which arrow is which:
- Unfreeze time by a tiny, teensy, amount and see which direction each arrow nudges forward. Use that to figure out their speed (if the arrow nudges forward 1 millimeter in one trillionth of a second, I can figure out its speed).
Math phrase: Let's take the difference between two points in the continuum: x, and the next point, x + dx. f(x + dx) - f(x)
What's up ahead? Division vs. Subtraction
If I move 10 feet in 2 seconds, it means I moved
"10 infinities of feet in 2 infinities of seconds"
Sort of weird, right? But it gives us a rate:
10 feet / 2 seconds = 5 "infinities per second"
Even for generic "units"
10 units / 2 units = 5 "infinities of units per unit"
I.e., for every unit I go, I'm moving through 5 infinities of units.
Arithmetic takes on different meanings depending on what we're counting with. With integers, sure, multiplication is repeated addition. But it can be scaling (reals) or rotation (complex numbers) as we get more advanced.
With integers, division is "splitting into groups". Fine. But with real numbers and decimals, division is more "What is your speed moving through the continuum?"
10 / 5 = 2 means "If you divide 10 into 5 equal units, through each unit your speed is 2".
Now, we could give the distance from each unit to the next:
10 / 5 = 2 means "Start at the origin and move 2 ahead to get to each unit". In fact, subtraction is the raw distance, and "rate" is the distance / time.
The problem, as we saw before, is we have trouble with raw distances when dealing with continuums (what's the raw distance from 0 to the smallest positive number?). But we can deal with rates (the number line advances from 0 to 1 at a rate of "1". I don't know how, but I do know the rate).
It's a bit like saying "Oh, your next exit is about 5 minutes up the road." You didn't give a mile distance, you assumed a rate and gave a time to follow that rate.
It's definitely tricky. These are deep philosophical paradoxes. I'm trying to wrap them into the language of calculus.
Measurements are there and not there
Suppose you're measuring someone's heart rate in a stress test: hop on the treadmill and wire up the electrodes.
Now, imagine the electrodes weigh 25 lbs and randomly electrocute the patient.
What happens to the measurement? You might get a heart rate of 190 when their real heart rate is 140. The presence of the electrodes is tiring and scaring the patient, raising their heart rate! What you measure is not what is real.
The formula might be like this:
Measured heart rate = Real heart rate + weight of electrodes * 2
so
190 = 140 + 25*2
Clearly, you want the electrodes to be small and unintrusive -- but there's always some effect you need to account for.
Ah. There's a paradox: you need the electrodes to make the measurement but afterwards adjust the measurement as if the electrodes weren't there.
This is the "is dx zero or non-zero?" argument made against Calculus (the ghosts of departed quantities). My answer:
We use dx to create a model. Using the model, we predict what would happen if dx wasn't there! This is our estimate for the "true" speed (derivative).
Pretty wild, eh? By the way, this is a general concept:
- When doing studies, people are on their best behavior. Do you account for the fact that you're doing a study and adjust the results afterwards? Everyone sticks to their diet more when they're being watched!
In math: We try to measure the speed, over interval "dx":
We measured a difference of: f(x + dx) - f(x) / dx
and then we try to get the "real" distance: what would the speed be if dx wasn't there?
In math terms, we can say 1) take the limit as dx -> 0 or 2) let dx be an infinitely small number (smaller than we could ever measure).
Here's the rub: sometimes this limit exists, sometimes it doesn't. When the limit does exist, and our prediction works out, we call the function "differentiable". But some functions are finnicky (thankfully not many in the real world) and the "predicted speed when dx=0" is not "the actual speed at that point".
We have to make a measurement to see some change (remember the arrow paradox). And most of the time, we can account for the size of our change and remove it. But some finicky functions aren't well behaved, and we can't predict their speed when our dx is 0 (by a limit or by infinitesimal).
These are the subtleties of learning calculus!
Now, we can learn to read the definition of the derivative, of the integral, and how they're related:
- Derivative: Let's predict the speed of moving through the continuum at every point (distance / time = speed)
- Integral: Let's move through the continuum at that speed (multiply speed * time = distance)
They are inverses, which is the fundamental theorem of calculus. But... the derivative gives "change in distance" not absolute, so if you had a starting value it would be lost (changing from 10 to 20 looks the same as 20 to 30, they are both +10). So you account for that with the initial conditions, C.
Gotchas
There are lots of subtleties when someone says "derivative"
- They can mean "dx", the modeled change to the next point
- They can mean dy/dx, the speed you are moving through the continuum (a general formula for your speed at any point, like dy/dx (x^2) = 2x)
- They can mean a specific value (dy/dx(x^2) at x = 5 is 10)
The idea of the derivative: If you follow the path x^2 [at every time x, you are at position x^2], what is your speed? (2x)
The idea of the integral: If you keep your speed at 2x [at every instant, make your speed to 2x], what path will you take? (x^2)
Derivative: Given distances, get speed.
Integral: Given speed, get distance
Notice how they break down for simple cases: Derivative of 2x is 2 -- at every instant, your speed is 2. (The pattern 0, 2, 4, 6, 8... has a derivative of 2... each neighbor is 2 away).
Integral of 2 is 2x. If your speed is 2, then at time x you've gone 2x.
Key insight:
- dx, dy, dz are miniature scaffolding we use to make a model, get an estimate, and then remove (with limits / infinitesimals)
- Continuous functions are ones where this scaffolding trick "works" (the estimate for when dx = 0 is actually what happens when dx = 0)
The goal is to get an intuition for what the derivative and integral are doing.
The derivative is breaking plates
The derivative "breaks" a function into its speed:
2x becomes 2.
The integral tries to recreate the function given its speed. And it's not easy! The operation is implied, i.e. "what function, when broken, looks like this?"
If I ask "What is the integral of 2" you say "What function, when breaking down its speed, gives 2? Oh, it's 2x. So the integral is 2x".
See how you worked backwards? It's like saying "What plate, when broken, gives this pattern? (And you show some shards on the ground)."
You start breaking plates and hoping one of them gives a pattern like the one on the ground. "Oh, it turns out that this green plate here, when broken, gives the pattern you're seeing."
It's an art, especially when the shards get complicated! It's trial and error to some extent, with heuristics (and computers) to help along the way.
Sometimes it's not possible to get an "analytic" function which behaves that way, so you have to just follow the speed and create your own function (following the speed you're giving us seems to have this path... so we'll call that the original).
Summary
Build a model to see how fast we traverse the continuum [whoa, pretty out there].
Notes
On Zeno's paradoxes:
"...and surely an infinite number of tasks cannot be completed in a finite period of time?"
No. We are moving through the continuum, always. We are always doing an infinite number of tasks. 4 infinities vs. 2 infinities.