How To Understand Derivatives: The Product, Power & Chain Rules

The jumble of rules for taking derivatives never truly clicked for me. The addition rule, product rule, quotient rule -- how do they fit together? What are we even trying to do?

This is a companion discussion topic for the original entry at

Awesome article. The description of the product rule really changed how I think about them.

Out of curiosity, how do you think your idea of the power rule extends to negative, fractional and irrational powers? It’s a bit harder to think about since you can’t just split them into linear parts.

Hi Gourav, thanks for the note. Great question about the negative, fractional and irrational powers. To follow the analogy, we could use the chain rule; suppose we have f(x) = x^-3. See x^-3 as shorthand for 1/x^3. We can do:

d/dx x^-3 = d/dx 1/x^3 = d/dx 1/u = -1/u^2 * du/dx

du/dx can be understood intuitively (3x^2), and we divide it by (x^3)^2. We can see the x powers fight it out as (x-1) - 2x = -x - 1 [The (x-1) power is from du/dx, and -2x is from 1/u^2. With x=3, get -3 - 1 = -4 as the power]. Notice how we still brought down the “3” (which was in du/dx). Hope this part made sense.

Once we get to fractional and irrational powers, it’s probably easiest to rewrite things in terms of e: x^3.4 = e^[ln(x)*3.4]. From here, we can use the chain rule and product rule and exponent rule (to be explained next time) we can get the result. Essentially, even a complex idea like a fractional exponent can be further broken down. It’s something I’d like to write more about – it’s helping to really test my intuition :).

Thanks for your extraordinarily simple explanations of calc! I’m currently a sophomore in high school, and I could have just waited until next year to take the class, but I’ve wanted to learn for too long already! It’s amazing how simple and easy the math of change is! Now I get to make the semi-intelligent juniors feel dumb for someone a grade below them knowing more about the subject than they do…

Great as usual. In the spirit of Deming I offer this. The f’(df/dx) in the first two line will confuse some…perhaps f’ (aka df/dx).

Keep figuring it all out and sharing with the rest of us.

Thanks Mark – great suggestion, just updated :).

@John: Thanks for the comment. Intuitively, the quotient rule can be seen as a variation of the product rule since division is a variation of multiplication (in my head, “multiplying by a quantity that is getting smaller”). So, the quotient rule should look a lot like the product rule (two “slices” to take into account), but one of the slices is a shrinking one. I’ll be posting a follow-up soon.

And great gut-check by the way. If a concept isn’t clicking deep down it means there’s more intuition to build (and, probably, the explanation can use some refinement :slight_smile: ).

@Jisoo: Great, I like the simple diagrams!

Great explanation. Especially about the chain rule. Thanks a lot.

@Hitoshi: Thanks, glad it helped!

@Phoenix: Thanks for the comment. Yep, that’s the essence of it – to get more particular, turn the 1 into “dx” (the amount of change, so it’s (x+dx)^2), then do the binomial theorem and throw away the dx at the end (i.e., assume your change was “perfect”).

what s the payoff in learning all of this… the bank account metaphor was insightful, poupulations models might serve as good example, but also remember that learning is different is for each individual depending on what resides in their subconscious…

First of all, great initiative and material. Loved the way u analyse things. I read this page a couple of months ago.
Recently I also read about binomial series and somehow I was able to narrate how the power rule was actually derived. So here it goes.

Lets take the simplest function y = x^2. Now what do we mean by derivative? It is simply the change in the output when we tweak the input a little.

Now lets take two number x and x+1. Now I want to find out how much y changes when we change x.
Change = (x+1)^2 - x^2
Conventional calculus tells us that it is 2x.
But the actual value can be obtained by using binomial theorem.
We all know (a+b)^2 formula, = a^2 + b^2 + 2
Now (x+1)^2 = x^2 + 1 + 2
Change = x^2 + 1 + 2x - x^2 = 2x + 1
Haha , we have arrived at the answer. The calculus value and the actual value differ by 1. To remove that we apply the rule that the change in input is very very less when compared to the input value. x>>1.
Applying the above, we can approximate 2x + 1 as 2x.

In the same way , I applied the same to x^3 and the difference is 1 + 3x1(x+1) which can be approximated as 1 + 3x(x) which can be reduced to 3x since x>>1.

In general, for x^n, we have n+1 terms in the series. Of that we omit all powers of n upto n-2. We take only x^n and x^n-1 terms. The co-efficient of x^n-1 is n and hence the power rule is given as

(d/dx) of x^n = n*(x^(n-1)).

Thanks to the admin for invoking a interest in me to solve this. Hope it helps.

Thanks Denis, awesome to hear. I think any subject can be naturally entertaining, as long as we’re focused on really building our intuition for it (otherwise, you’re right, it feels like a lot of effort).

I found the following website useful for understanding the product rule using what I already know.

Awesome stuff as usual! I love this website it is more entertainment than education for me. You get to learn how interesting things work without the effort.

[…] Post navigation ← Previous […]

Nice post Kalid. I’ve spend the last couple of hours trying to develop this ‘machine-like’ intuition. Any chance that you could also post some examples on the quotient rule. I’ve been trying to work it out on my own but haven’t managed to get there. Honestly this is slightly worrying. I feel that if I truly understood what you are saying then the quotient rule should be no big deal. Thanks.

@Bill: It’s a big, bright world out there!

@N: Wow, thank you for the heartfelt comment! I appreciate the encouragement, I plan to keep cranking :).

You’ve come a long way since you left us, haven’t you, Kalid?

If our input lever is at x = 10 and we wiggle it slightly (moving it by dx=0.1 to 10.1), the output should change by dy. How much, exactly?

We know f’(x) = dy/dx = 2 * x
At x = 10 the “output wiggle per input wiggle” is = 2 * 10 = 20. The output moves 20 units for every unit of input movement.
If dx = 0.1, then dy = 20 * dx = 20 * .1 = 2

To be clear, let me explain what I’m confused with. “The output moves 20 units for every unit of input movement.” What are we calling a unit? An integer? A dozen? twenty? (.1? 1? 20?)

The other thing that confuses me is that you go form dy/dx = 5 to a whole different formula/equation. Consistency would help those not as literate in mathematics out (like myself) quite a bit.