I found this article helpful: http://www.slimy.com/~steuard/teaching/tutorials/Lagrange.html

My rough intuition is that maximizing a function subject to a constraint means we are doing the “best we can” with respect to two different functions. The gradient of each function points in the direction of “best increase”, so we try to find a point that is pointing the best way on both functions.

Hence we need grad F = grad G, with a scaling factor allowed. We also need the point to meet the constraint, so we have 4 equations (each gradient component must match, along with the constraint function being true) with 4 unknowns (x, y, z of point, and the scaling factor).

Also: http://tutorial.math.lamar.edu/Classes/CalcIII/LagrangeMultipliers.aspx for examples and more discussion.

H(x,y,z) = F(x,y) [H is equal to our objective function]

H(x,y,z) = F(x,y) + g(x,y) [H is equal to our obj. function + constraint]

g(x,y) = 0 when we’re on the boundary

g(x,y) = c when we’re off the boundary

on the boundary: F(x,y) + 0

off the boundary: F(x,y) + c

F => objective function

H => objective function WITH PUNISHMENT

If I optimize H, that means we’re at a critical point for H.

When you’re at the critical point on H, these are true:

dH/dx = 0 (How much H changes when we increase x, our position)

dH/dy = 0 (How much H changes when we increase y, our position)

dH/dz = 0 (How much H changes when we increase our punishment. But… we should have 0 punishment, so increasing the punishment factor doesn’t impact us. “Increasing speeding fine, but we aren’t speeding anyway.”)

These conditions don’t HAVE to be true for all values of (x,y). They are simply true for the values of (x,y) that are on the constraint [no punishment] and are also critical points for H [dH/dx = dH/dy = 0].