My intuition is that we need a way to represent the “work” an algorithm needs to do. How can we represent work though, since it depends on the size of problem?

That’s where Big-O comes in. It’s basically the function that measure how much work an algorithm needs to complete. Now we can compare how efficient two algorithms are by comparing their Big-O functions.

Big O notation doesn’t just provide a way to account for the “work” the algorithm has to do based on the input size; it’s entire point it trying to represent how the work changes based on the input size, i.e., how the algorithm scales.

Bob can spend 2 months cutting the time your algorithm takes by a factor of then, but the Big Oh won’t care. Because infinity doesn’t care. Letting the input size go to infinity lets us focus on the most important part of this relationship and disregard slower growing terms and constant factors that don’t matter then.

[Notational Sidenote:] Actually, even if Bob modifies his algorithm to depend on the input size linearly instead of cubicly, his algorithm will still be be $O(n^3)$. Here, Big Theta ($\Theta(f(x))$) and Small Oh ($o(f(x))$) are useful to know alongside their more popular brother, Big Oh notation: they represent “grows as fast as” and “grows at least as fast” as respectively and can clear up a lot of confusion. I still would’ve preferred that we just had Big Theta notation and used the $\le$, $=$ and $\ge$ signs as appropriate.

Yep, this was a quick note from an email. I think the key intuition is that we can distill an entire algorithm into a function that accounts for how much work we’ll need to do [f(n)]. We can then compare the asymptotic behavior of these simpler representations.

Whoops: I to clarify, it should be something like “Ignoring constant scaling coefficients, create a function representing the work the algorithm does to complete its task”.

(I’m trying to express/hint at the key intuition that we’ve converted vastly different algorithms into simple functions that can be easily compared.)