How To Analyze Data Using the Average

vindl · July 25, 2008, 12:57pm

…where X1

vindl · July 25, 2008, 1:04pm

I can’t write ‘less than’ in comment. 8)
Where Xn = max(X1,…, Xn) then

  n * Xn

Xn/X1 + … Xn/Xn

When we divide numerator and denominator with Xn we get the harmonic mean formula. So its the same thing i guess.

kalid · July 27, 2008, 2:38am

Hi Vindl, thanks for the comment! That’s a neat way to link the arithmetic and harmonic mean formulas, thanks.

rakesh · August 23, 2008, 6:23am

Please delete the 15th comment. my email id is published there actually.

kalid · August 23, 2008, 7:41pm

Hi Rakesh, no problem – I removed it from the comment.

Anonymous_User · November 28, 2008, 5:00pm

I don’t understand the geometric mean, especially concerning the widget factory.

Let’s use another example. Say a machine preps widgets at 5 widgets/hour and a second machine finishes the widgets at 1 widget/hour. Wouldn’t the rate at which widgets are outputted 1 widget per hour?

After waiting for the very first widget to be prepped, the “finishing” machine essentially has an unlimited amount of “prepped” widgets to work with since they’re bunching up, waiting to be finished up. So the rate at which the “finishing” machine processes the widget (1 widget/hour) ends up being the rate at which widgets are leaving the whole system.

Your harmonic mean method suggests that if the two machines were replaced by two machines with the identical rates of 6/5 widgets per hour (2/((1/5)+(1/1))), the system output would be the same!

So what am I missing?

Anonymous_User · November 28, 2008, 5:02pm

Regarding my last post, the geometric mean would actually suggest a rate of 5/3 widgets per hour, not 6/5.

But this is still greater than 1 widget per hour, so my question stands.

kalid · December 4, 2008, 10:22am

@G: Great question, I had to think a bit to make it clear in my head. Let’s think about the production rates without using the geometric mean.

We can prep 5 widgets/hour and finish 1 widget per hour. How long does it takes to do a single widget?

Well, we spend 1/5 (12 minutes) on prepping and a full hour on finishing.

So, we can complete 5/6 of a widget per hour through the entire system. Even though the last stage can finish a widget in 1 hour, we spent 12 minutes getting to that stage. No single widget ever takes less than an hour. [Think of the machine as a black box – you drop in a widget, and it drops out 1:12 later. We don’t know that it really has two separate stages].

Now, suppose we wanted to replace this black box (with two stages) with parts that operated at the same exact speed. That is, we want to finish 5 widgets in 6 hours. How fast should each part go?

Well, they need to move identically. That means they each only have 3 hours (half of 6 hours) to do their work. So, each part must operate at 5 widgets in 3 hours, and then pass it to the next one.

This new black box, with 2 parts operating at 5 widgets every 3 hours (5/3 widgets/hour) would look exactly the same as our original.

The geometric mean gives the same result: (2/(1/5 + 1/1)) = 5/3 widgets/hour.

Since we must put widgets through both stages, the overall rate is 5/6 widgets/hour (half of that).

Hope this helps!

Anonymous_User · December 5, 2008, 3:23pm

Thank you for your answer.

I read and understood your post. However I don’t think your replacement parts would be appropriate replacements for a system of constant widget production.

While your replacement parts do simulate the black box perfectly with the example of one widget constructed, a situation of constant widget construction would have the rate of construction approximate 1 widget per hour the longer the system runs.

The very first widget involves the “finishing” machine waiting for 12 minutes before doing its work. Then the “finishing” machine does its work and 1h 12m later from the input into the system, out comes a widget. However, that finishing machine then IMMEDIATELY gets started on the next widget. And 1 hour (not 1h 12m) after the first widget is outputted, the second is outputted!

The limit of the rate of construction for the system as time approaches infinity is 1 widget per hour (that 12 minute wait at the very beginning will always prevent the overall rate from ever reaching 1 widget per hour)

The reason I bring this complication up is because I still don’t understand the value of a harmonic mean in a system involving components that depend upon each other for their rates. And most systems do have such components.

Thanks again!

kalid · December 6, 2008, 2:12am

@G: Great question – I really had to think about the two situations because viewpoints “make sense”.

The harmonic mean finds the average rate to produce one widget (1:12). That is, every single widget through the system spends 1:12 on the assembly line: 12 minutes on the first part, 1 hour on the second.

However, the harmonic mean only models 1 widget at a time. It isn’t complex enough to model optimizations like pipelining, where you push widgets through the first part even while the second is still working (after all, how long is “long enough” to take the amortized analysis? :-)).

The benefit is that you can compare similar production lines without having to make assumptions about whether the pipelines are full, the impact, of gaps, etc. So you can (quickly) compare a production line with 5 wph (widgets per hour) prepping & 1 wph finishing, with an alternative with 3 wph prepping & 2 wph finishing.

Another subtlety is that people don’t care which widget they get; widget A = widget B. But suppose this was a carwash instead: 5 cars per hour washing, and 1 car per hour polishing. You drop off your car – how long to get it done?

Clearly it’s 1:12. You don’t want just any car from the line (i.e. the next one the polisher spits out), you want the one you put in.

So, the harmonic mean takes a simple scenario where pipelining / amortized analysis isn’t involved, and assumes the widget you put in is the one you need to get out. Speaking of bottlenecks, “The Goal” is a pretty interesting look at production lines, etc. I think modeling the net output of such lines requires much more sophisticated analysis (pipeline stalls, etc.). It is true that a system, over time, will approach the speed of the slowest-moving part (1 wph). It is interesting to see that the speed of the entire system (1 wph) need not be the same as the time needed to process a single widget, due to pipelining. (Computers operate a similar way, and can complete 1 instruction per cycle even if a single instruction takes several cycles to complete – the last stage is always pumping one out.).

Hope this helps!

-Kalid

Anonymous_User · December 6, 2008, 3:50am

Helped a lot, thanks!

kalid · December 7, 2008, 5:14am

@G: You’re welcome, thanks for the interesting discussion.

Anonymous_User · December 8, 2008, 1:28pm

great post Kalid, few thoughts from my side …

I feel Harmonic Mean is not a right measure in your widget example or even for that matter even in the car wash example.

“But don’t we need to know how far work is? Nope! No matter how long the route is, X and Y have the same output;”

If I understood this correctly, Harmonic Mean is used to compare two different rates giving similar output, in widget or car wash examples we are talking about two different outputs at different rates aint it…

denise · February 9, 2009, 3:43am

If you drive one mile at 30 mph and one mile at 40 mph what is your average speed for the two miles?

Please help me understand this question and the answer. Thanks Denise

mello · February 14, 2009, 3:33pm

Wow!
You took me out of the rain!
Your explanations helped me a lot. Thanks!

Anonymous_User · May 7, 2009, 7:32am

[…] Intro: Mental math shortcuts, adding 1-100, how to learn math, understanding averages […]

Anonymous_User · May 21, 2009, 8:01pm

Best example I’ve ever seen of where average doesn’t work out: “The average net worth of Bill Gates and ten homeless guys is over a billion dollars.”

Anonymous_User · May 21, 2009, 8:15pm

On harmonic means: They used to give us puzzles in school like “The cold water tap can fill the tub in 20 minutes. The hot water tap can fill the tub in 40 minutes. The drain can empty the tub in 30 minutes. Both taps are turned on but the drain is accidentally left open. How long will it take to fill the bath?”

Those puzzles drove me up a tree because although I could memorize the formula, it still made no intuitive sense.

Finally I figured it out: Your intuition says that the values for the taps and the drain should simply add and subtract. Your intuition is right, but the question is add and subtract what?

Just adding times doesn’t make any sense, because that says the cold and hot water taps working together should add up to an hour, but we know that using them together would make the time shorter, not longer.

The answer is that it’s RATES that are adding and subtracting.

So the cold water tap fills at the rate of 3 bathtubs per hour. The hot tap runs at 1.5 bathtubs/hour and the drain runs at 2 bathtubs/hour.

So working together, they produce 3 + 1.5 - 2 = 2.5 bathtubs/hour. 2.5 bathtubs/hour is 0.4 hours/bathtub. 0.4 hours is 24 minutes. There’s your answer.

Do the algebra, and you get the same 1/(1/a+1/b-1/c) formula as above.

kalid · May 21, 2009, 8:22pm

@Ed: Great example on when to use the average vs the median :).

Also I like the tub example too – we want to add something, but times aren’t it! Rates are what we want, exactly for the reason you say – it’s a good example showing why we need to invert the times to get the rate.

franz · July 30, 2009, 1:47pm

Nice article! Could you give your view on expected value and why do people use it for probability distribution?