Understanding Bayes Theorem With Ratios

Thank you, Kalid, for your fine article, and Rich for your welcome comments and Jeff for your interesting additions, on this refreshingly civil and intelligence-based site. I hasten to say that I was in no way criticizing Kalid’s excellent article in my earlier post. I was criticizing the blunt ways that “odds” and “probabilities” are used every day, often misleading people into sometimes tragic outcomes (particularly in finance/gambling and medicine). I would like to see the term “probability” replaced with “possibility” - although that may seem pedantic and trivial, I think it would more accurately reflect the truth and improve understanding. And I look forward as always to further high-quality articles and posts. Best wishes to all.

@Ralph
What if we rephrased the “all tickets have equal chance to win” sentence as
"all tickets have equal chance to be the winning ticket" ?

Just wondering, if the apparent lack of meaning of probability you mentioned, is a consequence of skipping nuances while making statements.

I could be completely wrong here. I’m only an ‘amateur enthusiast’ at the maximum, regarding statistics and probability.

Thank you Kalid, for the great article.

TheDjinni wrote in part:

>to speak of “the winner” when you do not know who won is to speak nonsense […] Only if you presume that the draw, and ultimately life itself, is entirely and fully deterministic can you come to the conclusion that there is a winner and a loser decided beforehand

One can speak of the winner and losers before one knows who they are, and even if one rejects determinism in various senses. For example, one might say ‘The winner will be very happy.’, even before the draw has made, without making any philosophical commitment about determinism, and making a meaningful statement (and indeed one with a high probability of being true).

But of course, as long as we don’t actually know who the winner is, our statements about ‘the winner’, while meaningful, are still probabilistic statements. Probability is indeed what we use to talk about things when we don’t know everything about them (which is, as a practical matter, always), despite some people’s objections to the contrary.

And I now refute what I suggested earlier: ‘I would like to see the term “probability” replaced with “possibility” – although that may seem pedantic and trivial, I think it would more accurately reflect the truth and improve understanding.’ This would be equally inaccurate, since (in our lottery) only one ticket can possibly win, while the rest have zero possibility.

@ AJ
Thanks for the suggestion, but to me “all tickets have (an) equal chance to be the winning ticket” and “all tickets have (an) equal chance to win” have identical meanings (and in the lottery scenario are therefore, I maintain, equally false statements).

baye’s theorem demystified ;thanks a billion times

Incidentally, one can define information-dependant probabilities (the ‘seeming’ probabilities from late March above) using information-theoretic entropy. The probability of an event E, given information I, is 1/2 to the power of S(I) − S(I&E), where S(I) is the entropy (in bits) associated with the information I, while S(I&E) is the (smaller) entropy that would remain if one should learn additionally that E obtains.

>“Aha!” you may say, “But they each have an equal chance of winning BEFORE the winner is drawn”.

Well, I would never say that. “Aha!” I would say instead, “But they each have an equal chance of winning BEFORE I learn the identity of the winner”. This makes it clear that lottery probabilities are facts about my knowledge about the lottery, not facts directly about the lottery itself.

My head is still not completely wrapped around Bayes Theorm yet even thought it loomed large in both the inaugural MOOC AI course taught by Thrun and Norvig and Daniel Ng’s machine learning course. As usual I can feel the wrap wrap wrapping at my brain’s door with your latest post.

I have long had issues with “probabilities”, insofar as they are meaningless, yet people throughout society base their judgements on them - often extremely damagingly. Let me explain.

Let’s say a lottery has one million tickets. Each ticket has a one-in-a-million chance of winning, right? WRONG! ONE ticket (the winner) has 100% chance and the rest have 0%! How can all the tickets have the same chance if only one will win? That is illogical and absurd.

“Aha!” you may say, “But they each have an equal chance of winning BEFORE the winner is drawn”. But how can they each have an equal chance of winning if only one wins and the others lose - whether before or after the draw? They cannot and do not have an equal chance! Even young kids can understand that!

While one ticket in a million will win, that does NOT translate into each ticket having an equal chance!

Although most sporting events have favourites to win, quantifying their chances into odds is ridiculous, and nowhere more so than in horse racing. In the same race, one horse may be 20:1 to win, another horse 3:2, another 2:1, another 7:5 etc. - how absurd and meaningless is that, given that only one will win and the others won’t?!

Statements like “Only three other companies are bidding for the contract, so we have a one in four chance of winning it”, or “You have a 60% chance of doubling your money” are commonplace - and meaningless. Even worse, they can and do give false confidence leading to ruin, since people are misled into basing their judgements on them.

And doctors and health lobbyists should be prohibited from making such outrageous statements as “You have a one in five chance of surviving” based on data that one in five people survives. You will either survive (100%) or not (0%), so you do not have 20% chance, but either 100% or 0%. The problem is that some people literally worry themselves to death when they hear such pronouncements from doctors (who should know better, and may just be covering themselves in advance).

The error consists of extending the general to the particular (the opposite of generalising). Just because nine out of ten in my community have white skin does not mean that my skin or anyone else’s has a 90% chance of being white - it is either white or it is not!

To @ralph,

I think you concerns need to be understood in relationship to the ‘alternative model’. In this case, the alternative model is unaided intuition (expert guess) when dealing with highly complex and uncertain outcomes. All we are saying here, is that bayes can be used to model our uncertainty (put bounds on our uncertainty) not our exactness. And, there have been hundreds of studies in the pharmaceutical, soil science, military, energy and other industries that demonstrate that quantitative methods (particularly Bayesian methods) significantly out perform other methods of measurement…particularly expert opinion.

Great article in simplifying a complex subject.

On the meaning of “probability”. Is it helpful to think of a probability as meaning “my best estimate of the likelihood of something happening, based on the information available”?

So sure, only one ticket can win a lottery, but in the absence of knowing which one it is, when I buy one lottery ticket out of the million offered, MY best estimate is that I have a 1 in a million chance of that being the winning ticket.

As always, great post!

There are at least two additional advantages of thinking in terms of odds instead of percentages.

The first is that thinking in terms of odds reinforces the Bayesian style of probability that is the degree to which you’d feel comfortable betting on something given everything that you have observed about it so far.

The second is that using odds like you described makes a software implementation easier: you just keep track of things using simple counters. To avoid rounding errors, you can use logarithms. For example, in your second spam scenario, the logarithm of the final ratio would be

(log(9) + log(3) + log(3)) - (log(1) + log(2) + log(1)) = log(81/2)

The point is that you just keep summing logarithms on both sides of the “:” sign of the odds and then have some cutoff that determines spamminess that accounts for the pain of a false positive (i.e. make it 40:1) and then take the logarithm of that. If it exceeds the threshold then classify as spam.

@ Joe Smith
I think that rephrasing, as you suggest, to “When the drawing happens, there is X percent chance they will draw the numbers you picked on your ticket" is still inaccurate, since you will still have either 0% or 100%, and nothing else. The problem, as I see it, is making an incorrect leap from a true statement such as “one ticket in 1 million will win” to the false statement “EACH ticket has an EQUAL chance of 1 in one million of winning”. I agree that such false statements, regrettably, help sell tickets.

Ralph, you seem to be saying that the probability is SEEMINGLY 0.000001, when we have imperfect knowledge, but the probability is ACTUALLY either 0 or 1. Then using this language, it’s only the seeming probabilities, not the actual probabilities, that have any use for making decisions when one’s knowledge is incomplete (which is to say, always). And if the actual probabilities of specific events are always either 0 or 1, then we need only boolean logic to study these, not probability theory. So I would say that probability theory is really about the seeming probabilities, which are what we actually want to analyse, and so they are the only probabilities worth the name.

As for the bankers, if they really wanted to make the best decisions that they could with the information that they had, then it’s these probabilities that they should have used. It would not have been possible for them to base decisions on knowledge that they didn’t have, whatever you want to call it. Now, whether they calculated correctly, or whether they really tried to do the best, is another matter!

Saw Prof. Brian Greene trying to explain Boltzmann’s entropy theory (i.e. the tendency of the universe to become more disorganised). Apart from my disagreement with the theory [(1) there are innumerable instances in Nature where things become more organised - e.g. formation of crystals, including snowflakes - and (2) the perception of “order” is highly subjective - e.g. to an illiterate person, writing looks like disorganised scribbles], Green’s explanation was problematical. He ripped his book apart and flung the pages in the air so they landed all over the place, then said “Why didn’t the pages land in an ordered pattern? Because there are so many ways for them to land randomly and far fewer ways for them to land orderly. That’s why there is a tendency in the universe towards disorder”.

As a physicist, he of all people should know that there was only one way for them to land - the way they did! It was not willy-nilly - it was the result of the forces acting on them. Information Theory correctly states that the entropy of a system is proportional to the lack of information about that system.

It’s also important that thermodynamic entropy is not really the same thing as information-theoretic entropy. The thermodynamic entropy is an objective property, not dependant on one’s knowledge; but it is what the information-theoretic entropy would be if one’s knowledge of the system were precisely knowledge of its macroscopic properties … which is often approximately the case! In particular, information-theoretic entropy tends to go down with time, as one gains information; but thermodynamic entropy famously only seems to go up.

@David: Yes, this is a much more fruitful way of looking at probability than the one proposed by Ralph Schneider. Probability is a two-place function of a fact and an observer, not a one-place function of the fact alone. That is, you don’t say that the probability that the ticket will win is one in a million; you say that the probability that the ticket will win, given your information, is one in a million.

Let me be clear: that one ticket in a million will win is true, but that each ticket has an equal probability of winning is false, regardless of whether someone prefaces it with “based on what I know” or “with the information I’m given”!

We need to distinguish between mathematical constructs and how we operate in ‘real life’. Obviously, we constantly operate on assumptions that things will behave as they have before - to not do so would paralyse us.

We then revise our assumptions if things do not behave as we expected (although such revisions may be misguided - for example, some may believe that the more they lose, the closer they are to winning, based on ‘the odds’!). But the obvious fact remains - what happens always had 100% chance of happening, whereas what didn’t happen always had 0% chance.