Wednesday, November 03, 2010

Classic probability conundrums

For an illustration on how people can argue about the correct answers to relatively simple problems in probability, see

http://opinionator.blogs.nytimes.com/2010/10/24/stories-vs-statistics/

John Allen Paulos, "Stories vs Statistics" NY Times, Opinion Pages online, October 24, 2010.

This article mentions some of the same examples as those to which I refer in my draft paper on propensity evidence (see link on this page).

I should enter the fray and not sit cowardly on the sidelines scoffing. The first problem is, if you are told that a stranger to you has two children, at least one of whom is a boy, what is the probability that the other is also a boy?

Note that neither child has been singled out, so this is a problem about the probability of having two boys. Given that one child will have been born before the other (even if a twin), the two children can have arrived as BB, BG, GB. The other combination, GG does not count in this example. So, one out of three eligible combinations gives two boys, and the probability of that occurring is 0.33.

If one child had been singled out, the problem would have been about the gender of one child (the other one). You might have been able to see one child, and were asked about the other. For one child, the probability of it being a boy is 0.50.

This problem illustrates how important it is to ascertain exactly what the issue is. The seemingly endless argument about this problem in the discussion to the above NY Times article exemplifies this.

Some people wonder why the combinations BG and GB are counted separately. They would say that the relevant combinations are just two: BB and (B and G). The error here is in thinking (B and G) will occur just as frequently as BB. This overlooks the way in which the data can arise. (B and G) will occur in two ways (namely, BG and GB) whereas BB will occur in only one way.

So the two lessons from this problem are: ascertain the issue, and examine how the data arises.

The other problem is known most commonly as the Monty Hall problem, after a game show host. The task is to pick which of three doors, A, B and C, will when opened reveal a prize. You pick one door, say A. The host, who knows where the prize is, then tells you it is not behind door C. Should you change your guess to door B?

When you choose door A, you divide the doors into two groups or classes: the chosen and the not-chosen. The probability of the prize being behind door A is 0.33. The probability of it being in the not-chosen class is 0.67. Once the host eliminates one member of the not-chosen class, the class probability for that class attaches to the only remaining member, door B. You should change your guess.

Some people object that each door always has the same probability of concealing the prize, and that there is no reason to change your guess from A. This ignores the new information the host gives you. Changing the probability distribution among members of one class does not affect the probability distribution among member(s) of another class. The constant probabilities are wrongly linked to the individual doors, rather than to the classes.

The lessons from this example are: use all the relevant information, and recognize when the issue is about members of one class as distinct from members of another class.