The Birthday Paradox is a classic of counting and probability, because it's so darn surprising. It's a paradox not because it's logically contradictory, but because the true answer is so different from the "intuitive" answer.
The Birthday Problem: How many people do you need to gather together to have a 50-50 chance that two of them will share a birthday?
The intuitive answer: 365/2 or about 180.
The actual answer: 23!!! Those exclamation points are for surprise, not factorial.
Why? Let's first compute the probability that no one shares a birthday, as a function of the number of people.
N prob 2 (365 x 364)/3652 3 (365 x 364 x 363)/3653 4 (365 x 364 x 363 x 362)/3654 ... ... N 365!/(365-N!)/365N
It's appealing to try to just type that kind of formula into Excel and see what it does. Alas, it gives you a mysterious error:
Q: By trial and error, find out the largest value that Excel can compute the factorial of. If you're clever, you can find this value in less than 15 guesses; fewer, if you're lucky.
What's happening? There is a limit to how big the numbers that Excel uses can get. Unfortunately, this means that if the number we are computing is the quotient of two big numbers, we often can't compute the big numbers, even if the quotient is quite reasonable.
Q: Give two ways to compute 365 x 364 x 363 x ... 355, one of which works and the other which doesn't.
Set up a table like this:
1 365 =product($b$1:b1)/power(365,A1) 2 364 =product($b$1:b2)/power(365,A2) 3 363 =product($b$1:b3)/power(365,A3) 4 362 =product($b$1:b4)/power(365,A4)
Take a minute to see how that formula works. The idea is that the product function computes the product of all the cells in a range. This range is defined to be from B1 (absolute, so it doesn't change when we copy/paste it) to the current row. Since the first row is row 1, it's a product from B1 to B1, which is just 365. For the second row, it's the product from B1 to B2, which is 365*364. And so on.
Type in by hand some formulas for the birthday probabilities and check these values. Make sure you can trust the numbers you're getting.
Using copy/paste, increase this table so that it's at least 50 rows long.
Q: How many people do you need for the probability of no repeats to be less than 75 percent?
Q: How many people do you need for the probability of no repeats to be less than 50 percent?
Q: How many people do you need for the probability of no repeats to be less than 30 percent?
Q: How many people do you need for the probability of no repeats to be less than 10 percent?
Q: With 50 people gathered together, what is the chance that none of them will share a birthday?
Amazing, isn't it?
Here's a simulation in Starlogo of the Birthday Paradox.
Download it to your desktop, start StarLogo, and open the simulation. Try it.
Play around with it. Does it look random to you?
Many of you already know how to play poker. If not, here's a crash course that omits most of the game:
So, what we're really interested in is the value of different hands. There are a number of web sites that discuss how poker hands are valued. This is a good web site about poker. Essentially, the more likely a hand is, the less valuable; rare hands are more valuable. Therefore, we want to compute the probability of different poker hands.
The denominator of all of our probability calculations is the number of poker hands. That number is combin(52,5). Why?
Now, let's count hands:
Build a spreadsheet to compute these possibilities. Total them. Compare that with the combin(52,5) possible poker hands. Compute the probabilities of each hand.
The previous development is difficult and error prone. A somewhat better way is simulation. Look at the following model and see if you can determine how it works. The code in the equation block is a doozy, so take your time and skim.
Q: What are some disadvantages of using simulation instead of calculations to determine the probability of various hands?
We'll talk about this together.
Let's start with a guess. If your first guess is pretty good, you can do better, but let's start with a moderately bad guess, like 10. Here are my guesses and the answers. Try to deduce my strategy:
So, I wasn't particularly lucky in my guesses. I overshot by a lot when I went from 160 to 320, and it took me 3 guesses to ascertain that 170 was the biggest possible argument to factorial.
My strategy was clearly to keep doubling until I exceeded the maximum, and then to use binary search to narrow down the range. It turns out that this is a general, optimal strategy. But that's a topic for a different computer science course.
The way that works is just to compute that product "by hand." The way that doesn't work is the "shortcut" of computing fact(365)/fact(354). *sigh*
That's because you're choosing 5 cards from a deck of 52, and you don't care about the order in which you get the cards.
There are basically two disadvantages. The first is that, just as it's hard to be sure your theoretical calculations are right, it can be hard to determine that your program (the equation in the equation block) is correct. That's why it's good to use both methods; they can confirm one another.
The second disadvantage is that if something is very unlikely (such as a straight flush), you may have to generate millions and millions of hands before your probability estimate is reliable. For example, the probability of a straight flush is theoretically 0.00154 percent. That means out of a million hands, you expect to get about 15 straight flushes. If, by chance, we get 16 or 14 or 13, our probability estimate becomes 0.0016 or 0.0014 or 0.0013 percent, which is significantly different from the theoretical probability. In general, estimating the probabibility of very rare events requires generating zillions of samples, so that the law of large numbers can work for us.
This work is licensed under a Creative Commons License | | | |