- What was the moral of the Aisha and Brandy comparison?
- What is the meaning of a compounding period?
- How do we get things to oscillate more wildly? Try lynx-hare with reproduction rates of 2.5 and 0.3
- How can we use randomness?

The Birthday Paradox is a classic of counting and probability, because it's so darn surprising. It's a paradox not because it's logically contradictory, but because the true answer is so different from the "intuitive" answer.

The Birthday Problem:How many people do you need to gather together to have a 50-50 chance that two of them will share a birthday?

The intuitive answer: 365/2 or about 180.

The actual answer: 23!!! Those exclamation points are for surprise, not factorial.

**Why?** Let's first compute the probability that **no one**
shares a birthday, as a function of the number of people.

N prob 2 (365 x 364)/365 ^{2}3 (365 x 364 x 363)/365 ^{3}4 (365 x 364 x 363 x 362)/365 ^{4}... ... N 365!/(365-N!)/365 ^{N}

It's appealing to try to just type that kind of formula into Excel and see what it does. Alas, it gives you a mysterious error:

#NUM!

**Q:** By trial and error, find out the largest value that Excel can
compute the factorial of. If you're clever, you can find this value in
less than 15 guesses; fewer, if you're lucky.

What's happening? There is a limit to how big the numbers that Excel uses can get. Unfortunately, this means that if the number we are computing is the quotient of two big numbers, we often can't compute the big numbers, even if the quotient is quite reasonable.

**Q:** Give two ways to compute 365 x 364 x 363 x ... 355, one of
which works and the other which doesn't.

Set up a table like this:

1 365 =product($b$1:b1)/power(365,A1) 2 364 =product($b$1:b2)/power(365,A2) 3 363 =product($b$1:b3)/power(365,A3) 4 362 =product($b$1:b4)/power(365,A4)

Take a minute to see how that formula works. The idea is that the
**product** function computes the product of all the cells in a range.
This range is defined to be from B1 (absolute, so it doesn't change when
we copy/paste it) to the current row. Since the first row is row 1, it's
a product from B1 to B1, which is just 365. For the second row, it's the
product from B1 to B2, which is 365*364. And so on.

Type in by hand some formulas for the birthday probabilities and check these values. Make sure you can trust the numbers you're getting.

Using copy/paste, increase this table so that it's at least 50 rows long.

**Q:** How many people do you need for the probability of no repeats
to be less than 75 percent?

**Q:** How many people do you need for the probability of no repeats
to be less than 50 percent?

**Q:** How many people do you need for the probability of no repeats
to be less than 30 percent?

**Q:** How many people do you need for the probability of no repeats
to be less than 10 percent?

**Q:** With 50 people gathered together, what is the chance that
none of them will share a birthday?

Amazing, isn't it?

Here's a simulation in Starlogo of the Birthday Paradox.

Download it to your desktop, start StarLogo, and open the simulation. Try it.

- To adjust the number of people, use the "numTurtles" slider and click on "Make Turtles"
- Each time you click the "find-birthday" button, each turtle chooses a birthday and if any match, they are marked in red (three way matches in yellow and so forth).
- The "repeat-find-birthday" button just runs the "find-birthday" code a bunch of times (controlled by a slider).

Play around with it. Does it look random to you?

Many of you already know how to play poker. If not, here's a crash course that omits most of the game:

- Each player gets dealt 5 cards. (Some versions are different.)
- Each player bets based on the value of their hand.
- High hand or high bet wins.

So, what we're really interested in is the value of different hands. There are a number of web sites that discuss how poker hands are valued. This is a good web site about poker. Essentially, the more likely a hand is, the less valuable; rare hands are more valuable. Therefore, we want to compute the probability of different poker hands.

The denominator of all of our probability calculations is the number of
poker hands. That number is **combin(52,5)**. Why?

Now, let's count hands:

- Straight Flush: consecutive cards, all of one suit. To count these,
realize that once you choose the suit and the top card of the straight
flush, everything else is determined. There are 4 ways to choose the suit
and 10 ways to choose the top card, and these are independent choices, so
there are 4*10 or 40 ways total.
=4*10

- Four of a kind: four cards of the same
*rank*, and one other card. There are 13 possible ranks, and then there are 48 choices for the other card. Therefore, the total is 13*48.=13*48

- Full house: three cards of one rank and two of another. Okay, this is
hard, so take a deep breath. You have to choose two ranks. There are 13
ways to choose the triple and 12 ways to choose the pair, for a total of
13*12 ways to make those choices. For the triple,
there are combin(4,3) ways to choose the suits that they have. For the
pair, you have combin(4,2) ways to choose the suits they have.
Therefore, the total number of ways is
=13*12*combin(4,3)*combin(4,2)

- Flush: all cards of the same suit. There are four ways to choose the
suit. Once you've chosen the suit, you have combin(13,5) ways to choose
the 5 cards. However, you have to subtract off the 10 straights (since we
don't want to count straight flushes). Therefore, the answer is:
=4*(combin(13,5)-10)

- Straight: There are 10 ways to choose the high card of your straight;
all the other ranks are forced. There are power(4,5) ways to choose the
suits for the 5 cards in your straight. However, four of those are
flushes, so subtract those off. Thus, the total number of ways is:
=10*(power(4,5)-4)

- Three of a kind: one triple and two different cards. There are 13
choices for the rank of the triple, and combin(4,3) ways to choose the
suits for those cards. There are then combin(12,2) ways to choose the
remaining two ranks, with four possibilities for the suit of each. Thus:
=13*combin(4,3)*combin(12,2)*4*4

- Two pair. There are combin(13,2) ways to choose the two ranks, and
combin(4,2) ways to choose the suits for each, and then 44 choices for the
fifth card.
=combin(13,2)*combin(4,2)*combin(4,2)*44

- One pair. There are 13 ways to choose the rank of the pair, and
combin(4,2) ways to choose the suits, and then there are combin(12,3) to
choose the other three ranks and power(4,3) ways to choose the suits for the
remaining cards:
=13*combin(4,2)*combin(12,3)*power(4,3)

- nothing. There are combin(13,5) ways to choose the five different
ranks, but 10 of those are straights, so subtract them. There are
power(4,5) ways to choose the suits for the five cards, but four of those
are flushes, so subtract those. Multiply these two quantities.
= (combin(13,5)-10)*(power(4,5)-4)

Build a spreadsheet to compute these possibilities. Total them. Compare that with the combin(52,5) possible poker hands. Compute the probabilities of each hand.

The previous development is difficult and error prone. A somewhat better way is simulation. Look at the following model and see if you can determine how it works. The code in the equation block is a doozy, so take your time and skim.

**Q:** What are some disadvantages of using simulation instead of
calculations to determine the probability of various hands?

We'll talk about this together.

**Q:**What's the largest value that Excel can compute the factorial of?Let's start with a guess. If your first guess is pretty good, you can do better, but let's start with a moderately bad guess, like 10. Here are my guesses and the answers. Try to deduce my strategy:

10! 3,628,800 20! 2.43E+18 40! 8.16E+47 80! 7.16E+118 160! 4.71E+284 320! #NUM! 240! #NUM! 200! #NUM! 180! #NUM! 170! 7.26E+306 175! #NUM! 173! #NUM! 171! #NUM! So, I wasn't particularly lucky in my guesses. I overshot by a lot when I went from 160 to 320, and it took me 3 guesses to ascertain that 170 was the biggest possible argument to factorial.

My strategy was clearly to keep doubling until I exceeded the maximum, and then to use

**binary search**to narrow down the range. It turns out that this is a general, optimal strategy. But that's a topic for a different computer science course.**Q:**Give two ways to compute 365 x 364 x 363 x ... 355, one of which works and the other which doesn't.The way that works is just to compute that product "by hand." The way that doesn't work is the "shortcut" of computing fact(365)/fact(354). *sigh*

**Q:**Why is**combin(52,5)**the number of poker hands?That's because you're choosing 5 cards from a deck of 52, and you don't care about the order in which you get the cards.

**Q:**What are some disadvantages of using simulation instead of calculations to determine the probability of various hands?There are basically two disadvantages. The first is that, just as it's hard to be sure your theoretical calculations are right, it can be hard to determine that your program (the equation in the equation block) is correct. That's why it's good to use both methods; they can confirm one another.

The second disadvantage is that if something is very unlikely (such as a straight flush), you may have to generate millions and millions of hands before your probability estimate is reliable. For example, the probability of a straight flush is theoretically 0.00154 percent. That means out of a million hands, you expect to get about 15 straight flushes. If, by chance, we get 16 or 14 or 13, our probability estimate becomes 0.0016 or 0.0014 or 0.0013 percent, which is significantly different from the theoretical probability. In general, estimating the probabibility of very rare events requires generating zillions of samples, so that the law of large numbers can work for us.

This work is licensed under a Creative Commons License | | | |