Random Variables and Distributions

Review

We'll review and finish what we talked about last time:

Random Variables

The term "Random Variable" is a little strange. In most math, the word "variable" means something that has a particular value, but is either not currently known or is abstract. So,

y=3x+4
is the equation of a line. What is X? It's a variable, which means it can take on more than one value, but only one at a time. Given a particular value of X, you can calculate one for Y.

A Random Variable usually stands for a distribution, which we'll talk about next. We won't use the term too much, but you should have a basic understanding so that the term doesn't seem too strange.

Here's an example:

Z = X + Y
If X and Y are both random variables, this equation says that the random variable Z is the sum of X and Y. What that means is that the probability of Z having a value like 5, say, depends on the probability that X=2 and Y=3 or X=1 and Y=4 and so forth. For example, if X is the distribution of a normal six-sided die and so is Y, then Z has the distribution of values you get from tossing two dice and adding the values. (7 is most likely, 2 and 12 are least likely, and so on.)

Probability Distributions

A probability distribution is a complete set of outcomes and their corresponding probabilities. Dice and coins are typical examples. The following is the probability distribution of the number of heads in three tosses of a fair coin. (In other words, the binomial distribution with p=0.5 and n=3.)

outcomeprobability
01/8
13/8
23/8
31/8

Very often, as in the preceding example, the outcomes are all numbers and so the probability distribution can be defined by a mathematical function. For example, the preceding table can be replaced by:

f(x) = combin(3,x)/8

Any mathematical function that defines a real number between 0 and 1 for each outcome and the values sum to 1 can define a probability distribution.

Kinds of Distributions

Distributions are often divided into two broad categories:

The Binomial

(See section 6.1).

The binomial we've seen before, but now we have a bit more mathematical tools at our disposal.

The binomial is appropriate for modeling random situations where:

  1. There are a fixed number of trials or attempts, where each either succeeds or fails
  2. The probability of success on each trial is the same for each trial.
  3. The trials are all independent.

The probability of k successes in n trials with a probability of p of success each time is:

f(k) = combin(n,k)*power(p,k)*power(1-p,n-k)
It may not be obvious that those all sum to one, but they do. The proof, however, while not hard, isn't central to this course.

Let's take some time to understand this. Here is an example:

If the probability that a student forgets to put her name on an exam is 5 percent and there are 30 students in the class, what is the probability that the professor gets 0, 1, 2 or more anonymous exams?

For convenience, this probability is defined in Excel as the binomdist(s,trials,prob,cumulative) function. This function gives the probability of exactly s events from a Binomial process with "trials" each with "prob" chance of success. Use "false" for the value of "cumulative," unless you want the probability of all events less than or equal to s. In other words, the following formula is equivalent to the preceding one:

=binomdist(k,n,p,false)

Let's also take some time to make a chart of the binomial distribution using Excel. These charts will look just like the ones we did before using samples, so this will tend to blur the distinction between a population (what we're sampling from) and a sample (the actual values we get out in an experiment), so let's revisit that distinction.

The Poisson Distribution

See section 6.4

We also saw the Poisson before, but we can now examine it more closely.

Freund introduces the Poisson as a convenient way of approximating the binomial under certain conditions. Mathematically, that's true, but there are other ways to derive the Poisson distribution, so it is a distribution in its own right. Still, you can use it as an approximation of the binomial when:

  1. N is large (say, greater than 100)
  2. P is small
  3. NP is small (say, less than 10)
  4. We use NP as the rate or mean of the Poisson process

By definition, the Poisson probability function is defined for all non-negative integers x as:

pmf(x) = exp(-λ)*power(λ,x)/fact(x)
where λ is the rate of the Poisson process.

For convenience, this is defined in Excel as the Poisson(x,mean,cumulative) function. This function gives the probability of exactly x events from a Poisson process with the given mean. Use "false" for the value of "cumulative," unless you want the probability of all events less than or equal to x. In other words, the following formula is equivalent to the preceding one:

=poisson(x,lambda,false)

Just for fun, let's use Excel to calculate the probabilities of all integers from 0 to 100 for the Poisson (mean=10) and binomial (N=100, P=0.1, so NP=10). Let's find the maximum difference.

Let's also plot both for, say, mean=2.3 (N=10, P=0.23). Notice that this doesn't follow our rule of thumb, so we should see some differences.

As we discussed before, the Poisson is used when you know the average rate of something, but the actual occurrences are integer. For example:

Suppose that records show that the rape crisis line gets 2.3 calls per night. What is the probability of no calls in a night? More than 5 calls?

This work is licensed under a Creative Commons License | Creative Commons License | Viewable With Any
Browser | Valid HTML 4.01! | Valid CSS!