Today, we'll talk about how random numbers are generated. We'll start what properties we want, then how to generate uniform random numbers, then more exotic kinds.
"Anyone who considers arithmetical methods of random digits is, of course, in a state of sin. " -- John von Neumann (1951)
Almost all random number generation is based on an underlying Uniform(0,1) generator. That means a generator of random fractions between 0 and 1. What do we want out of such an RNG?
Let's build a quick model to look for patterns in the uniform RNG. You can click on the "Use block seed" checkbox to specify that a particular seed should be used.
Now, let's look at some ways to generate random numbers. Many are based on feedback equations, where each number generates the next, and often the feedback equation is linear.
To keep them in the right range, we often use modular arithmetic on integers, and then divide by the modulus to get the desired fraction. Here's an idea:
Next = (Curr * 5) mod 7
Starting at, say, 1, we get:
1, 5, 4, 6, 2, 3, 1 ...
So, that gives us a nearly maximal period and bounces around okay. Just imagine doing this with larger values. In general, the form is:
Next = (Curr * a) mod m
I think there are some number theory results that say that if "a" and "m" are relatively prime, you'll always get a maximal period.
But, you can get unlucky. Some choices of "a" and "m" produce very bad sequences. For example:
Next = (Curr * 2) mod 11
which, with a seed of 1, yields 2, 4, 8, 5, 10, 9, 7, 3, 6, 1, 2, 4, 8 ...
Also, consider what happens with a value like zero!
Let's cut to the chase. Many are "linear congruential generators":
Next = (Curr * a + c) mod m
There are tons of books and tables with good choices for a, c, and m. A well-respected one is Numerical Recipes in C. That book discusses the consequences of good and bad uses of linear congruential generators, and how:
Bad choices can invalidate scientific results, and this has happened.
Uniform RNGs are a building block for other RNGs, using several methods.
This is a great trick. If you have a CDf that is invertible, you can use it to generate random numbers. Let's use the standard exponential as an example.
cdf(k) = 1-exp(-k)
Remember that what this means is that:
P( x < k) = cdf(k) = 1-exp(-k)
The inverse of the cdf is solving for k as a function of p, the probability:
p = 1-exp(-k)
exp(-k) = 1-p
-k = ln(1-p)
k = -ln(1-p)
Define:
cdf-1(k) = -ln(1-p)
So, if I give it a p, it tells me the k that has that probability. You can think of this as the percentile function!
So what? So, if I generate a number, y, from Uniform(0,1), and do the following:
x = -ln(1-y)
then x will be distributed like the standard exponential!
Let's take an example:
Here's a sketch of this:
Here's the "proof" that this method works:
Because these are equal, the probability of generating any range of numbers is correct.
The inverse transform can be used on lots of distributions. Even if the cdf isn't as easily invertible as the exponential, we can often do a quick search, say for the binomial: just sum up the columns in a little table.
Sometimes, we generate numbers that are "close" and reject the ones that aren't quite right.
Suppose we want to generate numbers that have a "cosine" distribution. We can generate (x,y) points in the unit square, reject those that fall outside the unit circle, and then just use the x coordinate.
Sketch this
A combination of the rejection and transformation methods is used to produce Poisson and Gaussian numbers, for example.