Written by Scott D. Anderson
scott.anderson@acm.org

This work is licensed under a Creative Commons
License.
Review
- More on "hot hands" and sports streaks:
- Red Sox probability problem: what's the probability that they will
win, given that they are 2-0 in the series and assuming that the teams are
equally matched?
- Standard Error of the mean:
σx-bar = σ/sqrt(n)
The Central Limit Theorem
Amazingly enough:
The distributions of sums (and therefore means) is increasingly Gaussian
We'll explore the following model
CLT2
Limitations of the CLT:
- The samples must be independent
- The factors must be additive in effect. That is, if some
measurement is the result of many independent factors, we might want to
invoke the CLT and assume it is Gaussian, but not if the factors have a
synergistic effect, say. Example: the area of a rectangle is not
an additive result of the width and height.
- The CLT says that the distribution approaches the Gaussian, but it
doesn't say how fast. In practice, it's amazingly fast, but if you really
need high accuracy, you wouldn't want to rely on the CLT.
Here are some distributions that aren't Gaussian.
- The number of friends or social contacts people have
- The number of links on a web page
- The energy of earthquakes
- The size of avalanches
Confidence Intervals
- What they mean
- Confidence interval of the mean
- Magic numbers:
- 1.645 standard deviations yields a 90% confidence interval
- 1.960 standard deviations yields a 95% confidence interval
- 2.576 standard deviations yields a 99% confidence interval
- Example:
55, 57, 59,
58, 56, 56,
57, 61, 58
The summary stats are as follows:
mean: 57.44
s.d.: 1.81
What is a 95% confidence interval for the mean? Any particular percent?
- Demonstration:
CLT3.mox
- Bootstrap confidence intervals:
bootstrap-median.xls
uses the Excel
index(range,row,column)
and
randbetween(min,max)
functions.
|