Written by Scott D. Anderson
scott.anderson@acm.org
Creative Commons License
This work is licensed under a Creative Commons License.

Review

The Central Limit Theorem

Amazingly enough:

The distributions of sums (and therefore means) is increasingly Gaussian

We'll explore the following model

CLT2

Limitations of the CLT:

  • The samples must be independent
  • The factors must be additive in effect. That is, if some measurement is the result of many independent factors, we might want to invoke the CLT and assume it is Gaussian, but not if the factors have a synergistic effect, say. Example: the area of a rectangle is not an additive result of the width and height.
  • The CLT says that the distribution approaches the Gaussian, but it doesn't say how fast. In practice, it's amazingly fast, but if you really need high accuracy, you wouldn't want to rely on the CLT.
  • Here are some distributions that aren't Gaussian.

    • The number of friends or social contacts people have
    • The number of links on a web page
    • The energy of earthquakes
    • The size of avalanches

    Confidence Intervals

    • What they mean
    • Confidence interval of the mean
    • Magic numbers:
      • 1.645 standard deviations yields a 90% confidence interval
      • 1.960 standard deviations yields a 95% confidence interval
      • 2.576 standard deviations yields a 99% confidence interval
    • Example:
      55, 57, 59, 58, 56, 56, 57, 61, 58
      The summary stats are as follows:
      mean: 57.44
      s.d.: 1.81
      What is a 95% confidence interval for the mean? Any particular percent?
    • Demonstration:
      CLT3.mox
    • Bootstrap confidence intervals:
      bootstrap-median.xls
      uses the Excel
      index(range,row,column)
      and
      randbetween(min,max)
      functions.