Towards the end of last year in Uncertainty Wednesday, I wrote a post about suppressed volatility and gave an example. I ended the write up with:
if we simply estimate the volatility of a process from the observed sample variance, we may be wildly underestimating potential future variance
This turns out to be true not just for cases of “suppressed volatility” but much more broadly. For any fat tailed distribution, the sample variance will underestimate the true variance. Mistaking the sample variance for the actual variance is the same error as mistaking the sample mean for the actual mean. The sample mean has a distribution and the sample variance has a distribution. Whether or not they are an unbiased estimator for the true values depends on the characteristics of the process.
Consider objects colliding with earth. Small objects strike earth with relatively high frequency. But how should we use a sample? The article from NASA says:
The new data could help scientists better refine estimates of the distribution of the sizes of NEOs [Near Earth Objects] including larger ones that could pose a danger to Earth
That will only work well if we take into account that we know that over longer time periods there have been much more massive impacts although these are often millions of years apart. This is the hallmark of a fat tail distribution: rare large outlier events. Naively using a sample that does not include these large strikes would give us a dramatic under-estimate of the true danger for humanity.
Next week we will look more at what this means (including other examples) and what we can do about coming up with estimates in these situations.