Uncertainty Wednesday: Sample Variance (Cont’d)

Last Uncertainty Wednesday, I used meteorite impact data to make the point that sample variance may be much smaller than actual variance. Following the post, I was asked a great question on Twitter: “Is there such thing as estimation error on sample variance?” The answer is yes. Just as we saw earlier that the sample mean has a distribution, so does the sample variance. If you have different samples, you will get different variances and those will form a distribution. We are thus faced with exactly the same inference question as we were with the sample mean. How do we go about using the sample variance to estimate the actual variance? 

I will write a lot more about inference in the future, but for now suffice it to say: the biggest mistake being made (and it is being made all the time), is to mistake the sample mean/variance for the actual mean/variance. And today I will give more examples of real life situations where the sample variance is highly likely to grossly underestimate the actual variance.

The first example are natural disasters, such as floods or earthquakes. These are cause by physical processes in the earth and its atmosphere. Both of these contain ridiculous amounts of energy (with the energy in the atmosphere currently increasing rapidly due to climate change). As a result it is extremely unlikely that any past sample includes the maximal possible event. In fact, if the maximal possible event had occurred, we might not even be here to read and write about it. So whenever you look at disaster event data and variance analysis based upon them, it is safe to assume, that the sample variance underestimates the true variance.

The second example are economies and financial markets. Both are systems of human activity and with massive human interventions aimed (explicitly or implicitly) at keeping volatility low. For instance, in the economy we have governments and central banks engaging in anti-cyclical policies (at least that’s generally what they attempt), such as fiscal or monetary stimulus during a downturn. In financial markets, there are many trading strategies that have the effect of reducing volatility, such as trading assets against each other based on their historical correlations. Such as strategy will, at least temporarily, re-enforce those correlations, even when they are no longer warranted. So economic and financial markets data is another example where the sample variance will underestimate the true variance.

Now as it turns out, my language here isn’t entirely precise. What we are really dealing with in all of these examples, going back to my “suppressed volatility” posts, are situations in which the variance itself has a variance. Come again? Simply put: variance can be low at times and high at other times. Most sample periods will be of lower variance (volatility). Even if you include the higher variance occurrences as long as you average everything out your variance estimate will be too low. And as I argued above in the case of flood and earthquakes (also true for meteorites), even if you are going with the largest observed variances only, you will still be underestimating actual variance.

So what are we to do? Well as I will explain in the coming posts, this is why explanations are so crucial. Inference from data without explanations is how people go deeply wrong about reality.

Loading...
highlight
Collect this post to permanently own it.
Continuations logo
Subscribe to Continuations and never miss a post.
#uncertainty wednesday#sample variance#inference