Uncertainty Wednesday: Sample Variance (Cont’d)

Last Uncertainty Wednesday, I used meteorite impact data to make the point that sample variance may be much smaller than actual variance. Following the post, I was asked a great question on Twitter: “Is there such thing as estimation error on sample variance?” The answer is yes. Just as we saw earlier that the sample mean has a distribution, so does the sample variance. If you have different samples, you will get different variances and those will form a distribution. We are thus faced with exactly the same inference question as we were with the sample mean. How do we go about using the sample variance to estimate the actual variance?

I will write a lot more about inference in the future, but for now suffice it to say: the biggest mistake being made (and it is being made all the time), is to mistake the sample mean/variance for the actual mean/variance. And today I will give more examples of real life situations where the sample variance is highly likely to grossly underestimate the actual variance.

The first example are natural disasters, such as floods or earthquakes. These are cause by physical processes in the earth and its atmosphere. Both of these contain ridiculous amounts of energy (with the energy in the atmosphere currently increasing rapidly due to climate change). As a result it is extremely unlikely that any past sample includes the maximal possible event. In fact, if the maximal possible event had occurred, we might not even be here to read and write about it. So whenever you look at disaster event data and variance analysis based upon them, it is safe to assume, that the sample variance underestimates the true variance.

The second example are economies and financial markets. Both are systems of human activity and with massive human interventions aimed (explicitly or implicitly) at keeping volatility low. For instance, in the economy we have governments and central banks engaging in anti-cyclical policies (at least that’s generally what they attempt), such as fiscal or monetary stimulus during a downturn. In financial markets, there are many trading strategies that have the effect of reducing volatility, such as trading assets against each other based on their historical correlations. Such as strategy will, at least temporarily, re-enforce those correlations, even when they are no longer warranted. So economic and financial markets data is another example where the sample variance will underestimate the true variance.

Now as it turns out, my language here isn’t entirely precise. What we are really dealing with in all of these examples, going back to my “suppressed volatility” posts, are situations in which the variance itself has a variance. Come again? Simply put: variance can be low at times and high at other times. Most sample periods will be of lower variance (volatility). Even if you include the higher variance occurrences as long as you average everything out your variance estimate will be too low. And as I argued above in the case of flood and earthquakes (also true for meteorites), even if you are going with the largest observed variances only, you will still be underestimating actual variance.

So what are we to do? Well as I will explain in the coming posts, this is why explanations are so crucial. Inference from data without explanations is how people go deeply wrong about reality.

More from Continuations

Continuations

Feb 4

Philosophy Mondays: Human-AI Collaboration

Today's Philosophy Monday is an important interlude. I want to reveal that I have not been writing the posts in this series entirely by myself. Instead I have been working with Claude, not just for the graphic illustrations, but also for the text. My method has been to write a rough draft and then ask Claude for improvement suggestions. I will expand this collaboration to other intelligences going forward, including open source models such as Llama and DeepSeek. I will also explore other moda...

Cover image for Intent-based Collaboration Environments

Continuations

Dec 30

Intent-based Collaboration Environments

AI Native IDEs for Code, Engineering, Science

Continuations

Dec 29

Web3/Crypto: Why Bother?

One thing that keeps surprising me is how quite a few people see absolutely nothing redeeming in web3 (née crypto). Maybe this is their genuine belief. Maybe it is a reaction to the extreme boosterism of some proponents who present web3 as bringing about a libertarian nirvana. From early on I have tried to provide a more rounded perspective, pointing to both the good and the bad that can come from it as in my talks at the Blockstack Summits. Today, however, I want to attempt to provide a coge...

So what are we to do? Well as I will explain in the coming posts, this is why explanations are so crucial. Inference from data without explanations is how people go deeply wrong about reality.