Uncertainty Wednesday: Sample Mean (Cont’d)

Last Uncertainty Wednesday, we started to look at the behavior of the mean of a sample by repeatedly drawing samples. We used a sample of 10 rolls of a fair die. We know that the expected value of the probability distribution is 3.5 but we saw that the sample mean can deviate substantially from that on a small sample. In particular, with 10 rolls we got sample means both close to 1 (almost every roll is a 1) and close to 6 (almost every roll is a 6).

The fact that the sample mean itself is random and has a distribution shouldn’t be surprising and yet it is the source of a great deal of confusion. Let me show that in the case of discussion of weather and climate. I had defined climate as the probability distribution of possible weather events. The realized weather then is a sample. So we should not at all be surprised to see variability in the weather relative to past averages. And yet we use terms such as “unseasonably” cold or “unseasonably” hot all the time, which imply that there is something out of whack with what was observed. The challenge then in analyzing climate change based on data is to separate variability within the existing distribution (climate) from changes in the distribution (climate). We will get back to that in future posts, but first we have more on sample means.

What happens is we make our sample larger? Instead of a sample size of 10 rolls, let’s consider a sample size of 100 rolls. Below are graphs contrasting the results of 100,000 runs for sample size 10 and sample size 100:

We again see a distribution in the sample mean but it is much tighter around the expectation of 3.5, with almost all observed sample means for size 100 falling between 3 and 4, as compared to 2 and 5 for sample size 10.

It is really important to let this all sink in deeply. Even at a sample size of 100 rolls, there is significant variation in the sample mean. The good news is that the distribution of the sample mean is centered on the expected value. This is often referred to as the sample mean being an unbiased estimator of the expected value. We will dig into when and why that’s the case as it is not true for all underlying probability distributions (almost certainly *not* true for weather).

The bad news though is that even when the sample mean is an unbiased estimator of the expected value, on any one sample that you draw if it is the *only* sample, you have no idea whether you are above or below the expected value. Keep in mind that all this analysis we are currently conducting is based on a known distribution. That is hardly ever the problem we actually confront. Instead, we have explanations which lead us to prior beliefs about distributions and we need to use the observations to update those beliefs.

More to come on sample means and what we can learn from them next Wednesday. Until then, here is a question to ponder: why did the graph for sample size 10 comes out smoother than the one for sample size 100?

More from Continuations

Continuations

Feb 4

Philosophy Mondays: Human-AI Collaboration

Today's Philosophy Monday is an important interlude. I want to reveal that I have not been writing the posts in this series entirely by myself. Instead I have been working with Claude, not just for the graphic illustrations, but also for the text. My method has been to write a rough draft and then ask Claude for improvement suggestions. I will expand this collaboration to other intelligences going forward, including open source models such as Llama and DeepSeek. I will also explore other moda...

Cover image for Intent-based Collaboration Environments

Continuations

Dec 30

Intent-based Collaboration Environments

AI Native IDEs for Code, Engineering, Science

Continuations

Dec 29

Web3/Crypto: Why Bother?

One thing that keeps surprising me is how quite a few people see absolutely nothing redeeming in web3 (née crypto). Maybe this is their genuine belief. Maybe it is a reaction to the extreme boosterism of some proponents who present web3 as bringing about a libertarian nirvana. From early on I have tried to provide a more rounded perspective, pointing to both the good and the bad that can come from it as in my talks at the Blockstack Summits. Today, however, I want to attempt to provide a coge...