Uncertainty Wednesday: Probability Distribution

So far in Uncertainty Wednesday we have limited ourselves to looking at examples with only two states of the world and two possible signal values. When I introduced this I explained that these combine to form four elementary events and we looked at the basic requirements for assigning probabilities to these and the axioms that probability should then follow.

Now let’s forget for a moment about the origin of our elementary events and simply look at any set S = {A, B, C, D, E, F} where the members are elementary events, or states of the world, or signal values. Now a probability distribution across the set S, is simply a set of values such that for x ∈ S

0 ≤ P(x) ≤ 1

and

∑P(x) = 1 where the sum is over all x ∈ S

Quite clearly there are infinitely many probability distributions that are possible (this was also already true in the case where S has only two elements). But how many P(x) can be chosen “freely”? Well, if |S| = n, meaning the set S has n members, then we get to choose n - 1 probabilities and the last one is automatically determined by the requirement that they all sum up to 1. In the case of n = 2 there is only one free parameter, i.e. is S = {A, B} and P(A) = p, then automatically P(B) = 1 - p.

So take for a moment the case of |S| = 1000, well then there are 999 probabilities that could be different individually and only the last one is then determined. We can think of this as a 999-dimensional space. I have intentionally chosen letters as the elements of S and even that is suggesting too much structure for the most general version (because we think of the alphabet as ordered).

Why am I emphasizing this? First, because most probability distributions that we work with all the time, such as the normal distribution impose dramatic constraints. For starters, these distributions require an ordering of the state space (meaning an ordering of the elements of S). And then they collapse the number of free dimensions dramatically. In the case of a normal distribution, for example, we will see that there are only 2 parameters (the mean and the standard deviation). So in the case of |S| = 1000 just discussed, by imposing a distribution that is approximately normal, we reduce a 999-dimensional space to a 2-dimensional one!

Second, because it is possible to make some important arguments about uncertainty solely on the basis of the shape of the probability distribution without a reference to values associated with each element of S. Coming back to the simplest possible case of |S| = 2, there is a difference in uncertainty between P(A) = 0.8 and P(A) = 0.5 (and hence P(B) = 0.2 and 0.5 respectively). In a very precise way there is more uncertainty when P(A) = 0.5 than P(A) = 0.8 without saying anything about what happens in each state or assigning a numeric value to the states.

In the coming Wednesdays we will dig deeper into both of these important points, probably starting with the second one (although I haven’t quite made up my mind about that).