Uncertainty Wednesday: Entropy (Cont’d)

Last week in Uncertainty Wednesday, I introduced Shannon entropy as a measure of uncertainty that is based solely on the structure of the probability distribution. As a quick reminder, the formula is

H = - K ∑ pi log pi where ∑ is over the i = 1…n of the probability distribution

Now you may notice a potential problem here if the distribution includes a probability p that approaches 0, because log p will go to infinity. If you know limits and remember L'Hôpital’s rule, you can convince yourself that the limit of p log p → 0 as p → 0 (start be rewriting as log p / (1/p) then apply L'Hôpital). Because of this when we compute entropy we will define p log p = 0 for p = 0.

This lets us now easily graph what H looks like as a function of p for the simple case of only two states A and B, where we have p1 = P(A) and p2 = P(B) = 1 - p1. Here is a graph that has p1 as the x-axis and H as the y-axis

We see that the entropy H is maximized for p1 = 0.5 (and hence p2 = 0.5). Meaning: uncertainty is at a maximum when both states A and B are equally likely.

There is an important converse conclusion here: if you want to assume maximum uncertainty about something you should assume that both (or if there are more than two, all) states are equally likely. This assumption best represents “not knowing anything” (other than the number of states). How is this possibly useful? Take an asset like bitcoin as an example. If you want to assume maximum uncertainty, you should assume that the price is equally likely to go up as it is to go down.

Something else we see from the shape of the H(p) function is that at p = 0.5 the first derivative is zero and so small changes in p correspond to small changes in H. But as we get closer to the “edge” on either side, the same absolute change in p results in a much bigger change in H. You may recall our earlier analysis of the sensitivity and specificity of tests. We now have a measure of how much uncertainty reduction we get from a test and see that it depends on where we start with the least reduction occurring at maximum uncertainty.

Next week we will look at the relationship between uncertainty as measured by entropy and the cost of communicating information (communication is the context in which Shannon came up with the entropy measure).