Last week in Uncertainty Wednesday, I introduced the example of the PSA test for prostate cancer. I provided all the probabilities for elementary events assuming a 50-year old male. We then did some simple addition to figure out the probability that I would receive a high PSA level signal which turned out to be P(H) = 0.091302.
The crucial question then is (quoting from the previous post)
What we are looking for is P(B | H) this is read as the probability of state B *conditional* on signal H being observed. Or in this specific case, the probability (how likely is it?) that I have cancer (B) given that we have observed my PSA level is high (H).
Now I left this as a question at the end of the post and I was thrilled to see a number of people answer this correctly (including someone who sent me email and even one person who had a meeting with me on a separate topic and answered there).
It turns out that answering this question is nearly trivial with the elementary probabilities set out as they are. I will show the formula first, do the math and then explain everything.
P(B | H) = P({BH}) / P(H) = 0.001581 / 0.091302 = 0.017316
So the likelihood that I have prostate cancer (B) given that the PSA level was high (H) is only 1.73%. It is important to let this sink in slowly. Even after a positive test result, there is less than a 2 in 100 chance that I actually have cancer.
Now let’s look at the formula which has a straightforward English language interpretation
Likelihood of having cancer given a high observed PSA level P(B | H) = Likelihood of observing a high PSA level *AND* having cancer P({BH}) as a fraction of / likelihood of observing a high PSA level P(H)
Why is that? Because the likelihood of observing a high PSA level P(H) contains two cases within it. Those were the state of the system is cancer (B) and those where it is healthy (A). As a reminder that’s exactly how we calculated P(H) in the first place last week
P(H) = P(high PSA level) = P({AH, BH}) = P({AH}) + P({BH}) = 0.089721 + 0.001581 = 0.091302
Now let’s get deeper into understanding what is going on. One useful thing to do immediately is to ask whether we learned anything at all from the test. Well last time I also pointed out that P(B), the likelihood of my having prostate cancer as a 50 year old male is 0.31%. This based on knowing *ONLY* that I am 50 years old ant *NOTHING* else. After the test if my PSA level is high, then the likelihood is 1.73%. Using percentages to make the numbers really easy to see:
P(B) = 0.31% versus P(B | H) = 1.73%
We can divide these P(B | H) / P(B) = 1.73% / 0.31% = 5.58 to see that a high PSA level signal means it is more than 5 times as likely that I have prostate cancer than just knowing my age.
So the test clearly produces a signal. It is informative about the true state of the system. But why is it then that it is still so unlikely (less than 2 in 100) that I have cancer? The answer is that the results are being swamped by the likelihood of being healthy and nonetheless having a high PSA level. We can see this by looking at how much P({AH}) contributes to P(H) – just look at the numbers above in the calculation of P(H)
P({AH}) = 0.089721 >> P({BH}) = 0.001581
With >> meaning “much greater” than. These are the sometimes referred to as the “false positives” – the test is positive (high PSA level) but the person is in fact healthy (hence “false”). Next week we will run through the same using absolute numbers of what you would expect when a large number of men actually go through this. That will help us further develop the intuition for what is going on.
As a society we are increasingly choosing to not even do tests like this at all. We are so bad at interpreting the results that people jump from receiving a high PSA level to doing a biopsy of the prostate or worse engaging in full on treatment. Based on the math going to something invasive with lots of downside is obviously not the right response to this signal. Throwing the signal out entirely by not doing the test is, however, not a all that smart either. After all, it tells us that someone is more than 5 times a likely to have the cancer. So we should be doing other tests or monitoring the situation more closely, for instance with more frequent manual exams of the prostate.
Since today is International Women’s Day, I should point out that exactly the same is true for breast cancer testing. We are increasingly recommending that women don’t get tested because we are worried about jumping to invasive treatments based on positive test results. So it means we could give ourselves more information but we choose not to because we overreact. The we here of course is collective. Much of this is the result of wrong incentives in healthcare where we have inflated both the financial upside of doing procedures and the downside of seemingly ignoring a signal.