Gritty Jello: statistics

As implied in Part One, this article series is supposed to be an easy introduction to Bayes' Theorem for non-experts (by a non-expert), not a thinly veiled job application directed at government agencies that don't officially exist.

Review: Testing Positive for a Rare Disease Doesn't Mean You're Sick

The previous article in this series illustrated a surprising fact about disease screening: if the disease you're testing for is sufficiently rare, then a positive diagnosis is probably wrong. This seemingly WTF outcome is an instance of the false positive paradox. It arises when the event of interest (in this case, being diseased) is so statistically rare that true positives are drowned out by a background of false positives.

Bayes' Theorem allows us to analyze this paradox, as shown below. But first, we need to define true and false positives and negatives.

False Positives and False Negatives

No classification test is perfect. Any real-world diagnostic test will sometimes mistakenly report disease in a healthy person. This type of error is defined as a false positive. If you test for the disease in a large number of people who are known to be healthy, a certain percentage of the test results will be false positives. This percentage is called the false positive rate of the test. It's the probability of getting a positive result if you test a healthy person.

The other type of classification error is the false negative -- for example, a clean bill of health mistakenly issued to someone who's actually sick. If you run your test on a large number of people known to be sick, the test will fail to detect disease in some percentage of them. This percentage is known as the test's false negative rate.

The lower the false positive rate and false negative rate, the better the test. Both rates are independent of population size and disease prevalence.

But now we get to the root of the false positive paradox: if the disease is rare enough, then the vast majority of people you test will be healthy. This unavoidable testing of crowds of healthy people represents plenty of opportunities to get false positives. These false positives drown out the relatively faint true positive signal coming from the few sick people in the population. And if the test obtains each true positive at the cost of many false positives, any given positive result is probably a false one. It's intuitive by this point that the error rate of a screening process depends not only on the accuracy of the test itself, but also on the rarity of what you're screening for.

For a more rigorous understanding, we need to derive Bayes' Theorem. To do that, we need some basic probability theory.

Probability Basics

The probability that some event \(A\) will occur is written \(P(A)\). All probabilities are limited to values between 0 ("impossible") and 1 ("guaranteed"). If we let \(H\) stand for the event that a fair coin lands heads up, then \(P(H) = 0.5\), or 50%. If \(X\) stands for rolling a "20" on a 20-sided die, then \(P(X) = 1/20\), or 5%.

If two events \(A\) and \(B\) cannot occur at the same time, they are said to be mutually exclusive, and the probability that either \(A\) or \(B\) occurs is just \(P(A) + P(B)\). Rolling a 19 and rolling a 20 on a 20-sided die are mutually exclusive events, so the probability of rolling 19 or 20 is \(1/20 + 1/20 = 1/10\).

The opposite of an event, or its complement, is denoted with a tilde (\(\text{~}\)) before the letter. The probability that \(A\) will not occur is written \(P(\text{~}A)\). If \(A\) has only two possible values, such as heads/tails, sick/healthy, or guilty/innocent, then \(A\) is called a binary event and \(P(A) + P(\text{~}A) = 1\), which just says that either \(A\) happens or it doesn't. Heads and tails, sickness and health, and guilt and innocence are all mutually exclusive binary events.

Conditional Probability

So far, we've considered probabilities of single events occurring in theoretical isolation: a single coin flip, a single die roll. Now, consider the probability of an event \(A\) given that some other event \(B\) has occurred. This new probability is read as "the probability of A given B" or "the probability of A conditional on B." Because this new probability quantifies the occurrence of A under the condition that B has definitely occurred, it is known as a conditional probability. Standard notation for conditional probability is:
\[ \begin{equation} P(A|B) \end{equation} \]

The vertical bar stands for the word "given." \(P(A|B)\) means "the probability of \(A\) given \(B\)."

It's really important to recognize right away that \(P(A|B)\) is not the same as \(P(B|A)\). To see why, dream up two related real-world events and think about their conditional probabilities:

probability that a road is wet given that it's raining: \(P(\text{wet road} ~ | ~ \text{raining})\)
probability that it's raining given that the road is wet: \(P(\text{raining} ~ | ~ \text{wet road})\)

The road will certainly get wet if it rains, but many things besides rain could result in a wet road (use your imagination). Therefore,

\[ \begin{equation} P(\text{wet road} ~ | ~ \text{raining}) > P(\text{raining} ~ | ~ \text{wet road}). \end{equation} \]

Bayes' Theorem Converts between P(A|B) and P(B|A)

Okay, so \(P(A|B)\) does not equal \(P(B|A)\), but how are they related? If we know one quantity, how do we get the other? This section title blatantly gave away the answer.

To derive Bayes' Theorem, consider events \(A\) and \(B\) that have nonzero probabilities \(P(A)\) and \(P(B)\). Let's say that \(B\) has just occurred. What is the probability that \(A\) occurs given the occurrence of \(B\)? In symbols, what is \(P(A|B)\)?

Well, since \(A\) occurs after \(B\), it will certainly be true that both \(A\) and \(B\) will have occurred. The occurrence of both \(A\) and \(B\) is itself an event; let's call it \(AB\), with probability \(P(AB)\). Now, note that \(P(B)\) will always be greater or equal to \(P(AB)\), because the "\(A\)" in "\(AB\)" represents an added criterion for event completion. The chance of both \(A\) and \(B\) occurring has to be lower than the chance of just \(B\) occurring (unless, of course, \(A\) is guaranteed to occur).

The value of \(P(AB)\) itself isn't as interesting as the ratio of \(P(AB)\) to \(P(B)\), and here's why. This ratio compares the probability of both A and B to the probability of B just by itself. The ratio gives the proportion of possible occurrences of \(AB\) relative to the possible occurrences of \(B\). You should be able to convince yourself that this ratio is none other than the conditional probability \(P(A|B)\):
\[  \begin{equation} P(A|B) = \frac{P(AB)}{P(B)}.  \end{equation} \]
Rearranging gives
\[  \begin{equation} P(AB) = P(A|B)P(B). \label{whatstheuse} \end{equation} \]
Similarly,
\begin{align} P(B|A) &= \frac{P(BA)}{P(A)} \\
P(BA) &= P(B|A)P(A). \end{align}
Since the order of \(A\) and \(B\) doesn't affect the probability of both occurring, we have \(P(AB) = P(BA)\), so
\[ \begin{equation} P(A|B)P(B) = P(B|A)P(A). \end{equation} \]
This leads to Bayes' Theorem:
\[  \begin{equation}P(A|B) = \frac{P(B|A)P(A)}{P(B)} \label{existentialangst} \end{equation} \]
There we have it: to convert from \(P(B|A)\) to \(P(A|B)\), multiply \(P(B|A)\) by the ratio \(P(A)/P(B)\). Let's see what this looks like in the disease example.

Back to the Disease Example

Let the symbols \(+\) and \(-\) stand for a positive and negative diagnosis, and let \(D\) stand for the event that disease is present. Since the test must return either \(+\) or \(-\) every time we test someone, \(P(+) + P(-) = 1\). And since disease must either be present or absent, \(P(D) + P(\text{~}D) = 1\).

Now, to determine precisely how much a positive diagnosis should worry us, we care about the probability of disease given a positive diagnosis. This is just \(P(D|+)\). By Bayes' Theorem (Equation \(\ref{existentialangst}\)), we need to calculate
\[ \begin{equation} P(D|+) = \frac{P(+|D)P(D)}{P(+)} \end{equation} \]
Consider each term on the right side:

\(P(+|D)\) is just the probability of getting a positive diagnosis given the presence of disease, i.e., the probability that the test works as advertised as a disease detector. This is the definition of the true positive rate, AKA the sensitivity, a very commonly quoted test metric.

\(P(D)\) is the probability of disease in a person randomly selected from our population. In other words, \(P(D)\) is the disease prevalence (e.g., 15 per 10,000 people).

What about the denominator, \(P(+)\)? It's the probability of getting a positive diagnosis in a randomly selected person. A positive diagnosis can be either 1. a true positive, or 2. a false positive.

A true positive is defined by the occurrence of both \(D\) and \(+\). Equation \(\ref{whatstheuse}\) says that the probability of both \(D\) and \(+\) is \(P(+|D)P(D)\). \(P(+|D)\) is the true positive rate, and \(P(D)\) is the disease prevalence.
A false positive is defined by the occurrence of both \(\text{~}D\) and \(+\). The probability of this is \(P(+|\text{~}D)P(\text{~}D)\). \(P(+|\text{~}D)\) is the false positive rate (after which the paradox is named), and \(P(\text{~}D) = 1 - P(D)\).

Since true and false positives are mutually exclusive events, their probabilities add up to give the probability of a positive outcome = either a true positive or a false positive. Thus,

\[ \begin{equation} P(+) = P(+|D)P(D) + P(+|\text{~}D)P(\text{~}D). \end{equation} \]

Bayes' Theorem for the disease-screening example now looks like this:

\[ \begin{equation} P(D|+) = \frac{P(+|D)P(D)}{P(+|D)P(D) + P(+|\text{~}D)P(\text{~}D)} \label{thehorror} \end{equation} \]

Plugging in Example Numbers

Part One of this series gave concrete numbers for a hypothetical outbreak of dancing plague. Let's insert those numbers into our newly minted Equation \(\ref{thehorror}\) to calculate the value of a positive diagnosis.

Dancing plague was assumed to affect one in 1000 people, so \(P(D) = 1/1000\). Since each person either has or does not have disease, \(P(D) + P(\text{~}D) = 1\).
Test sensitivity was 99%. Sensitivity is synonymous with the true positive rate, so this tells us that \(P(+|D) = 0.99\). And since the test must return either \(+\) or \(-\) when disease is present, \(P(-|D) = 1 - 0.99 = 0.01\).
Test specificity was 95%. Specificity is synonymous with the true negative rate, so \(P(-|\text{~}D) = 0.95\). Then \(P(+|\text{~}D) = 1 - 0.95 = 0.05\).

Inserting these numbers into Equation \(\ref{thehorror}\) gives
\[ \begin{align}
P(D|+) &= \frac{0.99 \cdot \frac{1}{1000}}{0.99 \cdot \frac{1}{1000} + 0.05 \cdot (1 - \frac{1}{1000})} \label{whyareyouevenwritingthis} \\
&= \frac{\frac{0.99}{1000}}{\frac{0.99}{1000} + 0.05 \cdot \frac{999}{1000}} \\
&= 0.01943 \\ &\simeq 1.9 \%
\end{align} \]
As expected, this is the same answer we got in Part One through a less rigorous approach.

Final Remarks

In this excessively long article, we derived Bayes' Theorem and used it to confirm our earlier reasoning in Part One that a positive diagnosis of dancing plague has only a 2% chance of being correct. This low number is an example of the false positive paradox, and Equation \(\ref{whyareyouevenwritingthis}\) reveals its origin.

The form of Equation \(\ref{whyareyouevenwritingthis}\) is [something] divided by [something + other thing], or \(\frac{t}{t + f}\). If \(f\) is small compared to \(t\), then \(\frac{t}{t+f} \simeq \frac{t}{t} = 1\), which means that the probability of disease given a positive test result is close to 100%. But if \(f\) becomes much larger than \(t\), then \(\frac{t}{t+f}\) becomes much less than 1. Looking at Equations \(\ref{thehorror}\) and \(\ref{whyareyouevenwritingthis}\), you can see that \(t\) matches up with the term \(P(+|D)P(D)\), the probability of getting a true positive, and \(f\) matches up with \(P(+|\text{~}D)P(\text{~}D)\), the probability of getting a false positive. In our example, \(t \simeq 0.001 \) and \(f \simeq 0.05 \). Thus, the chance of getting a false positive is 50 times higher than the chance of getting a true positive. That's why someone who tests positive probably has nothing to worry about, other than the social stigma of getting tested in the first place.

Cliffhanger Ending

In my next post, I'll explain the Bayesian methodology I used in the course of my involvement with the series To Catch A Killer. Essentially, the above analysis can be adapted to homicide investigations by replacing rare-disease prevalence with the homicide rate for a specific time, place, and demographic, and by treating the presence or absence of forensic evidence as positive or negative diagnostic test outcomes.

If you're intimately familiar with Bayes' Theorem or profoundly bored of it, you may still find value in this post by taking a shot every time you read the words "theorem" and "disease."

I first encountered Bayes' Theorem in a high school conversation about email spam filters. I didn't retain much about either the theorem or spam filters, but promptly added the term "Bayes' Theorem" to my mental list of Things That Sound Vaguely Technical And Also Possibly Sinister. (That list includes the names of every military and/or aerospace contractor that ever existed. If you think of any exceptions, send them my way.)

Years afterward, Bayes' Theorem started cropping up in my medical biophysics studies and after-hours discussions about airport and border security. More recently, I used Bayes' Theorem to weigh forensic evidence in the upcoming documentary series To Catch a Killer. The theorem seems to appear everywhere and makes you sound smart, but just what is it?

Basically, Bayes' Theorem tells you how to update your beliefs using new information. That's the best plain-English definition I can think of. More generally, Bayes' Theorem tells you how to manipulate conditional probabilities, saving you from fallacious logic along the lines of "most Pabst drinkers are hipsters, so most hipsters drink Pabst." (It may be true that most Pabst drinkers are not hipsters, but that's not the point of this fallacy. The lesson for me is that I come up with poor examples.)

Bayes' Theorem follows directly from basic probability principles, but proper derivations tend to look like field notes by Will Hunting on how to outperform pompous Harvard grad students at impressing Minnie Driver. Accordingly, this post shall include zero equations, which is great, since I figured out how to embed equations in my last post. Instead, I'll try to show the importance of Bayes' Theorem by posing the following brain teaser to you, dear reader.

Brain Teaser: You Tested Positive for a Rare Disease; Do You Really Have It?

Imagine that a disease afflicts 0.1% of the general population, or 1 in 1000 people. A particular diagnostic test returns either "positive" or "negative" to indicate the presence or absence of the disease. Let's say you know that this test is 99% sensitive. That's a compact way of saying that out of 100 people who truly do have the disease, 99 of them will correctly test positive, whereas 1 will erroneously test negative, even though they actually have the disease. Let's also say you know that the test is 95% specific. That means that out of 100 disease-free people, 95 will correctly test negative, but 5 of these healthy people will erroneously be told that they have the disease.

Suppose you run this test on yourself, and sweet buttery Jesus, it says you're positive. This deeply distresses you, as it should if the disease in question were, say, dancing plague. As psychosomatic head-bobbing sets in, you ask yourself the following question: given the positive test result, what are the chances that I actually have dancing plague?

Take another look at those goddamn numbers. The test is 99% sensitive and 95% specific. Should you embrace your groovy fate and invest in a bell-bottomed suit and unnervingly realistic John Travolta mask? Is all hope lost? Is the jig up?!

Think it over and decide on your final answer before reading on. At the very least, don't bother with precise numbers, but decide whether you think the chance of actually having dancing plague is more or less than 50%, given your positive diagnosis.

If you haven't seen this kind of question before, the chance that your answer exceeds 50% exceeds 50%. It turns out that even though you tested positive, the chance that you have the disease is only about 2%! Choreographed celebrations are in order.

Explanation

You don't actually need to know anything about Bayes' Theorem to correctly answer the above question, though you might end up stepping through the theorem without knowing it. Here's one way to proceed.

Pick a sample of 1000 people from the general population. On average, only 1 of these people will actually have the disease. The vast majority, 999 out of 1000, will be healthy. Our initial sample thus consists of 999 healthy people and 1 sick person. Now, test them all.

Our test is 99% sensitive. That means that when the one diseased guy in our sample gets tested, he'll be correctly identified as sick 99 times out of 100. Very rarely, 1 time in 100, the test will mess up and give him a negative result.

The specificity of 95% means that most healthy people will test negative, as they should. 95% of the initial 999 healthy people, or 949.05 of them on average, will correctly be told that they're disease-free. However, the remaining 49.95 healthy people will erroneously receive positive test results, even though they're fine.

Therefore, by testing each of our starting 1000 people, we'd find an average of 0.99 correct positive diagnoses and 49.95 incorrect positive diagnoses, giving 50.94 positive diagnoses in total. Rounding off the numbers, it's obvious that about 51 people in our initial 1000 will be freaked out by positive test results. However, only one of these people will actually have the disease.

If you test positive, you could be any one of those 51 people, so try not to panic: the chance that you're the one person who actually has dancing plague is 1/51, or 1.96%.

Final Remarks

What was that about Bayes' Theorem helping to update your beliefs? "Belief" refers to one possible way to interpret what it means for a random outcome to have some numerically determined chance of occurring. In the above disease example, it's sensible to think of the chance that someone is ill as a measure of how firmly you believe that they're ill.

If you randomly chose one person from the general population and didn't test them, you'd be pretty skeptical that they're ill, since the disease is so rare. The chance that you picked someone with the disease is 1/1000. Running the test then gives you new information -- specifically, the test outcome. That outcome is sometimes wrong, but you can still use the new information to update your prior belief that the person has the disease. If the person tests positive, your belief just jumped from a prior value of 1/1000 to a "posterior" value of 1/51, a 20-fold increase.

Cliffhanger Ending

In a future post, we'll derive Bayes' Theorem and show how it applies to this and other problems. Until next time!

EDIT: Part 2 is here.

Gritty Jello

Tuesday, April 22, 2014

Bayes' Theorem, Part 2

Review: Testing Positive for a Rare Disease Doesn't Mean You're Sick

False Positives and False Negatives

Probability Basics

Conditional Probability

Bayes' Theorem Converts between P(A|B) and P(B|A)

Back to the Disease Example

Plugging in Example Numbers

Final Remarks

Cliffhanger Ending

Monday, January 27, 2014

Bayes' Theorem, Part 1: Not Just a Mnemonic for Apostrophe Placement

Brain Teaser: You Tested Positive for a Rare Disease; Do You Really Have It?

Explanation

Final Remarks

Cliffhanger Ending