In this series, we have been discussing the diagnostic statistics of sensitivity, specificity, and predictive values as they pertain to the Androgen Deficiency in Aging Males (ADAM) questionnaire. The ADAM questionnaire was designed by Morley and his colleagues as a screening test for low testosterone among middle-aged men. Although the ADAM demonstrates good sensitivity (88%) in identifying men possibly experiencing low testosterone, it’s rates of specificity (60%), and positive predictive value (42%) are more modest.
In the prior post, we showed how the predictive value of a positive response on the ADAM can be calculated directly.
To refresh your memory, Morley et al (2000) administered their questionnaire to a sample of 316 Canadian physicians and measured those physicians’ testosterone levels. Twenty-five percent of this physician sample had bioavailable testosterone levels lower than 70 ng/mL. The frequency count of physicians who fell into each diagnostic category was as follows:
To explain, let’s superimpose Bayesian terms over our ADAM example.
Here, event A (testosterone) can be low (A) or not low (A’) and event B (ADAM) can be positive (B) or not positive (B’). Figuring out the probabilities of A and B is fairly straight-forward.
The probability of A, or low testosterone, is the number of men with low testosterone divided by the total number of men who were tested:
In diagnostic terms, P(A) represents the base rate or prevalence of the disorder under investigation. In Bayesian language, P(A) is called the prior probability. The prior represents the known or assumed probability of the event or condition of interest we are trying to predict. It is thought of as prior in that it represents information we possess before we conduct our diagnostic test.
The probability of B, or a positive ADAM score, is the number of men with positive ADAM scores test divided by the total number of men who were tested:
The next term, P(B/A), is defined as the probability of B given A. However the phrase “B given A” is only crystal clear to those who have spent the majority of their adult life wrestling math, physics, or statistical formulas. For the rest of us, it is easier to think of the term P(B/A) as asking the question: “How many B’s are in A?” That is, the number of positive ADAM scores in the low testosterone group is:
In diagnostic terms, this represents the sensitivity of a test. In Bayesian language, P(B/A) is called the likelihood function. Likelihood is an important concept in both Bayesian and diagnostic statistics.
With the above terms, we can now calculate the positive predictive value of the ADAM as follows:
It does seem unnecessary to use Bayes’ theorem to determine the positive predictive value of a diagnostic test (or, in Bayesian terms, the posterior probability) when one can do this easily by using a simple table. Yet Bayesian statistics are important in that they emphasize that diagnostic information depends on not only sensitivity and specificity but disorder prevalence as well.
To illustrate, consider the following problem:
- As a school doctor, you perform a screening test for a viral disease in a primary school.
- The properties of the test are very good: Among 100 children who have the disease, the test is positive in 99, and negative in only 1, and among 100 children who do not have the disease, the test is negative in 99, and falsely positive in only 1.
- On average, about 1 out of 100 children are infected without knowing it.
- If the test for one of the children is positive, what is the probability that he or she actually has this viral disease?
Congratulations if you answered with a probability anywhere in the range of 95 percent or greater. If you did, you are in agreement with approximately 80 percent of a sample of physicians who were posed the same question. You, in fact, are wrong. But, at least, you are not alone.
This virus screening problem is a classic diagnostic riddle used to introduce health care workers to both Bayesian statistics and the influence of prevalence on clinical decision making. The actual answer to the question is 50 percent and here is why.
Although our diagnostic test is highly sensitive at 99 percent (among 100 children who have the disease, the test is positive in 99) and highly specific at 99 percent (among 100 children who do not have the disease, the test is negative in 99), the rate of prevalence is low (1 out of 100 children are infected without knowing it).
A low rate of prevalence has a direct influence on the frequency of false positive test results. In table form, the diagnostic virus problem looks like this:
The question was: “If the test for one of the children is positive, what is the probability that he or she actually has this viral disease?” In the table above that probability is the positive predictive value or 99 divided by 198 or 50 percent. The number of true positives are equal to the number of false positives due to the virus’ low rate of prevalence.
The benefit of using Bayes’ theorem is that we can also arrive at this solution with just the three probabilities that are offered in the virus problem. It looks like this:
Despite the diagnostic importance of prevalence, health care providers consistently overvalue the sensitivity and specificity of diagnostic tests and undervalue (or completely disregard) the prevalence of disorders. In fact, in their sample of 1361 physicians, Agoritsas and his colleagues (2011) found that independent of whether the prevalence rate offered was 1, 2, 10, 25, or 95 percent the most frequent answer to the virus problem was 95 percent or higher. This remained true even when physicians were not given any information regarding prevalence and, technically, no answer was possible.
Why would this be the case?
Why would highly trained diagnosticians ignore rates of prevalence despite their importance in understanding test outcomes? Agoritsas suggested that these errors may occur because physicians are (1) unaware of the impact of prevalence on test outcome, (2) may have a poor understanding of basic statistical properties of diagnostic tests, or (3) have difficulty in applying the basic arithmetic underlying Bayesian probability.
Perhaps. Yet this sample was derived from all 2745 physicians currently practicing in Geneva Switzerland. It is hard to imagine that within that large population of health care providers that the majority would be unaware of disorder prevalence, or of diagnostic statistics, or basic arithmetic.
Instead, it remains probable that these physicians were using more a implicit or subjective method of estimating their confidence in the outcome of the diagnostic test. Specifically, these physicians were most likely making a common error in judgement called the base rate fallacy.
In the next post in this series, we will discuss the base rate fallacy.
- Agoritsas, T., Courvoisier, D. S., Combescure, C., Deom, M., & Perneger, T. V. (2011). Does prevalence matter to physicians in estimating post-test probability of disease? A randomized trial. Journal of General Internal Medicine, 26, 373-378.
- Steurer, J., Fischer, J. E., Bachmann, L. M., Koller, M., & ter Riet, G. (2002). Communicating accuracy of tests to general practitioners: A controlled study. British Medical Journal, 324, 824-826.