- Probability, like time, is a concept invented by humans, and humans have to bear the responsibility for the obscurities that attend it. John Archibald Wheeler
We have been deconstructing the Androgen Deficiency in Aging Males (ADAM) questionnaire and measuring its worth in terms of identifying men who are possibly experiencing low testosterone. Based on data from its source publication, the ADAM’s ability to accurately predict men with low testosterone is about 42 percent.
A positive predictive value of 42 percent suggests that the ADAM will be wrong more often than it is right. If the ADAM predicts you have low testosterone, it is a safe bet that you do not. However, because we know that 1 out of 4 men in the original ADAM study had low testosterone, the ADAM did outperform guessing by over 15 percent.
Is a 15 percent increase meaningful?
Well, that depends on what we are trying to predict, the relative costs of error in that prediction, and the original base rate from which we began.
If we are trying to improve our ability to predict a tornado, a 15 percent increase in prediction may lead to saved lives. However, if tornados are rare in our geographical area, this method of prediction will also lead to a greater number of false alarms. False alarms may cause unnecessary panic or, worse, a dismissal of tornado warnings as usually wrong by those who live in that area. Under these circumstances, we might prefer a more precise early warning system and we might deem a 15 percent increase as not beneficial.
If we are trying to predict the presence of a specific type of rare cancer, a 15 percent increase in prediction again will also lead to many false alarms. However, if that cancer is highly treatable once detected and its treatment not invasive, we might be willing to accept a trade-off between this increase in prediction coupled with an increase in false alarms.
The point here is that our capacity to make a reasonable judgement depends on the meaning of the event to us, the probability of that event occurring, and our ability to accurately detect that event once it has occurred. Classic Bayesian probability, of course, can not comment on the subjective relevance or moral weight of an event. It cannot know what we hold in our minds’ eye. Instead, it’s main focus is on the general probability or base rate of an event and the specific instance of that event under consideration.
As humans beings, however, we seem to fail to consider the base rate that underlies all events. We are seduced by the instance. The example given in the last post based on a study by Agoritsas and his colleagues (2010) illustrates both our blindness and easy seduction. To repeat:
- As a school doctor, you perform a screening test for a viral disease in a primary school.
- The properties of the test are very good: Among 100 children who have the disease, the test is positive in 99, and negative in only 1, and among 100 children who do not have the disease, the test is negative in 99, and falsely positive in only 1.
- On average, about 1 out of 100 children are infected without knowing it.
- If the test for one of the children is positive, what is the probability that he or she actually has this viral disease?
As mentioned in the last post, when this problem was offered to a sample of more than 1000 physicians practicing in Switzerland, the majority guessed the probability of detection of viral disease at 95 percent or greater. This remained true even when the rate of prevalence was manipulated to range anywhere from 1 to 95 percent or was undetermined.
In fact, the answer to this riddle is 50 percent as depicted below:
The high sensitivity and specificity of the diagnostic test is tempered by the low prevalence of the virus. Although the diagnostic test is highly accurate, there is still a relatively large amount of false positives due to more people not being afflicted by the virus (99 out of 100) than the amount of people who do have the virus (1 out of 100).
The authors of this study highlight that the improper use of probability may result in medical error. If the outcome of diagnostic error leads to biomedical consequences, then the tendency to ignore prevalence or base rates goes beyond a curious phenomenon of human decision making, it becomes a potentially harmful event.
In the psychology literature, the general disregard of base rate information has long been a source of focus. Meehl and Rosen (1955) offered an early exploration of the importance of base rates or, more specifically, the lack of base rate information in most psychological tests. As well, considerable experimental research in the 1970s was conducted by Kahneman and Tversky and analyzed universal flaws in reasoning during decision making tasks. However, one of the more comprehensive and influential articles on base rate errors was written by Bar-Hillel in 1980.
Bar-Hillel labelled the tendency to ignore information about the historical occurrence of an event as the base-rate fallacy. Her interest was in gaining a better understanding under what circumstances base rate errors were most likely to occur.
Bar-Hillel did not see the base-rate fallacy as inevitable. Instead, she demonstrated that its influence could be reduced through manipulation of how information was presented and, more importantly, by increasing the relevance of base rate information. Bar-Hillel argued that if we deem information as possessing low relevance then we tend to disregard that information. It is not that we are unaware or ignorant of base rate information. On the contrary, she argued, we disregard this information because we strongly feel it should be disregarded.
The results of the Agoritsas study clearly demonstrates that the majority of physicians do not attend to a disorder’s general occurrence or base rate when making a clinical decision. They fail to do so either because of unawareness or, following Bar-Hillel, because they deem it of low importance.
In the case of the virus problem, the information provided was highly sparse and intended to focus on the importance of base rates. Real diagnostic problems, however, are complex and carry an abundance of possibly relevant information. Disregarding some information is an important step in pruning a problem to its smallest set of possible diagnoses. Determining that the probability of a correct diagnosis is 50 percent given the outcome of specific test makes complete sense in Bayesian logic. Yet, when a definitive yes or no response is required, as in health care, judicial decisions, or a marriage proposal, this is not overly helpful.
When asked to decide whether a child is positive for a virus, one must decide. You cannot 50 percent decide. You cannot treat a child with a half-measure. In the Agoritsas study, those physicians surveyed may have intuitively moved past the question of probability and toward the final goal of clinical action. For these physicians, if a child tests positive for a virus, they will chose to treat that child. Therefore, while it may be true that their answer to the question, as posed, was incorrect, the course of action that stemmed from the incorrect answer may have been consistent with those physicians’ method of practice.
Another way of thinking about the physicians’ process of decision making is that they chose to disregard the existing virus base rate and, instead, inserted a prior probability of perfect uncertainty. In Bayesian terms, a prior of perfect uncertainty looks like this:
If you think about it, most of us approach common day-to-day decisions in this way. Our base rates are subjective and tend to follow our personal history of exposure to certain events. If a problem is novel to us, we may opt for a prior probability of perfect uncertainty, as did the physicians in the virus problem. Across time, however, as we accumulate personal history of the same repeated event, we may start to adjust our prior probability rate.
A prior probability of perfect uncertainty is quite allowable in Bayesian probability. In fact, under Bayesian inference, it is mandatory (more on this in later posts). Bayesian probability was originally designed as a method of determining unknown events and, ironically, it was this subjective quality of the Bayesian approach that led to it being disfavored in those years following its publication.
Given the pervasive nature of the base-rate fallacy, the strong push in medical practice to treat any possible disorder, and that meeting patients’ needs is correlated with patient satisfaction, it is very likely that any physician who obtains a positive test result will move toward treatment even in the midst of high false positive rates.
- Bar-Hillel, M. (1980). The base-rate fallacy in probability judgments. Acta Psychologica, 44, 211-233.
- Wheeler, J. A. (1990). Information, physics, quantum: The search for links. In W. H. Zurek (Ed.), Complexity, entropy, and the physics of information : The proceedings of the 1988 workshop on complexity, entropy, and the physics of information held May-June, 1989, in Santa Fe, New Mexico (pp. 3-28). Redwood City, CA: Addison-Wesley.