19. Sick Versus Slick: 6. Sensitivity, Specificity, and Predictive Values

In this series we have been discussing male menopause, or Androgen Deficiency in Aging Males (ADAM), as defined by Morley and his colleagues in 2000.  In our last post, we reviewed the ADAM questionnaire, also created by Morley, and its ability (or inability) to identify men who may be suffering from testosterone deficiency.

Despite over 400 citations and its popularity among pharmaceutical and other commercial medical websites, the ADAM questionnaire is not recommended as a method of detecting testosterone deficiency.  Or at least that is the opinion of the Endocrine Society (ES), the International Society of Andrology (ISA), the International Society for the Study of Aging Male (ISSAM), the European Association of Urology (EAU), the European Academy of Andrology (EAA), and the American Society of Andrology (ASA).

Pray tell, how is that most, if not all, learned societies focusing on men’s health discourage indiscriminate use of the ADAM questionnaire, yet that same questionnaire is strongly promoted by those who peddle testosterone products? In short, the ADAM questionnaire tends to over-diagnose or over-predict the presence of testosterone deficiency.  Its tendency to over-diagnose makes the ADAM questionnaire a limited clinical tool but that same over-diagnostic tendency makes the ADAM questionnaire a superb marketing tool.  It all depends on what you are trying to achieve — optimal men’s health or optimal sales of men’s testosterone products.

Now I am not focusing on the ADAM questionnaire because it is egregious.  It is, in fact, no better or worse than a thousand other relatively short questionnaires that attempt to distill complex disorders down to simplistic outcomes.  I am focusing on it because it is popular.  It is ubiquitous.  And it is used uncritically.

In the last post, we tried to make this argument by discussing the concepts of sensitivity, specificity, true positives, and false positives among diagnostic tests.  It might be helpful to illustrate these diagnostic statistics with actual numbers.  And so I shall.

As described before, Morley et al administered their questionnaire to a sample of 316 Canadian physicians and measured these physicians’ testosterone levels.  Twenty-five percent of this physician sample had bioavailable testosterone levels lower than 70 ng/mL and were deemed to be hypogonadal.  In their publication, Morley and colleagues only provided percentages and did not give the frequency or count of physicians who fell into each diagnostic category.  However, because we know how many physicians had levels of low testosterone (25 percent), and the sensitivity (88 percent) and  specificity (66 percent) of the diagnostic questionnaire, it is easy to estimate how many physicians fell into each group.  We suspect the numbers looked like this:

Fig 19-1

Sensitivity and Specificity

In diagnostic testing, sensitivity and specificity represent two important components of a test and help us to make informed decisions about the quality of that test.

Fig 2 19b

Sensitivity reflects the relationship between the diagnostic test and the presence of the condition or disorder of interest.  Those who possess the disorder and are correctly identified represent a true positive result.   Those who possess the disorder and are incorrectly identified represent a false negative outcome. In the ADAM questionnaire and study, 69 physicians had true positive results and 9 physicians had false negative results. The ADAM’s sensitivity can be determined by dividing the true positive cases by the total number of men who had low levels of testosterone.

Fig 3 19b

Specificity is concerned with the relationship between a diagnostic test and the absence of the condition or disorder of interest.  Those who do not possess the disorder and are correctly identified represent what is called a true negative result.   Those who do not possess the disorder and are incorrectly identified represent a false positive outcome. In the ADAM study, 143 physicians had true negative results and 95 physicians had false positive results. The ADAM’s specificity is determined by dividing the true negative individuals by the total number of physicians who had normal levels of testosterone.

Positive and Negative Predictive Value

Sensitivity asks the question:  When a disorder is present, how well does our test predict that disorder’s presence? Specificity, on the other hand, asks the question:  When a disorder is absent, how well does our test predict that disorder’s absence?

As highlighted in the last post of this series, it is very easy to confuse sensitivity or specificity with predictive accuracy.  Consider this problem:

  • Among men with low testosterone, we know that the ADAM questionnaire correctly identifies the presence of low testosterone approximately 90 percent of the time.  If a man is identified as having low testosterone on the ADAM questionnaire, what is the probability that he has low testosterone?

Because of how human cognition operates, every fiber of our being wants to answer this question as: “Approximately 90 percent.”

Let me ask the same question again but in a different context:

  • Three little kittens have lost their mittens.  We find a mitten.  What is the probability it belongs to a kitten?

I know there is a part of you that wants to guess but the reality is that there is not enough information available to answer who probably owns the lost mitten.  Just as we need to know how many kittens and non-kittens have lost their mittens before we can answer this question, we need to know how many men with low testosterone and men with non-low testosterone exist before we can guess the ability of the ADAM to accurately predict the presence of men with low testosterone.

Positive predictive value refers to the degree to which a positive result on a diagnostic test is correct.  This is a comparison between the number of true positive results and the total number of positive predictions.  The ADAM predicted 164 physicians had low testosterone but was correct in only 69 cases.

Fig 4 19b

The complement to positive predictive value is negative predictive value or the degree to which a negative result on a diagnostic test is correct.  The ADAM predicted that 152 physicians had normal levels of testosterone and was correct in 143 cases.

Fig 5 19b

So, although the ADAM has a high degree of sensitivity, its ability to predict the presence of low testosterone is modest.  As well, a comparison of the ADAM’s positive and negative predictive values suggest that a negative result on the ADAM offers more predictive accuracy than a positive result.  That is, the ADAM is better at excluding the presence of low testosterone than it is at confirming the presence of low testosterone.

Another Way of Looking at Predictive Values

We know that in the current sample that 25 percent of physicians tested positively for low testosterone.  We also know that for those physicians who had a positive ADAM result, 42 percent also possessed low testosterone on blood testing – the positive predictive value.  Or, put another way, in this sample, the probability that a physician actually had a low testosterone level given a positive ADAM result was 42 percent.

There is another way to arrive at this same predictive value without needing to always break down the number of people in each diagnostic category.  This method has its beginning in work initially conducted by Thomas Bayes.  Bayes was an English Presbyterian minister whose thoughts on probability and prediction were published posthumously in 1763.  Bayes’ Theorem, named in his honor, holds a special place in diagnostic testing.

Bayes’ theorem will be the topic of our next post in this series.