The Effect of Disease Prevalence on the Predictive Value of Diagnostic Tests
Anthony A. Killeen, MD, PhD
Standard Metrics of Diagnostic Tests
When considering the diagnostic utility of tests to identify persons with disease from normal persons, reference is frequently made to common metrics, namely, sensitivity, specificity, positive predictive value, and negative predictive value of a test. Traditionally, these concepts are explained with a simple matrix to define the terms true positive (TP), false positive (FP), true negative (TN), and false negative (FN) as follows.

Test Positive 
Test Negative 
Disease 
TP 
FN 
Normal 
FP 
TN 
Using these terms, the following metrics can be defined.
Sensitivity = TP/(TP+FN)
Specificity = TN/(TN+FP)
Positive predictive value (PPV) = TP/(TP+FP)
Negative predictive value = TN/(TN+FN).
What is not so clear from these definitions is the effect of disease prevalence on the predictive values of test results, particularly the more commonlyused positive predictive value. Although standard textbooks of clinical pathology make reference to the fact that disease prevalence influences the positive predictive value, this relationship is seldom explained.
Effect of Overlapping Values
Given the definition of PPV above, it is apparent that this metric is a function of the TP and FP values. If there were no false positives, the PPV would be TP/TP or 100%. This would occur if the test could reliably distinguish normal persons from persons with disease without any overlap of values. However, this is rarely the situation with laboratory results that fall on a continuous scale. Almost always there is some degree of overlap between results from normal and diseased subjects. For example, low levels of blood markers of myocardial infarction can be detected in healthy subjects. These may overlap the lower range of values found in patients with MI. When there is such overlap, some cutoff point must be established to distinguish a positive from a negative result. It is worth noting that even results which are reported as "positive" or "negative" by analytical instruments are generated from continuous scales by using a cutoff point. An example of this is the reporting of HIV1 antibody screening tests. In a popular assay system, these are detected by ELISA reactions which result in changes in absorbance as measured by a spectrophotometer. The lower end of absorbance readings in HIV1 positive patients overlap those of normal subjects. The computer in the analytical instrument defines a cutoff point of absorbance readings and then reports the results as positive or negative based on the cutoff value. Given that most assays show some overlap of values between normal and diseased subjects, it is apparent that it is very unusual for assays to have no FP results.
Prevalence of Disease Influences Positive Predictive Value
There is another situation in which there are no false positive results, namely, if everyone in the population had the disease. In this situation, every positive result would be a true positive. There could be no false positive results, and the PPV would be 100%. Conversely, if no one in the population had the disease, every positive result would be a false positive. There could be no true positives, and the PPV would be 0%. This leads us to conclude that the disease prevalence influences the PPV by influencing the true positive and false positive rates.
Bayesian Analysis to Determine Positive Predictive Value
The effect of disease prevalence may be determined by performing a Bayesian analysis. Consider a disease with a prevalence of 10% in a given population. A diagnostic test for the disease has both a sensitivity and a specificity of 95%. What is the positive predictive value of the test?
Using Bayesian analysis we can determine the likelihood that a person with a positive test result has the disease as follows. For those not familiar with Bayesian calculations, the steps are outlined here and shown in the following tables.
The two mutually exclusive possibilities are that the person has disease or is normal (row A). The a priori probability of having disease is 10% (0.1) and of being normal is 90% (0.9) (row B). If a person has the disease, the probability of having a positive test result is 0.95. If a person is normal, the probability of having a positive test result is 0.05 (row C). These probabilities are obtained from the sensitivity and specificity values above. If the sensitivity is 95%, then FN is 5%. If the specificity is 95%, then FP is 5%.
The joint probabilities are the products of multiplying values in rows B and C (row D). The final probability for each of the two mutually exclusive states is obtained by dividing each value in row D by the sum of the two values in row D. This is shown in row E.
A 

Disease 
Normal 
B 
A priori probability 
0.1 
0.9 
C 
Probability of a positive test result 
0.95 
0.05 
D 
Joint probability 
0.1 x 0.95 = a 
0.9 x 0.05 = b 
E 
Final probability 
a/(a+b) = 0.68 
b/(a+b) = 0.32 
In this case, with a disease prevalence of 10%, a person with a positive test result has a 68% probability of having disease, and a 32% probability of being normal.
In a population in which the disease prevalence is 0.1% (0.001), the calculations are as follows:
A 

Disease 
Normal 
B 
A priori probability 
0.001 
0.999 
C 
Probability of a positive test result 
0.95 
0.05 
D 
Joint probability 
0.001 x 0.95 = a 
0.999 x 0.05 = b 
E 
Final probability 
a/(a+b) = 0.019 
b/(a+b) = 0.981 
In this case, with a disease prevalence of 0.1%, a person with a positive test result has a 1.9% probability of having disease and a 98.1% probability of being normal.
The effect of disease prevalence on the positive predictive value for a test with sensitivity and specificity of 95% is shown in the following table.
Prevalence 
PPV 
0.1% 
1.9% 
1% 
16% 
10% 
68% 
20% 
83% 
50% 
95% 
Conclusion
The prevalence of a disease in the population has an important influence on the positive predictive value of a diagnostic test. With increasing disease prevalence, the more likely it becomes that a person with a positive test result has the disease, and the less likely it becomes that a positive result is a false positive. The effect of disease prevalence can be easily determined by Bayesian analysis.
Copyright (C) Anthony A. Killeen, 19992007. All rights reserved.