False Positive Paradox Calculator
A test that is 99% accurate can still be wrong most of the time when the condition it screens for is rare. This is the false positive paradox, a consequence of Bayes' theorem. Enter the prevalence of the condition, the test's sensitivity (true positive rate), and its specificity (true negative rate) to see the real probability that a positive result means you actually have the condition, plus a full breakdown of true and false positives across a reference population of 10,000 people.
What is the false positive paradox?
The false positive paradox (also called the base rate fallacy) is the counterintuitive result that a test with high accuracy can produce more false alarms than genuine detections when the condition being screened for is rare. Imagine a disease that affects 1% of the population and a test that is correct 99% of the time. Tested on 10,000 people, the test correctly detects about 99 of the 100 people who have the disease. But it also gives a false positive to 1% of the 9,900 healthy people, generating roughly 99 false alarms. The result: of the ~198 positive results, half are false, even though the test is 99% accurate. This is not a flaw in the test - it is a mathematical consequence of the low base rate.
Sensitivity, specificity, and base rate
Three numbers fully determine the predictive value of any binary test. Sensitivity (true positive rate) is the probability that someone who has the condition tests positive. Specificity (true negative rate) is the probability that someone who does not have the condition tests negative. Prevalence (base rate) is the fraction of the population that actually has the condition. The paradox arises when prevalence is low: even a small false positive rate (1 - specificity) applies to the large healthy majority, generating many more false alarms than the sensitivity can produce true detections from the small affected minority. Raising specificity reduces false alarms; raising sensitivity reduces missed cases - these are different levers for different goals.
How PPV is calculated using Bayes theorem
The Positive Predictive Value (PPV) is the probability that a positive test result is correct, computed with Bayes theorem: PPV = (sensitivity x prevalence) / [(sensitivity x prevalence) + (1 - specificity) x (1 - prevalence)]. The Negative Predictive Value (NPV) uses the same logic in reverse: NPV = (specificity x (1 - prevalence)) / [(specificity x (1 - prevalence)) + (1 - sensitivity) x prevalence]. The False Discovery Rate (FDR) is simply 1 - PPV: the share of positive results that are false alarms. All four metrics change with prevalence, which is why a test validated in a hospital population (high prevalence) may perform very differently when applied in general population screening (low prevalence).
Real-world implications
The false positive paradox has practical consequences across medicine, security, and machine learning. In medical screening programs, a confirmatory test is almost always required after an initial positive result precisely because PPV in low-prevalence populations is low. In airport security, the prevalence of threats is so small (perhaps 0.01%) that even excellent sensors produce overwhelmingly more false alarms than true detections. In spam filtering, the consequence of a false positive (legitimate email in the spam folder) is usually worse than a false negative (spam in the inbox), so engineers tune specificity very high. Understanding PPV and the base rate helps you interpret any binary classification result - medical, statistical, or otherwise - without being misled by headline accuracy figures.
False positive paradox: example scenarios
| Scenario | Prevalence | Sensitivity | Specificity | PPV | Interpretation |
|---|---|---|---|---|---|
| Rare disease screening | 0.1% | 99% | 99% | 9.0% | 9 in 10 positives are false alarms |
| HIV (low-risk population) | 0.5% | 99% | 99% | 33% | Two out of three positives are false |
| Disease (1% prevalence) | 1% | 99% | 99% | 50% | Coin-flip odds even with 99% accuracy |
| Airport security (weapon) | 0.01% | 99% | 99% | 0.99% | Almost all alarms are false |
| Common infection screening | 10% | 95% | 95% | 68% | Moderate predictive value |
| High-prevalence population | 30% | 95% | 90% | 80% | Good PPV when condition is common |
| Spam filter (99.9% spec) | 1% | 99.9% | 99.9% | 91% | Near-perfect spec rescues PPV |
Positive Predictive Value (PPV) for a test with 99% sensitivity and 99% specificity at different prevalence levels. Even a highly accurate test yields poor PPV when the condition is rare.
Frequently asked questions
How can a 99% accurate test be wrong most of the time?
Because accuracy alone does not account for how rare the condition is. When a disease affects only 1 in 1,000 people, a 1% false positive rate means 1 in 100 healthy people will trigger an alarm. Since there are far more healthy people than sick ones, those false alarms outnumber genuine detections. This is the false positive paradox, and it is a direct consequence of Bayes theorem applied to low base rates.
What is the difference between sensitivity and specificity?
Sensitivity measures how well a test catches true cases: it is the fraction of people who actually have the condition that the test correctly identifies as positive. Specificity measures how well the test clears healthy people: it is the fraction of people without the condition that the test correctly identifies as negative. A test can be highly sensitive (catches almost all cases) but have low specificity (many false alarms), or vice versa. Both matter, and which is more important depends on the consequences of each type of error.
What is the Positive Predictive Value (PPV) and why does it matter more than sensitivity?
PPV is the probability that a positive test result is a true positive. Unlike sensitivity and specificity, which are fixed properties of the test, PPV changes with prevalence. A test with 99% sensitivity and 99% specificity has a PPV of about 50% at 1% prevalence, 9% at 0.1% prevalence, and 91% at 10% prevalence. PPV is what a patient or clinician actually needs to know after receiving a positive result: it is the answer to "given that I tested positive, what is the probability I actually have the condition?"
How do I reduce false positives without losing sensitivity?
The two main approaches are: (1) increase specificity, which directly reduces the false positive rate applied to the healthy majority, and (2) restrict testing to higher-risk groups where prevalence is higher, which improves PPV without changing the test at all. Requiring a confirmatory second test on all initial positives (sequential testing) is another common strategy in medicine, because the prior probability entering the second test is the PPV of the first, which can be high even if the initial test has modest specificity.
What is the false discovery rate (FDR)?
The False Discovery Rate is the proportion of all positive results that are false alarms: FDR = 1 - PPV. If a test returns 100 positive results and 60 of them are false, the FDR is 60%. In genomics, machine learning, and multiple hypothesis testing, controlling the FDR (rather than the per-test error rate) is a standard statistical goal, made famous by the Benjamini-Hochberg procedure.
What is the base rate fallacy?
The base rate fallacy is the cognitive error of ignoring the prior probability (prevalence) of a condition when interpreting a test result, and focusing only on the test accuracy instead. It leads people to overestimate the probability that a positive result is real. The false positive paradox is the mathematical expression of what happens when you commit this error: you believe a 99% accurate test means a 99% chance of being sick, when the actual probability given a low base rate might be below 50%.