Accuracy Calculator
Enter the four confusion-matrix counts (true positives, true negatives, false positives, false negatives) to get accuracy, precision, recall, F1 score, specificity, false-positive rate, and prevalence in one click. Switch to the prevalence method to compute accuracy from sensitivity, specificity, and known prevalence, or use the percent-error method to compare a measured value with a reference value. All formulas are shown step by step.
What is accuracy and how is it calculated?
Accuracy is the fraction of predictions a model or test gets right out of all predictions made. In binary classification, a prediction falls into one of four cells: a true positive (TP) when the model correctly flags a positive case, a true negative (TN) when it correctly rejects a negative case, a false positive (FP) when it incorrectly flags a negative case, and a false negative (FN) when it misses a positive case. Accuracy = (TP + TN) / (TP + TN + FP + FN). Because it counts all correct predictions regardless of class, accuracy can be misleading when the classes are imbalanced: a model that always predicts "negative" can score 99% accuracy on a dataset where only 1% of cases are positive.
Precision, recall, F1 score, and specificity explained
Precision (also called Positive Predictive Value) asks: of all cases the model labeled positive, what fraction truly are positive? Precision = TP / (TP + FP). Recall (also called sensitivity or the true-positive rate) asks: of all actual positives, what fraction did the model find? Recall = TP / (TP + FN). The F1 score is the harmonic mean of precision and recall, giving equal weight to both: F1 = 2 x (Precision x Recall) / (Precision + Recall). Specificity (the true-negative rate) asks: of all actual negatives, what fraction did the model correctly reject? Specificity = TN / (TN + FP). The false-positive rate is 1 minus specificity. In many screening settings recall matters more because missing a positive (false negative) has higher cost than raising a false alarm. In spam filtering, precision often matters more because delivering legitimate emails to the spam folder is costly.
The three accuracy calculation methods
This calculator offers three methods. The confusion-matrix method computes accuracy and all related metrics directly from TP, TN, FP, and FN counts - ideal for evaluating machine-learning classifiers or diagnostic tests run on a labeled dataset. The prevalence-adjusted method is used in clinical epidemiology when the proportion of positive cases in your test sample differs from the real-world prevalence. The formula Accuracy = (Sensitivity x Prevalence) + (Specificity x (1 - Prevalence)) re-weights the results so they apply to the actual population. The percent-error method is used in physical sciences and engineering: Percent error = |Observed - Accepted| / |Accepted| x 100, and Percent accuracy = 100 - Percent error.
Class imbalance and when to use other metrics
Accuracy is a poor single metric when classes are imbalanced - for example, fraud detection (0.1% frauds) or rare-disease screening. In those cases, prefer the F1 score, the precision-recall AUC, or Matthews Correlation Coefficient (MCC). The ROC AUC measures performance across all decision thresholds rather than a single one. The confusion-matrix method here gives you precision, recall, F1, specificity, and false-positive rate so you can assess model quality beyond raw accuracy. If prevalence is below 10%, pay particular attention to precision (positive predictive value) rather than recall alone.
Classification performance thresholds (general guidance)
| Accuracy range | Interpretation | Typical use cases |
|---|---|---|
| 95% - 100% | Excellent | Production ML models, clinical diagnostics |
| 85% - 94% | Good | Most business applications, research models |
| 70% - 84% | Fair | Baseline models, early prototypes |
| Below 70% | Poor | Needs improvement; check for class imbalance |
Thresholds vary heavily by application. Medical screening demands higher recall; spam filters may tolerate lower recall for higher precision.
Frequently asked questions
What does accuracy mean in statistics?
In statistics, accuracy is how close a result is to the true value. In classification (machine learning, diagnostic testing), it is the proportion of all predictions that are correct: (TP + TN) / (TP + TN + FP + FN). In experimental science, accuracy is quantified as 100% minus the percent error between a measured value and the accepted true value.
What is a good accuracy for a classification model?
There is no universal threshold because it depends on the application, the class balance, and the cost of errors. As a rough guide, above 95% is considered excellent for most tasks, 85-95% is good, and below 70% usually signals a model that needs improvement. For imbalanced datasets (e.g. fraud, rare disease), accuracy alone is misleading - look at F1 score or precision-recall AUC instead.
What is the difference between precision and recall?
Precision measures how trustworthy positive predictions are: TP / (TP + FP). Recall measures how complete positive detection is: TP / (TP + FN). A model tuned for high precision avoids false alarms but may miss real positives. A model tuned for high recall catches most positives but may raise more false alarms. The F1 score is the harmonic mean that balances both, and it falls between them.
Why is accuracy misleading for imbalanced data?
If 99% of your data is "negative", a model that always predicts "negative" achieves 99% accuracy without being useful at all. In that case, precision, recall, and F1 score expose the real situation: the model has 0% recall, meaning it never detects any positive case. Always check precision and recall alongside accuracy, especially when one class is rare.
What is the prevalence-adjusted accuracy formula?
When a study sample does not reflect the real proportion of positive cases in the population, raw accuracy from the sample is biased. The prevalence-adjusted formula corrects this: Accuracy = (Sensitivity x Prevalence) + (Specificity x (1 - Prevalence)). For example, a test with 75% sensitivity and 90% specificity applied to a population with 10% prevalence gives an adjusted accuracy of 0.75 x 0.10 + 0.90 x 0.90 = 88.5%.
How do I calculate percent error and percent accuracy?
Percent error = (|Observed - Accepted| / |Accepted|) x 100. Percent accuracy = 100% - percent error. For example, if you measured 9.8 m/s and the accepted value is 10.0 m/s, the percent error is (|9.8 - 10.0| / 10.0) x 100 = 2%, and the percent accuracy is 98%.
What is the F1 score and when should I use it?
The F1 score is the harmonic mean of precision and recall: 2 x (Precision x Recall) / (Precision + Recall). It ranges from 0% to 100% and is highest when both precision and recall are high. Use F1 when you want a single metric that balances both, particularly when the dataset is imbalanced or when false positives and false negatives are both costly.