Matthews Correlation Coefficient (MCC) Calculator
Enter the four cells of your binary confusion matrix - true positives, true negatives, false positives, and false negatives - to calculate the Matthews Correlation Coefficient (MCC). You also get sensitivity, specificity, precision, accuracy, F1 score, and the normalized MCC. The calculator shows every step and interprets the result so you know how strong your classifier really is.
Formula
Worked example
A classifier returns TP = 50, TN = 40, FP = 10, FN = 5. Numerator: 50*40 - 10*5 = 1950. Denominator: sqrt(60 * 55 * 50 * 45) = sqrt(7,425,000) approx 2725. MCC = 1950 / 2725 approx 0.7156. Normalized MCC = (0.7156 + 1) / 2 = 0.8578. Sensitivity = 50/55 = 90.9%, Specificity = 40/50 = 80.0%, Accuracy = 90/105 = 85.7%.
What is the Matthews Correlation Coefficient?
The Matthews Correlation Coefficient (MCC) was introduced by Brian W. Matthews in 1975 as a way to measure the quality of binary (two-class) classifications. It summarizes the entire 2x2 confusion matrix into a single number ranging from -1 to +1. A value of +1 means every prediction is correct, 0 means the classifier performs no better than random guessing, and -1 means every prediction is the reverse of the truth. Unlike accuracy, which simply counts correct predictions, MCC takes into account all four cells of the confusion matrix: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). This makes MCC a balanced and reliable metric even when the two classes are very different in size.
MCC versus accuracy, F1, and ROC AUC
Accuracy can be dangerously misleading on imbalanced datasets. Imagine a spam detector where 95% of emails are legitimate: a classifier that always predicts "not spam" achieves 95% accuracy while having zero ability to catch spam. MCC exposes this failure because it factors in all four confusion-matrix cells. F1 score is an improvement over accuracy because it balances precision and recall, but it ignores true negatives entirely, so it can produce an inflated result when the negative class is large. Research published in BMC Genomics (Chicco and Jurman, 2020) and in BioData Mining (Chicco et al., 2021) consistently shows that MCC provides a more informative summary than F1 or accuracy for imbalanced data. ROC AUC requires varying a decision threshold, while MCC works with a single fixed threshold, which is often what practitioners actually deploy.
How to read the confusion matrix
A confusion matrix lays out predictions against actual outcomes for a binary classifier. True Positives (TP) are cases correctly labeled as the positive class. True Negatives (TN) are cases correctly labeled as the negative class. False Positives (FP), also called Type I errors or false alarms, are cases labeled positive but actually negative. False Negatives (FN), also called Type II errors or misses, are cases labeled negative but actually positive. The total N = TP + TN + FP + FN. Once you have these four numbers, you can derive every common classification metric, including MCC, sensitivity, specificity, precision, accuracy, and F1 score.
Normalized MCC and why it matters
The standard MCC ranges from -1 to +1, which can feel unfamiliar if you are used to metrics like accuracy that live in the 0 to 1 range. The normalized MCC (also called normMCC) is simply (MCC + 1) / 2, rescaling the result to a 0 to 1 interval where 0 is the worst possible classification and 1 is perfect. This makes normMCC directly comparable to accuracy and F1 score in benchmarking tables or leaderboards. A normMCC of 0.5 corresponds to an MCC of 0, meaning random performance. This calculator shows both values so you can use whichever fits your reporting context.
MCC interpretation guide
| MCC range | Interpretation | Signal quality |
|---|---|---|
| 0.9 to 1.0 | Outstanding | Excellent |
| 0.7 to 0.9 | Strong | Very good |
| 0.5 to 0.7 | Moderate | Good |
| 0.3 to 0.5 | Weak | Fair |
| 0.0 to 0.3 | Very weak / random | Poor |
| -1.0 to 0.0 | Negative / inverse | Flip labels |
Widely used qualitative thresholds for the Matthews Correlation Coefficient. These are practical guidelines, not hard cutoffs.
Frequently asked questions
What does an MCC of 0 mean?
An MCC of exactly 0 means the classifier is providing no useful information above chance. Its predictions are statistically unrelated to the actual labels. This can happen when the classifier always predicts the same class, or when the errors and correct predictions happen to cancel out in the formula. Importantly, 0 does not necessarily mean 50% accuracy: on an imbalanced dataset, a classifier with 95% accuracy can still have an MCC near 0 if it never correctly identifies the minority class.
Can MCC be negative, and what does that mean?
Yes. A negative MCC means the classifier is doing worse than random guessing: it is predicting the wrong class more often than the right one. The extreme case is MCC = -1, which means every single prediction is incorrect. In practice, a negative MCC on a real model usually signals a labeling error (for example, the positive and negative classes are swapped in the training data), a threshold misconfiguration, or a fundamentally broken feature set.
Is MCC better than F1 score?
For most real-world binary classification tasks, especially those with class imbalance, MCC is considered more informative than F1 score. F1 ignores true negatives entirely, which can make a classifier appear strong even when it fails on the negative class. MCC uses all four confusion-matrix cells and is symmetric: it does not privilege either class over the other. Several peer-reviewed papers, including work published in BMC Genomics, recommend MCC as the standard summary metric for binary classification.
What happens when the denominator is zero?
The denominator of the MCC formula is zero whenever any of the four bracket products is zero: specifically when TP + FP = 0 (no positive predictions), TP + FN = 0 (no actual positives), TN + FP = 0 (no actual negatives), or TN + FN = 0 (no negative predictions). By convention, MCC is defined as 0 in this degenerate case. This calculator handles that gracefully and returns 0 rather than an error.
What is a good MCC value?
There are no universally fixed thresholds, but common practical guidelines treat MCC above 0.7 as strong, 0.5 to 0.7 as moderate, 0.3 to 0.5 as weak, and below 0.3 as approaching random. The scale depends heavily on the domain: a medical diagnostic model may require MCC above 0.9, while an early-stage NLP prototype might still be useful at 0.4. Always compare MCC to a baseline (such as always predicting the majority class) rather than treating any single threshold as universally meaningful.
How is the phi coefficient related to MCC?
They are the same formula. The phi coefficient (also written as the Phi coefficient or mean square contingency coefficient) is the standard statistical measure of association between two binary variables. When one binary variable is the actual class and the other is the predicted class, the phi coefficient is exactly the Matthews Correlation Coefficient. MCC is simply the name used in machine learning and bioinformatics contexts.
Sources
- Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 2020.
- Chicco D, Warrens MJ, Jurman G. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness. BioData Mining, 2021.