Skip to content
Statistics

Coefficient of Determination (R2) Calculator

Enter your paired x and y values (separated by commas or spaces) to compute the coefficient of determination, R2. The calculator fits a least-squares regression line, breaks down the sum of squares into explained (SSR) and unexplained (SSE) parts, reports the total (SST), and tells you exactly what percentage of variation in y is explained by x. You also get the Pearson correlation coefficient r, the regression equation, and a step-by-step worked solution.

Your details

Your independent (predictor) variable values. Separate them with commas, semicolons, or spaces. Must have at least 3 points.
Your dependent (response) variable values. Must have the same count as x values.
R2 (coefficient of determination)Very strong fit
0.9567

Proportion of variance in y explained by x (0 to 1)

Variance explained0.96%
Pearson r0.9781
Slope (b1)0.9214
Intercept (b0)0.7286
SSE (error)1.0757
SSR (regression)23.7729
SST (total)24.8486
n (data points)7
0.9567 R2
Very weak<0.25Weak0.25-0.5Moderate0.5-0.7Strong0.7-0.9Very strong0.9+
03.657.3147
x
  • Observed data
  • Regression line (R2 = 0.9567)

R2 = 0.9567: a very strong linear fit, explaining 95.7% of variance in y.

  • 95.7% of the variation in y is explained by the linear relationship with x; the remaining 4.3% is due to other factors or random noise.
  • The regression line is y = 0.9214x + 0.7286.
  • With 7 data points, the Pearson correlation r is 0.9781, indicating a very strong positive linear association.
  • An extremely high R2 can sometimes indicate overfitting or a spurious correlation; always check whether the relationship makes scientific or practical sense.

Next stepPlot your residuals to check the linearity assumption and look for outliers, then consider whether additional predictors could improve the model.

What is the coefficient of determination (R2)?

The coefficient of determination, written R2 (pronounced "R-squared"), measures how well a regression model fits its data. It ranges from 0 to 1 and answers one simple question: what fraction of the total variation in y does the linear relationship with x account for? An R2 of 0.85 means the model explains 85% of the variation in y, leaving 15% unexplained by x alone. At R2 = 1 every data point falls exactly on the regression line. At R2 = 0 the model is no better than simply predicting the mean of y for every observation. R2 is the square of the Pearson correlation coefficient r in simple linear regression, so a correlation of 0.9 gives R2 = 0.81.

How R2 is calculated: SSE, SSR, and SST

R2 is built from three sums of squares. The Total Sum of Squares (SST) measures how much y varies overall: SST = sum[(yi - y-bar)^2]. The Regression Sum of Squares (SSR) captures how much variation the fitted line accounts for: SSR = sum[(y-hat-i - y-bar)^2]. The Error Sum of Squares (SSE) is the leftover variance the model cannot explain: SSE = sum[(yi - y-hat-i)^2]. By construction SST = SSR + SSE, so R2 = SSR / SST = 1 - SSE / SST. The "show your work" panel above traces every step with your actual numbers: mean computation, OLS slope and intercept, each sum of squares, and the final ratio.

How to use this calculator

Paste or type your x values (predictor) in the first field and your y values (response) in the second, separated by commas, semicolons, or spaces. You need at least 3 paired observations. The calculator instantly reports R2, the percentage of variance explained, the Pearson r, and the full sum-of-squares breakdown. The gauge shows your R2 on a color-coded scale from very weak (red) to very strong (green). Use the chart to confirm that a straight-line model is visually reasonable before trusting the R2 value.

Limitations and common mistakes

R2 only measures linear fit. A perfect U-shaped relationship between x and y can produce an R2 near zero even though x predicts y almost exactly, because the regression line averages out the curve. Adding more predictors to a model always increases R2, even for random noise, which is why adjusted R2 is preferred for comparing multi-variable models. A high R2 does not mean x causes y: a spurious correlation between two unrelated time series can produce R2 close to 1. Always plot your data and residuals to check for non-linearity, outliers, and other violations before drawing conclusions from R2 alone.

R2 interpretation guide

R2 rangeStrength labelCommon interpretation
0.90 - 1.00 Very strong Model explains most variability; check for overfitting
0.70 - 0.89 Strong Good predictive power in most applied fields
0.50 - 0.69 Moderate Useful but other predictors likely matter
0.25 - 0.49 Weak Limited predictive value; revisit model specification
0.00 - 0.24 Very weak Model barely outperforms predicting the mean

Widely used benchmarks for interpreting R2 in applied regression. Context matters: a "weak" R2 in social science may be respectable, while a "strong" one in engineering may still be too low for a control system.

Frequently asked questions

What is a good R2 value?

It depends on the field. Physical sciences and engineering typically expect R2 above 0.95 for a model to be useful. In economics and finance, R2 of 0.5-0.7 is often considered strong. In psychology and social science, 0.3 can be notable because human behavior is highly variable. The reference table on this page gives widely used benchmarks, but always interpret R2 in the context of your specific domain and the alternatives available.

What is the difference between R and R2?

In simple linear regression, R (uppercase) is the absolute value of the Pearson correlation coefficient r, and R2 is its square. If r = 0.9, then R2 = 0.81. R2 has a cleaner interpretation as a proportion of variance explained, while r directly encodes the sign of the relationship (positive or negative). In multiple regression, R is the multiple correlation coefficient (correlation between y and y-hat), and R2 remains the proportion of variance explained.

Can R2 be negative?

In ordinary least-squares regression, R2 is always between 0 and 1 because the OLS line minimizes SSE by definition, so it can never do worse than the mean. However, if you compute R2 for a model whose line was NOT estimated by OLS (for example a line with a manually fixed slope), or if you apply a model trained on one dataset to a different dataset, then the formula 1 - SSE / SST can go negative, meaning the model is literally worse than predicting the mean.

What is adjusted R2 and when should I use it?

Adjusted R2 penalizes the ordinary R2 for every extra predictor added to a multiple regression model. The formula is 1 - (1 - R2)(n - 1) / (n - k - 1), where n is the sample size and k is the number of predictors. Use adjusted R2 whenever you are comparing models with different numbers of predictors. This calculator focuses on simple linear regression (one x, one y), where adjusted R2 = 1 - (1 - R2)(n - 1) / (n - 2).

Why can a high R2 be misleading?

Spurious correlations, overfitting, and non-random data collection can all inflate R2 without producing a useful model. Time-series data where both variables trend upward together often show R2 close to 1 even if x has no causal link to y. Adding irrelevant predictors to a model raises R2 mechanically. And a perfectly quadratic or sinusoidal relationship between x and y can give R2 near 0 even though x is a perfect predictor. Always visualize the data and residuals, not just the R2 number.

Sources

Written by Dr. Hannah Brandt, PhD Statistician · Munich, Germany

Applied statistician translating rigorous probability theory into clear, accurate tools for researchers and practitioners.

Search 3,500+ calculators

Loading search…