Skip to content
Statistics

Residual Calculator

A residual is the gap between what you observed and what your model predicted. Use the single-value mode to find one residual quickly, or switch to regression mode to enter up to ten (x, y) data pairs and get the full picture: the fitted line, each residual, the sum of squared errors, R-squared, RMSE and MAE. All results update as you type.

Your details

Single mode needs just observed and predicted values. Regression mode fits a least-squares line to your data pairs.
The actual measured value from your dataset.
The value your model or regression line predicts at this point.
ResidualResidual present
4

Observed minus predicted (e = y - y-hat)

Residual = 4.0000 (positive: the model under-predicted)

  • A positive residual means the observed value is higher than predicted: the model falls short here.
  • A negative residual means the observed value is lower than predicted: the model overshoots.
  • A residual of zero is a perfect prediction for that point.
  • Residuals should be randomly scattered around zero in a well-fitted model.

Next stepTo assess model quality, collect all residuals and check whether they are randomly distributed with no pattern.

Formula

e=yy^,y^=a+bx,b=SSXYSSXX,a=yˉbxˉ,R2=1SSESSTe = y - \hat{y}, \quad \hat{y} = a + bx, \quad b = \dfrac{SS_{XY}}{SS_{XX}}, \quad a = \bar{y} - b\bar{x}, \quad R^2 = 1 - \dfrac{SSE}{SST}

Worked example

For observed y = 52 and predicted y-hat = 48: residual e = 52 - 48 = 4. For a regression with x = [1,2,3,4,5] and y = [2.1,3.9,6.2,7.8,9.5], least-squares gives slope b = 1.86, intercept a = 0.22, so the fitted line is y-hat = 0.22 + 1.86x. The residual at x = 3, y = 6.2 is 6.2 - (0.22 + 1.86 x 3) = 6.2 - 5.80 = 0.40.

What is a residual?

A residual (also called a fitting error) is the difference between what you actually observed and what your statistical model predicted for the same point. The formula is e = y - y-hat, where y is the observed value and y-hat is the predicted value. A positive residual means the observation is above the model line (the model under-predicted), and a negative residual means the observation is below the line (the model over-predicted). Residuals are the raw material for judging whether a regression model fits well: a well-fitted model has residuals that are small and scattered randomly around zero with no systematic pattern.

How this calculator works

Choose single-value mode to find one residual quickly: enter the observed value and the predicted value from your model and the calculator returns their difference. Switch to regression mode to enter up to ten (x, y) coordinate pairs as comma-separated lists. The calculator fits a least-squares line, computes the predicted value and residual for every point, and reports the slope and intercept, the residual table, SSE, SSR, SST, R-squared, RMSE and MAE. Results update instantly so you can experiment with different data sets.

Sum of squared errors, R-squared, RMSE and MAE explained

The Sum of Squared Errors (SSE) adds up each squared residual: it is the total unexplained variation in your data. The Total Sum of Squares (SST) is how much the observed values vary around their own mean. The Regression Sum of Squares (SSR = SST - SSE) is the variation the model accounts for. R-squared (also called the coefficient of determination) divides SSR by SST, giving a proportion between 0 and 1: 0.90 means the line explains 90 percent of the variance in y. RMSE (Root Mean Squared Error) is the square root of SSE/n, expressed in the same units as y, and represents the typical prediction error. MAE (Mean Absolute Error) is the average of the absolute residuals, which is less sensitive to large outliers than RMSE.

Using residuals to check your model

Looking at the pattern of residuals is more informative than looking at R-squared alone. If residuals fan out as x increases (heteroscedasticity), your errors are not constant and a transformation or weighted regression may help. A curved pattern in the residuals suggests a non-linear term is needed. Residuals that cluster into groups by an unmodeled variable suggest adding that variable. Formal tests such as the Breusch-Pagan test (for heteroscedasticity) and the Durbin-Watson test (for autocorrelation in time-series data) build on the raw residuals this calculator provides.

R-squared interpretation guide

R-squared rangeFit qualityInterpretation
0.90 to 1.00 Strong Model explains most of the variation; very close fit
0.70 to 0.89 Moderate Reasonable fit; some unexplained variation remains
0.50 to 0.69 Weak-moderate Model captures a trend but many residuals are large
0.00 to 0.49 Poor Little variation explained; consider a different model

Common benchmarks for evaluating goodness of fit in simple linear regression.

Frequently asked questions

What does a negative residual mean?

A negative residual means the observed value is smaller than the predicted value: your model over-predicted at that point. If most residuals are negative in a particular region of x, the model is consistently too high there, which is a sign the fit could be improved.

What is the difference between a residual and an error?

In regression, the word "error" usually refers to the unobservable true difference between y and the population regression line, while "residual" refers to the difference between y and the fitted sample line. Residuals are what you can actually calculate from your data; errors are theoretical quantities in the underlying statistical model.

What is a good R-squared value?

It depends on the field. In physical sciences an R-squared of 0.99 is common, while in social sciences 0.50 can be considered acceptable because human behaviour is harder to predict. Rather than judging R-squared in isolation, check whether the residuals look random and whether the model makes theoretical sense.

Why do we square the residuals when computing SSE?

Squaring serves two purposes: it makes every term positive so that positive and negative residuals do not cancel each other out, and it penalises large errors more than small ones. The result is that the least-squares line minimises SSE, giving the best possible fit under the assumption of normally distributed errors.

What is the difference between RMSE and MAE?

RMSE squares the residuals before averaging, so it gives extra weight to large errors. MAE simply averages the absolute values, treating all residuals equally regardless of size. If your data has occasional large outliers you should compare both: a much larger RMSE than MAE signals that a few large errors are driving the RMSE up.

How many data points do I need for linear regression?

You need at least two points to define a line, but two points always produce a perfect fit with R-squared = 1 and zero residuals, which tells you nothing useful. As a practical minimum, aim for at least 10 to 20 observations so that the estimates of slope and intercept are stable and the residual pattern is meaningful.

Sources

Written by Dr. Hannah Brandt, PhD Statistician · Munich, Germany

Applied statistician translating rigorous probability theory into clear, accurate tools for researchers and practitioners.

Search 3,500+ calculators

Loading search…