Residual Calculator
A residual is the gap between what you observed and what your model predicted. Use the single-value mode to find one residual quickly, or switch to regression mode to enter up to ten (x, y) data pairs and get the full picture: the fitted line, each residual, the sum of squared errors, R-squared, RMSE and MAE. All results update as you type.
Formula
Worked example
For observed y = 52 and predicted y-hat = 48: residual e = 52 - 48 = 4. For a regression with x = [1,2,3,4,5] and y = [2.1,3.9,6.2,7.8,9.5], least-squares gives slope b = 1.86, intercept a = 0.22, so the fitted line is y-hat = 0.22 + 1.86x. The residual at x = 3, y = 6.2 is 6.2 - (0.22 + 1.86 x 3) = 6.2 - 5.80 = 0.40.
What is a residual?
A residual (also called a fitting error) is the difference between what you actually observed and what your statistical model predicted for the same point. The formula is e = y - y-hat, where y is the observed value and y-hat is the predicted value. A positive residual means the observation is above the model line (the model under-predicted), and a negative residual means the observation is below the line (the model over-predicted). Residuals are the raw material for judging whether a regression model fits well: a well-fitted model has residuals that are small and scattered randomly around zero with no systematic pattern.
How this calculator works
Choose single-value mode to find one residual quickly: enter the observed value and the predicted value from your model and the calculator returns their difference. Switch to regression mode to enter up to ten (x, y) coordinate pairs as comma-separated lists. The calculator fits a least-squares line, computes the predicted value and residual for every point, and reports the slope and intercept, the residual table, SSE, SSR, SST, R-squared, RMSE and MAE. Results update instantly so you can experiment with different data sets.
Sum of squared errors, R-squared, RMSE and MAE explained
The Sum of Squared Errors (SSE) adds up each squared residual: it is the total unexplained variation in your data. The Total Sum of Squares (SST) is how much the observed values vary around their own mean. The Regression Sum of Squares (SSR = SST - SSE) is the variation the model accounts for. R-squared (also called the coefficient of determination) divides SSR by SST, giving a proportion between 0 and 1: 0.90 means the line explains 90 percent of the variance in y. RMSE (Root Mean Squared Error) is the square root of SSE/n, expressed in the same units as y, and represents the typical prediction error. MAE (Mean Absolute Error) is the average of the absolute residuals, which is less sensitive to large outliers than RMSE.
Using residuals to check your model
Looking at the pattern of residuals is more informative than looking at R-squared alone. If residuals fan out as x increases (heteroscedasticity), your errors are not constant and a transformation or weighted regression may help. A curved pattern in the residuals suggests a non-linear term is needed. Residuals that cluster into groups by an unmodeled variable suggest adding that variable. Formal tests such as the Breusch-Pagan test (for heteroscedasticity) and the Durbin-Watson test (for autocorrelation in time-series data) build on the raw residuals this calculator provides.
R-squared interpretation guide
| R-squared range | Fit quality | Interpretation |
|---|---|---|
| 0.90 to 1.00 | Strong | Model explains most of the variation; very close fit |
| 0.70 to 0.89 | Moderate | Reasonable fit; some unexplained variation remains |
| 0.50 to 0.69 | Weak-moderate | Model captures a trend but many residuals are large |
| 0.00 to 0.49 | Poor | Little variation explained; consider a different model |
Common benchmarks for evaluating goodness of fit in simple linear regression.
Frequently asked questions
What does a negative residual mean?
A negative residual means the observed value is smaller than the predicted value: your model over-predicted at that point. If most residuals are negative in a particular region of x, the model is consistently too high there, which is a sign the fit could be improved.
What is the difference between a residual and an error?
In regression, the word "error" usually refers to the unobservable true difference between y and the population regression line, while "residual" refers to the difference between y and the fitted sample line. Residuals are what you can actually calculate from your data; errors are theoretical quantities in the underlying statistical model.
What is a good R-squared value?
It depends on the field. In physical sciences an R-squared of 0.99 is common, while in social sciences 0.50 can be considered acceptable because human behaviour is harder to predict. Rather than judging R-squared in isolation, check whether the residuals look random and whether the model makes theoretical sense.
Why do we square the residuals when computing SSE?
Squaring serves two purposes: it makes every term positive so that positive and negative residuals do not cancel each other out, and it penalises large errors more than small ones. The result is that the least-squares line minimises SSE, giving the best possible fit under the assumption of normally distributed errors.
What is the difference between RMSE and MAE?
RMSE squares the residuals before averaging, so it gives extra weight to large errors. MAE simply averages the absolute values, treating all residuals equally regardless of size. If your data has occasional large outliers you should compare both: a much larger RMSE than MAE signals that a few large errors are driving the RMSE up.
How many data points do I need for linear regression?
You need at least two points to define a line, but two points always produce a perfect fit with R-squared = 1 and zero residuals, which tells you nothing useful. As a practical minimum, aim for at least 10 to 20 observations so that the estimates of slope and intercept are stable and the residual pattern is meaningful.