Math

Least Squares Regression Line Calculator

Q: What is the difference between the regression line and a trend line?

They refer to the same thing in most everyday contexts. A "trend line" is the informal term often used in spreadsheet software for the line drawn through a scatter plot, while "least squares regression line" or "line of best fit" is the statistical term. Both describe the straight line that minimizes the sum of squared residuals.

Q: How many data points do I need?

Technically only two points are needed to define a line, but with two points the line passes through both exactly and R² = 1, which tells you nothing about real variation. In practice, use at least five to ten pairs for any meaningful inference, and more for reliable predictions.

Q: What does R-squared actually mean?

R-squared (R²) is the proportion of the total variance in y that is explained by the linear relationship with x. An R² of 0.80 means 80% of the variation in y is accounted for by the line; the remaining 20% is due to other factors or random noise. It does not tell you whether the linear model is the right model, only how tightly the data cluster around the line that was fitted.

Q: Can I use the regression line for predictions outside my data range?

This is called extrapolation, and it carries risk. The LSRL describes the relationship within the range of your observed x-values. Beyond that range, the relationship may no longer be linear, and predictions can be far off. Use extrapolated predictions cautiously and only when you have good reason to believe the linear pattern continues.

Q: What is the difference between r and R-squared?

r (the Pearson correlation coefficient) measures both the strength and the direction of the linear relationship and ranges from -1 to +1. R² (the coefficient of determination) is simply r squared, so it is always between 0 and 1 and has no direction. R² is easier to communicate as a percentage of variance explained, while r retains sign information and is used in hypothesis testing.

Q: What does the standard error of estimate mean?

The standard error of estimate (SE or S_e) is the typical size of a residual: the typical gap between an observed y and the y predicted by the regression line. It is in the same units as y. A smaller SE means the line fits the data more tightly, and predictions are generally more accurate.

Q: Does correlation mean causation?

No. The LSRL finds the best linear relationship between x and y, but a strong correlation does not prove that x causes y. Both could be caused by a third variable, or the relationship could be coincidental. Causation requires careful experimental design and domain knowledge, not just a high r or R².

Enter your x-values and y-values as comma-separated lists to find the least squares regression line that best fits your data. The calculator gives you the slope, y-intercept, full equation, Pearson correlation coefficient, R-squared, residual sum of squares, and standard error of estimate, plus a step-by-step walkthrough of every intermediate sum. You can also enter an x-value to get an instant predicted y.

By Dr. Rajiv Menon, PhD · Updated June 7, 2026

Regression equationVery strong fit

y = 5.3690x + 47.7143

The least squares line in the form y = mx + b

Slope (m)5.369

Y-intercept (b)47.7143

Correlation (r)0.9983

R-squared (R²)0.9966

Std. error of estimate0.8321

Predicted y74.5595

Number of points (n)8

Sum of x36

Sum of y575

Sum of xy2,813

Sum of x²204

Residual SS4.1548

Total SS1,214.875

0.9966 R²

Very weak fit<0.25Weak fit0.25-0.5Moderate fit0.5-0.7Strong fit0.7-0.9Very strong0.9+

Actual data
Regression line

Regression line fitted to 8 points (R² = 99.7%).

The positive correlation (r = 0.9983) indicates a very strong linear relationship.
R-squared = 99.7%, meaning x explains 99.7% of the variance in y.
For each 1-unit increase in x, y changes by +5.3690 units on average.
Only 8 data points were used. More data improves the reliability of the regression line.

Next stepCheck residual plots to confirm that errors are randomly distributed before using this line for predictions.

Count the data pointsn
8
Calculate the required sumsSum(x) = 36.00, Sum(y) = 575.00, Sum(xy) = 2813.00, Sum(x²) = 204.00
Intermediate sums obtained
Compute the slope (m)m = [n·Sum(xy) - Sum(x)·Sum(y)] / [n·Sum(x²) - Sum(x)²]
m = [8×2813.00 - 36.00×575.00] / [8×204.00 - 36.00²] = 5.3690
Compute the y-intercept (b)b = [Sum(y) - m·Sum(x)] / n
b = [575.00 - 5.3690×36.00] / 8 = 47.7143
Write the regression equationy = mx + b
y = 5.3690x + 47.7143
Pearson correlation coefficient (r)r = [n·Sum(xy) - Sum(x)·Sum(y)] / sqrt{[n·Sum(x²)-Sum(x)²][n·Sum(y²)-Sum(y)²]}
r = 0.9983
Coefficient of determination (R²)R² = r²
R² = 0.9983² = 0.9966
Residual sum of squares (SSRes)SSRes = Sum[(y_i - y_hat_i)²]
SSRes = 4.1548
Predicted y when x = 5y = 5.3690 × 5 + (47.7143)
y = 74.5595

Formula

m = \dfrac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2}, \quad b = \dfrac{\sum y - m\sum x}{n}, \quad r = \dfrac{n\sum xy - \sum x \sum y}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}

Worked example

Hours studied (x): 1, 2, 3, 4, 5. Exam scores (y): 52, 58, 65, 70, 75. n = 5, Sum(x) = 15, Sum(y) = 320, Sum(xy) = 1010, Sum(x²) = 55. Slope m = (5×1010 - 15×320) / (5×55 - 15²) = (5050 - 4800) / (275 - 225) = 250 / 50 = 5.00. Intercept b = (320 - 5×15) / 5 = (320 - 75) / 5 = 245 / 5 = 49.00. Equation: y = 5.00x + 49.00. Predicted score for 6 hours: y = 5×6 + 49 = 79.

What is the least squares regression line?

The least squares regression line (LSRL), also called the line of best fit or ordinary least squares (OLS) line, is the straight line that minimizes the sum of the squared vertical distances between every observed data point and the line itself. Those vertical distances are called residuals, and squaring them ensures that positive and negative deviations do not cancel each other out, while also penalizing large errors more heavily than small ones. The result is a unique line described by two numbers: the slope m, which tells you how much y changes for every one-unit increase in x, and the y-intercept b, which tells you the predicted value of y when x equals zero.

How to use this calculator

Type or paste your x-values and y-values into the two input boxes, separated by commas. Both lists must have the same number of entries and you need at least two pairs. The calculator finds the regression equation instantly and also shows the Pearson correlation coefficient (r), the coefficient of determination (R-squared), the standard error of estimate, and the residual sum of squares. To predict a y-value for a specific x, type it into the "Predict y when x =" field. The step-by-step panel below the results shows every intermediate sum so you can follow the arithmetic by hand.

Understanding slope, intercept, and the equation

The slope m is the rate of change: if m = 5, then each additional unit of x is associated with 5 more units of y on average. A positive slope indicates that y tends to increase as x increases; a negative slope indicates the opposite. The y-intercept b is the value the line predicts for y when x = 0. In some contexts that prediction is meaningful (for example, predicted sales when advertising spend is zero); in others it falls outside the range of observed data and should not be interpreted literally. The full equation y = mx + b lets you substitute any x-value to get a predicted y, though predictions become less reliable far outside the range of your data.

R, R-squared, and standard error: how well does the line fit?

The Pearson correlation coefficient r ranges from -1 to +1. A value near +1 means a strong positive linear relationship; near -1 means strong negative; near 0 means little or no linear relationship. R-squared (r²) is the square of r and is interpreted as the proportion of the variability in y that is explained by the linear relationship with x. For example, R² = 0.85 means 85% of the variation in y is accounted for by x, leaving 15% unexplained by the line. The standard error of estimate (SE) measures the typical size of residuals in the same units as y: a smaller SE means predictions from the line tend to be closer to the actual y-values. Together, r, R², and SE give a complete picture of how well the LSRL describes the data.

Interpreting R (Pearson correlation) and R-squared

\|r\| range	Correlation strength	R² range	Fit quality
0.90 - 1.00	Very strong	0.81 - 1.00	Excellent
0.70 - 0.89	Strong	0.49 - 0.80	Good
0.50 - 0.69	Moderate	0.25 - 0.48	Acceptable
0.30 - 0.49	Weak	0.09 - 0.24	Poor
0.00 - 0.29	Very weak / none	0.00 - 0.08	Negligible

These are general guidelines. Context matters: a lower R² can still be meaningful in social science, while a high R² is expected in precise physical experiments.

Frequently asked questions

What is the difference between the regression line and a trend line?

They refer to the same thing in most everyday contexts. A "trend line" is the informal term often used in spreadsheet software for the line drawn through a scatter plot, while "least squares regression line" or "line of best fit" is the statistical term. Both describe the straight line that minimizes the sum of squared residuals.

How many data points do I need?

Technically only two points are needed to define a line, but with two points the line passes through both exactly and R² = 1, which tells you nothing about real variation. In practice, use at least five to ten pairs for any meaningful inference, and more for reliable predictions.

What does R-squared actually mean?

R-squared (R²) is the proportion of the total variance in y that is explained by the linear relationship with x. An R² of 0.80 means 80% of the variation in y is accounted for by the line; the remaining 20% is due to other factors or random noise. It does not tell you whether the linear model is the right model, only how tightly the data cluster around the line that was fitted.

Can I use the regression line for predictions outside my data range?

This is called extrapolation, and it carries risk. The LSRL describes the relationship within the range of your observed x-values. Beyond that range, the relationship may no longer be linear, and predictions can be far off. Use extrapolated predictions cautiously and only when you have good reason to believe the linear pattern continues.

What is the difference between r and R-squared?

r (the Pearson correlation coefficient) measures both the strength and the direction of the linear relationship and ranges from -1 to +1. R² (the coefficient of determination) is simply r squared, so it is always between 0 and 1 and has no direction. R² is easier to communicate as a percentage of variance explained, while r retains sign information and is used in hypothesis testing.

What does the standard error of estimate mean?

The standard error of estimate (SE or S_e) is the typical size of a residual: the typical gap between an observed y and the y predicted by the regression line. It is in the same units as y. A smaller SE means the line fits the data more tightly, and predictions are generally more accurate.

Does correlation mean causation?

No. The LSRL finds the best linear relationship between x and y, but a strong correlation does not prove that x causes y. Both could be caused by a third variable, or the relationship could be coincidental. Causation requires careful experimental design and domain knowledge, not just a high r or R².

Sources

Was this calculator helpful?

Written by Dr. Rajiv Menon, PhD Applied Mathematician · Bengaluru, India

Applied mathematician bridging algebraic theory and computational tools for students, engineers, and everyday problem-solvers.

How we build & check our calculators