Polynomial Regression Calculator
Enter your x and y data points (one pair per line, comma-separated), choose a polynomial degree from 1 (linear) to 6 (sextic), and this calculator fits the best curve by least squares. You get the polynomial equation with coefficients, R-squared, adjusted R-squared, RMSE, a full table of fitted values and residuals, and a chart showing your data alongside the fitted curve. You can also enter any x-value to predict a new y.
What is polynomial regression?
Polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as an n-th degree polynomial. Unlike simple linear regression, which fits a straight line, polynomial regression can capture curves and bends in the data. It is still a form of linear regression because the model is linear in the unknown coefficients, even though it is nonlinear in x. The technique is used in physics, engineering, economics, biology, and any field where the data follow a curved pattern rather than a straight line.
How polynomial regression is computed
The calculator uses ordinary least squares (OLS) to find the polynomial coefficients that minimise the sum of squared residuals between the observed y-values and the values predicted by the polynomial. This is done via the normal equations: (X^T X) c = X^T y, where X is the Vandermonde design matrix. The system is solved using Gaussian elimination with partial pivoting, which is numerically stable for the degree range supported here (1-6). You need at least (degree + 1) data points to fit a unique polynomial. R-squared measures how well the polynomial explains the variance in y; values above 0.95 generally indicate a very good fit. Adjusted R-squared penalises the addition of extra terms and is the better metric when comparing polynomials of different degrees.
Choosing the right degree
Selecting the polynomial degree is the key modelling decision. A degree that is too low (underfitting) misses important curvature in the data. A degree that is too high (overfitting) passes through noise and performs poorly on new data even though R-squared is high on the training set. A good approach is to start at degree 2 and increase it until adjusted R-squared levels off or decreases. The residual plot is also informative: if residuals show a systematic curve, a higher degree is warranted. With fewer than 20 data points, degrees above 3 or 4 are rarely justified. For smooth interpolation across a large range of x, consider whether a spline or other piecewise approach would serve better than a high-degree global polynomial.
Interpreting R-squared and RMSE
R-squared (the coefficient of determination) tells you the fraction of the total variance in y that the fitted polynomial accounts for. An R-squared of 0.95 means 95% of the variation is explained. However, R-squared always increases or stays the same when you add more terms, even if those terms are not meaningful. Adjusted R-squared corrects for this by penalising degrees of freedom used by extra coefficients; it can decrease if an added term does not improve fit enough. RMSE (root mean square error) is the square root of the average squared residual and is in the same units as y. It tells you the typical prediction error of the model on the training data. Use RMSE alongside R-squared for a full picture of fit quality.
Polynomial Degree Guide
| Degree | Name | Equation form | Typical use |
|---|---|---|---|
| 1 | Linear | y = a0 + a1*x | Straight-line trends, proportional relationships |
| 2 | Quadratic | y = a0 + a1*x + a2*x^2 | Parabolic shapes, projectile motion, cost curves |
| 3 | Cubic | y = a0 + ... + a3*x^3 | S-shaped growth, economic cycles |
| 4 | Quartic | y = a0 + ... + a4*x^4 | Double-hump distributions, complex oscillations |
| 5 | Quintic | y = a0 + ... + a5*x^5 | High-precision curve matching, engineering splines |
| 6 | Sextic | y = a0 + ... + a6*x^6 | Very complex curves; use sparingly to avoid overfitting |
Typical uses for each polynomial degree in regression analysis.
Frequently asked questions
How many data points do I need?
You need at least (degree + 1) points to fit a polynomial of that degree: 2 points for degree 1, 3 for degree 2, and so on. In practice, you want considerably more than the minimum for a statistically meaningful fit. As a rough guide, aim for at least 5 times (degree + 1) data points before trusting R-squared values.
What format should my data be in?
Enter one (x, y) pair per line. Separate x and y with a comma, a space, or a tab. For example: "1, 2.5" or "1 2.5". Lines with missing or non-numeric values are ignored automatically, so you can paste directly from a spreadsheet.
Why is my R-squared perfect (1.0) but the fit looks wrong?
If the number of data points equals the polynomial degree plus one, the polynomial passes exactly through every point and R-squared is exactly 1. This is interpolation, not regression. For a meaningful fit that generalises to new data, use substantially more points than the degree requires.
When should I increase the polynomial degree?
Increase the degree if the residuals (observed minus fitted) show a clear systematic pattern, such as a curve or wave shape. A random scatter of residuals around zero indicates a good fit at the current degree. Increasing the degree when residuals are already random will overfit the data.
Can I use this for extrapolation beyond my data range?
You can enter any x-value in the prediction field and get a y-value, but extrapolation beyond the data range is unreliable for high-degree polynomials. Polynomials can diverge rapidly outside the fitted range. If extrapolation is important, use the lowest degree that fits well or consider a mechanistic model based on the underlying process.
What is the difference between R-squared and adjusted R-squared?
R-squared always increases (or stays the same) when you add a polynomial term, even if that term does not help. Adjusted R-squared adjusts for the number of terms and can decrease if a new term does not improve fit proportionally. When comparing models of different degrees, use adjusted R-squared: the degree with the highest adjusted R-squared is generally the best choice.