Pearson Correlation Calculator
Paste or type your paired X and Y data (separated by commas or spaces) and get the Pearson correlation coefficient r, its R-squared, t-statistic, two-tailed p-value, 95 percent confidence interval, and a plain-English interpretation of the strength and direction. All calculations update instantly in your browser with a full step-by-step breakdown.
Formula
Worked example
With X = [2, 4, 5, 6, 8] and Y = [4, 7, 8, 10, 14]: mean(X) = 5, mean(Y) = 8.6, cov = 7.6, SD(X) = 2.236, SD(Y) = 3.715, r = 7.6 / (2.236 * 3.715) = 0.9147. With n = 5, t = 0.9147 * sqrt(3) / sqrt(1 - 0.8367) = 3.94, p = 0.029, significant at the 0.05 level.
What is the Pearson correlation coefficient?
The Pearson correlation coefficient (r) measures the strength and direction of the linear relationship between two continuous variables. It was developed by Karl Pearson in 1895, building on earlier work by Francis Galton, and remains the most widely used measure of association in statistics. The value of r always falls between -1 and +1 inclusive. A value of +1 means a perfect positive linear relationship (both variables increase together in exact proportion), a value of -1 means a perfect negative linear relationship (one increases as the other decreases in exact proportion), and a value of 0 means no linear association. The sign tells you the direction; the absolute value tells you the strength. Because r only captures linear patterns, two datasets can have the same r but very different scatter plots (Anscombe's quartet is a classic demonstration of this), so always inspect a scatter plot alongside the coefficient.
What does r-squared mean?
R-squared (r^2) is the square of the Pearson coefficient and represents the proportion of variance in Y that is statistically explained by X. For example, if r = 0.80, then r^2 = 0.64, meaning that 64% of the variability in Y is accounted for by its linear relationship with X. The remaining 36% is due to other factors or random variation. R-squared is also the key output of a simple linear regression model on the same data, so it bridges correlation and regression analysis. A high r^2 does not mean the model is correct, it only means the line fits the data well within your sample.
How to interpret the p-value and significance
The t-statistic and the associated two-tailed p-value test the null hypothesis that the true population correlation (rho) is zero. If the p-value is below your chosen significance level (commonly 0.05), you reject the null hypothesis and conclude that the correlation is statistically significant, meaning it is unlikely to have arisen by chance alone. However, statistical significance is not the same as practical significance: with a large enough sample (n = 1000), even r = 0.07 becomes significant, yet that relationship explains less than 0.5% of the variance. Always report both r and the p-value, and consider the effect size (r itself) alongside significance. Small samples (n < 10) produce unstable estimates: a single outlier pair can shift r dramatically.
95% confidence interval for r (Fisher z-transformation)
Because r is bounded between -1 and +1, its sampling distribution is skewed for values far from zero. Fisher's z-transformation converts r to a value with an approximately normal distribution: z = 0.5 * ln((1+r)/(1-r)), with standard error 1/sqrt(n-3). A 95% confidence interval is constructed in z-space, then back-transformed to the r scale. The resulting CI tells you the plausible range for the true population correlation rho. Wider intervals (seen with small n or r near +/-1) indicate more uncertainty. This calculator requires n >= 4 to produce a finite confidence interval.
Assumptions and when NOT to use Pearson r
Pearson r is designed for data that are: (1) continuous and measured on an interval or ratio scale; (2) approximately bivariate normally distributed; (3) related linearly (not curved); and (4) free from heavy outliers that distort the mean. If your data are ordinal rankings, contain severe outliers, or show a clear non-linear pattern (a curve, a U-shape), consider Spearman rank correlation or Kendall tau instead. Pearson r also assumes independence of observations: repeated-measures or hierarchical data require different approaches. Never extrapolate a correlation found in one range of data to a wider range without additional evidence.
Pearson r strength guidelines (Cohen 1988 / Evans 1996)
| |r| range | Strength | Interpretation |
|---|---|---|
| 0.00 - 0.09 | None | No meaningful linear association |
| 0.10 - 0.29 | Weak | Small effect; relationship exists but is not strong |
| 0.30 - 0.49 | Moderate-weak | Moderate-small effect; noticeable but limited |
| 0.50 - 0.69 | Moderate | Moderate effect; practical significance likely |
| 0.70 - 0.89 | Strong | Large effect; meaningful linear association |
| 0.90 - 1.00 | Very strong | Very large effect; near-perfect linear association |
Conventional thresholds for interpreting the magnitude of a Pearson correlation. Direction (positive or negative) is assessed separately.
Frequently asked questions
What is a "good" Pearson r value?
There is no universal cutoff because it depends entirely on the field and research question. In physics or engineering, r below 0.99 might be considered poor. In psychology or social science, r = 0.30 can be a meaningful finding. Use Cohen's (1988) benchmarks as a starting point: r around 0.10 is small, 0.30 is medium, and 0.50 is large. Always interpret r in the context of your subject matter and consider r-squared to understand the practical proportion of explained variance.
How many data points do I need?
You need at least 3 pairs to compute r, but 3 points almost always yield either r = 1, r = -1, or r near those extremes by chance. As a practical rule, aim for at least 10 pairs before trusting the result, and at least 30 pairs for stable estimates. For formal significance testing, a power analysis is the right tool: at r = 0.50 with alpha = 0.05 and 80% power you need about 29 pairs.
What is the difference between Pearson r and Spearman's rho?
Pearson r measures the strength of the LINEAR relationship between two continuous variables, using the actual numeric values. Spearman's rho converts both variables to ranks first, then computes Pearson r on those ranks. This makes Spearman robust to outliers and valid for ordinal data or non-linear monotonic relationships. If your data are ordinal (e.g. survey ratings), heavily skewed, or clearly non-linear, prefer Spearman. If the data are continuous, roughly normal, and the relationship looks linear on a scatter plot, use Pearson.
Can Pearson r be used to prove causation?
No. A correlation, however strong, cannot prove that one variable causes another. Both variables might be driven by a third (confounding) variable, the relationship might be coincidental, or the causation might run in the opposite direction. To establish causation you need a well-designed experiment with random assignment, or a rigorous quasi-experimental design with controls for confounders.
What is the t-statistic used for in this calculator?
The t-statistic is derived from r and n, and it follows a t-distribution with n-2 degrees of freedom under the null hypothesis that rho = 0. It is used to compute the p-value: how surprising would this r (or one more extreme) be if the true correlation were zero? Large |t| (and thus small p) means the data would be very unusual if there were no true correlation, so you reject the null hypothesis.