Skip to content
Statistics

Correlation Coefficient Calculator

Find the Pearson r, Spearman rho, or Kendall tau-b correlation between two sets of paired numbers. Enter comma-separated x and y values to get the correlation coefficient, p-value, t-statistic, degrees of freedom, 95% confidence interval, and the least-squares regression equation.

Your details

Enter your first variable. Each value pairs with the value at the same position in the Y list.
Enter your second variable. It must have the same number of values as X.
Pearson r measures linear association. Spearman rho ranks the data first and is more robust to outliers. Kendall tau-b is best for small samples with many tied ranks.
The threshold p-value below which the correlation is deemed statistically significant. The 95% level (alpha = 0.05) is the standard in most fields.
Display the least-squares best-fit line equation y = mx + b alongside the correlation coefficient.
Correlation coefficientStrong positive correlation
0.8528
Strengthstrong positive correlation
r² (coefficient of determination)0.7273
Pairs (n)5
t-statistic2.8284
Degrees of freedom3
p-value (two-tailed)0.1892
Significant?No (p >= 0.05)
95% CI lower bound-0.119
95% CI upper bound0.9901
Covariance (population)1.6
Regression slope (m)0.8
Regression intercept (b)1.8
Regression equationy = 0.8x + (1.8)
0.8528
Very strong -<-0.9Strong --0.9--0.7Moderate --0.7--0.4Weak --0.4--0.1None-0.1-0.1Weak +0.1-0.4Moderate +0.4-0.7Strong +0.7-0.9Very strong +0.9+

r = 0.8528, a strong positive correlation.

  • The sign of the coefficient shows direction: as x rises, y tends to rise. The magnitude (0 to 1) shows how tightly the points track a straight line.
  • r² = 0.7273, meaning about 72.7% of the variation in y is statistically associated with x via this linear fit.
  • The p-value is 0.1892: No (p >= 0.05). A small p-value means the observed correlation is unlikely under the null hypothesis of no association.
  • Correlation is not causation. A high coefficient can arise from a confounding variable, reversed causality, or coincidence. Establish causation through controlled experiments or careful study design.

Next stepUse the regression equation y = 0.8x + (1.8) to predict y for any x. Plot a scatter chart to confirm the relationship looks linear.

Formula

r=(xixˉ)(yiyˉ)(xixˉ)2(yiyˉ)2,t=rn21r2,df=n2r = \dfrac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum (x_i-\bar{x})^2}\,\sqrt{\sum (y_i-\bar{y})^2}}, \quad t = r\,\sqrt{\dfrac{n-2}{1-r^2}}, \quad df = n-2

Worked example

For x = 1, 2, 3, 4, 5 and y = 2, 4, 5, 4, 6: means are x̅ = 3 and y̅ = 4.2. The covariance numerator Σ(x-x̅)(y-y̅) = 8. Denominator = √(10 × 8.8) ≈ 9.381. So r = 8 / 9.381 ≈ 0.8528, a strong positive correlation. t = 0.8528 × √3 / √(1-0.7273) ≈ 2.745 on 3 df, giving p ≈ 0.07.

What the correlation coefficient measures

A correlation coefficient is a single number between -1 and +1 that summarises how consistently two variables move together. The sign encodes direction: positive means that as one variable rises the other tends to rise too, negative means they move in opposite directions, and zero means there is no consistent linear or monotonic pattern. The magnitude tells you how tightly they track each other: values near 1 or -1 indicate a close relationship, while values near 0 indicate a loose or absent one. This calculator supports three coefficients. Pearson r measures the strength of a strictly linear relationship and requires interval or ratio-level data. Spearman rho ranks the data first and then applies the Pearson formula to the ranks, making it suitable for ordinal data and more resistant to outliers. Kendall tau-b counts concordant and discordant pairs directly and handles tied ranks well, making it a good choice for small samples.

Significance testing and the p-value

A correlation coefficient estimated from a sample may look non-zero simply by chance. The t-test converts r into a t-statistic using t = r x sqrt(n-2) / sqrt(1-r^2) and compares it against a t-distribution with n-2 degrees of freedom to produce a two-tailed p-value. If the p-value falls below your chosen significance level (typically 0.05), you reject the null hypothesis that the true population correlation is zero. The 95% confidence interval uses Fisher's z-transformation: r is converted to z = 0.5 ln((1+r)/(1-r)), a symmetric interval is built at z plus or minus 1.96/sqrt(n-3), then back-transformed to the r scale. Both p-values and confidence intervals are provided for Pearson and Spearman; they are omitted for Kendall tau-b because that coefficient requires a different approximation not yet implemented here.

The regression equation and r squared

When you enable the regression equation for Pearson r, the calculator fits a least-squares line y = mx + b through your data. The slope m equals the covariance of x and y divided by the variance of x, and the intercept b is chosen so the line passes through the point (x-bar, y-bar). Squaring r gives r-squared (the coefficient of determination), the proportion of variance in y that is linearly explained by x. An r of 0.85 looks impressive, but r-squared is only 0.72, meaning 28% of the variation in y remains unexplained by x alone. Regression and r-squared apply strictly to linear relationships: if your scatter plot looks curved, Pearson r and this line will both mislead.

Choosing the right coefficient and avoiding common pitfalls

Use Pearson r when both variables are measured on interval or ratio scales and a scatter plot confirms a roughly linear cloud of points. Switch to Spearman rho when your data are ordinal (e.g. survey ratings or rankings), when outliers are a concern, or when the relationship looks monotonic but not strictly linear. Prefer Kendall tau-b when the sample is small or contains many tied values, because it has better statistical properties in those situations. Whichever coefficient you choose, remember three traps. First, correlation is not causation: a strong coefficient can arise from a hidden third variable, reversed causality, or pure coincidence. Second, r only captures monotonic or linear associations depending on the method: two variables can be strongly related in a U-shaped or cyclical way and still produce a near-zero coefficient. Third, outliers can inflate or deflate Pearson r dramatically: always plot your data on a scatter chart before drawing conclusions.

Interpreting the size of a correlation coefficient

|r| rangeStrengthExample contexts
0.90-1.00 Very strong Instrument re-test reliability, physical measurement
0.70-0.89 Strong Validated psychometric scales, biomarker pairs
0.40-0.69 Moderate Social science constructs, health risk factors
0.10-0.39 Weak Large observational studies, distal predictors
0.00-0.09 Negligible No meaningful linear or monotonic association

Evans (1996) scale, widely used in social and health sciences. The sign (+/-) shows direction only, not strength. Significance depends on both the size of r and the sample size n.

Frequently asked questions

What is the difference between Pearson, Spearman, and Kendall correlation?

All three measure how consistently two variables move together, but they differ in assumptions and sensitivity. Pearson r quantifies the strength of a linear relationship and requires continuous, roughly normally distributed data. Spearman rho ranks both variables first, then applies the Pearson formula to the ranks; it works with ordinal data and is more resistant to outliers. Kendall tau-b counts pairs of observations and asks how often the ranking of one variable agrees with the ranking of the other; it handles ties well and has better statistical properties in small samples. If your data are continuous and the scatter plot looks linear, use Pearson. If you have ordinal data, skewed distributions, or outliers, use Spearman. For small samples with many ties, Kendall tau-b is often the most reliable choice.

How do I interpret the p-value from this calculator?

The p-value answers the question: if the true population correlation were exactly zero, how likely is it to observe a sample coefficient at least this far from zero just by chance? A p-value below your chosen significance level (commonly 0.05) means you reject that null hypothesis and conclude the observed correlation is statistically significant. A p-value above the threshold does not prove the correlation is zero; it only means you lack strong evidence to reject zero with this sample size. With very large samples, even a tiny r (say 0.05) can reach significance because sampling error is small, yet the association may be practically meaningless. Always report both the coefficient size and the p-value together.

What does the 95% confidence interval tell me?

The 95% confidence interval gives a range of plausible values for the true population correlation. If you repeated the study 100 times, approximately 95 of those intervals would contain the true r. A narrow interval means your estimate is precise; a wide interval (common with small samples) means there is substantial uncertainty about where the true correlation lies. The interval is computed via Fisher's z-transformation and is only shown for Pearson r. If the interval does not include zero, the result is significant at the 5% level, consistent with the p-value.

Why does my correlation come out as blank or undefined?

You need at least three paired values in each list (some methods require more for a meaningful test). If every value in one variable is identical, that variable has zero variance and r is mathematically undefined. Make sure both lists have the same length, that all values are numbers (no letters or extra commas), and that both variables actually vary across the dataset.

Can the correlation coefficient detect any kind of relationship?

No. Pearson r only detects linear associations: a perfect U-shaped curve can return an r of zero because the upward and downward slopes cancel out. Spearman and Kendall detect monotonic relationships (consistently increasing or decreasing) but will also miss patterns that reverse direction. For non-monotonic relationships, you may need a different measure such as distance correlation or mutual information. This is why plotting your data on a scatter chart before interpreting any coefficient is so important: the plot will reveal patterns the numbers alone cannot.

How is the regression equation calculated?

The least-squares regression line y = mx + b is computed from the same sums used for Pearson r. The slope m equals the covariance of x and y divided by the variance of x, and the intercept b equals the mean of y minus the slope times the mean of x. This ensures the line passes through the centroid (x-bar, y-bar) of the data cloud. The line is only valid for predicting y from x within the range of x values you observed; extrapolating beyond that range can be unreliable.

Sources

Written by Dr. Hannah Brandt, PhD Statistician · Munich, Germany

Applied statistician translating rigorous probability theory into clear, accurate tools for researchers and practitioners.

Search 3,500+ calculators

Loading search…