Skip to content
Statistics

T-Test Calculator

This t-test calculator covers all three major forms of the test: one-sample, two-sample independent (with Welch correction for unequal variances), and paired. Enter your summary statistics, pick the tail direction and significance level, and get the t statistic, exact p-value, critical value, confidence interval, and Cohen's d effect size with a full show-your-work panel.

Your details

One-sample: compare a sample mean to a fixed value. Two-sample: compare two independent group means. Paired: compare before/after on the same subjects.
Two-tailed tests for any difference. One-tailed tests for a specific direction.
The probability threshold below which you reject the null hypothesis. 0.05 is the most common choice.
The average of your observed sample.
The reference value you are testing against (null hypothesis value).
The sample standard deviation computed with n-1 in the denominator (most software defaults to this).
Number of observations (must be at least 2).
t statisticNot statistically significant (p >= 0.10)
1.3693
p-value0.1814
Degrees of freedom (df)29
Critical value |t*|-2.0395
Standard error0.073
95% CI lower bound0.2489
95% CI upper bound-0.0489
Cohen's d (effect size)0.25
0.1814 p
p < 0.001<0.001p < 0.010.001-0.01p < 0.050.01-0.05p < 0.100.05-0.1Not sig.0.1+

One-sample t-test: t = 1.3693, p = 0.1814.

  • The two-tailed p-value of 0.1814 is above your significance threshold, so there is insufficient evidence to reject the null hypothesis.
  • Cohen's d = 0.250 indicates a small practical effect size (small < 0.2, medium < 0.5 < 0.8, large).
  • With 29.0 degrees of freedom, compare |t| = 1.3693 against the critical value |t*| = -2.0395 at your chosen alpha.
  • The 05% confidence interval for the sample-minus-hypothesized-mean is [0.2489, -0.0489].

Next stepConsider whether the study was adequately powered, or whether a larger sample size is needed to detect the effect.

Formula

One-sample:  t=xˉμ0s/n,df=n1Two-sample  (Welch):  t=xˉ1xˉ2s12/n1+s22/n2,dfWelchPaired:  t=dˉμ0sd/n,df=n1One\text{-}sample:\; t = \dfrac{\bar{x} - \mu_0}{s/\sqrt{n}}, \quad df = n-1\\[6pt]Two\text{-}sample\;(Welch):\; t = \dfrac{\bar{x}_1 - \bar{x}_2}{\sqrt{s_1^2/n_1 + s_2^2/n_2}}, \quad df_{\text{Welch}}\\[6pt]Paired:\; t = \dfrac{\bar{d} - \mu_0}{s_d/\sqrt{n}}, \quad df = n-1

Worked example

One-sample example: x̄ = 5.1, μ₀ = 5, s = 0.4, n = 30. SE = 0.4/sqrt(30) = 0.0730, t = 0.1/0.0730 = 1.369, df = 29, p (two-tailed) = 0.1816. Cohen's d = 0.1/0.4 = 0.25 (small effect).

Which t-test should you use?

A one-sample t-test compares a single sample mean against a fixed reference value, such as a known standard, a historical average, or a regulatory threshold. Use this when you have one group and one target number. A two-sample independent t-test compares the means of two separate, unrelated groups: for example, a treatment group versus a control. When the two group variances may differ, the Welch correction adjusts the degrees of freedom downward so the test remains valid without assuming equal spread. A paired t-test is the right choice when each observation in one condition is matched to a specific observation in the other, such as measurements taken on the same person before and after an intervention. Pairing removes between-subject variation from the error term and gives the test more power when the pairing is real.

Reading the t statistic and p-value

The t statistic measures how many standard errors your observed difference sits away from the null hypothesis value. A t of zero means your data exactly match the null; larger absolute values represent stronger departures. The p-value translates t and the degrees of freedom into a probability: it is the chance of observing a t as extreme or more extreme than yours if the null hypothesis were true. It is not the probability that the null is true. The conventional threshold of 0.05 means you accept a 5% false-positive rate. For confirmatory research or medical applications, 0.01 or even 0.001 is more appropriate. The critical value |t*| in the table is the minimum absolute t needed to cross the chosen alpha; if |t| exceeds it, the result is significant.

Effect size and confidence intervals

A statistically significant result can accompany a trivially small real-world effect, especially with large samples. Cohen's d captures practical significance independently of sample size: it divides the observed mean difference by the pooled standard deviation. By convention, d below 0.2 is negligible, 0.2 to 0.5 is small, 0.5 to 0.8 is medium, and above 0.8 is large. The confidence interval anchors the result in the original measurement units. A 95% CI that excludes zero is equivalent to a two-tailed significant result at alpha = 0.05, but it also shows the plausible range of the true effect. Reporting t, p, df, Cohen's d, and the confidence interval together gives a complete and reproducible statistical summary.

Assumptions and when to check them

All three t-tests assume observations are independent of each other (except within the pairs of a paired test, which are intentionally dependent). The one-sample and two-sample tests assume the underlying population is roughly normal, or that the sample is large enough for the central limit theorem to apply. Outliers and extreme skew can distort results, especially with small n, so it is good practice to inspect a histogram or Q-Q plot before running the test. The two-sample equal-variance variant additionally assumes homoscedasticity: that both groups have the same population variance. Because violating this assumption inflates the false-positive rate, the Welch correction is the safer default. When normality is badly violated, consider the Wilcoxon signed-rank test (one-sample or paired) or the Mann-Whitney U test (two-sample) as nonparametric alternatives.

Common critical t values (two-tailed)

dfalpha = 0.10alpha = 0.05alpha = 0.01alpha = 0.001
16.314 12.706 63.657636.619
22.92 4.303 9.92531.599
52.015 2.571 4.0326.869
101.812 2.228 3.1694.587
201.725 2.086 2.8453.85
301.697 2.042 2.753.646
601.671 2 2.663.46
inf (z)1.645 1.96 2.5763.291

Reject the null when |t| exceeds the critical value for your df and alpha. For one-tailed tests use the adjacent alpha column.

Frequently asked questions

What is the difference between a one-tailed and two-tailed t-test?

A two-tailed test asks whether the mean is different from the reference value in either direction and splits the significance level equally between both tails of the distribution. A one-tailed test focuses the entire alpha on one direction: left-tailed tests whether the mean is significantly less than the reference, right-tailed tests whether it is significantly greater. One-tailed tests are statistically more powerful in the hypothesized direction but inappropriate unless you have a strong prior reason to rule out differences in the opposite direction. If in doubt, use two-tailed.

Why does this calculator use the Welch correction by default for two-sample tests?

The classic Student's two-sample t-test assumes both groups have the same population variance. When that assumption is violated, the test can have an inflated false-positive rate. The Welch correction adjusts the degrees of freedom using the Welch-Satterthwaite equation, which accounts for differing variances without requiring you to assume they are equal. Simulation studies show that the Welch test performs at least as well as the pooled test even when variances are equal, and much better when they are not, making it the recommended default in most statistical guidelines including those of the American Psychological Association.

How do I interpret Cohen's d?

Cohen's d is the mean difference expressed in standard deviation units. A value of 0.5 means the two means are half a standard deviation apart. Jacob Cohen's original benchmarks (small 0.2, medium 0.5, large 0.8) are rough guides: real-world interpretation should account for the domain, measurement precision, and practical consequences of the effect. Effect sizes matter most when sample sizes are large enough to detect trivially small differences at p < 0.05, or when comparing studies with different sample sizes.

What is a paired t-test and when should I use it?

A paired t-test is a one-sample test applied to the within-subject differences. You compute the difference for each matched pair, then test whether the mean of those differences differs significantly from zero (or another null value). Use it when observations come in natural pairs: the same person measured twice, twin studies, left-eye versus right-eye, or matched case-control designs. Because pairing removes between-subject variability from the error, a paired test is more sensitive than an independent two-sample test when the pairing is genuine.

Why does the calculator need at least two observations?

Degrees of freedom equal n minus 1, so with only one observation there would be zero degrees of freedom and no way to estimate variability. A meaningful sample standard deviation also requires at least two values. With n below 2 the t-test is undefined, so the calculator returns no result until you enter a valid sample size.

What is the p-value and what does it not tell me?

The p-value is the probability of obtaining a test statistic at least as extreme as the one computed, assuming the null hypothesis is exactly true. It is not the probability that the null hypothesis is true, nor the probability of making an error if you reject it. A p-value below your alpha threshold is evidence against the null, but it says nothing about the size or practical importance of the effect. Always pair the p-value with an effect size and confidence interval for a complete report.

Sources

Written by Dr. Hannah Brandt, PhD Statistician · Munich, Germany

Applied statistician translating rigorous probability theory into clear, accurate tools for researchers and practitioners.

Search 3,500+ calculators

Loading search…