Skip to content
Statistics

P-Value Calculator

Convert any test statistic into a p-value. Choose the distribution (Z, t, chi-square, or F), the tail direction, and your significance level. The calculator shows the exact p-value, flags whether the result is significant, and walks through every calculation step.

Your details

Choose Z for large samples and known variance, t for small samples with unknown variance, chi-squared for goodness-of-fit and contingency tables, and F for ANOVA or regression model comparisons.
Two-tailed tests are the safe default. Use a one-tailed test only when you committed to a direction before collecting data.
The value your test produced: a z-score, t-score, chi-square value, or F-ratio. For chi-square and F, enter a positive number.
z
The threshold at which you will reject the null hypothesis. The 0.05 level is the conventional default in most sciences.
P-valueSignificant (p < 0.05)
0.05
P-value (%)5%
One-tailed p-value0.025
0.05
p < 0.001<0.001p < 0.010.001-0.01p < 0.050.01-0.05p < 0.100.05-0.1Not significant0.1+

A z-score of 1.96 gives a two-tailed p-value of 0.0500, which is significant at alpha = 0.05.

  • The p-value is the probability of observing a test statistic at least as extreme as yours, assuming the null hypothesis is true.
  • Because p (0.0500) is below alpha (0.05), you would reject the null hypothesis at this significance level.
  • A small p-value is evidence against the null, not proof of a large or important effect. Report effect size and confidence intervals alongside the p-value.
  • Two-tailed tests count extremes in both directions and are the conservative default. Switch to one-tailed only when the direction was pre-specified.

Next stepPair this p-value with an effect size (Cohen's d, eta-squared, odds ratio, etc.) so readers can judge practical importance.

Formula

ptwo=2[1Φ(z)],pt=2P(Tdf>t),pχ2=P(χdf2>X2),pF=P(Fdf1,df2>F)p_{\text{two}} = 2\,[1 - \Phi(|z|)], \quad p_{t} = 2\,P(T_{df} > |t|), \quad p_{\chi^2} = P(\chi^2_{df} > X^2), \quad p_{F} = P(F_{df_1,df_2} > F)

Worked example

For a two-tailed z-test with z = 1.96: the upper tail is 1 - Phi(1.96) = 0.0250, so p = 2 x 0.0250 = 0.0500, right at the 0.05 threshold. For a right-tailed t-test with t = 2.228, df = 10: the t-CDF gives 0.9750, so p = 1 - 0.9750 = 0.0250, significant at the 0.05 level.

What a p-value actually measures

A p-value answers one narrow question: if the null hypothesis were true, how often would random sampling alone produce a test statistic at least as extreme as the one you observed? A small p-value means your data would be unusual under the null, which is taken as evidence against it. It is not the probability that the null hypothesis is true, and it is not the probability your result happened "by chance." Those are common misreadings that lead to overconfident conclusions. This calculator converts your test statistic into that tail probability using the appropriate statistical distribution, whether that is the standard normal, Student t, chi-squared, or F.

Choosing the right distribution

Use the Z distribution when your sample is large (roughly n > 30) and the population variance is known, or when testing proportions via the normal approximation. Use the t distribution for means from small samples with unknown variance; you must supply the degrees of freedom, which equals n - 1 for one-sample tests and n1 + n2 - 2 for independent two-sample tests. Use chi-squared for goodness-of-fit tests and contingency table independence tests; degrees of freedom equal the number of categories minus 1, or (rows - 1) x (columns - 1) for a table. Use F for ANOVA and regression model comparisons; it takes two degrees of freedom values, one for the numerator (groups minus 1) and one for the denominator (total observations minus groups).

Left-, right-, and two-tailed tests

A two-tailed test asks whether your statistic deviates from the null in either direction, doubling the single-tail probability. A right-tailed test asks only whether the statistic is larger than expected, which is the natural choice for ANOVA F-tests and chi-squared goodness-of-fit tests where only large values are surprising. A left-tailed test asks whether the statistic is smaller than expected. Choosing a one-tailed test after seeing the direction of your result effectively halves the p-value without statistical justification, which inflates false-positive rates. The two-tailed test is the safe default for any hypothesis where you have not pre-committed to a direction.

Reading the result responsibly

The conventional 0.05 cutoff is a convenience, not a law of nature. A p-value of 0.049 and one of 0.051 describe almost identical evidence. Treat statistical significance as one input into a decision, not a verdict. A highly significant result can still be trivially small in practical terms, and a non-significant result may simply reflect an underpowered study. Always pair the p-value with an effect size (Cohen's d, eta-squared, odds ratio) and a confidence interval. If you run many tests, some will cross the 0.05 threshold purely by chance, so apply a multiple-comparison correction such as Bonferroni or Benjamini-Hochberg when appropriate.

Common critical values by distribution

Distributionalpha = 0.10alpha = 0.05alpha = 0.01
Z1.6451.9602.576
t (df = 10)1.8122.2283.169
t (df = 30)1.6972.0422.750
chi-sq (df = 3, right)6.2517.81511.345
F (df1=3, df2=20, right)2.3803.0984.938

Two-tailed critical values at the most-used significance levels.

Frequently asked questions

What is the difference between a Z-test and a t-test?

A Z-test uses the standard normal distribution and is appropriate when the sample is large enough (roughly n > 30) that the Central Limit Theorem ensures the sampling distribution of the mean is approximately normal, or when the population standard deviation is known. A t-test uses the heavier-tailed Student t distribution, which is appropriate for small samples with unknown variance. As the degrees of freedom increase, the t distribution approaches the standard normal, so for large df the two give almost identical p-values.

When should I use a chi-squared test instead of a Z- or t-test?

Use a chi-squared test when your outcome is categorical rather than continuous. A goodness-of-fit chi-squared test asks whether observed category frequencies match expected ones. A chi-squared test of independence asks whether two categorical variables in a contingency table are associated. Neither question involves a sample mean or a continuous measurement, so Z and t are the wrong tools for those situations.

What are degrees of freedom and why do they matter?

Degrees of freedom (df) represent the number of independent pieces of information in your data that are free to vary once certain constraints are applied. They shape the exact tail probabilities of the t, chi-squared, and F distributions. With fewer degrees of freedom, these distributions have heavier tails, so a given test statistic yields a larger p-value. Enter the correct df for your study design to get accurate results.

Should I use the one-tailed or two-tailed p-value?

Use the two-tailed p-value unless you committed, before collecting data, to testing a specific direction (for example, "the new drug reduces cholesterol, not just changes it"). Two-tailed tests account for extremes in both directions and are the conservative default. Choosing a one-tailed test after seeing which way the result points effectively doubles the false-positive rate, which is why reviewers scrutinise one-tailed claims.

Does a small p-value mean my result is important?

No. A p-value only measures how compatible your data are with the null hypothesis, not how large or meaningful the effect is. With a very large sample, a tiny and practically irrelevant difference can produce an extremely small p-value. Always report an effect size and a confidence interval alongside the p-value so readers can judge real-world importance, not just statistical detectability.

Sources

Written by Dr. Hannah Brandt, PhD Statistician · Munich, Germany

Applied statistician translating rigorous probability theory into clear, accurate tools for researchers and practitioners.

Search 3,500+ calculators

Loading search…