P-Value Calculator
Convert any test statistic into a p-value. Choose the distribution (Z, t, chi-square, or F), the tail direction, and your significance level. The calculator shows the exact p-value, flags whether the result is significant, and walks through every calculation step.
Formula
Worked example
For a two-tailed z-test with z = 1.96: the upper tail is 1 - Phi(1.96) = 0.0250, so p = 2 x 0.0250 = 0.0500, right at the 0.05 threshold. For a right-tailed t-test with t = 2.228, df = 10: the t-CDF gives 0.9750, so p = 1 - 0.9750 = 0.0250, significant at the 0.05 level.
What a p-value actually measures
A p-value answers one narrow question: if the null hypothesis were true, how often would random sampling alone produce a test statistic at least as extreme as the one you observed? A small p-value means your data would be unusual under the null, which is taken as evidence against it. It is not the probability that the null hypothesis is true, and it is not the probability your result happened "by chance." Those are common misreadings that lead to overconfident conclusions. This calculator converts your test statistic into that tail probability using the appropriate statistical distribution, whether that is the standard normal, Student t, chi-squared, or F.
Choosing the right distribution
Use the Z distribution when your sample is large (roughly n > 30) and the population variance is known, or when testing proportions via the normal approximation. Use the t distribution for means from small samples with unknown variance; you must supply the degrees of freedom, which equals n - 1 for one-sample tests and n1 + n2 - 2 for independent two-sample tests. Use chi-squared for goodness-of-fit tests and contingency table independence tests; degrees of freedom equal the number of categories minus 1, or (rows - 1) x (columns - 1) for a table. Use F for ANOVA and regression model comparisons; it takes two degrees of freedom values, one for the numerator (groups minus 1) and one for the denominator (total observations minus groups).
Left-, right-, and two-tailed tests
A two-tailed test asks whether your statistic deviates from the null in either direction, doubling the single-tail probability. A right-tailed test asks only whether the statistic is larger than expected, which is the natural choice for ANOVA F-tests and chi-squared goodness-of-fit tests where only large values are surprising. A left-tailed test asks whether the statistic is smaller than expected. Choosing a one-tailed test after seeing the direction of your result effectively halves the p-value without statistical justification, which inflates false-positive rates. The two-tailed test is the safe default for any hypothesis where you have not pre-committed to a direction.
Reading the result responsibly
The conventional 0.05 cutoff is a convenience, not a law of nature. A p-value of 0.049 and one of 0.051 describe almost identical evidence. Treat statistical significance as one input into a decision, not a verdict. A highly significant result can still be trivially small in practical terms, and a non-significant result may simply reflect an underpowered study. Always pair the p-value with an effect size (Cohen's d, eta-squared, odds ratio) and a confidence interval. If you run many tests, some will cross the 0.05 threshold purely by chance, so apply a multiple-comparison correction such as Bonferroni or Benjamini-Hochberg when appropriate.
Common critical values by distribution
| Distribution | alpha = 0.10 | alpha = 0.05 | alpha = 0.01 |
|---|---|---|---|
| Z | 1.645 | 1.960 | 2.576 |
| t (df = 10) | 1.812 | 2.228 | 3.169 |
| t (df = 30) | 1.697 | 2.042 | 2.750 |
| chi-sq (df = 3, right) | 6.251 | 7.815 | 11.345 |
| F (df1=3, df2=20, right) | 2.380 | 3.098 | 4.938 |
Two-tailed critical values at the most-used significance levels.
Frequently asked questions
What is the difference between a Z-test and a t-test?
A Z-test uses the standard normal distribution and is appropriate when the sample is large enough (roughly n > 30) that the Central Limit Theorem ensures the sampling distribution of the mean is approximately normal, or when the population standard deviation is known. A t-test uses the heavier-tailed Student t distribution, which is appropriate for small samples with unknown variance. As the degrees of freedom increase, the t distribution approaches the standard normal, so for large df the two give almost identical p-values.
When should I use a chi-squared test instead of a Z- or t-test?
Use a chi-squared test when your outcome is categorical rather than continuous. A goodness-of-fit chi-squared test asks whether observed category frequencies match expected ones. A chi-squared test of independence asks whether two categorical variables in a contingency table are associated. Neither question involves a sample mean or a continuous measurement, so Z and t are the wrong tools for those situations.
What are degrees of freedom and why do they matter?
Degrees of freedom (df) represent the number of independent pieces of information in your data that are free to vary once certain constraints are applied. They shape the exact tail probabilities of the t, chi-squared, and F distributions. With fewer degrees of freedom, these distributions have heavier tails, so a given test statistic yields a larger p-value. Enter the correct df for your study design to get accurate results.
Should I use the one-tailed or two-tailed p-value?
Use the two-tailed p-value unless you committed, before collecting data, to testing a specific direction (for example, "the new drug reduces cholesterol, not just changes it"). Two-tailed tests account for extremes in both directions and are the conservative default. Choosing a one-tailed test after seeing which way the result points effectively doubles the false-positive rate, which is why reviewers scrutinise one-tailed claims.
Does a small p-value mean my result is important?
No. A p-value only measures how compatible your data are with the null hypothesis, not how large or meaningful the effect is. With a very large sample, a tiny and practically irrelevant difference can produce an extremely small p-value. Always report an effect size and a confidence interval alongside the p-value so readers can judge real-world importance, not just statistical detectability.