Z-Test Calculator
Choose the z-test type that fits your data: one-sample (does a sample mean match a known population mean?), two-sample (do two group means differ?), or one-proportion (does an observed proportion match a target?). Select one-tailed or two-tailed, set your significance level, and get the z-score, p-value, critical value, a plain-English verdict, and a confidence interval.
Formula
Worked example
One-sample example: x-bar = 104, mu = 100, sigma = 15, n = 36. Standard error = 15/sqrt(36) = 2.5. z = (104 - 100)/2.5 = 1.60. Two-tailed p = 2 x (1 - Phi(1.60)) = 0.1096. Critical z at alpha=0.05 is 1.96. Since |1.60| < 1.96, fail to reject the null. 95% CI for the true mean: 104 +/- 1.96 x 2.5 = [99.10, 108.90].
Which z-test should I use?
Three common situations call for a z-test. The one-sample test asks whether the mean of your sample matches a known or hypothesised population mean, for example whether a class average on a national exam equals the published national norm. The two-sample test asks whether two independent group means are equal, for example whether two manufacturing lines produce parts of the same average length. The one-proportion test asks whether an observed success rate equals a target proportion, for example whether a conversion rate equals 50%. All three tests require that the relevant standard deviation is known (or that the sample is large enough for the normal approximation to hold) and produce a z-score, a p-value, a critical value, and a confidence interval.
One-tailed vs. two-tailed tests
A two-tailed test is appropriate when you want to detect a difference in either direction: your sample mean might be above or below the hypothesised value, and either matters. A right-tailed test focuses only on the possibility that the true mean is higher; a left-tailed test focuses only on the possibility that it is lower. Choosing the direction in advance (before seeing the data) is important for honest inference: if you choose a one-tailed test only because the data came out on that side, the p-value is not valid. When in doubt, use a two-tailed test.
Understanding the p-value and critical value
The p-value is the probability of observing a test statistic at least as extreme as yours, in the direction(s) specified by your hypothesis, if the null hypothesis were true. A small p-value means the data would be surprising under the null, which is evidence against it. The critical value is the z-score threshold: if your |z| exceeds the critical value (two-tailed), or z crosses it in the appropriate direction (one-tailed), you reject the null. Both approaches give identical decisions; the p-value also tells you how far inside the rejection region you are, which is useful context.
Confidence interval alongside the hypothesis test
The calculator also reports a confidence interval for the parameter you are testing: the true mean (one-sample), the mean difference (two-sample), or the true proportion (proportion test). A 95% confidence interval says that, if you repeated the study many times, about 95% of the resulting intervals would contain the true parameter. Confidence intervals complement hypothesis tests: the test says whether the effect is real; the interval says how large it plausibly is. An interval that excludes zero (or the hypothesised value) corresponds exactly to a significant two-tailed test at the matching alpha level.
When to use a t-test instead
The z-test requires the population standard deviation to be known exactly. In most real-world settings, this is estimated from the sample, especially when samples are small. If you are estimating the standard deviation, use a t-test, which adds uncertainty in the tails through heavier-than-normal Student t-distribution. For large samples (roughly n greater than 30 per group), the t and z distributions are nearly identical and the choice matters little. The one-proportion test is always a z-test because the standard error under the null is fully determined by the hypothesised proportion and sample size.
Critical z-values by tail and significance level
| Alpha (alpha) | Two-tailed critical |z| | Right-tailed critical z | Left-tailed critical z | Confidence level |
|---|---|---|---|---|
| 0.10 | 1.645 | 1.282 | -1.282 | 90% |
| 0.05 | 1.960 | 1.645 | -1.645 | 95% |
| 0.01 | 2.576 | 2.326 | -2.326 | 99% |
| 0.001 | 3.291 | 3.090 | -3.090 | 99.9% |
Reject the null when |z| exceeds the two-tailed critical value, or z exceeds the one-tailed value in the specified direction.
Frequently asked questions
When should I use a z-test instead of a t-test?
Use a z-test when the population standard deviation is known exactly and the sampling distribution of the mean is approximately normal, which is reasonable for large samples (roughly n greater than 30). Use a t-test when the standard deviation is estimated from the sample, especially for small samples, because the t-distribution has heavier tails that account for that added uncertainty. For large samples the two tests give almost identical results.
What is the difference between a one-tailed and a two-tailed z-test?
A two-tailed test checks for a difference in either direction: your observed value might be above or below the null value. A one-tailed (right-tailed or left-tailed) test checks only for a difference in one specified direction. One-tailed tests have more power to detect a difference in the expected direction, but they cannot detect a difference in the opposite direction. The decision between one- and two-tailed must be made before collecting data, based on the research question.
What does the p-value actually mean?
The p-value is the probability of observing a test statistic at least as extreme as yours, assuming the null hypothesis is true. It is not the probability that the null hypothesis is true, nor the probability that your result was due to chance. A p-value below your chosen alpha (often 0.05) means the data are unusual enough under the null to justify rejecting it; it does not prove the alternative hypothesis or quantify the size of the effect.
How do I interpret the confidence interval?
The confidence interval gives a range of plausible values for the true parameter (the mean, mean difference, or proportion). A 95% confidence interval means that if the study were repeated many times, approximately 95% of the resulting intervals would contain the true value. If the interval excludes the null value (e.g., zero for a mean difference, or p0 for a proportion), the corresponding two-tailed test at the matching alpha is significant.
What are the assumptions of a z-test?
The one-sample and two-sample z-tests assume the population standard deviation is known, the data are independent, and the sampling distribution of the mean is approximately normal (satisfied for large samples via the central limit theorem or for normally distributed populations). The one-proportion test assumes the sample is large enough that np0 and n(1-p0) are both at least 5, ensuring the normal approximation to the binomial is valid.
Does a significant result prove the effect is real?
No. Statistical significance only means the observed result would be unlikely under the null hypothesis if the test's assumptions hold. It does not prove causation, rule out confounders, or guarantee the effect is practically meaningful. With a very large sample, even a negligible difference can be statistically significant. Always pair the p-value with an effect size and a confidence interval to judge whether the finding matters in practice.
Sources
- NIST/SEMATECH e-Handbook of Statistical Methods, Location tests
- Zar, J.H. (2010). Biostatistical Analysis, 5th ed. Prentice Hall.