Question 1

What p-value means an A/B test is statistically significant?

Accepted Answer

A p-value below your chosen significance threshold (alpha) is conventionally considered significant. The most common threshold is 0.05, meaning a 5% chance of a false positive. If your p-value is 0.03, for example, there is only a 3% chance of observing a difference this large if the two versions truly perform the same. Many teams use 95% confidence (alpha = 0.05) for routine tests and 99% (alpha = 0.01) for changes that directly affect revenue or safety.

Question 2

What is the difference between one-sided and two-sided tests?

Accepted Answer

A two-sided test checks whether B is different from A in either direction, better or worse. A one-sided test only checks whether B is better. One-sided tests have more statistical power (you can detect smaller effects with the same sample) but they cannot catch harm. If there is any realistic chance that the change could hurt your metric, use a two-sided test. One-sided tests are appropriate only when you are certain the direction of any effect must be positive.

Question 3

How long should I run an A/B test?

Accepted Answer

Run your test until you reach the pre-calculated sample size, not until you see a significant result. Stopping the moment significance is reached (peeking) inflates the false-positive rate well above the stated alpha. As a rule of thumb, run tests for at least one full business week to capture weekly seasonality, and never stop after fewer than 100 conversions in the smaller group. Use Plan mode to calculate the exact target sample size before you start.

Question 4

What is statistical power and why does it matter?

Accepted Answer

Statistical power (1-beta) is the probability that your test will detect a true improvement of the specified size. At 80% power you will miss 20% of real effects of that magnitude. Increasing power to 90% reduces missed opportunities but requires about 30% more visitors. Low power is a common reason teams conclude a test had "no effect" when the effect was real but the experiment was not large enough to see it. Run the sample size calculator before launching to ensure your test is adequately powered.

Question 5

What is a minimum detectable effect (MDE) and how do I choose one?

Accepted Answer

The MDE is the smallest change in conversion rate worth detecting. It is usually expressed as a relative percentage of the baseline (e.g., 10% relative means 4.0% baseline to 4.4%). A smaller MDE requires a much larger sample because tiny differences are harder to distinguish from noise. Choose your MDE based on what improvement would be worth shipping: if a change producing less than 5% relative lift would not change any business decision, set MDE = 5% and save yourself weeks of test time.

Question 6

Can I test more than two variants at once?

Accepted Answer

Testing A vs B vs C (a multi-variant or MVT test) is possible, but this calculator covers the two-variant case. When running more than two groups, each additional comparison increases the chance of a false positive (the multiple comparisons problem). You must apply a correction such as Bonferroni or Holm-Bonferroni, or use a dedicated multi-arm testing framework. Splitting traffic across three groups also increases the required total sample size.

Question 7

What is sample ratio mismatch and what should I do if I see it?

Accepted Answer

A sample ratio mismatch (SRM) means the actual split of visitors between A and B is different from the intended split. It usually signals a technical bug in the experiment setup: a redirect that some users do not follow, a caching layer skipping the experiment, or bots being filtered differently across groups. Any SRM makes your results unreliable. Stop the test, fix the root cause, and restart the experiment from scratch.

Confidence level	Alpha (type I error)	Typical use case
90%	0.10	Early-stage or low-stakes tests
95%	0.05	Standard for most A/B tests
99%	0.01	High-stakes revenue changes

A/B Test Calculator

Your details

What is an A/B test and why does statistical significance matter?

How the calculator works

Sample ratio mismatch (SRM) - the silent killer of A/B tests

Minimum detectable effect and test duration planning

Common significance thresholds and their meaning

Frequently asked questions

Sources