Statistics

Cohen's d Effect Size Calculator

Q: How does Cohen's d relate to statistical significance?

A large Cohen's d does not guarantee statistical significance, and a small p-value does not imply a large effect. With a huge sample even a d of 0.05 can reach p < 0.05, while with a tiny sample a d of 1.0 might not reach significance. Effect size and p-value answer different questions: d measures how big the difference is, p measures how likely it is to arise by chance if the null hypothesis were true. Report both.

Q: When should I use Glass delta instead of the pooled SD formula?

Use Glass delta (the unequal-SD option here) when you suspect or know that the treatment changed the variability of the outcome, not just the mean. For instance, a training intervention might narrow individual differences among participants. In that case, pooling the control and treatment SDs could distort the benchmark. By anchoring d to the control SD alone, Glass delta expresses the treatment shift relative to the natural spread that would have occurred without intervention.

Q: What sample sizes do I need to detect a medium effect?

A common rule of thumb: to detect a medium effect (d = 0.5) with 80% power at alpha = 0.05 in a two-tailed two-sample t-test, you need roughly 64 participants per group (128 total). For a small effect (d = 0.2) you need about 394 per group; for a large effect (d = 0.8) about 26 per group. Use a dedicated power-analysis tool to tailor these numbers to your exact design.

Enter the means and standard deviations for your two groups (or one sample vs. a population mean) and get Cohen's d, the pooled standard deviation, and the equivalent correlation r. The result includes a conventional effect-size interpretation (small, medium, or large) and a step-by-step breakdown of the calculation.

By Dr. Hannah Brandt, PhD · Updated June 7, 2026

Cohen's dSmall effect

0.3446

Standardised mean difference (effect size)

Pooled SD14.5086

r (correlation equivalent)0.1698

|d| (absolute)0.3446

0.3446 d

Negligible<0.2Small0.2-0.5Medium0.5-0.8Large0.8+

Cohen's d = 0.3446 (Small effect)

The absolute effect size is 0.345, which falls in the "Small" range by Cohen's (1988) conventional benchmarks.
Group 1 scored higher than Group 2 by 0.345 pooled standard deviations.
The equivalent point-biserial correlation is r = 0.170, explaining about 2.9% of variance.

Next stepA small effect is real but modest. It may require a larger sample to detect reliably; run a power analysis if you are still designing the study.

Calculate the mean difference105 - 100
5.0000
Calculate pooled variance: ((n1-1) x SD1^2 + (n2-1) x SD2^2) / (n1+n2-2)((30-1) x 15^2 + (30-1) x 14^2) / (30+30-2)
210.5000
Calculate pooled SD: sqrt(pooled variance)sqrt(210.5000)
14.5086
Divide mean difference by pooled SD to get Cohen's d5.0000 / 14.5086
0.3446
Convert d to r: r = d / sqrt(d^2 + 4)0.3446 / sqrt(0.3446^2 + 4)
r = 0.1698

Formula

d = (\bar{x}_1 - \bar{x}_2) / s_p, \quad s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}, \quad r = \frac{d}{\sqrt{d^2+4}}

Worked example

Two groups have means 105 and 100 with SDs of 15 and 14 and 30 participants each. Pooled variance = (29 x 225 + 29 x 196) / 58 = 210.5, pooled SD = 14.51. Cohen's d = (105 - 100) / 14.51 = 0.3446 (small-to-medium effect). Equivalent r = 0.3446 / sqrt(0.3446^2 + 4) = 0.1694.

What is Cohen's d?

Cohen's d is the most widely used measure of effect size for comparing two group means. It expresses the difference between means in units of standard deviation, so a d of 1.0 means the groups differ by exactly one standard deviation. Proposed by psychologist Jacob Cohen in his 1988 textbook Statistical Power Analysis for the Behavioral Sciences, it is now standard practice in psychology, medicine, education, and many other fields. Because d is dimensionless, it lets you compare results across studies that measured the same construct with different scales.

The three modes: pooled SD, Glass delta, and one-sample

The standard two-sample Cohen's d uses a weighted pooled standard deviation in the denominator, which balances information from both groups. This is appropriate when the two groups are assumed to come from populations with equal variance. Glass delta, sometimes called the 'unequal SD' variant, puts only the control group's standard deviation in the denominator. This is preferable in experiments where the treatment may have changed variability as well as the mean, because you want to express the difference relative to the natural baseline spread. The one-sample form compares a single sample mean to a known or hypothesised population value using the sample's own standard deviation as the denominator, matching the logic of a one-sample t-test.

Interpreting effect size: small, medium, and large

Cohen's (1988) conventional benchmarks classify d values below 0.2 as negligible, 0.2-0.49 as small, 0.5-0.79 as medium, and 0.8 or above as large. These thresholds are widely cited but are rough guides, not firm cutoffs. A 'small' effect in a life-or-death medical trial may be highly important, while a 'large' effect in a trivial outcome may not matter at all. Many researchers now recommend interpreting d in the context of comparable studies in the same field and in terms of practical significance (for example, the minimum clinically meaningful difference), rather than relying solely on Cohen's benchmarks.

Relationship between Cohen's d and other effect-size measures

Cohen's d can be converted to the point-biserial correlation r using the formula r = d / sqrt(d^2 + 4). This r value is sometimes reported alongside d in mixed-methods papers or meta-analyses that combine different research designs. The squared value r^2 is the proportion of variance in the outcome explained by group membership. For example, a d of 0.5 converts to r = 0.243, meaning group membership accounts for about 5.9% of outcome variance. Other related measures include Hedges' g (which applies a small-sample correction factor to d) and Glass's delta described above. For ANOVA, partial eta-squared (partial-eta^2) and f play the role that d plays for t-tests.

Cohen's d interpretation benchmarks

\|d\| range	Interpretation	Typical power requirement
< 0.2	Negligible	Very large sample needed
0.2 - 0.49	Small	Large sample needed
0.5 - 0.79	Medium	Moderate sample
>= 0.8	Large	Small-to-moderate sample

Conventional effect-size guidelines proposed by Jacob Cohen (1988). These are approximate rules of thumb, not strict cutoffs.

Frequently asked questions

What is a good value for Cohen's d?

There is no single 'good' value; it depends on the context. Cohen's (1988) conventions call 0.2 small, 0.5 medium, and 0.8 large. In many psychology experiments a d of 0.5 is considered a meaningful finding. In high-stakes clinical or educational research, even a d of 0.2 may be practically important if the outcome matters enough. Always compare your d to values reported in similar studies in your field.

Can Cohen's d be negative?

Yes. A negative d simply means Group 2 scored higher than Group 1. The sign reflects direction, not importance: a d of -0.6 and a d of +0.6 have equal magnitude. When reporting effect size in a symmetric way (for example, in a meta-analysis), you may report the absolute value and describe direction separately.

What is the difference between Cohen's d and Hedges' g?

Hedges' g applies a small-sample correction factor (often written J) to Cohen's d: g = d x J, where J is slightly less than 1. For sample sizes above about 20 per group the difference is negligible, but for very small samples (n < 10 per group) Hedges' g gives a less biased estimate. This calculator computes Cohen's d; for tiny samples consider multiplying by J = 1 - 3 / (4(n1+n2) - 9).

How does Cohen's d relate to statistical significance?

A large Cohen's d does not guarantee statistical significance, and a small p-value does not imply a large effect. With a huge sample even a d of 0.05 can reach p < 0.05, while with a tiny sample a d of 1.0 might not reach significance. Effect size and p-value answer different questions: d measures how big the difference is, p measures how likely it is to arise by chance if the null hypothesis were true. Report both.

When should I use Glass delta instead of the pooled SD formula?

Use Glass delta (the unequal-SD option here) when you suspect or know that the treatment changed the variability of the outcome, not just the mean. For instance, a training intervention might narrow individual differences among participants. In that case, pooling the control and treatment SDs could distort the benchmark. By anchoring d to the control SD alone, Glass delta expresses the treatment shift relative to the natural spread that would have occurred without intervention.

What sample sizes do I need to detect a medium effect?

A common rule of thumb: to detect a medium effect (d = 0.5) with 80% power at alpha = 0.05 in a two-tailed two-sample t-test, you need roughly 64 participants per group (128 total). For a small effect (d = 0.2) you need about 394 per group; for a large effect (d = 0.8) about 26 per group. Use a dedicated power-analysis tool to tailor these numbers to your exact design.

Sources

Was this calculator helpful?

Written by Dr. Hannah Brandt, PhD Statistician · Munich, Germany

Applied statistician translating rigorous probability theory into clear, accurate tools for researchers and practitioners.

How we build & check our calculators