Hypergeometric Distribution Calculator
Enter the population size, the number of successes in the population, the sample size, and the observed successes to compute every probability for a hypergeometric distribution instantly. The calculator shows exact probability, all five cumulative forms, the mean, variance, and standard deviation, together with a full step-by-step solution and a probability mass function chart.
Formula
Worked example
A deck of 52 cards (N=52) contains 13 hearts (K=13). Five cards are dealt (n=5). What is the chance exactly 2 are hearts (k=2)? C(13,2)=78, C(39,3)=9139, C(52,5)=2598960. P(X=2) = 78*9139/2598960 = 712842/2598960 = 0.274356, or about 27.4%. Mean = 5*13/52 = 1.25 hearts expected.
What is the hypergeometric distribution?
The hypergeometric distribution models the number of successes in a fixed-size sample drawn WITHOUT replacement from a finite population of known size. Unlike the binomial distribution, which assumes each draw is independent (i.e. with replacement), the hypergeometric distribution accounts for the fact that each draw changes the composition of the remaining population. Classic examples include drawing cards from a deck, selecting defective items from a production lot, or pulling colored balls from an urn without putting them back.
Hypergeometric distribution formula
The probability mass function is P(X = k) = C(K, k) * C(N-K, n-k) / C(N, n), where N is the total population size, K is the number of "success" items in the population, n is the sample size, and k is the number of successes observed in the sample. C(a, b) denotes the binomial coefficient ("a choose b"), which counts the number of ways to pick b items from a without regard to order. The mean (expected successes) is mu = n * K / N, and the variance is sigma^2 = n * K * (N-K) * (N-n) / (N^2 * (N-1)). The factor (N-n)/(N-1) is the finite population correction - it shrinks the variance because sampling without replacement constrains the possible outcomes.
Hypergeometric vs. binomial distribution
The key distinction is whether sampling is with or without replacement. If you shuffle a deck, deal a card, record it, and put it back before dealing again, each draw is independent and the binomial distribution applies with p = K/N. If you do NOT replace the card - the realistic scenario for most card games - the draws are dependent and the hypergeometric distribution is correct. As a practical rule: if the sample size n is less than 5% of the population N, the two distributions give very similar answers and the binomial approximation is acceptable. When n/N exceeds 5%, use the hypergeometric formula for accuracy.
How to use this calculator
Set N to the total population size (e.g., 52 cards), K to the number of successes in the population (e.g., 13 hearts), n to the sample size you are drawing (e.g., 5 cards), and k to the number of successes you want to evaluate (e.g., 2 hearts). The calculator instantly computes all five probability forms - exact P(X=k), left-tail P(X
Hypergeometric Distribution vs. Related Distributions
| Distribution | Sampling | Key condition | Use when |
|---|---|---|---|
| Hypergeometric | Without replacement | Finite population, no replacement | n/N > 5% (sample is large relative to population) |
| Binomial | With replacement | Fixed trials, constant probability | n/N <= 5% (sample is small, or population is infinite) |
| Poisson | Rare events | Very small p, large n | Counting rare occurrences over time or space |
| Negative hypergeometric | Without replacement | Count draws until r successes | Waiting time problems without replacement |
When to use each distribution when sampling without replacement or with replacement.
Frequently asked questions
What is the difference between hypergeometric and binomial distributions?
The hypergeometric distribution applies when sampling WITHOUT replacement from a finite population - each draw changes the remaining composition. The binomial distribution applies when sampling WITH replacement (or when the population is so large that replacement has negligible effect). If your sample is less than about 5% of the population, the two give nearly identical results. For larger samples relative to the population, only the hypergeometric distribution is accurate.
What do N, K, n, and k mean in the hypergeometric formula?
N is the total number of items in the population (e.g., 52 cards in a deck). K is the number of those items that are "successes" (e.g., 13 hearts). n is the size of the sample you draw without replacement (e.g., 5 cards). k is the specific number of successes you are calculating the probability for (e.g., exactly 2 hearts). The constraint is that k cannot exceed either n or K, and n cannot exceed N.
What is the finite population correction factor?
The finite population correction (FPC) is the term (N-n)/(N-1) that appears in the hypergeometric variance formula. It reduces the variance compared to the binomial model because when you sample without replacement, you cannot repeatedly draw the same item. The smaller the remaining population after each draw, the more constrained the outcome is, so the variance is smaller. The FPC approaches 1 when n is tiny relative to N, which is why the hypergeometric and binomial variances converge in large populations.
Can k be larger than n or K?
No. You cannot observe more successes in your sample (k) than either the sample size (n) or the total number of successes in the population (K), whichever is smaller. Similarly, k cannot be negative. The valid range for k is from max(0, n-(N-K)) to min(n, K). If you enter a value outside this range, the probability is zero.
What is P(X >= k) used for?
P(X >= k) - "at least k successes" - is often the most practical form for quality control and hypothesis testing. For example, if you test 10 items from a batch and ask whether 3 or more are defective, you compute P(X >= 3) to find the probability that sampling would reveal that many defects. Values below 5% often trigger rejection of a null hypothesis in statistical tests.
How does the hypergeometric distribution relate to the card game probabilities?
Card games are a natural application. In a standard 52-card deck (N=52) with 13 cards of each suit (K=13), the hypergeometric distribution tells you the exact probability of any given number of hearts in a 5-card hand. Because cards are dealt without replacement, the hypergeometric formula gives the correct answer. For example, P(exactly 2 hearts in 5 cards) is about 27.4%.
When should I use cumulative probability instead of exact probability?
Use cumulative probabilities when you care about a range of outcomes rather than one specific count. P(X <= k) answers "what is the chance of k or fewer successes?" - useful for left-tail tests. P(X >= k) answers "what is the chance of k or more?" - useful for right-tail tests or quality thresholds. The exact P(X = k) is most useful when you need the probability of one specific outcome, such as in genetics where exactly n offspring must show a trait.