Allele Frequency Calculator
Enter your population data as genotype counts (AA, Aa, aa), phenotype counts (dominant vs. recessive), or genotype percentages. The calculator finds the dominant allele frequency (p) and the recessive allele frequency (q), shows Hardy-Weinberg expected genotype frequencies, and runs a chi-square test to check whether your population is in Hardy-Weinberg equilibrium. Results update instantly as you type.
Formula
Worked example
In a population of 100 individuals: 49 AA, 42 Aa, 9 aa. Total alleles = 200. Dominant alleles A = 2(49) + 42 = 140, so p = 140/200 = 0.70. Recessive alleles a = 2(9) + 42 = 60, so q = 60/200 = 0.30. Hardy-Weinberg expected: p^2 = 0.49, 2pq = 0.42, q^2 = 0.09. Expected counts: 49 AA, 42 Aa, 9 aa. Chi-square = 0 (perfect HWE).
What is allele frequency?
An allele is one version of a gene. For a gene that exists in two forms - a dominant allele (A) and a recessive allele (a) - the allele frequency is simply the proportion of all gene copies in the population that are that particular version. Geneticists call the dominant allele frequency p and the recessive allele frequency q. Because every gene copy must be one allele or the other, p + q = 1. A p of 0.7 means 70 percent of all gene copies in the population carry the dominant form, and q = 0.3 means 30 percent carry the recessive form.
Hardy-Weinberg equilibrium explained
The Hardy-Weinberg principle states that in a large, randomly mating population with no selection, migration, mutation, or genetic drift, allele and genotype frequencies will remain constant from generation to generation. Under these conditions, genotype frequencies follow the equation p^2 + 2pq + q^2 = 1, where p^2 is the proportion of homozygous dominant individuals (AA), 2pq is the proportion of heterozygous carriers (Aa), and q^2 is the proportion of homozygous recessive individuals (aa). The principle is rarely perfectly satisfied in real populations, but it provides a crucial null hypothesis. When observed frequencies deviate significantly from HWE, it signals that one of the five equilibrium assumptions is violated - most commonly selection, small population size (drift), or non-random mating.
Three ways to calculate p and q
If you have genotype counts (the number of AA, Aa, and aa individuals), the direct method is most accurate: count all copies of A (each AA gives 2, each Aa gives 1) and divide by the total number of alleles (2N). This approach also lets you run a chi-square test. If you only have phenotype data (how many show the dominant vs. recessive trait), you must assume HWE and work backward: q^2 equals the recessive phenotype frequency, so q = the square root of that fraction, and p = 1 - q. If you have genotype percentages rather than raw counts, normalise them to fractions and apply the same direct formula. The chi-square goodness-of-fit test (one degree of freedom, critical value 3.841 at alpha = 0.05) tells you whether any deviation from HWE is statistically significant.
Practical applications in genetics and medicine
Allele frequency calculations have wide applications. In clinical genetics, they let you estimate the carrier frequency for an autosomal recessive disease from prevalence data: if cystic fibrosis affects about 1 in 2,500 Caucasians, q^2 = 0.0004, q = 0.02, and the carrier rate 2pq is approximately 1 in 25. In conservation biology, comparing allele frequencies across populations reveals genetic isolation and inbreeding. In evolutionary biology, repeated measurements over generations can detect selection in action - if a harmful allele is decreasing faster than drift alone would predict, selection is at work. In association studies, departure from HWE in a control group can also flag genotyping errors, making it a standard quality-control step in genome-wide studies.
Common autosomal recessive conditions and carrier frequencies
| Condition | Prevalence (q^2) | q (mutant allele) | Carrier rate (2pq) |
|---|---|---|---|
| Cystic fibrosis (Caucasian) | 1 in 2,500 | ~0.020 | ~1 in 25 |
| Phenylketonuria (Caucasian) | 1 in 15,000 | ~0.008 | ~1 in 60 |
| Sickle cell anemia (African-American) | 1 in 600 | ~0.041 | ~1 in 12 |
| Tay-Sachs (Ashkenazi Jewish) | 1 in 3,600 | ~0.017 | ~1 in 30 |
| Albinism (general) | 1 in 20,000 | ~0.007 | ~1 in 72 |
| Harlequin ichthyosis (general) | 1 in 300,000 | ~0.002 | ~1 in 275 |
Estimated allele frequencies computed from disease prevalence using q^2 = disease frequency, assuming Hardy-Weinberg equilibrium. Frequencies vary by population.
Frequently asked questions
What is the difference between allele frequency and genotype frequency?
Allele frequency (p or q) is the proportion of a specific allele among all allele copies in the population. Genotype frequency is the proportion of individuals with a specific combination of alleles: AA, Aa, or aa. They are related but distinct - for example, if p = 0.7 and q = 0.3, the genotype frequencies under HWE are p^2 = 0.49 (AA), 2pq = 0.42 (Aa), and q^2 = 0.09 (aa).
Why does my population fail the Hardy-Weinberg chi-square test?
A significant chi-square result means the observed genotype frequencies differ from HWE expectations more than chance alone would predict. The five conditions that must hold for HWE are: large population size (no genetic drift), random mating, no natural selection at the locus, no gene flow (migration), and no mutation. Any violation can cause departure. In practice, the most common causes are population structure (distinct sub-groups with different allele frequencies who mate preferentially within the group), recent population bottleneck, or strong selection against one genotype.
How do I find allele frequency from disease prevalence?
For an autosomal recessive trait, set q^2 equal to the disease prevalence expressed as a fraction. Take the square root to get q (the mutant allele frequency), then calculate p = 1 - q and the carrier rate 2pq. For example, if a disease affects 1 in 10,000 people, q^2 = 0.0001, q = 0.01, p = 0.99, and the carrier rate is 2 x 0.99 x 0.01 = 0.0198, or about 1 in 50. This assumes the population is in HWE and the trait is fully recessive.
What is the minor allele frequency (MAF)?
The minor allele frequency is simply the frequency of the less common allele at a given locus. It is always the smaller of p and q, so MAF = min(p, q). In genome-wide association studies, variants with a MAF below 0.01 or 0.05 are often excluded because very rare variants require very large samples to achieve statistical power. Most common disease-associated variants have MAF > 0.05.
Can this calculator handle more than two alleles?
This calculator covers the classic two-allele (biallelic) case, which applies to most recessive genetic diseases and many SNP analyses. For multi-allele loci with k alleles, the Hardy-Weinberg principle generalises: the frequencies of all alleles must sum to 1, and there are k(k+1)/2 possible genotypes. Those calculations require a more complex approach and a full genotype frequency matrix.
How many individuals do I need for a reliable chi-square test?
The chi-square approximation is generally considered valid when all expected genotype counts are at least 5. If any expected count falls below 5, the test result may be unreliable. This typically requires a sample of at least 30 to 50 individuals, depending on how rare the recessive allele is. For very rare alleles (q < 0.05), you may need hundreds or thousands of individuals to observe enough homozygous recessive cases for a meaningful test.