Histogram Calculator: Frequency Distribution and Statistics
Paste or type your numbers separated by commas, spaces, or line breaks. The calculator builds the histogram bins automatically using your chosen rule (Sturges, Scott, or Freedman-Diaconis), then shows the full frequency table with relative and cumulative frequencies, plus the key descriptive statistics: mean, median, standard deviation, variance, IQR, skewness, and kurtosis. The show-your-work panel walks through each formula step with your actual numbers.
What is a histogram?
A histogram is a bar chart of a frequency distribution. The horizontal axis shows the range of data values divided into equal-width intervals called bins or classes, and the height of each bar shows how many data points fall into that interval. Unlike a bar chart for categorical data, the bars of a histogram touch each other, reflecting that the underlying variable is continuous. Histograms are the fastest way to see the shape of a distribution: whether it is symmetric or skewed, whether it has one peak or several, and whether any values are unusually far from the rest.
How to use this calculator
Paste or type your numbers into the data field, separated by commas, spaces, or line breaks. The calculator immediately counts the values, picks a bin count using the selected rule, computes the bin boundaries and frequencies, and displays the frequency table. The default rule is Sturges, which works well for samples up to about 200 values. For larger samples or non-normal data, switch to Scott or Freedman-Diaconis. If you want a specific number of bars, choose "Custom bin count" and type the count into the field that appears. All descriptive statistics update automatically as you edit the data.
How the bin count is calculated
Three automatic rules are offered:
- Sturges (default): k = ceil(log2(n)) + 1. Simple and fast. Designed for roughly normal data with fewer than 200 observations. A sample of 20 gives 6 bins; 100 gives 8; 1,000 gives 11.
- Scott: bin width h = 3.49 s / n^(1/3). Minimizes the mean integrated squared error for a normal distribution. Works well when the data is symmetric and bell-shaped. Scott usually produces more bins than Sturges for large samples.
- Freedman-Diaconis: bin width h = 2 IQR / n^(1/3). Replaces the standard deviation with the interquartile range, making it more robust when the data has heavy tails or outliers. It is the best automatic choice for skewed data.
Once the bin count k is decided, bin width = (max - min) / k and the bins run from min to max with equal width. The last bin is closed on both ends to capture the maximum value.
Reading the descriptive statistics
The calculator outputs the standard summary statistics alongside the frequency table:
- Mean: the arithmetic average. Sensitive to outliers.
- Median: the middle value when sorted. Resistant to outliers. If the mean is well above the median, the distribution is likely right-skewed.
- Standard deviation (s): the typical distance from the mean, using the sample formula divided by n - 1.
- Variance (s^2): the square of the standard deviation.
- IQR: the interquartile range, Q3 minus Q1, covering the middle 50% of the data. A robust spread measure.
- Skewness: positive values mean a longer right tail, negative a longer left tail. Values beyond +/-0.5 are considered meaningfully skewed.
- Excess kurtosis: positive values indicate heavier tails than a normal distribution, negative values indicate lighter tails. Normal distributions have excess kurtosis = 0.
Interpreting the histogram shape
A symmetric, bell-shaped histogram suggests a normal distribution. A histogram with a long right tail and a cluster of values on the left is right-skewed (positive skew); income distributions are a classic example. A left-skewed histogram has most values bunched at the high end with a long tail toward lower values, common in test scores near a ceiling. A bimodal histogram with two peaks may indicate that the sample is actually a mixture of two groups. Gaps between bars suggest outliers or sparse data in that range. When comparing two groups, overlapping histograms are more informative than separate summary statistics alone.
Binning rule comparison
| Rule | Formula | Best for | Limitation |
|---|---|---|---|
| Sturges' | k = ceil(log2 n) + 1 | General use, n < 200 | Underestimates bins for large samples |
| Scott's | h = 3.49 s / n^(1/3) | Approximately normal data | Poorly suited to heavy-tailed data |
| Freedman-Diaconis | h = 2 IQR / n^(1/3) | Skewed or heavy-tailed data | Needs n >= 20 for reliable IQR |
| Square root | k = ceil(sqrt(n)) | Quick rough estimate | No statistical basis |
Common automatic bin-count rules and when each works best.
Frequently asked questions
How many bins should a histogram have?
There is no single correct answer. Sturges' rule (k = ceil(log2(n)) + 1) is a reasonable default for samples under 200. Scott's rule and the Freedman-Diaconis rule give more bins for non-normal data and are better for large samples. Too few bins hide the shape; too many bins create noise. Try the automatic rules first, then use the custom option to fine-tune.
What is the difference between a histogram and a bar chart?
A bar chart displays counts or values for distinct categories, with gaps between the bars. A histogram displays a frequency distribution for a continuous numeric variable, with no gaps between bars, because adjacent bars represent adjacent intervals of the same scale. Bar charts are for categorical data; histograms are for quantitative data.
What does a right-skewed histogram mean?
A right-skewed (positive-skew) histogram has most values clustered on the left with a long tail extending to the right. It means there are a few unusually large values pulling the mean above the median. Salary, wealth, house prices, and waiting times often show right skew.
What does skewness of 0 mean?
A skewness of exactly 0 means the distribution is perfectly symmetric. In practice, values between -0.5 and +0.5 are treated as approximately symmetric. A skewness beyond +1 or -1 is considered substantially skewed.
What is excess kurtosis and why does it matter?
Excess kurtosis measures tail heaviness relative to a normal distribution. A value of 0 means normal tails. Positive excess kurtosis (leptokurtic) means the distribution has heavier tails and more outliers than a normal distribution; financial returns and extreme weather events often show this. Negative excess kurtosis (platykurtic) means lighter tails and fewer outliers.
How does Freedman-Diaconis differ from Sturges?
Sturges' rule uses only sample size, giving relatively few bins. It assumes normality and understates the optimal bin count for large or skewed samples. Freedman-Diaconis uses the interquartile range scaled by n^(1/3), which adapts to the actual spread and shape of the data, resulting in more bins when the data is heavy-tailed or has outliers.
Can I use this for a continuous or discrete variable?
Yes to both. Histograms work for any numeric data. For discrete data with many possible values (like test scores 0-100), treat it as continuous and let the binning rule pick the class width. For discrete data with very few values (like dice rolls 1-6), a bar chart may be clearer than a histogram.