Descriptive Statistics Explained: Summarizing Data with Measures of Center and Spread
Measures of Central Tendency
Central tendency describes where the middle or center of a dataset falls. Three main measures serve this purpose, each with distinct strengths depending on the nature of your data.
The arithmetic mean is the sum of all values divided by the number of observations. For a dataset of exam scores (72, 85, 91, 68, 79), the mean is (72 + 85 + 91 + 68 + 79) / 5 = 79. The mean uses every data point in its calculation, making it the most informative measure for symmetric distributions. However, this sensitivity becomes a weakness when outliers are present. Adding a single score of 12 to that dataset drops the mean to 67.8, even though five of six students scored between 68 and 91. In income data, where a few extremely high earners skew the distribution rightward, the mean overstates what most people actually earn.
The median is the middle value when observations are arranged in order. For an odd number of values, it is the center observation. For an even number, it is the average of the two center values. The median of (68, 72, 79, 85, 91) is 79. Because the median depends only on position rather than magnitude, it resists the pull of extreme values. Median household income is a more representative measure of typical earnings than mean household income precisely because it ignores the distortion created by millionaires and billionaires at the top of the distribution.
The mode is the most frequently occurring value. In the dataset (3, 5, 5, 7, 8, 5, 9), the mode is 5 because it appears three times. A dataset can have multiple modes (bimodal, multimodal) or no mode at all if all values are unique. The mode is the only measure of central tendency that works for categorical (non-numeric) data. If you survey favorite colors and get (blue, red, blue, green, blue, red), the mode is blue. For continuous data, the mode is less useful unless you group values into bins first.
Measures of Variability
Central tendency alone tells an incomplete story. Two datasets can have identical means but vastly different spreads. The test scores (78, 79, 80, 81, 82) and (40, 60, 80, 100, 120) both have a mean of 80, but the second group shows far more variation in student performance. Measures of variability quantify this spread.
The range is the simplest measure: maximum value minus minimum value. For (78, 79, 80, 81, 82), the range is 4. For (40, 60, 80, 100, 120), the range is 80. While easy to calculate, the range depends entirely on the two most extreme observations and ignores everything in between. A single outlier can make the range misleadingly large.
The interquartile range (IQR) improves on the range by focusing on the middle 50% of the data. It equals the third quartile (Q3, the 75th percentile) minus the first quartile (Q1, the 25th percentile). The IQR ignores the tails of the distribution, making it robust against outliers. Values falling more than 1.5 times the IQR below Q1 or above Q3 are conventionally flagged as potential outliers.
Variance measures the average squared deviation from the mean. For each observation, you calculate how far it falls from the mean, square that distance (to eliminate negative signs), and then average the squared deviations. Population variance divides by N (the total count), while sample variance divides by N-1 (a correction called Bessel's correction that produces an unbiased estimate of population variance from sample data). The units of variance are squared, which makes direct interpretation awkward. If your data is in meters, variance is in square meters.
Standard deviation solves the units problem by taking the square root of variance, returning the measure to the original units of the data. A standard deviation of 10 on an exam scored in points means that observations typically deviate about 10 points from the mean. For normally distributed data, approximately 68% of observations fall within one standard deviation of the mean, about 95% within two standard deviations, and about 99.7% within three. This 68-95-99.7 rule provides a quick intuition for how extreme any particular observation is.
Distribution Shape
Beyond center and spread, the shape of a distribution reveals important characteristics of the data. Three properties describe shape: symmetry, skewness, and kurtosis.
A symmetric distribution looks the same on both sides of its center. The normal distribution is the classic example, with the mean, median, and mode all coinciding at the center. When data is symmetric, the mean is the best measure of central tendency because it equals the median and fully captures the center.
Skewness measures the degree of asymmetry. A right-skewed (positively skewed) distribution has a long tail extending to the right, with the mean pulled above the median. Income distributions are typically right-skewed because most people earn moderate amounts while a small number earn enormously more. A left-skewed (negatively skewed) distribution has a long tail to the left, with the mean pulled below the median. Age at retirement in stable industries can be left-skewed because most people retire around 65 while some retire much earlier due to disability or early retirement programs.
Kurtosis describes how heavy or light the tails are compared to a normal distribution. High kurtosis (leptokurtic) means more observations in the tails and a sharper peak, indicating that extreme values occur more often than a normal distribution would predict. Low kurtosis (platykurtic) means lighter tails and a flatter peak. Financial return data often exhibits high kurtosis, meaning market crashes and booms occur more frequently than normal distribution models would suggest.
Data Visualization
Visualizations transform abstract numbers into patterns that the human visual system processes quickly. Different plot types serve different purposes.
Histograms divide the range of data into equal-width bins and display the frequency (count) or density of observations in each bin as bars. The shape of the histogram immediately reveals whether data is symmetric, skewed, bimodal, or uniform. Choosing the right number of bins matters: too few bins hide important structure, while too many bins create noisy, uninterpretable displays. Rules of thumb like Sturges' rule (k = 1 + 3.322 log N) or the Freedman-Diaconis rule provide starting points.
Box plots (box-and-whisker plots) display five summary statistics: minimum, Q1, median, Q3, and maximum. The box spans from Q1 to Q3 (the IQR), with a line at the median. Whiskers extend to the most extreme values within 1.5 IQR of the box, and individual points beyond the whiskers are plotted as outliers. Box plots excel at comparing distributions across groups side by side, making differences in center, spread, and outlier patterns immediately visible.
Scatter plots display the relationship between two continuous variables by plotting each observation as a point in two-dimensional space. Patterns in the scatter reveal whether the relationship is linear or curved, positive or negative, strong or weak. Adding a trend line (regression line) quantifies the relationship. Scatter plots also expose outliers and clusters that might indicate subgroups within the data.
Practical Considerations
Descriptive statistics form the first step in any data analysis project. Before running complex models or significance tests, you should always examine your data descriptively. This serves multiple purposes: it reveals data entry errors (a human height recorded as 1800 cm rather than 180 cm), identifies unexpected patterns (bimodal distributions suggesting two distinct populations mixed together), and verifies that assumptions required by later analyses are at least approximately met.
When reporting descriptive statistics, include both a measure of center and a measure of spread. Reporting a mean without a standard deviation leaves the reader unable to judge whether individual observations are tightly clustered or wildly scattered. For skewed data, report the median and IQR rather than the mean and standard deviation, since the latter pair can be misleading when the distribution is asymmetric. Always complement numerical summaries with appropriate visualizations that allow readers to see the full distribution rather than relying solely on a few summary numbers.
Descriptive statistics compress raw data into interpretable summaries. Choose the mean and standard deviation for symmetric data, the median and IQR for skewed data, and always accompany numbers with visualizations that reveal the full shape of your distribution.