What Is Effect Size: Measuring the Magnitude of Statistical Results

Updated June 2026
Effect size is a quantitative measure of the magnitude of a phenomenon, expressing how large a difference, relationship, or effect actually is, independent of sample size. While p-values tell you whether an effect is statistically distinguishable from zero, effect sizes tell you whether it matters in practice. A statistically significant result with a tiny effect size may be trivial, while a large effect that narrowly misses significance may represent an important finding that the study was simply underpowered to confirm.

Why Effect Size Matters

Statistical significance depends on both the true effect and the sample size. With a large enough sample, even trivially small effects become statistically significant. A study of 1 million people might find that drinking green tea is associated with a 0.1% reduction in headache frequency (p < 0.001), a result that, while statistically significant, has no practical value for individual health decisions. Conversely, a study with 20 participants might miss a genuinely important treatment effect simply because sampling variability overwhelms the signal. Effect sizes separate the question of existence (is there any effect at all?) from the question of importance (is the effect large enough to care about?).

Effect sizes also enable comparison across studies that use different scales, different sample sizes, and different statistical tests. A psychology study reporting a t-test result cannot be directly compared to another study reporting an F-test, but both can be converted to a common effect size metric. This standardization makes meta-analysis possible, allowing researchers to pool findings across dozens of studies to estimate the overall magnitude of an effect with precision that no single study could achieve.

Power analysis, which determines how many participants a study needs, requires an expected effect size as input. Without knowing the likely magnitude of the effect you are studying, you cannot calculate whether your study has adequate statistical power to detect it. Researchers who skip effect size estimation often end up with underpowered studies that waste resources by producing inconclusive results.

Cohen's d: Standardized Mean Difference

Cohen's d is the most widely used effect size for comparing two group means. It expresses the difference between two group means in standard deviation units: d = (mean1 - mean2) / pooled SD. It answers the question: how far apart are the group averages, measured in terms of the typical variation within groups? A Cohen's d of 1.0 means the two groups differ by one full standard deviation, roughly equivalent to saying the average person in the higher group scores better than 84% of the lower group.

Conventional benchmarks from Jacob Cohen's 1988 guidelines provide rough interpretation: d = 0.2 is small (the difference is noticeable only with careful measurement), d = 0.5 is medium (the difference is visible to attentive observation), and d = 0.8 is large (a substantial, obvious difference). These benchmarks are rough guides, not rigid thresholds. Context matters enormously. In education, a d = 0.3 improvement in reading scores might represent months of additional learning gains and justify a nationwide policy change costing millions. In pharmacology, a d = 0.8 might still be insufficient if the treatment causes serious side effects that offset its benefits.

Variants of Cohen's d include Hedges' g, which applies a correction for small sample sizes (Cohen's d slightly overestimates the population effect in small samples), and Glass's delta, which uses only the control group's standard deviation in the denominator (useful when the treatment is expected to change variability as well as the mean). For paired or within-subjects designs, Cohen's dz uses the standard deviation of the difference scores rather than the pooled standard deviation of the two conditions.

Correlation Coefficients as Effect Sizes

Pearson's r directly serves as an effect size measure for the strength of a linear relationship between two continuous variables. Unlike Cohen's d, which measures group differences, r captures the degree to which two variables move together. The sign indicates direction (positive means both increase together, negative means one increases as the other decreases), and the magnitude indicates strength.

The square of the correlation, r-squared, gives the proportion of variance in one variable explained by the other, providing an intuitive interpretation of practical importance. An r of 0.30 means the two variables share 9% of their variance, leaving 91% unexplained. Benchmarks: r = 0.10 (small, explains 1% of variance), r = 0.30 (medium, explains 9%), r = 0.50 (large, explains 25%). Spearman's rho provides a rank-based alternative when the relationship is monotonic but not necessarily linear, or when data contain outliers that distort Pearson's r.

It is worth noting that even "small" correlations can have important practical implications in certain contexts. A correlation of 0.10 between a hiring test and job performance might seem trivial, but when applied across thousands of hiring decisions per year, it can produce substantial cumulative benefits in workforce quality. Context determines whether an effect size is meaningful, not arbitrary benchmarks.

Odds Ratios and Risk Ratios

For binary outcomes (yes/no, success/failure, disease/healthy), odds ratios and risk ratios quantify the strength of association between a factor and an outcome. The odds ratio compares the odds of an event in one group to the odds in another. An odds ratio of 2.0 means the event has twice the odds in the exposed group compared to the unexposed group. An odds ratio of 1.0 indicates no association. Values below 1.0 indicate reduced odds in the exposed group.

The risk ratio (relative risk) compares probabilities directly rather than odds. An intervention that reduces disease risk from 10% to 5% has a risk ratio of 0.5, meaning the intervention cuts the risk in half. Risk ratios are more intuitive than odds ratios but can only be calculated from prospective studies (cohort studies and randomized trials) where absolute probabilities are known. Case-control studies can only estimate odds ratios because the disease prevalence in the study is artificially determined by the sampling design.

The absolute risk reduction complements relative measures by expressing the actual difference in probabilities. A treatment that reduces mortality from 2% to 1% has a relative risk reduction of 50% (impressive sounding) but an absolute risk reduction of only 1 percentage point. The number needed to treat (NNT), which is 1 divided by the absolute risk reduction, indicates that 100 people must be treated for one additional person to benefit. Relative and absolute measures together provide the full picture of clinical importance.

Eta-Squared and Partial Eta-Squared

For ANOVA designs, eta-squared represents the proportion of total variance in the dependent variable explained by the grouping factor. If eta-squared = 0.10, the factor accounts for 10% of individual differences in the outcome. Benchmarks following Cohen's guidelines: 0.01 (small), 0.06 (medium), 0.14 (large). These correspond roughly to group differences of d = 0.2, 0.5, and 0.8 in the two-group case.

Partial eta-squared is used in factorial designs with multiple factors. It isolates the variance explained by one factor after removing variance explained by other factors and their interactions. Partial eta-squared is always larger than eta-squared for the same factor because the denominator excludes variance explained by other factors. When reporting ANOVA results, specify which measure you are using, because the two are not interchangeable and can give very different impressions of effect magnitude.

Omega-squared provides a less biased estimate of population effect size than eta-squared, which tends to overestimate the true effect, particularly in small samples. The bias occurs because eta-squared is calculated from sample sums of squares that include sampling error in both the between-group and total variance components. Omega-squared corrects for this by adjusting the formula to produce an unbiased estimate.

Converting Between Effect Size Measures

Different effect size measures can be converted into each other using established formulas. Cohen's d can be converted to r using the formula r = d / sqrt(d-squared + 4), and r can be converted back to d using d = 2r / sqrt(1 - r-squared). These conversions allow meta-analysts to combine studies that report different statistics into a common metric. Odds ratios can be converted to d using d = ln(OR) * sqrt(3) / pi, where ln is the natural logarithm.

Converting between metrics is useful but carries assumptions. The d-to-r conversion assumes equal group sizes. The odds ratio conversion assumes logistic distributions. When these assumptions are violated, the conversions are approximate rather than exact. Despite these limitations, approximate conversions are generally preferable to excluding studies from meta-analyses because they report effects in a different metric.

Reporting Effect Sizes

Every statistical result should include an appropriate effect size alongside the test statistic and p-value. The APA Publication Manual and most major journals require effect size reporting. Choose the measure that matches your analysis: Cohen's d for t-tests comparing two groups, eta-squared or omega-squared for ANOVA, r for correlations, odds ratios for logistic regression, and R-squared for multiple regression.

Include a confidence interval around the effect size to communicate its precision. A Cohen's d of 0.50 with a 95% CI of [0.10, 0.90] tells a very different story than d = 0.50 with CI [0.45, 0.55]. The first indicates substantial uncertainty about the true effect magnitude, while the second indicates precise estimation. This combination of point estimate and interval provides the most complete picture of what the data shows, superior to reporting either the p-value or the effect size alone.

Key Takeaway

Effect sizes measure how large an effect is, independent of sample size. They answer whether a result matters in practice, not just whether it exists statistically. Always report them alongside p-values, using Cohen's d for group comparisons, r for relationships, odds ratios for binary outcomes, and eta-squared for ANOVA. Include confidence intervals around effect sizes to communicate precision, and interpret magnitudes within the specific context of your research rather than relying solely on generic benchmarks.