How to Interpret Statistical Results: Reading Output Correctly
The following five-step framework provides a systematic approach to reading any statistical output, whether from a t-test, regression, ANOVA, or other analysis. Following these steps in order prevents the common mistake of jumping to the p-value while ignoring everything else.
Step 1: Check the Descriptive Statistics First
Before examining any inferential results, look at the basic summaries. What are the group means and standard deviations? Are the distributions approximately normal or heavily skewed? Are there obvious outliers? Do the numbers make sense given what you know about the subject matter? A mean reaction time of 3 seconds when you expected 300 milliseconds signals a data entry error. Descriptive statistics ground your interpretation in the actual data before statistical machinery takes over.
Check whether the sample sizes are adequate for the analysis being performed. Small samples (fewer than 20 per group) reduce power and make results sensitive to individual observations. Unequal group sizes in ANOVA or t-tests can inflate Type I error rates when combined with unequal variances. Missing data patterns (are values missing randomly or systematically?) affect the validity of conclusions.
Step 2: Read the Effect Size and Direction
The effect size tells you the magnitude and direction of your finding. A regression coefficient of 2.3 means a one-unit increase in the predictor is associated with a 2.3-unit increase in the outcome. A Cohen's d of 0.45 means the treatment group averaged about half a standard deviation higher than the control. Focus on what the effect means substantively: is this difference meaningful in the real-world context of your research? Would anyone change their behavior or decisions based on an effect of this size?
Direction matters as much as magnitude. A negative correlation between exercise and heart disease risk tells you that more exercise is associated with less risk. Misinterpreting the sign of a coefficient is a surprisingly common error, especially in regression models with multiple predictors where the direction of a coefficient can reverse after adjusting for other variables (Simpson's paradox).
Step 3: Examine the Confidence Interval
The confidence interval shows you the range of effect sizes that are consistent with your data. A 95% CI of [0.5, 8.2] for a mean difference tells you the effect could be as small as 0.5 or as large as 8.2. A narrow interval indicates precise estimation. A wide interval indicates substantial uncertainty. If the interval includes zero (for differences) or one (for ratios), the effect is not statistically significant at that confidence level.
Confidence intervals are more informative than p-values alone because they convey both significance and precision simultaneously. Two studies might both report p = 0.03, but one has a CI of [0.1, 15.0] (highly imprecise) while the other has a CI of [2.0, 4.0] (very precise). The second study provides much stronger evidence about the true effect size, even though both have the same p-value.
Step 4: Consider the P-Value in Context
The p-value indicates how incompatible the data is with the null hypothesis. A p-value of 0.001 suggests strong evidence against no effect. A p-value of 0.048 suggests borderline evidence. But always interpret p-values alongside effect sizes: a tiny p-value with a trivial effect size (common with large samples) means the effect is real but unimportant. A moderate p-value (e.g., 0.08) with a large effect size might indicate an important effect that the study was underpowered to confirm.
Never interpret a non-significant p-value as evidence that no effect exists. Absence of evidence is not evidence of absence. A p-value of 0.15 might reflect a true null effect, or it might reflect a real effect that the study had insufficient power to detect. Examining the confidence interval helps distinguish these scenarios: a CI of [-0.1, 0.2] centered near zero suggests no meaningful effect, while a CI of [-2.0, 12.0] suggests the study was simply too imprecise to draw conclusions.
Step 5: Assess Assumptions and Limitations
No statistical result is stronger than the assumptions underlying it. Were the assumptions of the test met (normality, equal variances, independence)? Could confounding variables explain the results? Is the sample representative of the population you want to generalize to? Could measurement error or bias affect the findings? Honest acknowledgment of limitations prevents overinterpretation and signals scientific integrity.
Common assumption violations include non-independence (students clustered in classrooms, patients clustered in hospitals), non-normality (skewed distributions in small samples), heteroscedasticity (unequal variances across groups), and multicollinearity (highly correlated predictors in regression). Each violation has specific consequences for the validity of different tests and specific remedies (robust standard errors, nonparametric alternatives, data transformations, or alternative modeling approaches).
Reading Regression Output
Regression tables report coefficients, standard errors, t-values, and p-values for each predictor. The coefficient is the effect size (expected change in Y per unit change in X, holding other predictors constant). The standard error reflects precision. The t-value is the coefficient divided by its standard error (a signal-to-noise ratio). R-squared shows overall model fit. Focus first on whether coefficients are in the expected direction and of meaningful magnitude, then on whether they reach significance.
In multiple regression, each coefficient represents the effect of that predictor after controlling for all other predictors in the model. This adjusted interpretation differs from simple correlations. A variable that is highly correlated with the outcome in a bivariate analysis may have a small, non-significant coefficient in multiple regression because other predictors account for the shared variance. Conversely, a suppressor variable can have a larger coefficient in the multiple regression than its simple correlation would suggest.
Reading ANOVA Tables
ANOVA tables partition variance into sources: between-groups (due to the factor) and within-groups (residual error). The F-ratio divides between-group variance by within-group variance. A large F with small p indicates at least one group differs from the others. Report eta-squared or partial eta-squared for effect size. Remember that a significant omnibus F requires post-hoc tests (Tukey HSD, Bonferroni, or Games-Howell for unequal variances) to identify which specific groups differ from which.
Factorial ANOVA tables include rows for main effects and interactions. Main effects tell you whether each factor influences the outcome when averaged across levels of the other factor. Interactions tell you whether the effect of one factor depends on the level of the other. When a significant interaction is present, interpret main effects cautiously because the interaction means that the effect of each factor is not constant across conditions.
Avoiding Common Interpretation Mistakes
Several interpretation errors recur across disciplines. Confusing statistical significance with practical importance is the most common: a statistically significant result may be too small to matter, while a non-significant result may reflect inadequate sample size rather than a true null effect. Interpreting correlation as causation, ignoring the multiple testing problem, and cherry-picking favorable results while ignoring unfavorable ones are additional pitfalls that careful interpretation avoids.
Interpret statistical results systematically: descriptives first, then effect size and direction, then confidence interval, then p-value, then assumptions and limitations. This sequence prevents the common mistake of jumping to significance while ignoring everything else that determines whether a finding is meaningful and reliable.