When a pharmaceutical company announces that their drug "significantly" reduced symptoms, they're making a statistical claim—not a guarantee. Understanding what significance actually means prevents both over-interpretation of noisy data and under-interpretation of real effects. The concept is counterintuitive enough that most people, including many researchers, get it wrong. Statistical significance addresses a specific question: if there were no real effect, how likely would we see data this extreme? A p-value of 0.03 means that if the null hypothesis were true, we'd see data this extreme about 3% of the time. That's all. It doesn't mean there's a 97% chance the effect is real, nor does it mean the effect is large or practically important. The 0.05 threshold is arbitrary. Ronald Fisher suggested it as a convenient benchmark, not a sacred boundary. A p-value of 0.049 and a p-value of 0.051 are essentially identical, yet one "passes" and one "fails." This binary thinking has damaged scientific inference by encouraging researchers to manipulate data or analyses until they cross an arbitrary line. Statistical significance says nothing about effect size. A drug could reduce headaches by 30 seconds on average, a statistically significant effect so small as to be clinically meaningless. Or a

Introduction

large, important effect could fail to reach statistical significance in a small study. Significance and importance are different questions requiring different analyses. Multiple testing is the graveyard of statistical significance. Test twenty different hypotheses at p<0.05, and one will likely appear significant even if nothing is happening. Corrections for multiple comparisons—Bonferroni, Tukey, false discovery rate—adjust thresholds to account for the number of tests. Without these corrections, your "significant" findings might be noise. Replication is the only real validation. A genuinely real effect replicates; a statistical artifact doesn't. The replication crisis in psychology and medicine exposed how many "significant" findings were noise. Pre-registration—specifying hypotheses and analyses before data collection—reduces flexibility that allows post-hoc manipulation. Registered reports, where journals commit to publication before seeing results, further reduce bias. Practical significance requires domain knowledge to evaluate. Statistical analysis can tell you whether an effect exists and estimate its magnitude; whether that magnitude matters in the real world requires understanding context. A 1-point increase on a 100-point satisfaction scale might be statistically significant but operationally trivial—or vice versa.

Key Concepts

Practical Application

Common Mistakes

Advanced Topics