Ice cream sales and drowning deaths both increase in summer. No amount of ice cream causes drowning, and no amount of drowning increases ice cream appetite. Both are caused by a third factor—summer weather—that makes both more likely. This classic example illustrates why confusing correlation with causation produces absurd conclusions. The human brain is wired to find patterns, and when two things happen together, we instinctively assume one caused the other. This pattern-seeking serves us well in many contexts, but in data analysis, it leads us astray. Establishing causation requires more than observing co-occurrence; it requires understanding the mechanism or conducting controlled experiments. Confounding variables create spurious correlations. A variable that causes both X and Y makes X and Y appear related even when there's no direct connection. Education and income are correlated, but much of this correlation reflects underlying cognitive ability and social networks that influence both. Disentangling these effects requires statistical controls or experimental designs that hold confounders constant. Directionality is a separate problem. X and Y might be correlated, but does X cause Y, does Y cause X, or does a third factor cause both? Higher education correlates with higher income, but does education cause higher income,

Introduction

or do higher-income families provide better educational opportunities? Both probably contribute, but cross-sectional survey data can't disentangle these causal directions. Time precedence helps establish causality. If X consistently precedes Y, causation becomes more plausible, but still doesn't prove it—the precursor might be a trigger rather than a cause, or some third factor might cause both with X preceding Y by coincidence. Longitudinal designs that track changes over time provide stronger evidence for causal claims. Experimental designs—randomly assigning participants to conditions—provide the strongest evidence for causation. By controlling who receives an intervention, experiments eliminate confounding and establish time precedence. When experiments aren't feasible—studying income effects on health, for instance—quasi-experimental designs, natural experiments, and instrumental variable approaches provide weaker but still useful evidence.

Key Concepts

Practical Application

Common Mistakes

Advanced Topics