I once spent three months collecting survey data from 2,000 respondents, only to have a reviewer point out that my convenience sampling strategy meant I couldn't make any statistical inferences beyond my sample. All my careful analysis was essentially storytelling with numbers. That experience taught me that sampling design is the foundation everything else builds on. Probability sampling gives every member of the population a known, non-zero chance of selection. Simple random sampling—where every possible sample of size n has an equal chance of being selected—provides the gold standard. Stratified sampling divides the population into subgroups and samples within each, ensuring representation across key dimensions. Cluster sampling samples groups rather than individuals, efficient for geographically dispersed populations. Each approach involves tradeoffs between cost, accuracy, and logistical feasibility. Non-probability sampling includes convenience samples (whoever is easiest to reach), purposive samples (selected for specific characteristics), quota samples (filling quotas that match population proportions), and volunteer panels (self-selected respondents). These approaches are often cheaper and faster but cannot support statistical inference. The size of a convenience sample doesn't overcome its fundamental limitation: respondents aren't representative of any larger population. The illusion of representativeness haunts non-probability samples. A volunteer panel might have 500,000 members,
Introduction
but those members are not like all internet users, who are not like the general population. Weighting—adjusting results to match known population proportions—can reduce bias but cannot eliminate it. Unknown biases remain unknown; you can only correct for biases you can measure. Sample size and representativeness are independent concerns. A large, non-representative sample does not become representative by getting larger. A small, well-designed probability sample outperforms a large convenience sample for making inferences about the target population. Size matters for precision; design matters for validity. Online panels have become ubiquitous in market research, but their representativeness varies enormously by panel construction and recruitment methods. Randomly recruited panels with low招募 rates might be less representative than opt-in panels despite their formal probability design. Transparency about recruitment and response rates is essential for evaluating sample quality.