Why Replication Matters in Science
What Is Scientific Replication?
Replication means conducting the same experiment again, either by the original researchers or by independent teams, to see if the results hold up. At its core, replication answers a simple but crucial question: "If we do this again, do we get the same answer?" If the answer is yes, confidence in the finding increases. If the answer is no, something about the original result may have been wrong, incomplete, or specific to conditions that were not recognized at the time.
There are different levels of replication. Direct replication (sometimes called exact replication) attempts to reproduce an experiment as closely as possible, using the same methods, materials, and procedures. Conceptual replication tests the same underlying hypothesis but uses different methods or approaches. If a finding holds up under both direct and conceptual replication, using different researchers, different equipment, and different populations, it is considered highly robust.
Replication is not simply about checking for fraud, though it does serve that function. More often, replication catches honest errors, statistical flukes, and unrecognized biases that can affect any research. Even the most careful and well-intentioned scientists can produce incorrect results. Replication is the safety net that catches these errors before they become established as accepted knowledge.
Why Single Experiments Are Not Enough
A single experiment is a snapshot of one particular set of conditions. Random variation in subjects, instruments, and environmental conditions means that even a perfectly designed experiment might produce results that are not representative of the true effect. Statistical significance at the conventional threshold of p less than 0.05 means there is still up to a 5% chance the result is a false positive, occurring by chance when no real effect exists.
This 5% false positive rate becomes a much larger problem when you consider how many experiments are conducted across all of science. If thousands of research groups are running experiments, hundreds of them will find "significant" results purely by chance. Without replication, these false positives enter the scientific literature and may be cited, built upon, and applied in ways that ultimately fail because the original finding was not real.
Publication bias amplifies this problem. Scientific journals are more likely to publish positive results (experiments that found a significant effect) than negative results (experiments that found no effect). This means that false positives are disproportionately published while failed replications or null results often go unreported. The published literature can therefore paint a misleadingly optimistic picture of how well-supported certain findings really are.
Small sample sizes compound the issue further. Experiments with few subjects have low statistical power, meaning they can easily miss real effects and also tend to produce effect size estimates that are inflated. A small study that happens to find a large effect gets published, but it may be an outlier that does not represent the true effect size. Replication with larger samples provides a more accurate estimate.
The Replication Crisis
Beginning around 2010, several large-scale efforts attempted to replicate published findings in psychology, cancer biology, economics, and other fields. The results were sobering. In one landmark project, researchers attempted to replicate 100 psychology studies published in top journals. Only about 36% of the replications produced statistically significant results in the same direction as the original studies. The average effect size in the replications was roughly half that reported in the originals.
Similar problems emerged in other fields. A large-scale effort to replicate preclinical cancer biology studies found that many key findings could not be reproduced. Economists attempting to replicate studies published in top economics journals found that about 60% held up, better than psychology but still concerning. These findings collectively became known as the "replication crisis" and prompted serious reflection across the scientific community.
The replication crisis has multiple causes. Pressure to publish novel, positive results incentivizes researchers to use questionable practices like p-hacking (analyzing data many ways until finding significance), HARKing (hypothesizing after results are known), and selective reporting. Small sample sizes and inadequate statistical power are widespread. Insufficient methodological detail in published papers makes exact replication difficult. These are systemic problems, not the fault of individual researchers.
How Science Is Responding
The replication crisis has catalyzed significant reforms across many scientific fields. Pre-registration, where researchers publicly record their hypotheses and analysis plans before collecting data, prevents after-the-fact changes that inflate false positive rates. Registered reports, a publishing format where journals commit to publishing results before they are known, eliminate publication bias against negative findings.
Open science practices are becoming more common. Sharing raw data, analysis code, and experimental materials allows other researchers to verify results and conduct replications more easily. Many journals now require or encourage data availability statements, and repositories for open data and pre-prints have grown rapidly.
Larger sample sizes and multi-site studies are increasingly valued. Collaborative projects where dozens of laboratories conduct the same experiment simultaneously provide powerful tests of replicability while also revealing whether findings generalize across different settings and populations. These "many labs" projects have produced some of the most informative replication data available.
Statistical practices are also evolving. Many researchers now supplement or replace traditional p-values with confidence intervals, effect sizes, and Bayesian statistics that provide more nuanced information about the strength of evidence. The goal is to move away from the binary "significant or not" framework that incentivizes questionable practices.
How to Replicate an Experiment
If you want to replicate a published study, start by reading the original paper carefully and noting every methodological detail. Contact the original authors for additional information about procedures, materials, or equipment that may not be fully described in the published paper. Many researchers are willing to share protocols and materials to support replication efforts.
Follow the original methods as closely as possible for a direct replication. Use the same instruments, the same procedures, the same statistical analyses, and similar sample sizes (ideally larger). Document any deviations from the original protocol, no matter how minor, because what seems like a trivial difference might matter for the results.
Consider statistical power when planning your replication. If the original study used 20 subjects and found a small effect, you may need 100 or more subjects to reliably detect that effect. An underpowered replication that fails to find a significant result does not necessarily mean the original finding was wrong, it may simply mean the replication was not large enough to detect the effect.
Replication in the Classroom
Replication is an excellent teaching tool. Students who replicate published experiments learn laboratory techniques, gain experience with data analysis, and develop a firsthand understanding of why replication matters. When student replications produce different results from published findings, the resulting discussion about possible reasons is one of the most valuable learning experiences in science education.
Student replication projects also contribute to the scientific record. Several initiatives invite students to participate in coordinated replication efforts, collecting data that is combined across classrooms to create sample sizes large enough for meaningful statistical analysis. This gives students the experience of contributing to real scientific knowledge while reinforcing the importance of replication in maintaining scientific quality.
The Future of Replication
The replication crisis, while revealing serious problems, has ultimately strengthened science by prompting reforms that improve the reliability of published research. Pre-registration, open data, larger sample sizes, and multi-site collaborations are becoming standard practice in many fields. These changes make it easier to replicate studies and harder for false positives to persist unchallenged in the literature.
Technology is also making replication more practical. Automated laboratory equipment can execute experimental protocols with greater consistency than human hands. Containerized computational environments ensure that data analyses produce identical results regardless of when or where they are run. Online platforms connect researchers who want to replicate findings with the resources and protocols they need. As these tools become more widespread, replication will become a routine part of the scientific workflow rather than an exceptional effort.
Replication is science's essential self-correcting mechanism. A single experiment provides preliminary evidence, but only when results are independently reproduced can the scientific community trust that a finding is real. The replication crisis has prompted important reforms that are making science more transparent, rigorous, and reliable.