Irreproducibility is often attributed to misconduct or questionable research practices (QRPs), but theoretical work also suggests it may stem from intrinsic properties of studied phenomena and from methodological constraints. A pre-existing corpus collected by DF was screened for articles proposing non-QRP causes of irreproducibility. Additional references were identified through citation tracking, Web of Science, and a public collection of critical metascience articles. Systematic database searches were discontinued due to low yield; 516 articles were screened in total. For included studies, we recorded hypothesised determinants, brief causal summaries, methodology (commentary/review, simulation, analytical), and relevant research fields. Distinct factors were listed separately, while overlapping arguments were represented by the most detailed source.
Coming soon
Factor
Variation in observed effect sizes
Brief explanation of the argument
The “replication crisis” presupposes effect sizes that are fixed across time and contexts and can be divided between true and false. But when a given effect is measured in practice, “features unique to that context may mediate the average effect by adding additional mediator variance” […] “This can occur for numerous reasons, ranging from a poorly chosen statistical model to imperfect randomization, differing sample populations, environmental conditions, or flexibility in experimental design.”Their model shows that low replication rates may occur regardless of QRPs, unless the heterogeneity (between-study variance) is much smaller than the within-study variance of effect sizes, and sample size is sufficiently large.
Discipline
NA
Reference & doi
Bak-Coleman et al. (2022). Replication and reliability of science. SocArXiv. 10.31235/osf.io/rkyf7
Multiple trials
In order to increase power and detect small and medium effects, psychologists might run multiple trials or use multiple items and then aggregate data. But this is shown to inflate the estimated effect size.
Psychology
Brand et al. (2010). Exaggerated effect sizes from multiple trials. J. Gen. Psychol. 10.1080/00221309.2010. 520360
Unforeseen confounds
Replication studies may suffer from unforeseen confounding factors, which are unknown and therefore not documented in the original study, and in the replication study may be noticed but not corrected due to the reluctance of changing results post-hoc. The quality control that is often applied to the original study is not applied to the replication studies, leading to over-estimation of irreproducibility.
Psychology
Bressan, P. Confounds in “failed” replications. Front. Psychol., 2019, 10. 10.3389/fpsyg.2019.01884
Centralized scientific community
Evidence obtained comparing published drug-gene interaction claims with high-throughput experiments from the LINKCS L1000 program suggest that centralised scientific communities of authors using similar methods and contributing to many articles produce less replicable claims.
Toxicogenomics
Danchev et al. (2019). Centralized communities and replicability. eLife. 10.7554/elife.43094
Between-site variation
We should understand replications as instances of resampling (of population, effects etc.). Differences between sites/laboratories generate not small random effects that cancel each other out, but often important effects, non-randomly distributed. The result is an inflation of false positives, especially in between-species comparisons and small samples.
Animal behaviour
Farrar et al. (2021). Representativeness in animal cognition research. Anim. Behav. Cogn. 10.26451/abc.08.02.14. 2021
Small and non-representative samples
We should understand replications as instances of resampling (of population, effects etc.). Small and non-representative samples (of experimental units, settings, treatments, and measurements) will mean that sampling variation will make other laboratories fail to replicate.
Animal behaviour
Farrar et al. (2021). Representativeness in animal cognition research. Anim. Behav. Cogn. 10.26451/abc.08.02.14. 2021
Vaguely specified hypotheses
We should understand replications as instances of resampling (of population, effects etc.). Small and non-representative samples (of experimental units, settings, treatments, and measurements) will mean that sampling variation will make other laboratories fail to replicate.
Animal behaviour
Farrar et al. (2021). Representativeness in animal cognition research. Anim. Behav. Cogn. 10.26451/abc.08.02.14. 2021