
Figure 1. Average statistical power from 44 reviews of papers published in journals in the social and behavioural sciences between 1960 and 2011. Data are power to detect small effect sizes (d=0.2), assuming a false-positive rate of α=0.05, and indicate both very low power (mean=0.24) but also no increase over time (R2=0.00097).

Figure 2. The relationship between power and false-positive rate, modified by effort, e. Runs analysed in this paper were initialized with e0=75 (shown in orange), such that α=0.05 when power is 0.8.

Figure 3. Power evolves. The evolution of mean power (W), false-positive rate (α) and false discovery rate (FDR).

Figure 4. Effort evolves. The evolution of low mean effort corresponds to evolution of high false-positive and false discovery rates.

Figure 5. The coevolution of effort and replication.

Figure 6. The evolution of effort when zero, 25% or 50% of all studies performed are replications.

Figure 7. Lab pay-offs from the non-evolutionary model. Each graph shows count distributions for high and low effort labs’ total pay-offs after 110 time steps, 100 of which included replication. (a–c) Total count for each pay-off is totalled from 50 runs for each condition. Panel (c) includes an inset that displays the same data as the larger graph, but for a narrower range of pay-offs. The punishment for having one’s novel result fail to replicate is orders of magnitude greater than the benefit of publishing, reflected in the discrete peaks in (b) and (c).