The assessment of intrinsic credibility and a new argument for p < 0.005

The concept of intrinsic credibility has been recently introduced to check the credibility of ‘out of the blue’ findings without any prior support. A significant result is deemed intrinsically credible if it is in conflict with a sceptical prior derived from the very same data that would make the effect just non-significant. In this paper, I propose to use Bayesian prior-predictive tail probabilities to assess intrinsic credibility. For the standard 5% significance level, this leads to a new p-value threshold that is remarkably close to the recently proposed p < 0.005 standard. I also introduce the credibility ratio, the ratio of the upper to the lower limit (or vice versa) of a confidence interval for a significant effect size. I show that the credibility ratio has to be smaller than 5.8 such that a significant finding is also intrinsically credible. Finally, a p-value for intrinsic credibility is proposed that is a simple function of the ordinary p-value and has a direct frequentist interpretation in terms of the probability of replicating an effect. An application to data from the Open Science Collaboration study on the reproducibility of psychological science suggests that intrinsic credibility of the original experiment is better suited to predict the success of a replication experiment than standard significance.

interpretation that Student gave in his paper of 1908 following on from Laplace etc. The term was not in use in 1908 but Student was what would now be described as a Bayesian. An enormous amount of modern applied Bayesian work (rightly or wrongly) uses this sort of analysis. <br>b) Fisher (although he was not the first to do so) proposed a direct probability interpretation as the probability of a result as extreme or more extreme and also proposed doubling it.<br> Most of the critics of P-values sign up to neither of these interpretations but instead start from the position that what a P-value <I>ought</I> to be is something more along the lines of a probability associated with the Jeffreys hypothesis test. What you propose does not go as far as that. Nevertheless, you are suggesting something more conservative than the conventional Pvalue. That's fine. However, in my opinion, when you implicitly adopt a classical frequentist calibration without discussion, you go too far. You adopt a classification for the original P-values proposed by Bland (2015) as being appropriate for the modified one. (In fact you somewhat misrepresent Bland who uses 'Evidence' and not 'Moderate evidence' for the range 0.01 to 0.05). However, either Bland's standard is appropriate for P-values, in which case the correct classifications for P=0.0011 is 'strong evidence' and the fact that PIC = 0.021 does not justify labelling this as merely 'evidence', or Bland's standard is inappropriate in the first place, in which case, why cite it? Is one meant to think 'if only Bland understood the evidential value of P-values he would reclassify his scheme'? However, Bland is a very experienced statistician and his calibration no doubt reflects that experience. In short, in my view, you have committed the mistake of assuming that when a measuring instrument is changed the appropriate numerical thresholds do not (2) Decision letter (RSOS-181534.R0)

25-Jan-2019
Dear Dr Held On behalf of the Editors, I am pleased to inform you that your Manuscript RSOS-181534 entitled "The Assessment of Intrinsic Credibility and a New Argument for p<0.005" has been accepted for publication in Royal Society Open Science subject to minor revision in accordance with the referee suggestions. Please find the referees' comments at the end of this email.
The reviewers and handling editors have recommended publication, but also suggest some minor revisions to your manuscript. Therefore, I invite you to respond to the comments and revise your manuscript.
• Ethics statement If your study uses humans or animals please include details of the ethical approval received, including the name of the committee that granted approval. For human studies please also detail whether informed consent was obtained. For field studies on animals please include details of all permissions, licences and/or approvals granted to carry out the fieldwork.
• Data accessibility It is a condition of publication that all supporting data are made available either as supplementary information or preferably in a suitable permanent repository. The data accessibility section should state where the article's supporting data can be accessed. This section should also include details, where possible of where to access other relevant research materials such as statistical tools, protocols, software etc can be accessed. If the data has been deposited in an external repository this section should list the database, accession number and link to the DOI for all data from the article that has been made publicly available. Data sets that have been deposited in an external repository and have a DOI should also be appropriately cited in the manuscript and included in the reference list.
If you wish to submit your supporting data or code to Dryad (http://datadryad.org/), or modify your current submission to dryad, please use the following link: http://datadryad.org/submit?journalID=RSOS&manu=RSOS-181534 • Competing interests Please declare any financial or non-financial competing interests, or state that you have no competing interests.
• Authors' contributions All submissions, other than those with a single author, must include an Authors' Contributions section which individually lists the specific contribution of each author. The list of Authors should meet all of the following criteria; 1) substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data; 2) drafting the article or revising it critically for important intellectual content; and 3) final approval of the version to be published.
All contributors who do not meet all of these criteria should be included in the acknowledgements.
We suggest the following format: AB carried out the molecular lab work, participated in data analysis, carried out sequence alignments, participated in the design of the study and drafted the manuscript; CD carried out the statistical analyses; EF collected field data; GH conceived of the study, designed the study, coordinated the study and helped draft the manuscript. All authors gave final approval for publication.
• Acknowledgements Please acknowledge anyone who contributed to the study but did not meet the authorship criteria.
• Funding statement Please list the source of funding for each author.
Please ensure you have prepared your revision in accordance with the guidance at https://royalsociety.org/journals/authors/author-guidelines/ --please note that we cannot publish your manuscript without the end statements. We have included a screenshot example of the end statements for reference. If you feel that a given heading is not relevant to your paper, please nevertheless include the heading and explicitly state that it is not relevant to your work.
Because the schedule for publication is very tight, it is a condition of publication that you submit the revised version of your manuscript before 03-Feb-2019. Please note that the revision deadline will expire at 00.00am on this date. If you do not think you will be able to meet this date please let me know immediately.
To revise your manuscript, log into https://mc.manuscriptcentral.com/rsos and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions". Under "Actions," click on "Create a Revision." You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript and upload a new version through your Author Centre.
When submitting your revised manuscript, you will be able to respond to the comments made by the referees and upload a file "Response to Referees" in "Section 6 -File Upload". You can use this to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the referees. We strongly recommend uploading two versions of your revised manuscript: 1) Identifying all the changes that have been made (for instance, in coloured highlight, in bold text, or tracked changes); 2) A 'clean' version of the new manuscript that incorporates the changes made, but does not highlight them.
When uploading your revised files please make sure that you have: 1) A text file of the manuscript (tex, txt, rtf, docx or doc), references, tables (including captions) and figure captions. Do not upload a PDF as your "Main Document"; 2) A separate electronic file of each figure (EPS or print-quality PDF preferred (either format should be produced directly from original creation package), or original software format); 3) Included a 100 word media summary of your paper when requested at submission. Please ensure you have entered correct contact details (email, institution and telephone) in your user account; 4) Included the raw data to support the claims made in your paper. You can either include your data as electronic supplementary material or upload to a repository and include the relevant doi within your manuscript. Make sure it is clear in your data accessibility statement how the data can be accessed; 5) All supplementary materials accompanying an accepted article will be treated as in their final form. Note that the Royal Society will neither edit nor typeset supplementary material and it will be hosted as provided. Please ensure that the supplementary material includes the paper details where possible (authors, article title, journal name).
Supplementary files will be published alongside the paper on the journal website and posted on the online figshare repository (https://rs.figshare.com/). The heading and legend provided for each supplementary file during the submission process will be used to create the figshare page, so please ensure these are accurate and informative so that your files can be found in searches. Files on figshare will be made available approximately one week before the accompanying article so that the supplementary material can be attributed a unique DOI.
Please note that Royal Society Open Science charge article processing charges for all new submissions that are accepted for publication. Charges will also apply to papers transferred to Royal Society Open Science from other Royal Society Publishing journals, as well as papers submitted as part of our collaboration with the Royal Society of Chemistry (http://rsos.royalsocietypublishing.org/chemistry).
If your manuscript is newly submitted and subsequently accepted for publication, you will be asked to pay the article processing charge, unless you request a waiver and this is approved by Royal Society Publishing. You can find out more about the charges at http://rsos.royalsocietypublishing.org/page/charges. Should you have any queries, please contact openscience@royalsociety.org.
Once again, thank you for submitting your manuscript to Royal Society Open Science and I look forward to receiving your revision. If you have any questions at all, please do not hesitate to get in touch. Comments to the Author(s) I have only minor comments but raise a couple of matters you may wish to take into account.<P> 1) It is inherent to a pure Bayesian approach that data and prior distribution are exchangeable to the degree defined by the model (1). Now, hardly anybody checks that the second half of a data set is compatible with the first. This thus raises the issue as to why one would check the compatibility of the data and some prior. A pure Bayesian approach would be simply to update one's beliefs and it is not immediately clear to me how you expect the statistician, whether frequentist or Bayesian to use the concept. Are data to be rejected? Are possible prior distributions to be called into account? What philosophy of statistics do either of these correspond to? Of course to raise these comments is perhaps to be a bit purist and practical data analysis is often a complex and "dirty" business. Nevertheless a very brief discussion might be welcome.<P> 2) In my opinion you do not entirely escape a common confusion unnecessarily introduced by Bayesians interpreting P-values in ways they are not meant to be interpreted. In my view the paper by Benjamin et al (2018) is an example of this. P-values have a long history in which they can be reasonably interpreted in one of two ways a) A one-sided P-value can be interpreted as the Bayesian probability that the true treatment effect is after all negative rather than (say) positive if an uninformative prior distribution is taken to apply. This is the inverse probability interpretation that Student gave in his paper of 1908 following on from Laplace etc. The term was not in use in 1908 but Student was what would now be described as a Bayesian. An enormous amount of modern applied Bayesian work (rightly or wrongly) uses this sort of analysis. b) Fisher (although he was not the first to do so) proposed a direct probability interpretation as the probability of a result as extreme or more extreme and also proposed doubling it. Most of the critics of P-values sign up to neither of these interpretations but instead start from the position that what a P-value <I>ought</I> to be is something more along the lines of a probability associated with the Jeffreys hypothesis test. What you propose does not go as far as that. Nevertheless, you are suggesting something more conservative than the conventional Pvalue. That's fine. However, in my opinion, when you implicitly adopt a classical frequentist calibration without discussion, you go too far. You adopt a classification for the original P-values proposed by Bland (2015) as being appropriate for the modified one. (In fact you somewhat misrepresent Bland who uses 'Evidence' and not 'Moderate evidence' for the range 0.01 to 0.05). However, either Bland's standard is appropriate for P-values, in which case the correct classifications for P=0.0011 is 'strong evidence' and the fact that PIC = 0.021 does not justify labelling this as merely 'evidence', or Bland's standard is inappropriate in the first place, in which case, why cite it? Is one meant to think 'if only Bland understood the evidential value of P-values he would reclassify his scheme'? However, Bland is a very experienced statistician and his calibration no doubt reflects that experience. In short, in my view, you have committed the mistake of assuming that when a measuring instrument is changed the appropriate numerical thresholds do not (2) Decision letter (RSOS-181534.R1)

13-Feb-2019
Dear Dr Held, I am pleased to inform you that your manuscript entitled "The Assessment of Intrinsic Credibility and a New Argument for p<0.005" is now accepted for publication in Royal Society Open Science.
You can expect to receive a proof of your article in the near future. Please contact the editorial office (openscience_proofs@royalsociety.org and openscience@royalsociety.org) to let us know if you are likely to be away from e-mail contact. Due to rapid publication and an extremely tight schedule, if comments are not received, your paper may experience a delay in publication.
Royal Society Open Science operates under a continuous publication model (http://bit.ly/cpFAQ). Your article will be published straight into the next open issue and this will be the final version of the paper. As such, it can be cited immediately by other researchers. As the issue version of your paper will be the only version to be published I would advise you to check your proofs thoroughly as changes cannot be made once the paper is published.
On behalf of the Editors of Royal Society Open Science, we look forward to your continued contributions to the Journal.
Kind regards, Andrew Dunn Royal Society Open Science Editorial Office Royal Society Open Science openscience@royalsociety.org on behalf of Prof Mark Chaplain (Subject Editor) openscience@royalsociety.org Follow Royal Society Publishing on Twitter: @RSocPublishing Follow Royal Society Publishing on Facebook: https://www.facebook.com/RoyalSocietyPublishing.FanPage/ Read Royal Society Publishing's blog: https://blogs.royalsociety.org/publishing/ The Assessment of Intrinsic Credibility and a New Argument for p < 0.05.
The paper presents a new threshold for intrinsic credibility, along with a corresponding p-value, and offers a new case in favor of the recently advocated rule p < 0.05. The bulk of the main contribution is documented in § §3-4, and an application is offered on §5. There is a lot to like about this manuscript, it is interesting, it goes straight to the point, and it fairly acknowledges shortcomings with the analysis. I have some recommendations on some points that could streamline and enrich the document.
• Background and Goals: I found §2 to be clear on describing background, and § §3-4 to be also relatively clear at presenting the main tools; and yet the abstract and Introduction are much more convolved, and are not sufficiently clear about what one will find in the manuscript (nor on the main contributions). I enjoy the motivation in the Introduction-in terms of interest on the problem-but think more concrete details could be anticipated on what are the main contributions on the manuscript, as well as on why these contributions are important.
• On Geometrical Principles of Compatibility: The marginal likelihood resulting from a prior-predictive sceptical-prior (π S ) can be regarded as an inner product between π S and the likelihood ( ), and thus it can be interpreted as an angle on the Hilbert space (L 2 (Θ), ·, · ) (de Carvalho et al., 2018, in press). I suggest bringing on board some intuition and brief remarks on the geometrical principles of compatibility-understood here as formally defined in Definition 2 and its variants on §3 on the latter paper-so to enrich the discussion on §3. In addition, I wonder about: • Compatibility Between Sceptical Prior and the Distribution of θ: It seems natural asking whether compatibility between the posterior (p) and the distribution of θ is of any relevance, or whether one is only concerned with compatibility between π S and data?
• Robustness of Claims to Misspecification: Keeping in mind that the paper makes the bold statement on a new argument for p < 0.05, one wonders: "How robust is Fig. 3 to model misspecification? " Indeed, how robust are the claims on the manuscript to model model misspecification? In addition: • Transformations / Reparametrizations: On p. 17 a comment is made on the possible need for working with a transformation g(θ). Some further remarks would be welcome on the consequences that these transformations could have, as well as on the possibility of having to work with reparametrizations of a model.

Minor comments:
• p. 5: It should be made clear from the onset that it is being assumed that θ ∼ N (θ, σ 2 ) and θ ∼ N (0, τ 2 ), instead of one having to collect these separate pieces of information throughout §2.
• p. 6: It would be worth connecting the text and Fig. 1 in terms of conclusions on intrinsic credibility (or lack of it); currently, after the notion of intrinsic credibility is defined on p. 6 no reference is made to Fig. 1 (certainly the relevant information appears on the titles of Fig. 1; still I think it would be worth briefly mentioning this in the text after introducing the notion).
• p. 8: I agree with comment on coherency, and I remark that Hartigan's maximum likelihood prior  is another example of a prior that can be regarded as a data-based prior in a compatibility-based context, and so is the max-compatible prior (de Carvalho et al., 2018, in press, §2.4).
• p. 9: On the definition of credibility ratio, perhaps a remark should be added reminding L will not be zero.
• p. 17: The comment acknowledging the simple mathematical framework is appropriate. Yet I suspect more could be said on the setting where the likelihood is on the exponential family; certainly not part of this paper but perhaps worth remarking.
The Assessment of Intrinsic Credibility and a New Argument for p < 0.005 I am very grateful for the constructive comments made by the referees. I have tried to integrate all suggested changes and have also made some minor additional changes and additions to the manuscript in order to improve clarity. All changes are marked in red in the submitted document.
Response to referee 1 Minor comments: 1. Background and Goals: I found $2 to be clear on describing background, and $$3-4 to be also relatively clear at presenting the main tools; and yet the abstract and Introduction are much more convolved, and are not sufficiently clear about what one will find in the manuscript (nor on the main contributions). I enjoy the motivation in the Introduction-in terms of interest on the problem-but think more concrete details could be anticipated on what are the main contributions on the manuscript, as well as on why these contributions are important.
I have added a few sentences on the main contributions of the manuscript and their relevance in Section 1.

On Geometrical Principles of Compatibility:
The marginal likelihood resulting from a prior-predictive sceptical-prior (π S ) can be regarded as an inner product between π S and the likelihood (l), and thus it can be interpreted as an angle on the Hilbert space (de Carvalho et al., 2018, in press). I suggest bringing on board some intuition and brief remarks on the geometrical principles of compatibility-understood here as formally defined in Definition 2 and its variants on $3 on the latter paper-so to enrich the discussion on $3.
Thank you for this comment. I have added a reference to the work by de Carvalho et al (2018) as an alternative approach to investigate the compatibility of prior and data at the beginning of Section 3. I found the paper by de Carvalho et al (2018) very interesting but felt that a deeper discussion of it would perhaps distract the reader from the main contribution of my manuscript.
In addition, I wonder about: 3. Compatibility Between Sceptical Prior and the Distribution ofθ: It seems natural asking whether compatibility between the posterior (p) and the distribution ofθ is of any relevance, or whether one is only concerned with compatibility between π S and data?
This is an interesting suggestion, but I am not convinced that this is of Thanks, added.
5. p. 17: The comment acknowledging the simple mathematical framework is appropriate. Yet I suspect more could be said on the setting where the likelihood is on the exponential family; certainly not part of this paper but perhaps worth remarking.
Thank you, a corresponding comment has been added at the end.