Computing the exact distributions of some functions of the ordered multinomial counts: maximum, minimum, range and sums of order statistics

Starting from seminal neglected work by Rappeport (Rappeport 1968 Algorithms and computational procedures for the application of order statistics to queuing problems. PhD thesis, New York University), we revisit and expand on the exact algorithms to compute the distribution of the maximum, the minimum, the range and the sum of the J largest order statistics of a multinomial random vector under the hypothesis of equiprobability. Our exact results can be useful in all those situations in which the multinomial distribution plays an important role, from goodness-of-fit tests to the study of Poisson processes, with applications spanning from biostatistics to finance. We describe the algorithms, motivate their use in statistical testing and illustrate two applications. We also provide the codes and ready-to-use tables of critical values.

Maximum, Minimum, Range and Sums of Order Statistics" has been accepted for publication in Royal Society Open Science subject to minor revision in accordance with the referee suggestions. Please find the referees' comments at the end of this email.
The reviewers and handling editors have recommended publication, but also suggest some minor revisions to your manuscript. Therefore, I invite you to respond to the comments and revise your manuscript.
• Ethics statement If your study uses humans or animals please include details of the ethical approval received, including the name of the committee that granted approval. For human studies please also detail whether informed consent was obtained. For field studies on animals please include details of all permissions, licences and/or approvals granted to carry out the fieldwork.
• Data accessibility It is a condition of publication that all supporting data are made available either as supplementary information or preferably in a suitable permanent repository. The data accessibility section should state where the article's supporting data can be accessed. This section should also include details, where possible of where to access other relevant research materials such as statistical tools, protocols, software etc can be accessed. If the data has been deposited in an external repository this section should list the database, accession number and link to the DOI for all data from the article that has been made publicly available. Data sets that have been deposited in an external repository and have a DOI should also be appropriately cited in the manuscript and included in the reference list.
If you wish to submit your supporting data or code to Dryad (http://datadryad.org/), or modify your current submission to dryad, please use the following link: http://datadryad.org/submit?journalID=RSOS&manu=RSOS-190198 • Competing interests Please declare any financial or non-financial competing interests, or state that you have no competing interests.
• Authors' contributions All submissions, other than those with a single author, must include an Authors' Contributions section which individually lists the specific contribution of each author. The list of Authors should meet all of the following criteria; 1) substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data; 2) drafting the article or revising it critically for important intellectual content; and 3) final approval of the version to be published.
All contributors who do not meet all of these criteria should be included in the acknowledgements.
We suggest the following format: AB carried out the molecular lab work, participated in data analysis, carried out sequence alignments, participated in the design of the study and drafted the manuscript; CD carried out the statistical analyses; EF collected field data; GH conceived of the study, designed the study, coordinated the study and helped draft the manuscript. All authors gave final approval for publication.
• Acknowledgements Please acknowledge anyone who contributed to the study but did not meet the authorship criteria.
• Funding statement Please list the source of funding for each author.
Please ensure you have prepared your revision in accordance with the guidance at https://royalsociety.org/journals/authors/author-guidelines/ --please note that we cannot publish your manuscript without the end statements. We have included a screenshot example of the end statements for reference. If you feel that a given heading is not relevant to your paper, please nevertheless include the heading and explicitly state that it is not relevant to your work.
Because the schedule for publication is very tight, it is a condition of publication that you submit the revised version of your manuscript before 19-Jun-2019. Please note that the revision deadline will expire at 00.00am on this date. If you do not think you will be able to meet this date please let me know immediately.
To revise your manuscript, log into https://mc.manuscriptcentral.com/rsos and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions". Under "Actions," click on "Create a Revision." You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript and upload a new version through your Author Centre.
When submitting your revised manuscript, you will be able to respond to the comments made by the referees and upload a file "Response to Referees" in "Section 6 -File Upload". You can use this to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the referees. We strongly recommend uploading two versions of your revised manuscript: 1) Identifying all the changes that have been made (for instance, in coloured highlight, in bold text, or tracked changes); 2) A 'clean' version of the new manuscript that incorporates the changes made, but does not highlight them.
When uploading your revised files please make sure that you have: 1) A text file of the manuscript (tex, txt, rtf, docx or doc), references, tables (including captions) and figure captions. Do not upload a PDF as your "Main Document"; 2) A separate electronic file of each figure (EPS or print-quality PDF preferred (either format should be produced directly from original creation package), or original software format); 3) Included a 100 word media summary of your paper when requested at submission. Please ensure you have entered correct contact details (email, institution and telephone) in your user account; 4) Included the raw data to support the claims made in your paper. You can either include your data as electronic supplementary material or upload to a repository and include the relevant doi within your manuscript. Make sure it is clear in your data accessibility statement how the data can be accessed; 5) All supplementary materials accompanying an accepted article will be treated as in their final form. Note that the Royal Society will neither edit nor typeset supplementary material and it will be hosted as provided. Please ensure that the supplementary material includes the paper details where possible (authors, article title, journal name).
Supplementary files will be published alongside the paper on the journal website and posted on the online figshare repository (https://rs.figshare.com/). The heading and legend provided for each supplementary file during the submission process will be used to create the figshare page, so please ensure these are accurate and informative so that your files can be found in searches. Files on figshare will be made available approximately one week before the accompanying article so that the supplementary material can be attributed a unique DOI.
Please note that Royal Society Open Science charge article processing charges for all new submissions that are accepted for publication. Charges will also apply to papers transferred to Royal Society Open Science from other Royal Society Publishing journals, as well as papers submitted as part of our collaboration with the Royal Society of Chemistry (http://rsos.royalsocietypublishing.org/chemistry).
If your manuscript is newly submitted and subsequently accepted for publication, you will be asked to pay the article processing charge, unless you request a waiver and this is approved by Royal Society Publishing. You can find out more about the charges at http://rsos.royalsocietypublishing.org/page/charges. Should you have any queries, please contact openscience@royalsociety.org.
Once again, thank you for submitting your manuscript to Royal Society Open Science and I look forward to receiving your revision. If you have any questions at all, please do not hesitate to get in touch. We have two good reviews. There is lots to do to clean up and some suggestions to follow, which I would strongly encourage you to think about, but nothing that seems to be critical.

RSOS-190198.R1 (Revision)
Review form: Reviewer 1 Is the manuscript scientifically sound in its present form? Yes

Are the interpretations and conclusions justified by the results? Yes
Is the language acceptable? Yes

Recommendation?
Accept as is You can expect to receive a proof of your article in the near future. Please contact the editorial office (openscience_proofs@royalsociety.org and openscience@royalsociety.org) to let us know if you are likely to be away from e-mail contact --if you are going to be away, please nominate a coauthor (if available) to manage the proofing process, and ensure they are copied into your email to the journal.
Due to rapid publication and an extremely tight schedule, if comments are not received, your paper may experience a delay in publication.
Royal Society Open Science operates under a continuous publication model (http://bit.ly/cpFAQ). Your article will be published straight into the next open issue and this will be the final version of the paper. As such, it can be cited immediately by other researchers.
As the issue version of your paper will be the only version to be published I would advise you to check your proofs thoroughly as changes cannot be made once the paper is published.

By: Bonetti, Cirillo and Ogay
The multinomial distribution indeed plays a very important role in modern statistics.
The manuscript provides algorithms for deriving the exact distribution of sums and ranges of the largest multinomial counts order statistics.
I think it is an interesting and very nicely written paper. I really like the idea of replacing chi-squared distributed goodness of fit tests with more powerful tests for which it is more difficult to compute p-values. I also think that the authors do a very nice job presenting the algorithm and motivating the use of these statistics. My main concern is that explicitly deriving the test statistic distributions may not be needed in practice. For example, suppose the goal of the analysis is to test the null hypothesis of equal multinomial probabilities at level 05 . 0   using the maximum multinomial count statistic. One option would be to use the author's code to derive the statistic's null distribution; compute the p-value by summing the null probability for the statistic value that are greater than or equal to the statistic value for the observed data; reject the null hypothesis if the p-value is less than 0.05. I think an easier option would be a Monte Carlo simulation that takes several minutes to write and several seconds to run: sample 6 10 iid null Multinomial vectors; compute the proportion of samples for which the statistic value is greater than or equal to the statistic value for the observed data; reject the null hypothesis if the proportion is less than 0.05. Here is R code that runs in 3 seconds that evaluates the exact distribution of the maximum:  The authors derive the exact distributions of the maximum, the minimum, the range, and the sum of the J largest order statistics of a random vector having an equiprobable multinomial distribution based upon an unpublished work by Michael Arnold Rappeport. Preparing these main results, the authors discuss some approximations and some exact results of the distribution of the maximum, the minimum and the range, respectively. Afterwards, their exact results are compared to those approximations and applied in statistical testing theory. Finally, the authors illustrate two applications to testing for the homogeneity of a Poisson process and for clustering diseases.
Overall, the manuscript is well structured and well written. After some minor concerns listed below, I recommend accepting this paper for publication. For example Figures 2 and 3 are illustrated at page 12, but textually mentioned just at page 13. 11) page 19, line 39: "from" 12) The readability of the algorithms given in the Appendix and the comparability to the results could be improved by using the notations from the corresponding section. For example use and instead of and , respectively.

Dear Referees,
First of all thanks for your nice comments on our work and for the useful suggestions you gave us.
As you will see, we have accepted all the changes you have proposed and commented upon the points you have raised.
Here below we collect all the changes introduced in the paper.
Thanks for your help and best regards.
The Authors.

Reviewer 1
-"My main concern is that explicitly deriving the test statistic distributions may not be needed in practice…" We agree on the fact that, nowadays, very good approximations for the distributions of functions of the multinomial counts can be obtained via simulations, as suggested in your report. And probably, for most applications, those approximations could be sufficient. However, from an epistemological point of view, they will always be approximations, and not exact results as those we propose. We feel that our paper contributes to the resolution of an old open problem related to multinomial random variables.
To take into account your comment, we have added the following text just before Section 2(a): Note that, given the increased (and still increasing) computing power one can rely upon today, the probability distributions of functions of the multinomial counts can also be estimated via Monte Carlo simulations. However, even if extremely accurate, from a conceptual point of view they are still approximations and not exact results, as those we will discuss in this article.
-"In general, for presenting the results of MC probability assessment I suggest adding standard errors…" Regarding your comment about standard errors in the MC simulation, we have added the following text on Page 13, Paragraph 3 (the formula is in Latex):

For both figures, standard errors can be computed using the simple formula $ \sqrt{\frac{p(1-p)}{3000}}$, where $p$ is the estimated power.
This gives the reader a quick way to compute the standard errors at her own discretion.
-"I also found the comparison between approximation and exact results in Section 7 a little misleading…" Finally, to address your very relevant point about power comparisons, we have clarified the text, by adding the following lines at the end of Section 7: However, one should keep in mind that the observed differences in the performances of the two procedures (exact vs approximation) may also depend upon the accuracy of the calculations of the tail probabilities under the approximation formulas, which may produce type I error probabilities different from the desired ones.
Thanks a lot for your help in clarifying our paper.

Appendix C Reviewer 2
Following your suggestions, we have modified the paper as follows. For example Figures 2 and 3 are illustrated at page 12, but textually mentioned just at page 13. DONE 11) page 19, line 39: "from" DONE 12) The readability of the algorithms given in the Appendix and the comparability to the results could be improved by using the notations from the corresponding section. For example use and instead of and , respectively. DONE Thanks a lot for your help in finding even the smallest typo. We appreciate that.