Next Article in Journal
A Note on Simultaneous Confidence Intervals for Direct, Indirect and Synthetic Estimators
Previous Article in Journal
Wilcoxon-Type Control Charts Based on Multiple Scans
Previous Article in Special Issue
Informative g-Priors for Mixed Models
 
 
Article
Peer-Review Record

The Flexible Gumbel Distribution: A New Model for Inference about the Mode

by Qingyang Liu 1,*, Xianzheng Huang 1 and Haiming Zhou 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 11 February 2024 / Revised: 8 March 2024 / Accepted: 11 March 2024 / Published: 13 March 2024
(This article belongs to the Special Issue Bayes and Empirical Bayes Inference)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper is fairly well written and can be accepted for publication without any further revision/changes.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

In this paper, a new distribution family was introduced based on the mixture of Gumbel distributions to fit the heavy tail data sets. It is interesting to read. My main comments are as follows:

1. In Section 4, Simulation Study, parameters were selected to be \theta=0, \sigma_1=1, \sigma_2=5 and \omega=0.5. Do you have a specific reason for selecting these values for simulation? Additionally, only two sample sizes n=100 and 200 are considered. How does the new distribution perform when n is relatively small, say n=50? 

2. In Section 5, An Application in Hydrology, the authors claim that a relatively larger p-value of the K-S test provides evidence that FG distributions fit the real data set better than normal mixture distributions. However, I feel the logic here is defective because a relatively larger p-value can only show that we have insufficient evidence to reject the null hypothesis H_0:  the data comes from a specified distribution. It can not be evidence to ACCEPT H_0. 

3. Based on #2, I would like to recommend including some commonly used criteria of model selection, such as AIC and BIC in Sections 5 and 6 for real data examples. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This paper proposes a flexible Gumbel distribution that is a mixture/convex combination of a left and right Gumbel distribution. It allows the authors to generalize the Gumbel distribution and fix the issue of the fixed skew and kurtosis, and model data with heavy tails in both directions. The authors include both frequentist and Bayesian methodology, discuss identifiability, and use both simulated data and two real data experiments. The paper is very good for the most part, thorough, intelligent, and well written. I have the following suggestions for improvement. For the most part, my suggestions are minor. However, the appendix seems to be missing, so I would like to see that, and I think the authors should expand the paper more with some proofs that should go in the appendix (perhaps they are already there).



1) Math issues

 

Skewness 1.44: Wikipedia says 1.14, double check this

 

“one can show that the third central moment of Y is given by” You should include the proofs of such claims in an appendix.

 

In your Shiny app, you allow loc1 and loc2 to be different, but they should be the same according to your paper’s theory

 

Identifiability: this is a major claim you’ve made, so you should write down theorem 1 of the referenced paper and prove (using theorem 1) that your determinant is non-zero. If I’m reading this right, this just requires one computation, but you should make one in the manuscript. Also, the Shiny app says theorem 2 but I think you mean theorem 1.

 

3.1: you should add a few remarks explaining EM. For example, state that the marginal distributions of f_Y,Z at the bottom of pg 3 (which should have an equation number) are just f_1 and f_2, that’s the point of EM.

 

You should also outline a few steps of ECM.

 

3.2: Beta(1,1) is just uniform, so why do you write it like this

 

“Numbers in parentheses are 100 Monte Carlo standard errors associated with the Averages.” Why is there a Monte Carlo standard error for the frequentist method? Perhaps I am misunderstanding what that means.

 

Chapter 6: are sigma1, sigma2 fixed? If so, specify what they are. If they are estimated, include their estimates in the tables



2) Discussion/misc issues:

 

“two of which we demonstrate in the Appendix” I don’t see an appendix.

 

The DC poverty rate was displayed in Figure 3 as the 3rd highest. That surprised me. I checked Wikipedia and it says DC poverty rate is the 9th highest (8th highest excluding Puerto Rico). https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_poverty_rate

 

You should also explicitly comment (in the text not just in the figure) that DC has both the highest murder rate and college attainment rate, as both of these facts are important to its outlier status, and striking. You could also add some discussion in the caption of Figure 3.

 

References: some contain doi, others don’t, I suggest including them everywhere.

 

Please include a Readme file in your Github repository.



3) Minor English issues

 

“A regression model concerning the mode of a response given covariates based on the proposed unimodal distribution can be easily formulated, which we apply to data from an application in criminology to reveal interesting data features that are obscured by outliers.” Run-on sentence. There are some other run-on sentences in the paper

 

“The mean, median, and mode are three most commonly used measure of central tendency of data” ->The mean, median, and mode are the three most commonly used measures of central tendency of data

 

“There are two main reasons contributing to the long-lasting trend of opting to semi-/non-parametric methods for mode estimation, despite the fact that inference procedures proposed along these veins are usually less straightforward to implement (e.g., involving bandwidth selection), and less efficient than their parametric counterparts” This sentence has several unnecessary words, eg you could cut “opting to”. And you could replace “inference procedures proposed along these veins” with just “these procedures”. There are some other analogous instances where the writing could be slightly improved.

 

“distirbution,” typo

 

“From National Water Information System” ->From the National Water Information System

 

“may be nearly symmetry” may be nearly symmetrical

 

Comments on the Quality of English Language

Minor issues listed above

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors answered all my questions. I have no more comments.  

Author Response

Thank you for taking the time to review our manuscript and for your valuable feedback. We are pleased to hear that the responses provided have addressed all of your questions and concerns satisfactorily. Your acknowledgment of our efforts is greatly appreciated.

Reviewer 3 Report

Comments and Suggestions for Authors

I have not had much time to check the manuscript carefully. However, it does look like the reviewers did a good job at meeting my suggestions.

I will say, please give the manuscript a careful read. For example, the authors sometimes write DC and sometimes D.C., which should be standardized, and the latter is better. I also suggest the authors use more recent data than 2003 for their crime data section, but that's optional.

 

Comments on the Quality of English Language

Please check carefully.

Author Response


Dear Reviewer,

Thank you for your thorough review of our manuscript. We appreciate your attention to detail and your valuable suggestions for improvement.

Regarding the inconsistency in the abbreviation "DC" versus "D.C.", we have carefully revised the manuscript to standardize the usage, opting for "D.C." throughout for consistency.

Regarding the suggestion to utilize more recent data for our crime data section, we acknowledge the importance of staying current. However, we encountered challenges in sourcing data sets with exact covariates such as education, poverty, and metropolitan ratio for more recent years. Therefore, we have opted to maintain the 2003 data set for the sake of consistency and coherence in our analysis.

We hope these revisions address your concerns adequately. Please let us know if there are any further adjustments or clarifications needed.

Thank you again for your insightful feedback.

Sincerely,
Qingyang Liu
Xianzheng Huang
Haiming Zhou

Back to TopTop