Confidence Intervals for Negative Binomial Random Variables of High Dispersion
This paper considers the problem of constructing confidence intervals for the mean of a Negative Binomial random variable based upon sampled data. When the sample size is large, it is a common practice to rely upon a Normal distribution approximation to construct these intervals. However, we demonstrate that the sample mean of highly dispersed Negative Binomials exhibits a slow convergence in distribution to the Normal as a function of the sample size. As a result, standard techniques (such as the Normal approximation and bootstrap) will construct confidence intervals for the mean that are typically too narrow and significantly undercover in the case of high dispersion. To address this problem, we propose methods based upon Bernsteins inequality along with the Gamma and Chi Square distributions as alternatives to the standard methods when the sample size is small and the dispersion is high. A confidence interval based upon Bernsteins inequality relies upon less stringent assumptions than those required of parametric models. Moreover, we prove a limit theorem demonstrating that the sample mean of Negative Binomials converges in distribution to a Gamma random variable under suitable hypotheses, and we use this observation to construct approximate confidence intervals. Furthermore, we investigate the applicability of the Chi Square distribution as a special case of the Gamma model. We then undertake a variety of simulation experiments to compare the proposed methods to standard techniques in terms of empirical coverage and provide concrete recommendations for the settings in which particular intervals are preferred. We also apply the proposed methods to examples arising in the serial analysis of gene expression and traffic flow in a communications network to illustrate both the strengths and weaknesses of these procedures along with those of standard techniques.