This function fits a normal distribution to a data set using a mimimum Hellinger distance approach and
then performs a test of hypothesis that the data are from a normal distribution.
The data are supplied by the user in a numeric vector. The length of the vector determines the number of data values.
NGauss
The number of subintervals for the Gauss-Legendre integration techniques is controlled by NGauss. A default value of 100 is used. A minimum of 25 is enforced.
MaxIter
The maximum number of iterations that can occur in evaluating the minimum Hellinger distance is controlled by MaxIter. A default of 25 is used. A minimum of 1 is enforced.
InitLocation
An optional initial location estimate can be defined using InitLocation. The data median is the default value.
InitScale
An optional initial scale estimate can be defined using InitScale. The data median absolute deviation is the default value.
EpsLoc
The epsilon (in data units) below which the iterative minimization approach declares convergence in the location estimate is controlled by EpsLoc. EpsLoc should be set to give approximately 5 digits of accuracy in the location estimate. A default value of 0.0001 is used.
EpsSca
The epsilon (in data units) below which the iterative minimization approach declares convergence in the SCALE estimate is controlled by EpsSca. EpsSca should be set to give approximately 5 digits of accuracy in the scale estimate. A default value of 0.0001 is used.
Silent
A value of FALSE for Silent writes several results to the R console. Use Silent=TRUE to eliminate the output.
Small
A value of FALSE for Small returns a list of 11 objects. Use Small=TRUE to return a shorter list containing only the Hellinger distance and the p-value.
Details
Let f(x) and g(x) be absolutely continuous probability density functions. The square of the
Hellinger distance can be written as H^2 = 1 - int√{f(x)g(x)}dx. For this package, f(x)
denotes the family of normal densities and g is a data-based density obtained by using the Ephanechnikov
kernel. The kernel has the form w(z)=0.75(1-z^2 ) for -1<z<1 and 0 elsewhere. Let the
n sample data be denoted by X1, ..., Xn. The data-based kernel density at any
point y is calculated from
A Newton-Rhapson method with analytical derivatives is to determine the minimum Hellinger distance.
Numerical integration is done using a 6-point composite Gauss-Legendre technique.
Value
Values returned in a list include the following items:
Minimized Hellinger distance
p-value for the minimized distance
Initial location used in the iterative solution
Initial scale used in the iterative solution
Final location estimate
Final scale estimate
Sample size
Kernel density bandwidth parameter
Vector of x values used in the integration for the Hellinger distance
Vector of nonparametric density values at the x values used in the integration
Vector of normal density values for the estimated location and scale at the x values used in integration
Author(s)
Paul W. Eslinger and Heather M. Orr
References
Epanechnikov, VA. 1969. "Non-Parametric Estimation of a Multivariate Probability Density."
Theory of Probability and its Applications 14(1):153-156. doi http://dx.doi.org/10.1137/1114019
Hellinger, E. 1909. "Neue Begrundung Der Theorie Quadratischer Formen Von Unendlichvielen
Veranderlichen." Journal fur die reine und angewandte Mathematik 136:210-271.
doi http://dx.doi.org/10.1515/crll.1909.136.210
Examples
## example with a normal data set
mhde.test(rnorm(20,0.0,1.0),Small=TRUE)
## example with a uniform data set including example plot
MyList <- mhde.test(runif(25,min=2,max=4))
mhde.plot(MyList)