R: Fitting distributions to observations/Monte Carlo simulations
fitDistr
R Documentation
Fitting distributions to observations/Monte Carlo simulations
Description
This function fits 21 different continuous distributions by (weighted) NLS to the histogram or kernel density of the Monte Carlo simulation results as obtained by propagate or any other vector containing large-scale observations. Finally, the fits are sorted by ascending AIC.
Distributions 3) and 16) - 19) are sometimes hard to fit because the start parameters are not readily deducible from the kernel density estimates or some parameters are highly sensitive to shape changes. For these five cases, a grid of starting values with different magnitudes is used to obtain the best parameter combination with respect to lowest residual sum-of-squares ("brute force" approach).
The goodness-of-fit (GOF) is calculated with AIC from the (weighted) log-likelihood of the fit:
with x_i = the residuals from the NLS fit, N = the length of the residual vector, k = the number of parameters of the fitted model and w_i = the weights.
In contrast to some other distribution fitting softwares (i.e. Easyfit, Mathwave) that use residual sum-of-squares/Anderson-Darling/Kolmogorov-Smirnov statistics as GOF measures, the application of AIC accounts for increasing number of parameters in the distribution fit and therefore compensates for overfitting. Hence, this approach is more similar to ModelRisk (Vose Software) and as employed in fitdistr of the 'MASS' package.
Another application is to identify a possible distribution for the raw data prior to using Monte Carlo simulations from this distribution. However, a decent number of observations should be at hand in order to obtain a realistic estimate of the proper distribution. See 'Examples'.
The code for the density functions is in file "distr-densities.R".
IMPORTANT: It can be feasible to set weights = TRUE in order to give more weight to bins with low counts. See 'Examples'.
ALSO: Distribution fitting is highly sensitive to the number of defined histogram bins, so it is advisable to change this parameter and inspect if the order of fitted distributions remains stable!
Value
A list with the following items:
aic: the ascendingly sorted AIC dataframe. fit: a list of the results from nls.lm for each distribution model. bestfit: the best model in terms of lowest AIC. fitted: the fitted values from the best model. residuals: the residuals from the best model.
Author(s)
Andrej-Nikolai Spiess
References
Continuous univariate distributions, Volume 1.
Johnson NL, Kotz S and Balakrishnan N. Wiley Series in Probability and Statistics, 2.ed (2004).