This package is based on notation, definitions, and results of Goerg (2011,
2015, 2016). I will not include these references in the description of
each single function.
Lambert W \times F distributions are a general framework to model and
transform skewed, heavy-tailed data. Lambert W \times F random
variables (RV) are based on an input/ouput system with input RV X
sim F_X(x mid oldsymbol β) and output Y, which is a
non-linearly transformed version of X – with similar properties to X,
but slightly skewed and/or heavy-tailed. Then Y has a 'Lambert W
\times F_X' distribution - see References.
get_distnames lists all implemented Lambert W \times F
distributions in this package. If you want to generate a
skewed/heavy-tailed version of a distribution that is not implemented,
you can use the do-it-yourself modular toolkit
(create_LambertW_input and
create_LambertW_output). It allows users to quickly
implement their own Lambert W x 'MyFavoriteDistribution' and use it in
their analysis right away.
This package contains several functions to analyze skewed and heavy-tailed
data: simulate random samples (rLambertW), evaluate pdf and
cdf (dLambertW and pLambertW), estimate
parameters (IGMM and MLE_LambertW), compute
quantiles (qLambertW), and plot/print results nicely
(plot.LambertW_fit, print.LambertW_fit,
summary.LambertW_fit).
Probably the most useful function is Gaussianize, which works
similarly to scale, but makes your data Gaussian (not
just centers and scales it, but also makes it symmetric and removes
excess kurtosis).
If you use this package in your work please cite it
(citation("LambertW")). You can also send me an implementation of
your 'Lambert W \times YourFavoriteDistribution' to add to the
LambertW package (and I will reference your work introducing your
'Lambert W \times YourFavoriteDistribution' here.)
Feel free to contact me for comments, suggestions, code improvements,
implementation of new input distributions, bug reports, etc.
Author(s)
Author and maintainer: Georg M. Goerg (im (at) gmge.org)
References
Goerg, G.M. (2011). “Lambert W Random Variables - A New Family of
Generalized Skewed Distributions with Applications to Risk
Estimation”. Annals of Applied Statistics, 5 (3), 2197-2230.
(http://arxiv.org/abs/0912.4554).
Goerg, G.M. (2015). “The Lambert Way to Gaussianize heavy-tailed data
with the inverse of Tukey's h transformation as a special case”. The
Scientific World Journal: Probability and Statistics with Applications in
Finance and Economics. Available at
http://www.hindawi.com/journals/tswj/aa/909231/.
Goerg, G.M. (2016). “Rebuttal of the “Letter to the Editor of
Annals of Applied Statistics” on Lambert W x F distributions and the
IGMM algorithm”. Available on arxiv.
Examples
# Replicate parts of the analysis in Goerg (2011)
data(AA)
y <- AA[AA$sex=="f", "bmi"]
test_normality(y)
fit.gmm <- IGMM(y, type = "s")
summary(fit.gmm) # gamma is significant and positive
plot(fit.gmm)
# Compare empirical to theoretical moments (given parameter estimates)
moments.theory <-
mLambertW(theta = list(beta = fit.gmm$tau[c("mu_x", "sigma_x")],
gamma = fit.gmm$tau["gamma"]),
distname = "normal")
TAB <- rbind(unlist(moments.theory),
c(mean(y), sd(y), skewness(y), kurtosis(y)))
rownames(TAB) <- c("Theoretical (IGMM)", "Empirical")
TAB
x <- get_input(y, fit.gmm$tau)
test_normality(x) # input is normal -> fit a Lambert W x Gaussian by MLE
fit.ml <- MLE_LambertW(y, type = "s", distname = "normal", hessian = TRUE)
summary(fit.ml)
plot(fit.ml)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(LambertW)
Loading required package: MASS
Loading required package: ggplot2
This is 'LambertW' version 0.6.4. Please see the NEWS file and citation("LambertW").
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/LambertW/LambertW-package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: LambertW-package
> ### Title: R package for Lambert W \times F distributions
> ### Aliases: LambertW LambertW-package
> ### Keywords: package
>
> ### ** Examples
>
>
> # Replicate parts of the analysis in Goerg (2011)
> data(AA)
> y <- AA[AA$sex=="f", "bmi"]
> test_normality(y)
$seed
[1] 975141
$shapiro.wilk
Shapiro-Wilk normality test
data: data.test
W = 0.97251, p-value = 0.03454
$shapiro.francia
Shapiro-Francia normality test
data: data.test
W = 0.96986, p-value = 0.02306
$anderson.darling
Anderson-Darling normality test
data: data
A = 0.49478, p-value = 0.2104
>
> fit.gmm <- IGMM(y, type = "s")
> summary(fit.gmm) # gamma is significant and positive
Call: IGMM(y = y, type = "s")
Estimation method: IGMM
Input distribution: Any distribution with finite mean & variance and skewness = 0.
Parameter estimates:
Note: standard errors are only asymptotic, simulation based.
If you want more accurate estimates see ?bootstrap . Estimate Std. Error t value Pr(>|t|)
mu_x 21.735200 0.100000 217.3520 < 2e-16 ***
sigma_x 2.569731 0.070711 36.3415 < 2e-16 ***
gamma 0.099291 0.040000 2.4823 0.01305 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
--------------------------------------------------------------
a b
Support 12.21418 Inf
Data range 16.75000 31.93
p_m1 = P(non-principal branch affects solution): NA
--------------------------------------------------------------
Given these input parameter estimates the moments of the output random variable are
(assuming Gaussian input):
mu_y = 21.99; sigma_y = 2.63; skewness = 0.6; kurtosis = 3.61.
> plot(fit.gmm)
>
> # Compare empirical to theoretical moments (given parameter estimates)
> moments.theory <-
+ mLambertW(theta = list(beta = fit.gmm$tau[c("mu_x", "sigma_x")],
+ gamma = fit.gmm$tau["gamma"]),
+ distname = "normal")
> TAB <- rbind(unlist(moments.theory),
+ c(mean(y), sd(y), skewness(y), kurtosis(y)))
> rownames(TAB) <- c("Theoretical (IGMM)", "Empirical")
> TAB
mean sd skewness.gamma kurtosis.gamma
Theoretical (IGMM) 21.99161 2.633412 0.6006551 3.608928
Empirical 21.98920 2.640028 0.6933951 4.176055
>
> x <- get_input(y, fit.gmm$tau)
> test_normality(x) # input is normal -> fit a Lambert W x Gaussian by MLE
$seed
[1] 539521
$shapiro.wilk
Shapiro-Wilk normality test
data: data.test
W = 0.99443, p-value = 0.9576
$shapiro.francia
Shapiro-Francia normality test
data: data.test
W = 0.99339, p-value = 0.8428
$anderson.darling
Anderson-Darling normality test
data: data
A = 0.15246, p-value = 0.9583
>
> fit.ml <- MLE_LambertW(y, type = "s", distname = "normal", hessian = TRUE)
> summary(fit.ml)
Call: MLE_LambertW(y = y, distname = "normal", type = "s", hessian = TRUE)
Estimation method: MLE
Input distribution: normal
Parameter estimates:
Estimate Std. Error t value Pr(>|t|)
mu 21.740830 0.273444 79.5074 < 2e-16 ***
sigma 2.555770 0.187541 13.6278 < 2e-16 ***
gamma 0.096230 0.038781 2.4813 0.01309 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
--------------------------------------------------------------
a b
Support 11.96984 Inf
Data range 16.75000 31.93
p_m1 = P(non-principal branch affects solution): 1.34421e-25
--------------------------------------------------------------
Given these input parameter estimates the moments of the output random variable are
(assuming Gaussian input):
mu_y = 21.99; sigma_y = 2.62; skewness = 0.58; kurtosis = 3.57.
> plot(fit.ml)
>
>
>
>
>
>
> dev.off()
null device
1
>