Last data update: 2014.03.03

R: R package for Lambert W \times F distributions
LambertW-packageR Documentation

R package for Lambert W \times F distributions

Description

This package is based on notation, definitions, and results of Goerg (2011, 2015, 2016). I will not include these references in the description of each single function.

Lambert W \times F distributions are a general framework to model and transform skewed, heavy-tailed data. Lambert W \times F random variables (RV) are based on an input/ouput system with input RV X sim F_X(x mid oldsymbol β) and output Y, which is a non-linearly transformed version of X – with similar properties to X, but slightly skewed and/or heavy-tailed. Then Y has a 'Lambert W \times F_X' distribution - see References.

get_distnames lists all implemented Lambert W \times F distributions in this package. If you want to generate a skewed/heavy-tailed version of a distribution that is not implemented, you can use the do-it-yourself modular toolkit (create_LambertW_input and create_LambertW_output). It allows users to quickly implement their own Lambert W x 'MyFavoriteDistribution' and use it in their analysis right away.

This package contains several functions to analyze skewed and heavy-tailed data: simulate random samples (rLambertW), evaluate pdf and cdf (dLambertW and pLambertW), estimate parameters (IGMM and MLE_LambertW), compute quantiles (qLambertW), and plot/print results nicely (plot.LambertW_fit, print.LambertW_fit, summary.LambertW_fit).

Probably the most useful function is Gaussianize, which works similarly to scale, but makes your data Gaussian (not just centers and scales it, but also makes it symmetric and removes excess kurtosis).

If you use this package in your work please cite it (citation("LambertW")). You can also send me an implementation of your 'Lambert W \times YourFavoriteDistribution' to add to the LambertW package (and I will reference your work introducing your 'Lambert W \times YourFavoriteDistribution' here.)

Feel free to contact me for comments, suggestions, code improvements, implementation of new input distributions, bug reports, etc.

Author(s)

Author and maintainer: Georg M. Goerg (im (at) gmge.org)

References

Goerg, G.M. (2011). “Lambert W Random Variables - A New Family of Generalized Skewed Distributions with Applications to Risk Estimation”. Annals of Applied Statistics, 5 (3), 2197-2230. (http://arxiv.org/abs/0912.4554).

Goerg, G.M. (2015). “The Lambert Way to Gaussianize heavy-tailed data with the inverse of Tukey's h transformation as a special case”. The Scientific World Journal: Probability and Statistics with Applications in Finance and Economics. Available at http://www.hindawi.com/journals/tswj/aa/909231/.

Goerg, G.M. (2016). “Rebuttal of the “Letter to the Editor of Annals of Applied Statistics” on Lambert W x F distributions and the IGMM algorithm”. Available on arxiv.

Examples

 
# Replicate parts of the analysis in Goerg (2011)
data(AA)
y <- AA[AA$sex=="f", "bmi"]
test_normality(y)

fit.gmm <- IGMM(y, type = "s")
summary(fit.gmm)  # gamma is significant and positive
plot(fit.gmm)

# Compare empirical to theoretical moments (given parameter estimates)
moments.theory <- 
 mLambertW(theta = list(beta = fit.gmm$tau[c("mu_x", "sigma_x")], 
                        gamma = fit.gmm$tau["gamma"]), 
           distname = "normal")
TAB <- rbind(unlist(moments.theory),
             c(mean(y), sd(y), skewness(y), kurtosis(y)))
rownames(TAB) <- c("Theoretical (IGMM)", "Empirical")
TAB

x <- get_input(y, fit.gmm$tau)
test_normality(x) # input is normal -> fit a Lambert W x Gaussian by MLE

fit.ml <- MLE_LambertW(y, type = "s", distname = "normal", hessian = TRUE)
summary(fit.ml)
plot(fit.ml)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(LambertW)
Loading required package: MASS
Loading required package: ggplot2
This is 'LambertW' version 0.6.4.  Please see the NEWS file and citation("LambertW").

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/LambertW/LambertW-package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: LambertW-package
> ### Title: R package for Lambert W \times F distributions
> ### Aliases: LambertW LambertW-package
> ### Keywords: package
> 
> ### ** Examples
> 
>  
> # Replicate parts of the analysis in Goerg (2011)
> data(AA)
> y <- AA[AA$sex=="f", "bmi"]
> test_normality(y)
$seed
[1] 975141

$shapiro.wilk

	Shapiro-Wilk normality test

data:  data.test
W = 0.97251, p-value = 0.03454


$shapiro.francia

	Shapiro-Francia normality test

data:  data.test
W = 0.96986, p-value = 0.02306


$anderson.darling

	Anderson-Darling normality test

data:  data
A = 0.49478, p-value = 0.2104


> 
> fit.gmm <- IGMM(y, type = "s")
> summary(fit.gmm)  # gamma is significant and positive
Call: IGMM(y = y, type = "s")
Estimation method: IGMM
Input distribution: Any distribution with finite mean & variance and skewness = 0.

 Parameter estimates:
 Note: standard errors are only asymptotic, simulation based.
 If you want more accurate estimates see ?bootstrap .         Estimate  Std. Error  t value Pr(>|t|)    
mu_x    21.735200    0.100000 217.3520  < 2e-16 ***
sigma_x  2.569731    0.070711  36.3415  < 2e-16 ***
gamma    0.099291    0.040000   2.4823  0.01305 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-------------------------------------------------------------- 
                  a     b
Support    12.21418   Inf
Data range 16.75000 31.93

 p_m1 = P(non-principal branch affects solution): NA
-------------------------------------------------------------- 

Given these input parameter estimates the moments of the output random variable are 
  (assuming Gaussian input): 
 mu_y = 21.99; sigma_y = 2.63; skewness = 0.6; kurtosis = 3.61.

> plot(fit.gmm)
> 
> # Compare empirical to theoretical moments (given parameter estimates)
> moments.theory <- 
+  mLambertW(theta = list(beta = fit.gmm$tau[c("mu_x", "sigma_x")], 
+                         gamma = fit.gmm$tau["gamma"]), 
+            distname = "normal")
> TAB <- rbind(unlist(moments.theory),
+              c(mean(y), sd(y), skewness(y), kurtosis(y)))
> rownames(TAB) <- c("Theoretical (IGMM)", "Empirical")
> TAB
                       mean       sd skewness.gamma kurtosis.gamma
Theoretical (IGMM) 21.99161 2.633412      0.6006551       3.608928
Empirical          21.98920 2.640028      0.6933951       4.176055
> 
> x <- get_input(y, fit.gmm$tau)
> test_normality(x) # input is normal -> fit a Lambert W x Gaussian by MLE
$seed
[1] 539521

$shapiro.wilk

	Shapiro-Wilk normality test

data:  data.test
W = 0.99443, p-value = 0.9576


$shapiro.francia

	Shapiro-Francia normality test

data:  data.test
W = 0.99339, p-value = 0.8428


$anderson.darling

	Anderson-Darling normality test

data:  data
A = 0.15246, p-value = 0.9583


> 
> fit.ml <- MLE_LambertW(y, type = "s", distname = "normal", hessian = TRUE)
> summary(fit.ml)
Call: MLE_LambertW(y = y, distname = "normal", type = "s", hessian = TRUE)
Estimation method: MLE
Input distribution: normal

 Parameter estimates:
       Estimate  Std. Error  t value Pr(>|t|)    
mu    21.740830    0.273444  79.5074  < 2e-16 ***
sigma  2.555770    0.187541  13.6278  < 2e-16 ***
gamma  0.096230    0.038781   2.4813  0.01309 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-------------------------------------------------------------- 
                  a     b
Support    11.96984   Inf
Data range 16.75000 31.93

 p_m1 = P(non-principal branch affects solution): 1.34421e-25
-------------------------------------------------------------- 

Given these input parameter estimates the moments of the output random variable are 
  (assuming Gaussian input): 
 mu_y = 21.99; sigma_y = 2.62; skewness = 0.58; kurtosis = 3.57.

> plot(fit.ml)
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>