Last data update: 2014.03.03

R: Fitting Linear Models with Hyperbolic Errors
hyperblmR Documentation

Fitting Linear Models with Hyperbolic Errors

Description

Fits linear models with hyperbolic errors. Can be used to carry out linear regression for data exhibiting heavy tails and skewness. Displays the histogram, log-histogram (both with fitted error distribution), Q-Q plot and residuals vs. fitted values plot for the fitted linear model.

Usage

hyperblm(formula, data, subset, weights, na.action,
         xx = FALSE, y = FALSE, contrasts = NULL,
         offset, method = "Nelder-Mead",
         startMethod = "Nelder-Mead", startStarts = "BN",
         paramStart = NULL,
         maxiter = 100, tolerance = 0.0001,
         controlBFGS = list(maxit = 1000),
         controlNM = list(maxit = 10000),
         maxitNLM = 10000,
         controlCO = list(), silent = TRUE, ...)

## S3 method for class 'hyperblm'
print(x, digits = max(3, getOption("digits")-3), ...)

## S3 method for class 'hyperblm'
coef(object, ...)

## S3 method for class 'hyperblm'
plot(x, breaks = "FD",
                        plotTitles = c("Residuals vs Fitted Values",
                                       "Histogram of residuals",
                                       "Log-Histogram of residuals",
                                       "Q-Q Plot"),
                        ...)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lm is called.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

weights

an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, weighted least squares is used with weights weights (that is, minimizing sum(w*e^2)); otherwise ordinary least squares is used. See also ‘Details’,

na.action

A function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

xx, y

Logicals. If TRUE, the corresponding components of the fit (the explanatory matrix and the response vector) are returned.

contrasts

An optional list. See the contrasts.arg of model.matrix.default.

offset

An optional vector. See Details.

method

Character. Possible values are "BFGS", "Nelder-Mead" and "nlm". See Details.

startMethod

Character. Possible values are "BFGS" and "Nelder-Mead". See Details.

startStarts

Character. Possible values are "BN", "FN", "SL", "US" and "MoM". See Details.

paramStart

An optional vector. A vector of parameter start values for the optimization routine. See Details.

maxiter

Numeric. The maximum number of two-stage optimization alternating iterations. See Details.

tolerance

Numeric. The two-stage optimization convergence ratio. See Details.

controlBFGS, controlNM

Lists. Lists of control parameters for optim when using corresponding (BFGS, Nelder-Mead) optimisation method in first stage. See optim.

maxitNLM

Numeric. The maximum number of iterations for the NLM optimizer.

controlCO

List. A list of control parameters for constrOptim in second stage.

silent

Logical. If TRUE, the error messgae of optimizer will not be displayed.

x

An object of class "hyperblm".

object

An object of class "hyperblm".

breaks

May be a vector, a single number or a character string. See hist.

plotTitles

Titles to appear above the plots.

digits

Numeric. Desired number of digits when the object is printed.

...

Passes additional arguments to function hyperbFitStand, optim and constrOptim.

Details

Models for hyperblm are specified symbolically. A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. A specification of the form first:second indicates the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second. This is the same as first + second + first:second.

If the formula includes an offset, this is evaluated and subtracted from the response.

If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix.

See model.matrix for some further details. The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on.

A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula for more details of allowed formulae.

Non-NULL weights can be used to indicate that different observations have different variances (with the values in weights being inversely proportional to the variances); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations (including the case that there are w_i observations equal to y_i and the data have been summarized).

hyperblm calls the lower level function hyperblmFit for the actual numerical computations.

All of weights, subset and offset are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula.

hyperblmFit uses a two-stage alternating optimization routine. The quality of parameter start values (especially the error distribution parameters) is crucial to the routine's convergence. The user can specify the start values via the paramStart argument, otherwise the function finds reliable start values by calling the hyperbFitStand function.

startMethod in the argument list is the optimization method for function hyperbFitStandStart which finds the start values for function hyperbFitStand. It is set to "Nelder-Mead" by default due to the robustness of this optimizer. The "BFGS" method is also implemented as it is relatively fast to converge. Since "BFGS" method is a quasi-Newton method it will not as robust and for some data will not achieve convergence.

startStarts is the method used to find the start values for function hyperbFitStandStart which includes:

  • "BN"A method from Barndorff-Nielsen (1977) based on estimates of psi and gamma the absolute slopes of the left and right asymptotes to the log density function

  • "FN"Based on a fitted normal distribution as it is a limit of the hyperbolic distribution

  • "SL"Based on a fitted skew-Laplace distribution for which the log density has the form of two straight line with absolute slopes 1/alpha, 1/beta

  • "MoM"A method of moment approach

  • "US"User specified

method is the method used in stage one of the two-stage alternating optimization routine. As the startMethod, it is set to "Nelder-Mead" by default. Besides "BFGS","nlm" is also implemented as a alternative. Since BFGS method is a quasi-Newton method it will not as robust and for some data will not achieve convergence.

If the maximum of the ratio the change of the individual coefficients is smaller than tolerance then the routine assumes convergence, otherwise if the alternating iteration number exceeds maxiter with the maximum of the ratio the change of the individual coefficients larger than tolerance, the routine is considered not to have converged.

Value

hyperblm returns an object of class "hyperblm" which is a list containing:

coefficients

A named vector of regression coefficients.

distributionParams

A named vector of fitted hyperbolic error distribution parameters.

fitted.values

The fitted values from the model.

residuals

The remainder after subtracting fitted values from response.

mle

The maximum likelihood value of the model.

method

The optimization method for stage one.

paramStart

The start values of parameters that the user specified (only where relevant).

residsParamStart

The start values of parameters obtained by hyperbFitStand (only where relevant).

call

The matched call.

terms

The terms object used.

contrasts

The contrasts used (only where relevant).

xlevels

The levels of the factors used in the fitting (only where relevant).

offset

The offset used (only where relevant)

xNames

The names of each explanatory variables. If explanatory variables don't have names then they will be named x.

yVec

The response vector.

xMatrix

The explanatory variables matrix.

iterations

Number of two-stage alternating iterations to convergence.

convergence

The convergence code for two stage optimization: 0 is the system converged, 1 is first stage does not converge, 2 is second stage does not converge, 3 is the both stages do not converge.

breaks

The cell boundaries found by a call the hist.

Author(s)

David Scott d.scott@auckland.ac.nz, Xinxing Li xli053@aucklanduni.ac.nz

References

Barndorff-Nielsen, O. (1977) Exponentially decreasing distributions for the logarithm of particle size, Proc. Roy. Soc. Lond., A353, 401–419.

Prause, K. (1999). The generalized hyperbolic models: Estimation, financial derivatives and risk measurement. PhD Thesis, Mathematics Faculty, University of Freiburg.

Trendall, Richard (2005). hypReg: A Function for Fitting a Linear Regression Model in R with Hyperbolic Error. Masters Thesis, Statistics Faculty, University of Auckland.

Paolella, Marc S. (2007). Intermediate Probability: A Computational Approach. pp. 415 -Chichester: Wiley.

Scott, David J. and W<c3><bc>rtz, Diethelm and Chalabi, Yohan, (2011). Fitting the Hyperbolic Distribution with R: A Case Study of Optimization Techniques. In preparation.

Stryhn, H. and Christensen, J. (2003). Confidence intervals by the profile likelihood method, with applications in veterinary epidemiology. ISVEE X.

See Also

print.hyperblm prints the regression result in a table. coef.hyperblm obtains the regression coefficients and error distribution parameters of the fitted model. summary.hyperblm obtains a summary output of class hyperblm object. print.summary.hyperblm prints the summary output in a table. plot.hyperblm obtains a residual vs fitted value plot, a histgram of residuals with error distribution density curve on top, a histgram of log residuals with error distribution error density curve on top and a QQ plot. hyperblmFit, optim, nlm, constrOptim, hist, hyperbFitStand, hyperbFitStandStart.

Examples

## stackloss data example

 airflow <- stackloss[, 1]
 temperature <- stackloss[, 2]
 acid <- stackloss[, 3]
 stack <- stackloss[, 4]

 hyperblm.fit <- hyperblm(stack ~ airflow + temperature + acid,
                          tolerance = 1e-11)

 coef.hyperblm(hyperblm.fit)
 plot.hyperblm(hyperblm.fit, breaks = 20)
 summary.hyperblm(hyperblm.fit, hessian = FALSE)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GeneralizedHyperbolic)
Loading required package: DistributionUtils
Loading required package: RUnit
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/GeneralizedHyperbolic/hyperblm.Rd_%03d_medium.png", width=480, height=480)
> ### Name: hyperblm
> ### Title: Fitting Linear Models with Hyperbolic Errors
> ### Aliases: hyperblm print.hyperblm coef.hyperblm plot.hyperblm
> 
> ### ** Examples
> 
> ## stackloss data example
> 
>  airflow <- stackloss[, 1]
>  temperature <- stackloss[, 2]
>  acid <- stackloss[, 3]
>  stack <- stackloss[, 4]
> 
>  hyperblm.fit <- hyperblm(stack ~ airflow + temperature + acid,
+                           tolerance = 1e-11)
> 
>  coef.hyperblm(hyperblm.fit)
$coefficients
[1] -36.9799824   0.8129461   0.7648236  -0.1246500

$distributionParams
[1] -1.0715709  0.1904949  0.5529651  0.1737802

>  plot.hyperblm(hyperblm.fit, breaks = 20)
>  summary.hyperblm(hyperblm.fit, hessian = FALSE)

Call:
hyperblm(formula = stack ~ airflow + temperature + acid, tolerance = 1e-11)

Data:      (Intercept) airflow temperature acid 
Parameter estimates:
             Estimate Std. Error t value  Pr(>|t|)    
(Intercept) -36.97998   13.91268 -2.6580  0.018734 *  
airflow       0.81295    0.08635  9.4145 1.962e-07 ***
temperature   0.76482    0.22756  3.3610  0.004661 ** 
acid         -0.12465    0.14274 -0.8733  0.397237    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Parameters for hyperbolic error distribution:
     mu    delta    alpha     beta  
-1.0716   0.1905   0.5530   0.1738  
Likelihood:         
Method:             Nelder-Mead 
Convergence code:   0 
Iterations:         6 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>