Last data update: 2014.03.03

R: Growth charts regression quantiles
gcrqR Documentation

Growth charts regression quantiles

Description

Estimation of nonparametric growth charts via quantile regression. Quantile curves are estimated via B-splines with a L1 penalty on the spline coefficient differences, and non-crossing and monotonicity restrictions are set to obtain estimates more biologically plausible. Linear terms are allowed in the model specification.

Usage

gcrq(formula, tau = c(0.1, 0.25, 0.5, 0.75, 0.9), data, subset, weights, 
      na.action, transf=NULL, y = TRUE, interc=FALSE, 
      foldid = NULL, nfolds = 10, cv = FALSE, n.boot=0, eps=.0001, ...)

Arguments

formula

a standard R formula to specify the response in the left hand side, and the covariates in the right hand side. See Details.

tau

a numeric vector to specify the percentiles of interest. Default to (.1,.25,.5,.75,.9).

data

the dataframe where the variables required by the formula, subset and weights arguments are stored.

subset

optional. A vector specifying a subset of observations to be used in the fitting process.

weights

optional. A numeric vector specifying weights to be assigned to the observations in the fitting process. Currently unimplemented.

na.action

a function which indicates how the possible ‘NA’s are handled.

transf

an optional character string (with "y" as argument) meaning a function to apply to the response variable before fitting. E.g. "log(y+0.1)".

y

logical. If TRUE (default) the returned object includes also the responses vector.

interc

logical. If formula includes a "ps" term, interc=TRUE means that a model intercept is also estimated. If this is the case, a very small ridge-type penalty is exploited to allow estimation with a design matrix containing both a full B-spline basis and a column of ones. interc=TRUE overwrites the intercept specification in the formula (e.g., ~0+..), and it is ignored if the model does not include a "ps" term.

foldid

optional. A numeric vector identifying the group labels to perform cross validation to select the smoothing parameter. Ignored if the lambda argument in ps() is not a vector.

nfolds

optional. If foldid is not provided, it is scalar specifying the number of ‘folds’ (groups) which should be used to perform cross validation to select the smoothing parameter. Default to 10, but it is ignored if the lambda argument in ps() is not a vector.

cv

logical. If TRUE, the returned object includes also the matrix cv having number of rows equal to length of lambda and number of columns equal to nfolds. Ignored if the lambda argument in ps() is not a vector.

n.boot

Number of nonparametric (cases resampling) bootstrap samples to be used. Notice that the smoothing parameter (if relevant) does change throughout the bootstrap replicates.

eps

A small positive constant to ensure noncrossing curves. Use it at your risk! If eps is large, the resulting fitted quantile curves could appear unreasonable.

...

further arguments.

Details

The function fits regression quantiles at specified percentiles given in tau as a function of covariates specified in the formula argument. The formula may optionally include several ps terms to model nonlinear relationships with quantitative covariates, usually age in growth charts. When the lambda argument in ps() is scalar, it represents the actual smoothing parameter. When it is a vector, 'K-fold' cross validation is performed to select the ‘optimal’ lambda value and the model is fitted at such selected lambda value. To select the smoothing parameter via CV, foldid or nfolds may be supplied. If provided foldid overwrites nfolds, otherwise foldid is obtained via random extraction, namely sample(rep(seq(nfolds), length = n)). However selection of smoothing parameter is allowed with a unique ps() term in the formula.

Value

This function returns an object of class gcrq, that is a list with the following components

coefficients

The matrix of estimated regression parameters; the number of columns equals the number of the fitted quantile curves.

B

the design matrix of the final fit.

df

a vector reporting the df values for each quantile curve. See the section 'Warning' below.

rho

a vector including the values of the objective functions at the solution for each quantile curve.

info.smooth

some information on the smoothing term (if included in the formula via ps).

BB

further information on the smoothing term (if present in the formula via ps), including stuff useful for plotting via plot.gcrq().

Bderiv

if the smooth term is included, the first derivative of the B spline basis.

boot.coef

The array including the estimated coefficients at different bootstrap samples.

y

the response vector (if gcrq() has been called with y=TRUE).

contrasts

the contrasts used, when the model contains a factor.

xlevels

the levels of the factors (when included) used in fitting.

taus

a vector of values between 0 and 1 indicating the estimated quantile curves.

call

the matched call.

Warning

The function (and underlying method) works pretty well in obtaining point estimates and displaying quantile curves accordingly. Typically this is the main (and unique) goal when dealing with growth charts. However from a statistical viewpoint there are some important limitations affecting the theory and the relevant package,

  1. Computation of model degrees of freedom

  2. Computation of standard errors

Currently the function does not return standard errors for the parameter estimates (unless n.boot>0) and degrees of freedom are roughtly computed by summing the 'zero' residuals. Due to noncrossing constraints, the number of zero residuals might not be equal to the number of estimated parameters even in unpenalized models - except for the median, or for models where a single quantile curve has been fitted.

Note

This function is based upon the package quantreg by R. Koenker. Currently methods specific to the class "gcrq" are plot.gcrq, print.gcrq and summary.gcrq

Author(s)

Vito M. R. Muggeo, vito.muggeo@unipa.it

References

V. M. R. Muggeo, M. Sciandra, A. Tomasello, S. Calvo (2013). Estimating growth charts via nonparametric quantile regression: a practical framework with application in ecology, Environ Ecol Stat, 20, 519-531.

See Also

ps, plot.gcrq

Examples

## Not run: 
data(growthData) #load data
tauss<-seq(.1,.9,by=.1) #fix the percentiles of interest

m1<-gcrq(y~ps(x, mon=0), tau=tauss, data=growthData) #unpenalized.. very wiggly curves
#strongly penalized models
m2<-gcrq(y~ps(x, mon=0, lambda=1000, pdiff=2), tau=tauss, data=growthData) #linear 
m3<-gcrq(y~ps(x, mon=0, lambda=1000, pdiff=3), tau=tauss, data=growthData) #quadratic  

#penalized model with monotonicity restrictions
m4<-gcrq(y~ps(x, mon=1, lambda=10), tau=tauss, data=growthData)

#monotonicity constraints with varying penalty
m5<-gcrq(y~ps(x, mon=1, lambda=10, var.pen="(1:k)^3"), tau=tauss, data=growthData) 

par(mfrow=c(2,2))
plot(m1, pch=20, y=TRUE)
plot(m2, pch=20, y=TRUE)
plot(m3, add=TRUE, lwd=2)
plot(m4, pch=20, y=TRUE)
plot(m5, pch=20, y=TRUE, legend=TRUE)

#select lambda via 'K-fold' CV
m6<-gcrq(y~ps(x, lambda=seq(0,100,l=20)), tau=tauss, data=growthData) 
par(mfrow=c(1,2))
plot(m6, cv=TRUE) #display CV score versus lambda values
plot(m6, y=TRUE) #fitting at the best lambda value


## End(Not run)

Results