R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Growth charts regression quantiles

gcrq	R Documentation

Growth charts regression quantiles

Description

Estimation of nonparametric growth charts via quantile regression. Quantile curves are estimated via B-splines with a L1 penalty on the spline coefficient differences, and non-crossing and monotonicity restrictions are set to obtain estimates more biologically plausible. Linear terms are allowed in the model specification.

Usage

gcrq(formula, tau = c(0.1, 0.25, 0.5, 0.75, 0.9), data, subset, weights, 
      na.action, transf=NULL, y = TRUE, interc=FALSE, 
      foldid = NULL, nfolds = 10, cv = FALSE, n.boot=0, eps=.0001, ...)

Arguments

`formula`	a standard R formula to specify the response in the left hand side, and the covariates in the right hand side. See Details.
`tau`	a numeric vector to specify the percentiles of interest. Default to (.1,.25,.5,.75,.9).
`data`	the dataframe where the variables required by the formula, subset and weights arguments are stored.
`subset`	optional. A vector specifying a subset of observations to be used in the fitting process.
`weights`	optional. A numeric vector specifying weights to be assigned to the observations in the fitting process. Currently unimplemented.
`na.action`	a function which indicates how the possible ‘NA’s are handled.
`transf`	an optional character string (with `"y"` as argument) meaning a function to apply to the response variable before fitting. E.g. `"log(y+0.1)"`.
`y`	logical. If `TRUE` (default) the returned object includes also the responses vector.
`interc`	logical. If `formula` includes a `"ps"` term, `interc=TRUE` means that a model intercept is also estimated. If this is the case, a very small ridge-type penalty is exploited to allow estimation with a design matrix containing both a full B-spline basis and a column of ones. `interc=TRUE` overwrites the intercept specification in the formula (e.g., `~0+..`), and it is ignored if the model does not include a `"ps"` term.
`foldid`	optional. A numeric vector identifying the group labels to perform cross validation to select the smoothing parameter. Ignored if the `lambda` argument in `ps()` is not a vector.
`nfolds`	optional. If `foldid` is not provided, it is scalar specifying the number of ‘folds’ (groups) which should be used to perform cross validation to select the smoothing parameter. Default to 10, but it is ignored if the `lambda` argument in `ps()` is not a vector.
`cv`	logical. If `TRUE`, the returned object includes also the matrix `cv` having number of rows equal to length of `lambda` and number of columns equal to `nfolds`. Ignored if the `lambda` argument in `ps()` is not a vector.
`n.boot`	Number of nonparametric (cases resampling) bootstrap samples to be used. Notice that the smoothing parameter (if relevant) does change throughout the bootstrap replicates.
`eps`	A small positive constant to ensure noncrossing curves. Use it at your risk! If `eps` is large, the resulting fitted quantile curves could appear unreasonable.
`...`	further arguments.

Details

The function fits regression quantiles at specified percentiles given in tau as a function of covariates specified in the formula argument. The formula may optionally include several ps terms to model nonlinear relationships with quantitative covariates, usually age in growth charts. When the lambda argument in ps() is scalar, it represents the actual smoothing parameter. When it is a vector, 'K-fold' cross validation is performed to select the ‘optimal’ lambda value and the model is fitted at such selected lambda value. To select the smoothing parameter via CV, foldid or nfolds may be supplied. If provided foldid overwrites nfolds, otherwise foldid is obtained via random extraction, namely sample(rep(seq(nfolds), length = n)). However selection of smoothing parameter is allowed with a unique ps() term in the formula.

Value

This function returns an object of class gcrq, that is a list with the following components

`coefficients`	The matrix of estimated regression parameters; the number of columns equals the number of the fitted quantile curves.
`B`	the design matrix of the final fit.
`df`	a vector reporting the df values for each quantile curve. See the section 'Warning' below.
`rho`	a vector including the values of the objective functions at the solution for each quantile curve.
`info.smooth`	some information on the smoothing term (if included in the formula via `ps`).
`BB`	further information on the smoothing term (if present in the formula via `ps`), including stuff useful for plotting via `plot.gcrq()`.
`Bderiv`	if the smooth term is included, the first derivative of the B spline basis.
`boot.coef`	The array including the estimated coefficients at different bootstrap samples.
`y`	the response vector (if gcrq() has been called with `y=TRUE`).
`contrasts`	the contrasts used, when the model contains a factor.
`xlevels`	the levels of the factors (when included) used in fitting.
`taus`	a vector of values between 0 and 1 indicating the estimated quantile curves.
`call`	the matched call.

Warning

The function (and underlying method) works pretty well in obtaining point estimates and displaying quantile curves accordingly. Typically this is the main (and unique) goal when dealing with growth charts. However from a statistical viewpoint there are some important limitations affecting the theory and the relevant package,

Computation of model degrees of freedom
Computation of standard errors

Currently the function does not return standard errors for the parameter estimates (unless n.boot>0) and degrees of freedom are roughtly computed by summing the 'zero' residuals. Due to noncrossing constraints, the number of zero residuals might not be equal to the number of estimated parameters even in unpenalized models - except for the median, or for models where a single quantile curve has been fitted.

Note

This function is based upon the package quantreg by R. Koenker. Currently methods specific to the class "gcrq" are plot.gcrq, print.gcrq and summary.gcrq

Author(s)

Vito M. R. Muggeo, vito.muggeo@unipa.it

References

V. M. R. Muggeo, M. Sciandra, A. Tomasello, S. Calvo (2013). Estimating growth charts via nonparametric quantile regression: a practical framework with application in ecology, Environ Ecol Stat, 20, 519-531.

Examples

## Not run: 
data(growthData) #load data
tauss<-seq(.1,.9,by=.1) #fix the percentiles of interest

m1<-gcrq(y~ps(x, mon=0), tau=tauss, data=growthData) #unpenalized.. very wiggly curves
#strongly penalized models
m2<-gcrq(y~ps(x, mon=0, lambda=1000, pdiff=2), tau=tauss, data=growthData) #linear 
m3<-gcrq(y~ps(x, mon=0, lambda=1000, pdiff=3), tau=tauss, data=growthData) #quadratic  

#penalized model with monotonicity restrictions
m4<-gcrq(y~ps(x, mon=1, lambda=10), tau=tauss, data=growthData)

#monotonicity constraints with varying penalty
m5<-gcrq(y~ps(x, mon=1, lambda=10, var.pen="(1:k)^3"), tau=tauss, data=growthData) 

par(mfrow=c(2,2))
plot(m1, pch=20, y=TRUE)
plot(m2, pch=20, y=TRUE)
plot(m3, add=TRUE, lwd=2)
plot(m4, pch=20, y=TRUE)
plot(m5, pch=20, y=TRUE, legend=TRUE)

#select lambda via 'K-fold' CV
m6<-gcrq(y~ps(x, lambda=seq(0,100,l=20)), tau=tauss, data=growthData) 
par(mfrow=c(1,2))
plot(m6, cv=TRUE) #display CV score versus lambda values
plot(m6, y=TRUE) #fitting at the best lambda value


## End(Not run)