Estimation of nonparametric growth charts via quantile regression. Quantile curves are estimated via B-splines with
a L1 penalty on the spline coefficient differences, and non-crossing and monotonicity restrictions are set to obtain estimates
more biologically plausible. Linear terms are allowed in the model specification.
a standard R formula to specify the response in the left hand side, and the covariates in the right hand side. See Details.
tau
a numeric vector to specify the percentiles of interest. Default to (.1,.25,.5,.75,.9).
data
the dataframe where the variables required by the formula, subset and weights arguments are stored.
subset
optional. A vector specifying a subset of observations to be used in the fitting process.
weights
optional. A numeric vector specifying weights to be assigned to the observations in the fitting process.
Currently unimplemented.
na.action
a function which indicates how the possible ‘NA’s are handled.
transf
an optional character string (with "y" as argument) meaning a function to apply to the response variable
before fitting. E.g. "log(y+0.1)".
y
logical. If TRUE (default) the returned object includes also the responses vector.
interc
logical. If formula includes a "ps" term, interc=TRUE means that a model intercept is also estimated.
If this is the case, a very small ridge-type penalty is exploited to allow estimation with a design matrix containing both
a full B-spline basis and a column of ones. interc=TRUE overwrites the intercept specification in the formula (e.g., ~0+..),
and it is ignored if the model does not include a "ps" term.
foldid
optional. A numeric vector identifying the group labels to perform cross validation to select the smoothing parameter.
Ignored if the lambda argument in ps() is not a vector.
nfolds
optional. If foldid is not provided, it is scalar specifying the number of ‘folds’ (groups) which should be used to perform cross validation to select
the smoothing parameter. Default to 10, but it is ignored if the lambda argument in ps() is not a vector.
cv
logical. If TRUE, the returned object includes also the matrix cv
having number of rows equal to length of lambda and number of columns equal to nfolds.
Ignored if the lambda argument in ps() is not a vector.
n.boot
Number of nonparametric (cases resampling) bootstrap samples to be used. Notice that the smoothing parameter (if relevant) does change
throughout the bootstrap replicates.
eps
A small positive constant to ensure noncrossing curves. Use it at your risk! If eps is large, the resulting fitted quantile
curves could appear unreasonable.
...
further arguments.
Details
The function fits regression quantiles at specified percentiles given in tau as a function of
covariates specified in the formula argument. The formula may optionally include several ps
terms to model nonlinear relationships with quantitative covariates, usually age in growth charts. When the lambda argument
in ps() is scalar, it represents the actual smoothing
parameter. When it is a vector, 'K-fold' cross validation is performed to select the ‘optimal’ lambda value and the model is
fitted at such selected lambda value. To select the smoothing parameter via CV, foldid or nfolds
may be supplied. If provided foldid overwrites nfolds, otherwise foldid is obtained via random extraction, namely
sample(rep(seq(nfolds), length = n)). However selection of smoothing parameter is allowed with a unique ps() term in the formula.
Value
This function returns an object of class gcrq, that is a list with the following components
coefficients
The matrix of estimated regression parameters; the number of columns equals the number of the fitted quantile curves.
B
the design matrix of the final fit.
df
a vector reporting the df values for each quantile curve. See the section 'Warning' below.
rho
a vector including the values of the objective functions at the solution for each quantile curve.
info.smooth
some information on the smoothing term (if included in the formula via ps).
BB
further information on the smoothing term (if present in the formula via ps), including
stuff useful for plotting via plot.gcrq().
Bderiv
if the smooth term is included, the first derivative of the B spline basis.
boot.coef
The array including the estimated coefficients at different bootstrap samples.
y
the response vector (if gcrq() has been called with y=TRUE).
contrasts
the contrasts used, when the model contains a factor.
xlevels
the levels of the factors (when included) used in fitting.
taus
a vector of values between 0 and 1 indicating the estimated quantile curves.
call
the matched call.
Warning
The function (and underlying method) works pretty well in obtaining point estimates and displaying quantile curves accordingly.
Typically this is the main (and unique) goal when dealing with growth charts. However from a statistical viewpoint
there are some important limitations affecting the theory and the relevant package,
Computation of model degrees of freedom
Computation of standard errors
Currently the function does not return standard errors for the parameter estimates (unless n.boot>0) and degrees of freedom
are roughtly computed by summing the 'zero' residuals. Due to noncrossing constraints, the number of zero residuals
might not be equal to the number of estimated parameters even in unpenalized models - except for the median, or for models
where a single quantile curve has been fitted.
Note
This function is based upon the package quantreg by R. Koenker.
Currently methods specific to the class "gcrq" are plot.gcrq, print.gcrq and summary.gcrq
V. M. R. Muggeo, M. Sciandra, A. Tomasello, S. Calvo (2013).
Estimating growth charts via nonparametric quantile
regression: a practical framework with application
in ecology, Environ Ecol Stat, 20, 519-531.
See Also
ps, plot.gcrq
Examples
## Not run:
data(growthData) #load data
tauss<-seq(.1,.9,by=.1) #fix the percentiles of interest
m1<-gcrq(y~ps(x, mon=0), tau=tauss, data=growthData) #unpenalized.. very wiggly curves
#strongly penalized models
m2<-gcrq(y~ps(x, mon=0, lambda=1000, pdiff=2), tau=tauss, data=growthData) #linear
m3<-gcrq(y~ps(x, mon=0, lambda=1000, pdiff=3), tau=tauss, data=growthData) #quadratic
#penalized model with monotonicity restrictions
m4<-gcrq(y~ps(x, mon=1, lambda=10), tau=tauss, data=growthData)
#monotonicity constraints with varying penalty
m5<-gcrq(y~ps(x, mon=1, lambda=10, var.pen="(1:k)^3"), tau=tauss, data=growthData)
par(mfrow=c(2,2))
plot(m1, pch=20, y=TRUE)
plot(m2, pch=20, y=TRUE)
plot(m3, add=TRUE, lwd=2)
plot(m4, pch=20, y=TRUE)
plot(m5, pch=20, y=TRUE, legend=TRUE)
#select lambda via 'K-fold' CV
m6<-gcrq(y~ps(x, lambda=seq(0,100,l=20)), tau=tauss, data=growthData)
par(mfrow=c(1,2))
plot(m6, cv=TRUE) #display CV score versus lambda values
plot(m6, y=TRUE) #fitting at the best lambda value
## End(Not run)