Last data update: 2014.03.03

R: Function to fit penalised regression
penRegR Documentation

Function to fit penalised regression

Description

The function penReg() can be used to fit a P-spline. It can be used as demonstration of how the penalised B-splines can be fitted to one explanatory variable. For more that one explanatory variables use the function pb() in gamlss. The function penRegQ() is similar to the function penReg() but it estimates the "random effect" sigmas using the Q-function (marginal likelihood). The Q-function estimation takes longer but it has the advantage that standard errors are provided for log σ_e and log σ_b, where the sigmas are the standard errors for the response and the random effects respectively. The function pbq() is a smoother within GAMLSS and should give identical results to the additive function pb(). The function gamlss.pbq is not for use.

Usage

penReg(y, x, weights = rep(1, length(y)), df = NULL, lambda = NULL, start = 10, 
      inter = 20, order = 2, degree = 3,  plot = FALSE,
      method = c("ML", "ML-1", "GAIC", "GCV", "EM"), k = 2, ...)
penRegQ(y, x, weights = rep(1, length(y)), order = 2, start = 10, 
       plot = FALSE, lambda = NULL, inter = 20, degree = 3, 
       optim.proc = c("nlminb", "optim"), 
       optim.control = NULL)        
pbq(x, control = pbq.control(...), ...)
gamlss.pbq(x, y, w, xeval = NULL, ...)

Arguments

y

the response variable

x

the unique explanatory variable

weights

prior weights

w

weights in the iretation withing GAMLSS

df

effective degrees of freedom

lambda

the smoothing parameter

start

the lambda starting value if the local methods are used

inter

the no of break points (knots) in the x-axis

order

the required difference in the vector of coefficients

degree

the degree of the piecewise polynomial

plot

whether to plot the data and the fitted function

method

The method used in the (local) performance iterations. Available methods are "ML", "ML-1", "EM", "GAIC" and "GCV"

k

the penalty used in "GAIC" and "GCV"

optim.proc

which function to be use to optimise the Q-function, options are c("nlminb", "optim")

optim.control

options for the optimisation procedures

control

arguments for the fitting function. It takes one two: i) order the order of the B-spline and plot whether to plot the data and fit.

xeval

this is use for prediction

...

for extra arguments

Value

Returns a fitted object of class penReg. The object contains 1) the fitted coefficients 2) the fitted.values 3) the response variable y, 4) the label of the response variable ylabel 5) the explanatory variable x, 6) the lebel of the explanatory variable 7) the smoothing parameter lambda, 8) the effective degrees of freedom df, 9) the estimete for sigma sigma, 10) the residual sum of squares rss, 11) the Akaike information criterion aic, 12) the Bayesian information criterion sbc and 13) the deviance

Author(s)

Mikis Stasinopoulos d.stasinopoulos@londonmet.ac.uk, Bob Rigby r.rigby@londonmet.ac.uk and Paul Eilers

References

Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties (with comments and rejoinder). Statist. Sci, 11, 89-121.

Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.

Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, http://www.jstatsoft.org/v23/i07.

Examples

set.seed(1234)
x <- seq(0,10,length=200); y<-(yt<-1+2*x+.6*x^2-.1*x^3)+rnorm(200, 4)
library(gamlss)
#------------------ 
# df fixed
g1<-gamlss(y~pb(x, df=4))
m1<-penReg(y,x, df=4) 
cbind(g1$mu.coefSmo[[1]]$lambda, m1$lambda)
cbind(g1$mu.df, m1$edf)
cbind(g1$aic, m1$aic)
cbind(fitted(g1), fitted(m1))[1:10,]
# identical
#------------------
# estimate lambda using ML
g2<-gamlss(y~pb(x))
m2<-penReg(y,x) 
cbind(g2$mu.df, m2$edf)
cbind(g2$mu.lambda, m2$lambda) 
cbind(g2$aic, m2$aic) # different lambda
cbind(fitted(g2), fitted(m2))[1:10,]
# identical
#------------------
#  estimate lambda using GCV
g3 <- gamlss(y~pb(x, method="GCV"))
m3 <- penReg(y,x, method="GCV") 
cbind(g3$mu.df, m3$edf)
cbind(g3$mu.lambda, m3$lambda)
cbind(g3$aic, m3$aic)
cbind(fitted(g3), fitted(m3))[1:10,]
# almost identical
#------------------
#  estimate lambda using  GAIC(#=3)
g4<-gamlss(y~pb(x, method="GAIC", k=3))
m4<-penReg(y,x, method="GAIC", k=3) 
cbind(g4$mu.df, m4$edf )
cbind(g4$mu.lambda, m4$lambda)
cbind(g4$aic, m4$aic)
cbind(g4$mu.df, m4$df)
cbind(g4$mu.lambda, m4$lambda)
cbind(fitted(g4), fitted(m4))[1:10,]

#-------------------
plot(y~x)
lines(fitted(m1)~x, col="green")
lines(fitted(m2)~x, col="red")
lines(fitted(m3)~x, col="blue")
lines(fitted(m4)~x, col="yellow")
lines(fitted(m4)~x, col="grey")
# using the Q function

# the Q-function takes longer
system.time(g6<-gamlss(y~pbq(x)))
system.time(g61<-gamlss(y~pb(x)))
AIC(g6, g61)
#
system.time(m6<-penRegQ(y,x))
system.time(m61<-penReg(y,x)) 
AIC(m6, m61)

cbind(g6$mu.df, g61$mu.df,m6$edf, m61$edf)
cbind(g6$mu.lambda,g61$mu.lambda, m6$lambda, m61$lambda) 
cbind(g6$aic, AIC(g6), m6$aic, AIC(m6), m61$aic, AIC(m61)) 
cbind(fitted(g6), fitted(m6))[1:10,]

Results