Last data update: 2014.03.03

R: Pseudo Data Algorithm for Robust Estmation of GAMs
robustgamR Documentation

Pseudo Data Algorithm for Robust Estmation of GAMs

Description

This function implements a fast and stable algorithm developed in Wong, Yao and Lee (2013) for robust estimation of generalized additive models. Currently, this implementation only covers binomial and poisson distributions. This function does not choose smoothing parameter by itself and the user has to specify it manunally. For implementation with automatic smoothing parameter selection, see robustgam.GIC, robustgam.GIC.optim, robustgam.CV. For prediction, see pred.robustgam.

Usage

robustgam(X, y, family, p=3, K=30, c=1.345, sp=-1, show.msg=FALSE, count.lim=200,
          w.count.lim=50, smooth.basis="tp", wx=FALSE)

Arguments

X

a vector or a matrix (each covariate form a column) of covariates

y

a vector of responses

family

A family object specifying the distribution and the link function. See glm and family.

p

order of the basis. It depends on the option of smooth.basis.

K

number of knots of the basis; dependent on the option of smooth.basis.

c

tunning parameter for Huber function; a smaller value of c corresponds to a more robust fit. It is recommended to set as 1.2 and 1.6 for binomial and poisson distribution respectively.

sp

a vector of smoothing parameter. If only one value is specified, it will be used for all smoothing parameters.

show.msg

If show.msg=T, progress is displayed.

count.lim

maximum number of iterations of the whole algorithm

w.count.lim

maximum number of updates on the weight. It corresponds to zeta in Wong, Yao and Lee (2013)

smooth.basis

the specification of basis. Four choices are available: "tp" = thin plate regression spline, "cr" = cubic regression spline, "ps" = P-splines, "tr" = truncated power spline. For more details, see smooth.construct.

wx

If wx=T, robust weight on the covariates are applied. For details, see Real Data Example in Wong, Yao and Lee (2013)

Value

fitted.values

fitted values

initial.fitted

the starting values of the algorithm

beta

estimated coefficients (corresponding to the basis)

B

the basis: fitted linear estimator = B%*%beta

sD

for internal use

basis

the smooth construct object. For more details, see smooth.construct

converge

If converge=T, the algorithm converged.

w

for internal use

family

the family object

wx

Indicate whether robust weight on covariates is applied

beta.fit

for internal use

Author(s)

Raymond K. W. Wong <raymondkww.dev@gmail.com>

References

Raymond K. W. Wong, Fang Yao and Thomas C. M. Lee (2013) Robust Estimation for Generalized Additive Models. Journal of Graphical and Computational Statistics, to appear.

See Also

robustgam.GIC, robustgam.GIC.optim, robustgam.CV, pred.robustgam

Examples

# load library
library(robustgam)

# test function
test.fun <- function(x, ...) {
  return(2*sin(2*pi*(1-x)^2))
}

# some setting
set.seed(1234)
true.family <- poisson()
out.prop <- 0.05
n <- 100

# generating dataset for poisson case
x <- runif(n)
x <- x[order(x)]
true.eta <- test.fun(x)
true.mu <- true.family$linkinv(test.fun(x))
y <- rpois(n, true.mu) # for poisson case

# create outlier for poisson case
out.n <- trunc(n*out.prop)
out.list <- sample(1:n, out.n, replace=FALSE)
y[out.list] <- round(y[out.list]*runif(out.n,min=3,max=5)^(sample(c(-1,1),out.n,TRUE)))

# robust GAM fit
robustfit <- robustgam(x, y, family=true.family, p=3, c=1.6, sp=0.000143514, show.msg=FALSE,
 smooth.basis='tp')
## the smoothing parameter is selected by the RBIC, the command is:
# robustfit.gic <- robustgam.GIC.optim(x, y, family=true.family, p=3, c=1.6, show.msg=FALSE,
#   count.lim=400, smooth.basis='tp', lsp.initial=log(1e-2) ,lsp.min=-15, lsp.max=10,
#   gic.constant=log(n), method="L-BFGS-B"); robustfit <- robustfit.gic$optim.fit


# ordinary GAM fit
nonrobustfit <- gam(y~s(x, bs="tp", m=3),family=true.family) # m = p for 'tp'

# prediction
x.new <- seq(range(x)[1], range(x)[2], len=1000)
robustfit.new <- pred.robustgam(robustfit, data.frame(X=x.new))$predict.values
nonrobustfit.new <- as.vector(predict.gam(nonrobustfit,data.frame(x=x.new),type="response"))

# plot
plot(x, y)
lines(x.new, true.family$linkinv(test.fun(x.new)), col="blue")
lines(x.new, robustfit.new, col="red")
lines(x.new, nonrobustfit.new, col="green")
legend(0.6, 23, c("true mu", "robust fit", "nonrobust fit"), col=c("blue","red","green"),
  lty=c(1,1,1))

Results