R: Compute Model-averaged Parameter Estimate with Shrinkage...
modavgShrink
R Documentation
Compute Model-averaged Parameter Estimate with Shrinkage (Multimodel Inference)
Description
This function computes an alternative version of model-averaging
parameter estimates that consists in shrinking estimates toward 0 to
reduce model selection bias as in Burnham and Anderson (2002, p. 152),
Anderson (2008, pp. 130-132) and Lukacs et al. (2010). Specifically,
models without the parameter of interest have an estimate and variance
of 0. modavgShrink also returns unconditional standard errors
and unconditional confidence intervals as described in Buckland et
al. (1997) and Burnham and Anderson (2002).
a list storing each of the models in the candidate model set.
parm
the parameter of interest, enclosed between quotes, for which a
model-averaged estimate is required. For a categorical variable,
the label of the estimate must be included as it appears in the output
(see 'Details' below).
modnames
a character vector of model names to facilitate the identification of
each model in the model selection table. If NULL, the function
uses the names in the cand.set list of candidate models. If no names
appear in the list, generic names (e.g., Mod1, Mod2) are
supplied in the table in the same order as in the list of candidate
models.
second.ord
logical. If TRUE, the function returns the second-order Akaike
information criterion (i.e., AICc).
nobs
this argument allows to specify a numeric value other than total
sample size to compute the AICc (i.e., nobs defaults to total
number of observations). This is relevant only for mixed models or
various models of unmarkedFit classes where sample size is not
straightforward. In such cases, one might use total number of
observations or number of independent clusters (e.g., sites) as the
value of nobs.
uncond.se
either, "old", or "revised", specifying the equation
used to compute the unconditional standard error of a model-averaged
estimate. With uncond.se = "old", computations are based on
equation 4.9 of Burnham and Anderson (2002), which was the former way
to compute unconditional standard errors. With uncond.se =
"revised", equation 6.12 of Burnham and Anderson (2002) is used.
Anderson (2008, p. 111) recommends use of the revised version for the
computation of unconditional standard errors and it is now the
default. Note that versions of package AICcmodavg < 1.04 used the old
method to compute unconditional standard errors.
conf.level
the confidence level (1 - α) requested for the computation of
unconditional confidence intervals.
c.hat
value of overdispersion parameter (i.e., variance inflation factor) such
as that obtained from c_hat. Note that values of c.hat different
from 1 are only appropriate for binomial GLM's with trials > 1 (i.e.,
success/trial or cbind(success, failure) syntax), with Poisson GLM's,
single-season occupancy models (MacKenzie et al. 2002), dynamic
occupancy models (MacKenzie et al. 2003), or N-mixture models
(Royle 2004, Dail and Madsen 2011). If c.hat > 1,
modavgShrink will return the quasi-likelihood analogue of the
information criteria requested and multiply the variance-covariance
matrix of the estimates by this value (i.e., SE's are multiplied by
sqrt(c.hat)). This option is not supported for generalized
linear mixed models of the mer or merMod classes.
gamdisp
if gamma GLM is used, the dispersion parameter should be specified here
to apply the same value to each model.
parm.type
this argument specifies the parameter type on which the effect size
will be computed and is only relevant for models of
unmarkedFitOccu, unmarkedFitColExt,
unmarkedFitOccuFP, unmarkedFitOccuRN,
unmarkedFitMPois, unmarkedFitPCount,
unmarkedFitPCO, unmarkedFitDS, unmarkedFitGDS,
unmarkedFitGMM, and unmarkedFitGPC classes. The
character strings supported vary with the type of model fitted. For
unmarkedFitOccu objects, either psi or detect can
be supplied to indicate whether the parameter is on occupancy or
detectability, respectively. For unmarkedFitColExt, possible
values are psi, gamma, epsilon, and
detect, for parameters on occupancy in the inital year,
colonization, extinction, and detectability, respectively. For
unmarkedFitOccuFP objects, one can specify psi,
detect, or fp, for occupancy, detectability, and
probability of assigning false-positives, respectively. For
unmarkedFitOccuRN objects, either lambda or
detect can be entered for abundance and detectability
parameters, respectively. For unmarkedFitPCount and
unmarkedFitMPois objects, lambda or detect denote
parameters on abundance and detectability, respectively. For
unmarkedFitPCO objects, one can enter lambda,
gamma, omega, or detect, to specify parameters on
abundance, recruitment, apparent survival, and detectability,
respectively. For unmarkedFitDS objects, only lambda is
supported for the moment. For unmarkedFitGDS, lambda
and phi denote abundance and availability, respectively. For
unmarkedFitGMM and unmarkedFitGPC objects,
lambda, phi, and detect denote abundance,
availability, and detectability, respectively.
...
additional arguments passed to the function.
Details
The parameter for which a model-averaged estimate is requested must be
specified with the parm argument and must be identical to its
label in the model output (e.g., from summary). For factors, one
must specify the name of the variable and the level of interest. The
shrinkage version of model averaging is only appropriate for cases where
each parameter is given an equal weighting in the model (i.e., each
parameter must appear the same number of times in the models) and has
the same interpretation across all models. As a result, models with
interaction terms or polynomial terms are not supported by
modavgShrink.
modavgShrink is implemented for a list containing objects of
aov, betareg, clm, clmm, clogit,
coxme, coxph, glm, gls, hurdle,
lm, lme, lmekin, maxlikeFit, mer,
glmerMod, lmerMod, multinom, polr,
rlm, survreg, vglm, zeroinfl classes as well
as various models of unmarkedFit classes.
Value
modavgShrink creates an object of class modavgShrink
with the following components:
Parameter
the parameter for which a model-averaged estimate with
shrinkage was obtained
Mod.avg.table
the model selection table based on models including
the parameter of interest
Mod.avg.beta
the model-averaged estimate based on all models
Uncond.SE
the unconditional standard error for the model-averaged
estimate (as opposed to the conditional SE based on a single model)
Conf.level
the confidence level used to compute the confidence
interval
Lower.CL
the lower confidence limit
Upper.CL
the upper confidence limit
Author(s)
Marc J. Mazerolle
References
Anderson, D. R. (2008) Model-based Inference in the Life Sciences:
a primer on evidence. Springer: New York.
Buckland, S. T., Burnham, K. P., Augustin, N. H. (1997) Model selection:
an integral part of inference. Biometrics53, 603–618.
Burnham, K. P., Anderson, D. R. (2002) Model Selection and
Multimodel Inference: a practical information-theoretic
approach. Second edition. Springer: New York.
Burnham, K. P., Anderson, D. R. (2004) Multimodel inference:
understanding AIC and BIC in model selection. Sociological
Methods and Research33, 261–304.
Dail, D., Madsen, L. (2011) Models for estimating abundance from
repeated counts of an open population. Biometrics67,
577–587.
Lukacs, P. M., Burnham, K. P., Anderson, D. R. (2010) Model selection
bias and Freedman's paradox. Annals of the Institute of
Statistical Mathematics62, 117–125.
MacKenzie, D. I., Nichols, J. D., Lachman, G. B., Droege, S., Royle,
J. A., Langtimm, C. A. (2002) Estimating site occupancy rates when
detection probabilities are less than one. Ecology83,
2248–2255.
MacKenzie, D. I., Nichols, J. D., Hines, J. E., Knutson, M. G.,
Franklin, A. B. (2003) Estimating site occupancy, colonization, and
local extinction when a species is detected imperfectly. Ecology84, 2200–2207.
Mazerolle, M. J. (2006) Improving data analysis in herpetology: using
Akaike's Information Criterion (AIC) to assess the strength of
biological hypotheses. Amphibia-Reptilia27, 169–180.
Royle, J. A. (2004) N-mixture models for estimating population
size from spatially replicated counts. Biometrics60,
108–115.
##cement example in Burnham and Anderson 2002
data(cement)
##setup same model set as in Table 3.2, p. 102
Cand.models <- list( )
Cand.models[[1]] <- lm(y ~ x1 + x2, data = cement)
Cand.models[[2]] <- lm(y ~ x1 + x2 + x4, data = cement)
Cand.models[[3]] <- lm(y ~ x1 + x2 + x3, data = cement)
Cand.models[[4]] <- lm(y ~ x1 + x4, data = cement)
Cand.models[[5]] <- lm(y ~ x1 + x3 + x4, data = cement)
Cand.models[[6]] <- lm(y ~ x2 + x3 + x4, data = cement)
Cand.models[[7]] <- lm(y ~ x1 + x2 + x3 + x4, data = cement)
Cand.models[[8]] <- lm(y ~ x3 + x4, data = cement)
Cand.models[[9]] <- lm(y ~ x2 + x3, data = cement)
Cand.models[[10]] <- lm(y ~ x4, data = cement)
Cand.models[[11]] <- lm(y ~ x2, data = cement)
Cand.models[[12]] <- lm(y ~ x2 + x4, data = cement)
Cand.models[[13]] <- lm(y ~ x1, data = cement)
Cand.models[[14]] <- lm(y ~ x1 + x3, data = cement)
Cand.models[[15]] <- lm(y ~ x3, data = cement)
##vector of model names
Modnames <- paste("mod", 1:15, sep="")
##AICc
aictab(cand.set = Cand.models, modnames = Modnames)
##compute model-averaged estimate with shrinkage - each parameter
##appears 8 times in the models
modavgShrink(cand.set = Cand.models, modnames = Modnames, parm = "x1")
##compare against classic model-averaging
modavg(cand.set = Cand.models, modnames = Modnames, parm = "x1")
##note that model-averaged estimate with shrinkage is closer to 0 than
##with the classic version
##remove a few models from the set and run again
Cand.unbalanced <- Cand.models[-c(3, 14, 15)]
##set up model names
Modnames <- paste("mod", 1:length(Cand.unbalanced), sep="")
##issues an error because some parameters appear more often than others
## Not run: modavgShrink(cand.set = Cand.unbalanced,
modnames = Modnames, parm = "x1")
## End(Not run)
##example on Orthodont data set in nlme
## Not run:
require(nlme)
##set up candidate model list
##age and sex parameters appear in the same number of models
##same number of models with and without these parameters
Cand.models <- list( )
Cand.models[[1]] <- lme(distance ~ age, data = Orthodont, method = "ML")
##random is ~ age | Subject as it is a grouped data frame
Cand.models[[2]] <- lme(distance ~ age + Sex, data = Orthodont,
random = ~ 1, method = "ML")
Cand.models[[3]] <- lme(distance ~ 1, data = Orthodont, random = ~ 1,
method = "ML")
Cand.models[[4]] <- lme(distance ~ Sex, data = Orthodont, random = ~ 1,
method = "ML")
##create a vector of model names
Modnames <- paste("mod", 1:length(Cand.models), sep = "")
##compute importance values for age
imp.age <- importance(cand.set = Cand.models, parm = "age",
modnames = Modnames, second.ord = TRUE,
nobs = NULL)
##compute shrinkage version of model averaging on age
mod.avg.age.shrink <- modavgShrink(cand.set = Cand.models,
parm = "age", modnames = Modnames,
second.ord = TRUE, nobs = NULL)
##compute classic version of model averaging on age
mod.avg.age.classic <- modavg(cand.set = Cand.models, parm = "age",
modnames = Modnames, second.ord = TRUE,
nobs = NULL)
##correspondence between shrinkage version and classic version of
##model averaging
mod.avg.age.shrink$Mod.avg.beta/imp.age$w.plus
mod.avg.age.classic$Mod.avg.beta
detach(package:nlme)
## End(Not run)
##example of N-mixture model modified from ?pcount
## Not run:
require(unmarked)
data(mallard)
mallardUMF <- unmarkedFramePCount(mallard.y, siteCovs = mallard.site,
obsCovs = mallard.obs)
##set up models so that each variable on abundance appears twice
fm.mall.one <- pcount(~ ivel + date ~ length + forest, mallardUMF,
K = 30)
fm.mall.two <- pcount(~ ivel + date ~ elev + forest, mallardUMF,
K = 30)
fm.mall.three <- pcount(~ ivel + date ~ length + elev, mallardUMF,
K = 30)
##model list and names
Cands <- list(fm.mall.one, fm.mall.two, fm.mall.three)
Modnames <- c("length + forest", "elev + forest", "length + elev")
##compute model-averaged estimate with shrinkage for elev on abundance
modavgShrink(cand.set = Cands, modnames = Modnames, parm = "elev",
parm.type = "lambda")
detach(package:unmarked)
## End(Not run)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(AICcmodavg)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/AICcmodavg/modavgShrink.Rd_%03d_medium.png", width=480, height=480)
> ### Name: modavgShrink
> ### Title: Compute Model-averaged Parameter Estimate with Shrinkage
> ### (Multimodel Inference)
> ### Aliases: modavgShrink modavgShrink.default modavgShrink.AICaov.lm
> ### modavgShrink.AICbetareg modavgShrink.AICsclm.clm modavgShrink.AICclmm
> ### modavgShrink.AICcoxme modavgShrink.AICcoxph modavgShrink.AICglm.lm
> ### modavgShrink.AICgls modavgShrink.AIChurdle modavgShrink.AIClm
> ### modavgShrink.AIClme modavgShrink.AIClmekin
> ### modavgShrink.AICmaxlikeFit.list modavgShrink.AICmer
> ### modavgShrink.AICglmerMod modavgShrink.AIClmerMod
> ### modavgShrink.AICmultinom.nnet modavgShrink.AICpolr
> ### modavgShrink.AICrlm.lm modavgShrink.AICsurvreg modavgShrink.AICvglm
> ### modavgShrink.AICzeroinfl modavgShrink.AICunmarkedFitOccu
> ### modavgShrink.AICunmarkedFitColExt modavgShrink.AICunmarkedFitOccuRN
> ### modavgShrink.AICunmarkedFitPCount modavgShrink.AICunmarkedFitPCO
> ### modavgShrink.AICunmarkedFitDS modavgShrink.AICunmarkedFitGDS
> ### modavgShrink.AICunmarkedFitOccuFP modavgShrink.AICunmarkedFitMPois
> ### modavgShrink.AICunmarkedFitGMM modavgShrink.AICunmarkedFitGPC
> ### print.modavgShrink
> ### Keywords: models
>
> ### ** Examples
>
> ##cement example in Burnham and Anderson 2002
> data(cement)
> ##setup same model set as in Table 3.2, p. 102
> Cand.models <- list( )
> Cand.models[[1]] <- lm(y ~ x1 + x2, data = cement)
> Cand.models[[2]] <- lm(y ~ x1 + x2 + x4, data = cement)
> Cand.models[[3]] <- lm(y ~ x1 + x2 + x3, data = cement)
> Cand.models[[4]] <- lm(y ~ x1 + x4, data = cement)
> Cand.models[[5]] <- lm(y ~ x1 + x3 + x4, data = cement)
> Cand.models[[6]] <- lm(y ~ x2 + x3 + x4, data = cement)
> Cand.models[[7]] <- lm(y ~ x1 + x2 + x3 + x4, data = cement)
> Cand.models[[8]] <- lm(y ~ x3 + x4, data = cement)
> Cand.models[[9]] <- lm(y ~ x2 + x3, data = cement)
> Cand.models[[10]] <- lm(y ~ x4, data = cement)
> Cand.models[[11]] <- lm(y ~ x2, data = cement)
> Cand.models[[12]] <- lm(y ~ x2 + x4, data = cement)
> Cand.models[[13]] <- lm(y ~ x1, data = cement)
> Cand.models[[14]] <- lm(y ~ x1 + x3, data = cement)
> Cand.models[[15]] <- lm(y ~ x3, data = cement)
>
> ##vector of model names
> Modnames <- paste("mod", 1:15, sep="")
>
> ##AICc
> aictab(cand.set = Cand.models, modnames = Modnames)
Model selection based on AICc:
K AICc Delta_AICc AICcWt Cum.Wt LL
mod1 4 69.31 0.00 0.57 0.57 -28.16
mod2 5 72.44 3.13 0.12 0.68 -26.93
mod3 5 72.48 3.16 0.12 0.80 -26.95
mod4 4 72.63 3.32 0.11 0.91 -29.82
mod5 5 73.19 3.88 0.08 0.99 -27.31
mod6 5 78.04 8.73 0.01 1.00 -29.73
mod7 6 79.84 10.52 0.00 1.00 -26.92
mod8 4 83.74 14.43 0.00 1.00 -35.37
mod9 4 94.93 25.62 0.00 1.00 -40.96
mod10 3 100.41 31.10 0.00 1.00 -45.87
mod11 3 100.74 31.42 0.00 1.00 -46.04
mod12 4 104.52 35.21 0.00 1.00 -45.76
mod13 3 105.08 35.77 0.00 1.00 -48.21
mod14 4 109.01 39.70 0.00 1.00 -48.00
mod15 3 110.63 41.31 0.00 1.00 -50.98
>
> ##compute model-averaged estimate with shrinkage - each parameter
> ##appears 8 times in the models
> modavgShrink(cand.set = Cand.models, modnames = Modnames, parm = "x1")
Multimodel inference on "x1" based on AICc
AICc table used to obtain model-averaged estimate with shrinkage:
K AICc Delta_AICc AICcWt Estimate SE
mod1 4 69.31 0.00 0.57 1.47 0.12
mod2 5 72.44 3.13 0.12 1.45 0.12
mod3 5 72.48 3.16 0.12 1.70 0.20
mod4 4 72.63 3.32 0.11 1.44 0.14
mod5 5 73.19 3.88 0.08 1.05 0.22
mod6 5 78.04 8.73 0.01 0.00 0.00
mod7 6 79.84 10.52 0.00 1.55 0.74
mod8 4 83.74 14.43 0.00 0.00 0.00
mod9 4 94.93 25.62 0.00 0.00 0.00
mod10 3 100.41 31.10 0.00 0.00 0.00
mod11 3 100.74 31.42 0.00 0.00 0.00
mod12 4 104.52 35.21 0.00 0.00 0.00
mod13 3 105.08 35.77 0.00 1.87 0.53
mod14 4 109.01 39.70 0.00 2.31 0.96
mod15 3 110.63 41.31 0.00 0.00 0.00
Model-averaged estimate with shrinkage: 1.44
Unconditional SE: 0.24
95% Unconditional confidence interval: 0.97, 1.92
>
> ##compare against classic model-averaging
> modavg(cand.set = Cand.models, modnames = Modnames, parm = "x1")
Multimodel inference on "x1" based on AICc
AICc table used to obtain model-averaged estimate:
K AICc Delta_AICc AICcWt Estimate SE
mod1 4 69.31 0.00 0.57 1.47 0.12
mod2 5 72.44 3.13 0.12 1.45 0.12
mod3 5 72.48 3.16 0.12 1.70 0.20
mod4 4 72.63 3.32 0.11 1.44 0.14
mod5 5 73.19 3.88 0.08 1.05 0.22
mod7 6 79.84 10.52 0.00 1.55 0.74
mod13 3 105.08 35.77 0.00 1.87 0.53
mod14 4 109.01 39.70 0.00 2.31 0.96
Model-averaged estimate: 1.46
Unconditional SE: 0.21
95% Unconditional confidence interval: 1.05, 1.86
> ##note that model-averaged estimate with shrinkage is closer to 0 than
> ##with the classic version
>
> ##remove a few models from the set and run again
> Cand.unbalanced <- Cand.models[-c(3, 14, 15)]
>
> ##set up model names
> Modnames <- paste("mod", 1:length(Cand.unbalanced), sep="")
>
> ##issues an error because some parameters appear more often than others
> ## Not run:
> ##D modavgShrink(cand.set = Cand.unbalanced,
> ##D modnames = Modnames, parm = "x1")
> ## End(Not run)
>
>
>
> ##example on Orthodont data set in nlme
> ## Not run:
> ##D require(nlme)
> ##D
> ##D ##set up candidate model list
> ##D ##age and sex parameters appear in the same number of models
> ##D ##same number of models with and without these parameters
> ##D Cand.models <- list( )
> ##D Cand.models[[1]] <- lme(distance ~ age, data = Orthodont, method = "ML")
> ##D ##random is ~ age | Subject as it is a grouped data frame
> ##D Cand.models[[2]] <- lme(distance ~ age + Sex, data = Orthodont,
> ##D random = ~ 1, method = "ML")
> ##D Cand.models[[3]] <- lme(distance ~ 1, data = Orthodont, random = ~ 1,
> ##D method = "ML")
> ##D Cand.models[[4]] <- lme(distance ~ Sex, data = Orthodont, random = ~ 1,
> ##D method = "ML")
> ##D
> ##D ##create a vector of model names
> ##D Modnames <- paste("mod", 1:length(Cand.models), sep = "")
> ##D
> ##D ##compute importance values for age
> ##D imp.age <- importance(cand.set = Cand.models, parm = "age",
> ##D modnames = Modnames, second.ord = TRUE,
> ##D nobs = NULL)
> ##D
> ##D ##compute shrinkage version of model averaging on age
> ##D mod.avg.age.shrink <- modavgShrink(cand.set = Cand.models,
> ##D parm = "age", modnames = Modnames,
> ##D second.ord = TRUE, nobs = NULL)
> ##D
> ##D ##compute classic version of model averaging on age
> ##D mod.avg.age.classic <- modavg(cand.set = Cand.models, parm = "age",
> ##D modnames = Modnames, second.ord = TRUE,
> ##D nobs = NULL)
> ##D
> ##D ##correspondence between shrinkage version and classic version of
> ##D ##model averaging
> ##D mod.avg.age.shrink$Mod.avg.beta/imp.age$w.plus
> ##D mod.avg.age.classic$Mod.avg.beta
> ##D detach(package:nlme)
> ## End(Not run)
>
>
> ##example of N-mixture model modified from ?pcount
> ## Not run:
> ##D require(unmarked)
> ##D data(mallard)
> ##D mallardUMF <- unmarkedFramePCount(mallard.y, siteCovs = mallard.site,
> ##D obsCovs = mallard.obs)
> ##D ##set up models so that each variable on abundance appears twice
> ##D fm.mall.one <- pcount(~ ivel + date ~ length + forest, mallardUMF,
> ##D K = 30)
> ##D fm.mall.two <- pcount(~ ivel + date ~ elev + forest, mallardUMF,
> ##D K = 30)
> ##D fm.mall.three <- pcount(~ ivel + date ~ length + elev, mallardUMF,
> ##D K = 30)
> ##D
> ##D ##model list and names
> ##D Cands <- list(fm.mall.one, fm.mall.two, fm.mall.three)
> ##D Modnames <- c("length + forest", "elev + forest", "length + elev")
> ##D
> ##D ##compute model-averaged estimate with shrinkage for elev on abundance
> ##D modavgShrink(cand.set = Cands, modnames = Modnames, parm = "elev",
> ##D parm.type = "lambda")
> ##D detach(package:unmarked)
> ## End(Not run)
>
>
>
>
>
> dev.off()
null device
1
>