The anova function automatically tests most meaningful hypotheses
in a design. For example, suppose that age and cholesterol are
predictors, and that a general interaction is modeled using a restricted
spline surface. anova prints Wald statistics (F statistics
for an ols fit) for testing linearity of age, linearity of
cholesterol, age effect (age + age by cholesterol interaction),
cholesterol effect (cholesterol + age by cholesterol interaction),
linearity of the age by cholesterol interaction (i.e., adequacy of the
simple age * cholesterol 1 d.f. product), linearity of the interaction
in age alone, and linearity of the interaction in cholesterol
alone. Joint tests of all interaction terms in the model and all
nonlinear terms in the model are also performed. For any multiple
d.f. effects for continuous variables that were not modeled through
rcs, pol, lsp, etc., tests of linearity will be
omitted. This applies to matrix predictors produced by e.g.
poly or ns. print.anova.rms is the printing
method. plot.anova.rms draws dot charts depicting the importance
of variables in the model, as measured by Wald chi-square,
chi-square minus d.f., AIC, P-values, partial
R^2, R^2 for the whole model after deleting the effects in
question, or proportion of overall model R^2 that is due to each
predictor. latex.anova.rms is the latex method. It
substitutes Greek/math symbols in column headings, uses boldface for
TOTAL lines, and constructs a caption. Then it passes the result
to latex.default for conversion to LaTeX.
Usage
## S3 method for class 'rms'
anova(object, ..., main.effect=FALSE, tol=1e-9,
test=c('F','Chisq'), india=TRUE, indnl=TRUE, ss=TRUE,
vnames=c('names','labels'))
## S3 method for class 'anova.rms'
print(x, which=c('none','subscripts','names','dots'), ...)
## S3 method for class 'anova.rms'
plot(x,
what=c("chisqminusdf","chisq","aic","P","partial R2","remaining R2",
"proportion R2", "proportion chisq"),
xlab=NULL, pch=16,
rm.totals=TRUE, rm.ia=FALSE, rm.other=NULL, newnames,
sort=c("descending","ascending","none"), margin=NULL, pl=TRUE,
trans=NULL, ntrans=40, ...)
## S3 method for class 'anova.rms'
latex(object, title, psmall=TRUE, dec.chisq=2,
dec.F=2, dec.ss=NA, dec.ms=NA, dec.P=4, table.env=TRUE,
caption=NULL, ...)
Arguments
object
a rms fit object. object must
allow vcov to return the variance-covariance matrix. For
latex is the result of anova.
...
If omitted, all variables are tested, yielding tests for individual factors
and for pooled effects. Specify a subset of the variables to obtain tests
for only those factors, with a pooled Wald tests for the combined effects
of all factors listed. Names may be abbreviated. For example, specify
anova(fit,age,cholesterol) to get a Wald statistic for testing the joint
importance of age, cholesterol, and any factor interacting with them.
Can be optional graphical parameters to send to
dotchart2, or other parameters to send to latex.default.
Ignored for print.
main.effect
Set to TRUE to print the (usually meaningless) main effect tests even when
the factor is involved in an interaction. The default is FALSE, to print only
the effect of the main effect combined with all interactions involving that
factor.
tol
singularity criterion for use in matrix inversion
test
For an ols fit, set test="Chisq" to use Wald χ^2 tests rather than F-tests.
india
set to FALSE to exclude individual tests of
interaction from the table
indnl
set to FALSE to exclude individual tests of
nonlinearity from the table
ss
For an ols fit, set ss=FALSE to suppress printing partial
sums of squares, mean squares, and the Error SS and MS.
vnames
set to 'labels' to use variable labels rather than
variable names in the output
x
for print,plot,text is the result of anova.
which
If which is not "none" (the default), print.anova.rms will
add to the rightmost column of the output the list of parameters being
tested by the hypothesis being tested in the current row. Specifying
which="subscripts" causes the subscripts of the regression
coefficients being tested to be printed (with a subscript of one for
the first non-intercept term). which="names" prints the names of
the terms being tested, and which="dots" prints dots for terms being
tested and blanks for those just being adjusted for.
what
what type of statistic to plot. The default is the Wald
chi-square
statistic for each factor (adding in the effect of higher-ordered
factors containing that factor) minus its degrees of freedom. The
R2 choices for what only apply to ols models.
xlab
x-axis label, default is constructed according to what.
plotmath symbols are used for R, by default.
pch
character for plotting dots in dot charts. Default is 16 (solid dot).
rm.totals
set to FALSE to keep total chi-squares (overall, nonlinear, interaction totals)
in the chart.
rm.ia
set to TRUE to omit any effect that has "*" in its name
rm.other
a list of other predictor names to omit from the chart
newnames
a list of substitute predictor names to use, after omitting any.
sort
default is to sort bars in descending order of the summary statistic
margin
set to a vector of character strings to write text for
selected statistics in the right margin of the dot chart. The
character strings can be any combination of "chisq",
"d.f.", "P", "partial R2",
"proportion R2", and "proportion chisq".
Default is to not draw any statistics in the margin.
pl
set to FALSE to suppress plotting. This is useful when you only wish to
analyze the vector of statistics returned.
trans
set to a function to apply that transformation to the statistics
being plotted, and to truncate negative values at zero. A good choice
is trans=sqrt.
ntrans
n argument to pretty, specifying the
number of values for which to place tick marks. This should be larger
than usual because of nonlinear scaling, to provide a sufficient
number of tick marks on the left (stretched) part of the chi-square
scale.
title
title to pass to latex, default is name of fit object passed to
anova prefixed with "anova.". For Windows, the default is
"ano" followed by the first 5 letters of the name of the fit
object.
psmall
The default is psmall=TRUE, which causes P<0.00005 to print as <0.0001.
Set to FALSE to print as 0.0000.
dec.chisq
number of places to the right of the decimal place for typesetting
chi-square values (default is 2). Use zero for integer, NA for
floating point.
dec.F
digits to the right for F statistics (default is 2)
dec.ss
digits to the right for sums of squares (default is NA, indicating
floating point)
dec.ms
digits to the right for mean squares (default is NA)
dec.P
digits to the right for P-values
table.env
see latex
caption
caption for table if table.env is TRUE.
Default is constructed from the response variable.
Details
If the statistics being plotted with plot.anova.rms are few in
number and one of them is negative or zero, plot.anova.rms
will quit because of an error in dotchart2.
Value
anova.rms returns a matrix of class anova.rms containing factors
as rows and chi-square, d.f., and P-values as
columns (or d.f., partial SS, MS, F, P). An attribute
vinfo provides list of variables involved in each row and the
type of test done.
plot.anova.rms invisibly returns the vector of quantities
plotted. This vector has a names attribute describing the terms for
which the statistics in the vector are calculated.
Side Effects
print prints, latex creates a
file with a name of the form "title.tex" (see the title argument above).
Author(s)
Frank Harrell
Department of Biostatistics, Vanderbilt University
f.harrell@vanderbilt.edu
n <- 1000 # define sample size
set.seed(17) # so can reproduce the results
treat <- factor(sample(c('a','b','c'), n,TRUE))
num.diseases <- sample(0:4, n,TRUE)
age <- rnorm(n, 50, 10)
cholesterol <- rnorm(n, 200, 25)
weight <- rnorm(n, 150, 20)
sex <- factor(sample(c('female','male'), n,TRUE))
label(age) <- 'Age' # label is in Hmisc
label(num.diseases) <- 'Number of Comorbid Diseases'
label(cholesterol) <- 'Total Cholesterol'
label(weight) <- 'Weight, lbs.'
label(sex) <- 'Sex'
units(cholesterol) <- 'mg/dl' # uses units.default in Hmisc
# Specify population model for log odds that Y=1
L <- .1*(num.diseases-2) + .045*(age-50) +
(log(cholesterol - 10)-5.2)*(-2*(treat=='a') +
3.5*(treat=='b')+2*(treat=='c'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)
fit <- lrm(y ~ treat + scored(num.diseases) + rcs(age) +
log(cholesterol+10) + treat:log(cholesterol+10))
a <- anova(fit) # Test all factors
b <- anova(fit, treat, cholesterol) # Test these 2 by themselves
# to get their pooled effects
a
b
# Add a new line to the plot with combined effects
s <- rbind(a, 'treat+cholesterol'=b['TOTAL',])
class(s) <- 'anova.rms'
plot(s)
g <- lrm(y ~ treat*rcs(age))
dd <- datadist(treat, num.diseases, age, cholesterol)
options(datadist='dd')
p <- Predict(g, age, treat="b")
s <- anova(g)
# Usually omit fontfamily to default to 'Courier'
# It's specified here to make R pass its package-building checks
plot(p, addpanel=pantext(s, 28, 1.9, fontfamily='Helvetica'))
plot(s, margin=c('chisq', 'proportion chisq'))
# new plot - dot chart of chisq-d.f. with 2 other stats in right margin
# latex(s) # nice printout - creates anova.g.tex
options(datadist=NULL)
# Simulate data with from a given model, and display exactly which
# hypotheses are being tested
set.seed(123)
age <- rnorm(500, 50, 15)
treat <- factor(sample(c('a','b','c'), 500, TRUE))
bp <- rnorm(500, 120, 10)
y <- ifelse(treat=='a', (age-50)*.05, abs(age-50)*.08) + 3*(treat=='c') +
pmax(bp, 100)*.09 + rnorm(500)
f <- ols(y ~ treat*lsp(age,50) + rcs(bp,4))
print(names(coef(f)), quote=FALSE)
specs(f)
anova(f)
an <- anova(f)
options(digits=3)
print(an, 'subscripts')
print(an, 'dots')
an <- anova(f, test='Chisq', ss=FALSE)
plot(0:1) # make some plot
tab <- pantext(an, 1.2, .6, lattice=FALSE, fontfamily='Helvetica')
# create function to write table; usually omit fontfamily
tab() # execute it; could do tab(cex=.65)
plot(an) # new plot - dot chart of chisq-d.f.
# Specify plot(an, trans=sqrt) to use a square root scale for this plot
# latex(an) # nice printout - creates anova.f.tex
## Example to save partial R^2 for all predictors, along with overall
## R^2, from two separate fits, and to combine them with a lattice plot
require(lattice)
set.seed(1)
n <- 100
x1 <- runif(n)
x2 <- runif(n)
y <- (x1-.5)^2 + x2 + runif(n)
group <- c(rep('a', n/2), rep('b', n/2))
A <- NULL
for(g in c('a','b')) {
f <- ols(y ~ pol(x1,2) + pol(x2,2) + pol(x1,2) %ia% pol(x2,2),
subset=group==g)
a <- plot(anova(f),
what='partial R2', pl=FALSE, rm.totals=FALSE, sort='none')
a <- a[-grep('NONLINEAR', names(a))]
d <- data.frame(group=g, Variable=factor(names(a), names(a)),
partialR2=unname(a))
A <- rbind(A, d)
}
dotplot(Variable ~ partialR2 | group, data=A,
xlab=ex <- expression(partial~R^2))
dotplot(group ~ partialR2 | Variable, data=A, xlab=ex)
dotplot(Variable ~ partialR2, groups=group, data=A, xlab=ex,
auto.key=list(corner=c(.5,.5)))
# Suppose that a researcher wants to make a big deal about a variable
# because it has the highest adjusted chi-square. We use the
# bootstrap to derive 0.95 confidence intervals for the ranks of all
# the effects in the model. We use the plot method for anova, with
# pl=FALSE to suppress actual plotting of chi-square - d.f. for each
# bootstrap repetition.
# It is important to tell plot.anova.rms not to sort the results, or
# every bootstrap replication would have ranks of 1,2,3,... for the stats.
n <- 300
set.seed(1)
d <- data.frame(x1=runif(n), x2=runif(n), x3=runif(n),
x4=runif(n), x5=runif(n), x6=runif(n), x7=runif(n),
x8=runif(n), x9=runif(n), x10=runif(n), x11=runif(n),
x12=runif(n))
d$y <- with(d, 1*x1 + 2*x2 + 3*x3 + 4*x4 + 5*x5 + 6*x6 +
7*x7 + 8*x8 + 9*x9 + 10*x10 + 11*x11 +
12*x12 + 9*rnorm(n))
f <- ols(y ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12, data=d)
B <- 20 # actually use B=1000
ranks <- matrix(NA, nrow=B, ncol=12)
rankvars <- function(fit)
rank(plot(anova(fit), sort='none', pl=FALSE))
Rank <- rankvars(f)
for(i in 1:B) {
j <- sample(1:n, n, TRUE)
bootfit <- update(f, data=d, subset=j)
ranks[i,] <- rankvars(bootfit)
}
lim <- t(apply(ranks, 2, quantile, probs=c(.025,.975)))
predictor <- factor(names(Rank), names(Rank))
Dotplot(predictor ~ Cbind(Rank, lim), pch=3, xlab='Rank')