This function creates a “bubble” plot of functions,
R = log(Studentized residuals^2) by L = log(H/p*(1-H)) of the
hat values, with the areas of the
circles representing the observations proportional to Cook's distances.
This plot, suggested by McCulloch & Meeter (1983) has the attractive property that
contours of equal Cook's distance are diagnonal lines with slope = -1.
Various reference lines are drawn on the plot corresponding to
twice and three times the average hat value, a “large” squared studentized
residual and contours of Cook's distance.
a factor to adjust the radii of the circles, in relation to sqrt(CookD)
xlab, ylab
axis labels.
xlim, ylim
Limits for x and y axes. In the space of (L, R) very small residuals
typically extend the y axis enough to swamp the large residuals, so the default for
ylim is set to a range of 6 log units starting at the maximum value.
labels, id.method, id.n, id.cex, id.col
settings for labelling
points; see link{showLabels} for details. To omit point labelling, set
id.n=0, the default. The default id.method="noteworthy" is used
in this function to indicate setting labels for points with large
Studentized residuals, hat-values or Cook's distances. See Details below. Set
id.method="identify" for interactive point identification.
ref
Options to draw reference lines, any one or more of c("h", "v", "d", "c").
"h" and "v" draw horizontal and vertical reference lines at noteworthy values
of R and L respectively. "d" draws equally spaced diagonal reference lines for
contours of equal CookD. "c" draws diagonal reference lines corresponding to
approximate 0.95 and 0.99 contours of CookD.
ref.col, ref.lty
Color and line type for reference lines. Reference lines for "c" %in% ref are handled
separately.
ref.lab
A logical, indicating whether the reference lines should be labeled.
...
arguments to pass to the plot and points functions.
Details
The id.method="noteworthy" setting
also requires setting id.n>0 to have any effect.
Using id.method="noteworthy", and id.n>0, the number of points labeled
is the union of the largest id.n values on each of L, R, and CookD.
Value
If points are identified, returns a data frame with the hat values,
Studentized residuals and Cook's distance of the identified points. If
no points are identified, nothing is returned. This function is primarily
used for its side-effect of drawing a plot.
Author(s)
Michael Friendly
References
A. J. Lawrence (1995).
Deletion Influence and Masking in Regression
Journal of the Royal Statistical Society. Series B (Methodological) , Vol. 57, No. 1, pp. 181-189.
McCulloch, C. E. & Meeter, D. (1983).
Discussion of "Outliers..." by R. J. Beckman and R. D. Cook.
Technometrics, 25, 152-155.
See Also
influencePlot
influencePlot in the car package for other methods
Examples
# artificial example from Lawrence (1995)
x <- c( 0, 0, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 18, 18 )
y <- c( 0, 6, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 7, 18 )
DF <- data.frame(x,y, row.names=LETTERS[1:length(x)])
DF
with(DF, {
plot(x,y, pch=16, cex=1.3)
abline(lm(y~x), col="red", lwd=2)
NB <- c(1,2,13,14)
text(x[NB],y[NB], LETTERS[NB], pos=c(4,4,2,2))
}
)
mod <- lm(y~x, data=DF)
# standard influence plot from car
influencePlot(mod, id.n=4)
# lrPlot version
lrPlot(mod, id.n=4)
library(car)
dmod <- lm(prestige ~ income + education, data = Duncan)
influencePlot(dmod, id.n=3)
lrPlot(dmod, id.n=3)