R: Compute Lack-of-Fit and Pure Error Anova Table for a Linear...
anovaPE
R Documentation
Compute Lack-of-Fit and Pure Error Anova Table for a Linear Model
Description
Compute a lack-of-fit and pure error anova table for either a linear model with
one predictor variable or else a linear model for which all predictor variables
in the model are functions of a single variable (for example, x, x^2, etc.). There
must be replicate observations for at least one value of the predictor variable(s).
Usage
anovaPE(object)
Arguments
object
an object of class"lm".
The object must have only one predictor variable in the formula, or else
all predictor variables in the model must be functions of a single variable
(for example, x, x^2, etc.). Also, the predictor variable(s) must have replicate
observations for at least one value of the predictor variable(s).
Finally,
the total number of observations must be such that the degrees of freedom associated
with the residual sums of squares is greater than the number of observations minus
the number of unique observations.
Details
Produces an anova table with the the sums of squares partitioned by “Lack of Fit”
and “Pure Error”. See Draper and Smith (1998, pp.47-53) for details.
This function is called by the function calibrate.
Value
An object of class"anova" inheriting from class
"data.frame".
Draper, N., and H. Smith. (1998). Applied Regression Analysis. Third Edition.
John Wiley and Sons, New York, pp.47-53.
Millard, S.P., and Neerchal, N.K. (2001). Environmental Statistics with S-PLUS.
CRC Press, Boca Raton, Florida.
See Also
anova.lm, lm, calibrate.
Examples
# The data frame EPA.97.cadmium.111.df contains calibration data for
# cadmium at mass 111 (ng/L) that appeared in Gibbons et al. (1997b)
# and were provided to them by the U.S. EPA.
#
# First, display a plot of these data along with the fitted calibration line
# and 99% non-simultaneous prediction limits. See
# Millard and Neerchal (2001, pp.566-569) for more details on this
# example.
EPA.97.cadmium.111.df
# Cadmium Spike
#1 0.88 0
#2 1.57 0
#3 0.70 0
#...
#33 99.20 100
#34 93.71 100
#35 100.43 100
Cadmium <- EPA.97.cadmium.111.df$Cadmium
Spike <- EPA.97.cadmium.111.df$Spike
calibrate.list <- calibrate(Cadmium ~ Spike,
data = EPA.97.cadmium.111.df)
newdata <- data.frame(Spike = seq(min(Spike), max(Spike), length.out = 100))
pred.list <- predict(calibrate.list, newdata = newdata, se.fit = TRUE)
pointwise.list <- pointwise(pred.list, coverage = 0.99, individual = TRUE)
plot(Spike, Cadmium, ylim = c(min(pointwise.list$lower),
max(pointwise.list$upper)), xlab = "True Concentration (ng/L)",
ylab = "Observed Concentration (ng/L)")
abline(calibrate.list, lwd = 2)
lines(newdata$Spike, pointwise.list$lower, lty = 8, lwd = 2)
lines(newdata$Spike, pointwise.list$upper, lty = 8, lwd = 2)
title(paste("Calibration Line and 99% Prediction Limits",
"for US EPA Cadmium 111 Data", sep="\n"))
rm(Cadmium, Spike, newdata, calibrate.list, pred.list,
pointwise.list)
#----------
# Now fit the linear model and produce the anova table to check for
# lack of fit. There is no evidence for lack of fit (p = 0.41).
fit <- lm(Cadmium ~ Spike, data = EPA.97.cadmium.111.df)
anova(fit)
#Analysis of Variance Table
#
#Response: Cadmium
# Df Sum Sq Mean Sq F value Pr(>F)
#Spike 1 43220 43220 9356.9 < 2.2e-16 ***
#Residuals 33 152 5
#---
#Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Analysis of Variance Table
#
#Response: Cadmium
#
#Terms added sequentially (first to last)
# Df Sum of Sq Mean Sq F Value Pr(F)
# Spike 1 43220.27 43220.27 9356.879 0
#Residuals 33 152.43 4.62
anovaPE(fit)
# Df Sum Sq Mean Sq F value Pr(>F)
#Spike 1 43220 43220 9341.559 <2e-16 ***
#Lack of Fit 3 14 5 0.982 0.4144
#Pure Error 30 139 5
#---
#Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
rm(fit)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(EnvStats)
Attaching package: 'EnvStats'
The following objects are masked from 'package:stats':
predict, predict.lm
The following object is masked from 'package:base':
print.default
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/EnvStats/anovaPE.Rd_%03d_medium.png", width=480, height=480)
> ### Name: anovaPE
> ### Title: Compute Lack-of-Fit and Pure Error Anova Table for a Linear
> ### Model
> ### Aliases: anovaPE
> ### Keywords: models regression
>
> ### ** Examples
>
> # The data frame EPA.97.cadmium.111.df contains calibration data for
> # cadmium at mass 111 (ng/L) that appeared in Gibbons et al. (1997b)
> # and were provided to them by the U.S. EPA.
> #
> # First, display a plot of these data along with the fitted calibration line
> # and 99% non-simultaneous prediction limits. See
> # Millard and Neerchal (2001, pp.566-569) for more details on this
> # example.
>
> EPA.97.cadmium.111.df
Cadmium Spike
1 0.88 0
2 1.57 0
3 0.70 0
4 0.80 0
5 0.54 0
6 1.83 0
7 1.34 0
8 10.17 10
9 11.13 10
10 11.66 10
11 10.80 10
12 11.11 10
13 11.95 10
14 11.14 10
15 19.97 20
16 20.28 20
17 23.20 20
18 22.12 20
19 18.01 20
20 24.83 20
21 21.10 20
22 54.78 50
23 49.00 50
24 51.92 50
25 49.00 50
26 54.75 50
27 50.25 50
28 50.03 50
29 97.06 100
30 94.60 100
31 102.54 100
32 101.09 100
33 99.20 100
34 93.71 100
35 100.43 100
> # Cadmium Spike
> #1 0.88 0
> #2 1.57 0
> #3 0.70 0
> #...
> #33 99.20 100
> #34 93.71 100
> #35 100.43 100
>
> Cadmium <- EPA.97.cadmium.111.df$Cadmium
>
> Spike <- EPA.97.cadmium.111.df$Spike
>
> calibrate.list <- calibrate(Cadmium ~ Spike,
+ data = EPA.97.cadmium.111.df)
>
> newdata <- data.frame(Spike = seq(min(Spike), max(Spike), length.out = 100))
>
> pred.list <- predict(calibrate.list, newdata = newdata, se.fit = TRUE)
>
> pointwise.list <- pointwise(pred.list, coverage = 0.99, individual = TRUE)
>
> plot(Spike, Cadmium, ylim = c(min(pointwise.list$lower),
+ max(pointwise.list$upper)), xlab = "True Concentration (ng/L)",
+ ylab = "Observed Concentration (ng/L)")
>
> abline(calibrate.list, lwd = 2)
>
> lines(newdata$Spike, pointwise.list$lower, lty = 8, lwd = 2)
>
> lines(newdata$Spike, pointwise.list$upper, lty = 8, lwd = 2)
>
> title(paste("Calibration Line and 99% Prediction Limits",
+ "for US EPA Cadmium 111 Data", sep="\n"))
>
> rm(Cadmium, Spike, newdata, calibrate.list, pred.list,
+ pointwise.list)
>
> #----------
>
> # Now fit the linear model and produce the anova table to check for
> # lack of fit. There is no evidence for lack of fit (p = 0.41).
>
> fit <- lm(Cadmium ~ Spike, data = EPA.97.cadmium.111.df)
>
>
> anova(fit)
Analysis of Variance Table
Response: Cadmium
Df Sum Sq Mean Sq F value Pr(>F)
Spike 1 43220 43220 9356.9 < 2.2e-16 ***
Residuals 33 152 5
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> #Analysis of Variance Table
> #
> #Response: Cadmium
> # Df Sum Sq Mean Sq F value Pr(>F)
> #Spike 1 43220 43220 9356.9 < 2.2e-16 ***
> #Residuals 33 152 5
> #---
> #Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> #Analysis of Variance Table
> #
> #Response: Cadmium
> #
> #Terms added sequentially (first to last)
> # Df Sum of Sq Mean Sq F Value Pr(F)
> # Spike 1 43220.27 43220.27 9356.879 0
> #Residuals 33 152.43 4.62
>
>
> anovaPE(fit)
Df Sum Sq Mean Sq F value Pr(>F)
Spike 1 43220 43220 9341.559 <2e-16 ***
Lack of Fit 3 14 5 0.982 0.4144
Pure Error 30 139 5
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> # Df Sum Sq Mean Sq F value Pr(>F)
> #Spike 1 43220 43220 9341.559 <2e-16 ***
> #Lack of Fit 3 14 5 0.982 0.4144
> #Pure Error 30 139 5
> #---
> #Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>
> rm(fit)
>
>
>
>
>
> dev.off()
null device
1
>