R: Cross-Validating a Spatial Linear Model Fitted by 'georob'
cv.georob
R Documentation
Cross-Validating a Spatial Linear Model Fitted by georob
Description
This function assesses the goodness-of-fit of a spatial linear model by
K-fold cross-validation. In more detail, the model is re-fitted
K times by robust (or Gaussian) (RE)ML, excluding each time
1/Kth of the data. The re-fitted models are used to compute robust
(or customary) external kriging predictions for the omitted observations.
If the response variable is log-transformed then the kriging predictions
can be optionally transformed back to the orginal scale of the
measurements. S3methods for evaluating and plotting diagnostic summaries
of the cross-validation errors are decribed for the function
validate.predictions.
an optional formula for the regression model passed by
update to georob, see Details.
subset
an optional vector specifying a subset of observations
to be used in the fitting process, see Details.
method
keyword, controlling whether subsets are formed by
partitioning data set into blocks by kmeans
(default) or randomly. Ignored if sets is
non-NULL.
nset
positive integer defining the number K of subsets into
which the data set is partitioned (default: nset = 10). Ignored
if sets is non-NULL.
seed
optional integer seed to initialize random number generation,
see set.seed. Ignored if sets is non-NULL.
sets
an optional vector of the same length as the response vector
of the fitted model and with positive integers taking values in
(1,2,…,K), defining in this way the K subsets into which
the data set is split. If sets = NULL (default) the partition is
randomly generated by kmeans or
runif (using possibly seed).
duplicates.in.same.set
logical controlling whether replicated
observations at a given location are assigned to the same subset when
partitioning the data (default TRUE).
re.estimate
logical controlling whether the model is re-fitted to
the reduced data sets before computing the kriging predictions
(TRUE, default) or whether the model passed in object is
used to compute the predictions for the omitted observations, see
Details.
param
an optional named numeric vector or a matrix or data frame
with variogram parameters passed by update to
georob, see Details. If param is a matrix
(or a data frame) then it must have nset rows and
length(object[["param"]]) columns with initial values of variogram
parameters for the nset cross-validation sets and
colnames(param) must match names(object[["param"]]).
fit.param
an optional named logical vector or a matrix or data
frame defining which variogram parameters should be adjusted when passed
by update to georob, see
Details. If fit.param is a matrix (or a data frame) then
it must have nset rows and length(object[["param"]]) columns
with variogram parameter fitting flags for the nset
cross-validation sets and colnames(param) must match
names(object[["param"]]).
aniso
an optional named numeric vector or a matrix or data frame
with anisotropy parameters passed by update to
georob, see Details. If aniso is a matrix
(or a data frame) then it must have nset rows and
length(object[["aniso"]][["aniso"]]) columns with initial values
of anisotropy parameters for the nset cross-validation sets and
colnames(aniso) must match
names(object[["aniso"]][["aniso"]]).
fit.aniso
an optional named logical vector or a matrix or data
frame defining which anisotropy parameters should be adjusted when passed
by update to georob, see
Details. If fit.aniso is a matrix (or a data frame) then
it must have nset rows and
length(object[["aniso"]][["aniso"]]) columns with anisotropy
parameter fitting flags for the nset cross-validation sets and
colnames(aniso) must match
names(object[["aniso"]][["aniso"]]).
return.fit
logical controlling whether information about the fit
should be returned when re-estimating the model with the reduced data
sets (default FALSE).
reduced.output
logical controlling whether the complete fitted
model objects, fitted to the reduced data sets, are returned
(FALSE) or only some components (TRUE, default, see
Value). Ignored if return.fit = FALSE.
lgn
logical controlling whether kriging predictions of a
log-transformed response should be transformed back to the original scale
of the measurements (default FALSE).
mfl.action
character controlling what is done when some levels of
factor(s) are not present in any of the subsets used to fit the model.
The function either stops ("stop") or treats the respective
factors as model offset ("offset", default).
ncores
positive integer controlling how many cores are used for
parallelized computations, see Details.
verbose
positive integer controlling logging of diagnostic
messages to the console during model fitting. Passed by
update to georob, see Details.
...
additional arguments passed by update
to georob, see Details.
Details
Note that the dataframe passed asdataargument togeorobmust exist in the user workspace
when calling cv.georob.
cv.georob then uses the package parallel for parallelized
computations. By default, the function uses K CPUs but not
more than are physically available (as returned by
detectCores).
cv.georob uses the function update to
re-estimated the model with the reduced data sets. Therefore, any
argument accepted by georob can be changed when re-fitting
the model. Some of them (e.g. formula, subset, etc.) are
explicit arguments of cv.georob, but also the remaining ones can
be passed by ... to the function.
Practitioners in geostatistics commonly cross-validate a fitted model
without re-estimating the model parameters with the reduced data sets.
This is clearly an unsound practice (see Hastie et al., 2009, sec.
7.10). Therefore, the argument re.estimate should always be set
to TRUE. The alternative is provided only for historic reasons.
Value
An object of class cv.georob, which is a list with the two
components pred and fit.
pred is a data frame with the coordinates and the
cross-validation prediction results with the following variables:
subset
an integer vector defining to which of the K subsets
an observation was assigned.
data
the values of the (possibly log-transformed) response.
pred
the kriging predictions.
se
the kriging standard errors.
If lgn = TRUE then pred has the additional variables:
lgn.data
the untransformed response.
lgn.pred
the unbiasedly back-transformed predictions of a
log-transformed response.
lgn.se
the kriging standard errors of the back-transformed
predictions of a log-transformed response.
The second component fit contains either the full outputs of
georob, fitted for the K reduced data sets
(reduced.output = FALSE), or K lists with the components
tuning.psi, converged, convergence.code,
gradient, variogram.model, param,
aniso[["aniso"]], coefficients along with the standard errors of
hatβ, see
georobObject.
Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of
Statistical Learning; Data Mining, Inference and Prediction. New York:
Springer-Verlag.
See Also
validate.predictions for computing statistics of the cross-validation errors;
georob for (robust) fitting of spatial linear models;
georobObject for a description of the class georob;
predict.georob for computing robust kriging predictions.