R: A kernel average smoother function to weigh residuals...
gaussianWeights
R Documentation
A kernel average smoother function to weigh residuals according to a Gaussian
density function
This function is still experimental... use with care
Description
A calibration dataset in database format (cf. modCost for the database
format) is extended in order to fit model output using a weighted least
squares approach. To this end, the observations are replicated for a
certain number of times, and weights are assigned to the replicates
according to a Gaussian density function. This density has the relevant
observation as mean value. The standard deviation, provided as a parameter,
determines the number of inserted replicate observations (see Detail).
This weighted regression approach may be interesting when discontinuities
exist in the observational data. Under these circumstances small changes
in the timing (or more general the position along the axis of the independent
variable) of the model output may have a disproportional impact on the
overall goodness-of-fit (e.g. timing of nutrient depletion). Additionally,
this approach may be used to model uncertainty in the independent variable
(e.g. slices of sediment profiles, or the timing of a sampling).
Usage
gaussianWeights (obs, x = x, y = y, xmodel, spread, weight = "none",
aggregation = x ,ordering)
Arguments
obs
dataset in long (database) format as is typically used by
modCost
x
name of the independent variable (typically x, cf. modCost) in
obs. Defaults to x (not given as character string; cf. subset)
y
name of the dependent variable in obs. Defaults to y.
xmodel
an ordered vector of unique times at which model output
is produced. If not given, the independent variable of the observational
dataset is used.
spread
standard deviation used to calculate the weights from a
normal density function. This value also determines the number of
points from the model output that are compared to a specific observa-
tion in obs (2 * 3 * spread + 1; containing 99.7% of the
Gaussian distribution, centered around the observation of interest).
weight
scaling factor of the modCost function ("sd", "mean", or
"none"). The Gaussian weights are multiplied by this factor to account
for differences in units.
aggregation
vector of column names from the dataset that are used
to aggregate observations while calculating the scaling factor. Defaults
to the variable name, "name".
ordering
Optional extra grouping and ordering of observations. Given
as a vector of variable names. If none given, ordering will be done by
variable name and independent variable. If both aggregation and ordering
variables are given, ordering will be done as follows:
x within ordering (in reverse order) within aggregation (in reverse order).
Aggregation and ordering should be disjoint sets of variable names.
Details
Suppose: spread = 1/24 (days; = 1 hour)
x = time in days, 1 per hour
Then:
obs_i is replicated 7 times (spread = observational periodicity = 1 hour):
The weights (W_i+j, for j = -3 ...3) are calculated as follows:
W'_i+j = 1/(spread * sqrt(2pi)) * exp(-1/2 * ((obs_i+j - obs_i)/spread)^2
W_i+j = W'_i+j/sum(W_i-3,...,W_i+3)
(such that their sum equals 1)
Value
A modified version of obs is returned with the following extensions:
1. Each observation obs[i] is replicated n times were n represents the number
of modelx values within the interval [obs_i - (3 * spread), obs_i +
3 * spread)].
2. These replicate observations get the same x values as their
modeled counterparts (xmodel).
3. Weights are given in column, called "err"
The returned data frame has the following columns:
"name" or another name specified by the first element of
aggregation. Usually this column contains the names of the
observed variables.
"x" or another name specified by x
"y" or another name specified by y
"err" containing the calculated weights
The rest of the columns of the data frame given by obs in
that order.
Author(s)
Tom Van Engeland <tom.vanengeland@nioz.nl>
Examples
## =======================================================================
## A Sediment example
## =======================================================================
## Sediment oxygen concentration is measured every
## centimeter in 3 sediment types
depth <- 0:7
observations <- data.frame(
profile = rep(c("mud","silt","sand"), each=8),
depth = depth,
O2 = c(c(6,1,0.5,0.1,0.05,0,0,0),
c(6,5,3,2,1.5,1,0.5,0),
c(6,6,5,4,3,2,1,0)
)
)
## A model generates profiles with a depth resolution of 1 millimeter
modeldepths <- seq(0, 9, by = 0.05)
## All these model outputs are compared with weighed observations.
gaussianWeights(obs = observations, x = depth, y = O2,
xmodel = modeldepths,
spread = 0.1, weight = "none",
aggregation = profile)
# Weights of one observation in silt at depth 2:
Sub <- subset(observations, subset = (profile == "silt" & depth == 2))
plot(Sub[,-1])
SubWW <- gaussianWeights(obs = Sub, x = depth, y = O2,
xmodel = modeldepths, spread = 0.5,
weight="none", aggregation = profile)
SubWW[,-1]