model output, as generated by the integration routine or
the steady-state solver, a matrix or a data.frame, with one column
per dependent and independent variable.
obs
the observed data, either in long (database) format (name,
x, y), a data.frame, or in wide (crosstable, or matrix) format - see
details.
x
the name of the independent variable; it should be a
name occurring both in the obs and model data
structures.
y
either NULL, the name of the column with the
dependent variable values,or an index to the dependent
variable values; if NULL then the observations are assumed to
be in crosstable (matrix) format, and the names of the independent
variables are given by the column names of this matrix.
err
either NULL, or the name of the column with the
error estimates, used to weigh the residuals (see details);
if NULL, then the residuals are not weighed.
cost
if not NULL, the output of a previous call to
modCost; in this case, the new output will combine both.
weight
only if err=NULL: how to weigh the
residuals, one of "none", "std", "mean", see details.
scaleVar
if TRUE, then the residuals of one observed
variable are scaled respectively to the number of observations (see
details).
...
additional arguments passed to R-function approx.
Details
This function compares model output with observed data.
It computes
the weighted residuals, one for each data point.
the variable costs, i.e. the sum of squared weight
residuals per variable.
the model cost, the scaled sum of variable costs .
There are three steps:
1. For any observed data point, i, the weighted
residuals are estimated as:
and where Mod_i and Obs_i are
the modeled, respectively observed value of data point i.
The weights are equal to 1/error, where the latter can be inputted,
one for each data point by specifying err as an extra column in
the observed data.
This can only be done when the data input is in long (database) format.
When err is not inputted, then the weights are specified via argument
weight which is either:
"none", which sets the weight equal to 1 (the default)
"std", which sets the weights equal to the standard deviation
of the observed data (can only be used if there is more than 1 data point)
"mean", which uses the mean of the absolute value of the
observed data (can only be used if not 0).
2. Then for each observed variable, j, a variable cost is
estimated as the sum of squared weighted residuals for this variable:
Cost_varj=sum(for i=1,n_j) (res_i^2)
where n_j is the number of observations for observed
variable j.
3. Finally, the model Cost is estimated as the scaled sum of
variable costs:
sum(Cost_varj/scale_j)
and where scale_j allows to scale the variable
costs relative to the number of observations. This is set by
specifying argument scaleVar. If TRUE, then the variable
costs are rescaled. The default is NOT to rescale
(i.e. scale_j=1).
The models typically consist of (a system of) differential equations, which
are either solved by:
integration routines, e.g. the routines from package deSolve,
steady-state estimators, as from package rootSolve.
The data can be presented in two formats:
data table (long) format; this is a two to four column
data.frame that contains the name of the observed variable (always
the FIRST column), the (optional) value of the independent variable
(default column name = "time"), the value of the observation and
the (optional) value of the error.
For data presented in this format, the names of the column(s) with the
independent variable (x) and the name of the column that has the
value of the dependent variable y must be passed to function
modCost.
crosstable (wide) format; this is a matrix, where each
column denotes one dependent (or independent) variable; the column name
is the name of the observed variable.
When using this format, only the name of the column that contains the
dependent variable must be specified (x).
As an example of both formats consider the data, called Dat consisting
of two observed variables, called "Obs1" and "Obs2", both containing two
observations, at time 1 and 2:
name
time
val
err
Obs1
1
50
5
Obs1
2
150
15
Obs2
1
1
0.1
Obs2
2
2
0.2
for the long format and
time
Obs1
Obs2
1
50
1
2
150
2
for the crosstab format. Note, that in the latter case it is not possible to
provide separate errors per data point.
By calling modCost several consecutive times (using the cost argument),
it is possible to combine both types of data files.
Value
a list of type modCost containing:
model
one value, the model cost, which equals the sum of scaled
variable costs (see details).
minlogp
one value, -log(model probablity), where it is assumed
that the data are normally distributed, with standard deviation =
error.
var
the variable costs, a data.frame with, for each observed
variable the following (see details):
name, the name of the observed variable.
scale, the scale-factor used to weigh the variable cost,
either 1 or 1/(number observations), defaults to 1.
N, the number of data points per observed variable.
SSR.unweighted, the sum of squared residuals per observed
variable, unweighted.
SSR, the sum of weighted squared residuals per observed
variable(see details).
residuals
the data residual, a data.frame with several columns:
name, the name of the observed variable.
x, the value of the independent variable (if present).
obs, the observed variable value.
mod, the corresponding modeled value.
weight, the factor used to weigh the residuals, 1/error,
defaults to 1.
res, the weighted residuals between model and observations
(mod-obs)*weight.
res.unweighted, the residuals between model and observations
(mod-obs).
Note
In the future, it should be possible to have more than one independent
variable present. This is not yet implemented, but it should allow e.g.
to fit time series of spatially dependent variables.
Author(s)
Karline Soetaert <karline.soetaert@nioz.nl>
References
Soetaert, K. and Petzoldt, T., 2010. Inverse Modelling, Sensitivity and
Monte Carlo Analysis in R Using Package FME. Journal of Statistical
Software 33(3) 1–28.
http://www.jstatsoft.org/v33/i03
Examples
## =======================================================================
## Type 1 input: name, time, value
## =======================================================================
## Create new data: two observed variables, "a", "b"
Data <- data.frame(name = c(rep("a", 4), rep("b", 4)),
time = c(1:4, 2:5), val = c(runif(4), 1:4))
## "a nonsense model"
Mod <- function (t, y, par) {
da <- 0
db <- 1
return(list(c(da, db)))
}
out <- ode(y = c(a = 0.5, b = 0.5), times = 0:6, func = Mod, parms = NULL)
Data # Show
out
## The cost function
modCost(model = out, obs = Data, y = "val")
## The cost function with a data error added
Dat2 <- cbind(Data, Err = Data$val*0.1) # error = 10% of value
modCost(model = out, obs = Dat2, y = "val", err = "Err")
## =======================================================================
## Type 2 input: Matrix format; column names = variable names
## =======================================================================
## logistic growth model
TT <- seq(1, 100, 2.5)
N0 <- 0.1
r <- 0.5
K <- 100
## analytical solution
Ana <- cbind(time = TT, N = K/(1 + (K/N0 - 1) * exp(-r*TT)))
## numeric solution
logist <- function(t, x, parms) {
with(as.list(parms), {
dx <- r * x[1] * (1 - x[1]/K)
list(dx)
})
}
time <- 0:100
parms <- c(r = r, K = K)
x <- c(N = N0)
## Compare several numerical solutions
Euler <- ode(x, time, logist, parms, hini = 2, method = "euler")
Rk4 <- ode(x, time, logist, parms, hini = 2, method = "rk4")
Lsoda <- ode(x, time, logist, parms) # lsoda is default method
Ana2 <- cbind(time = time, N = K/(1 + (K/N0 - 1) * exp(-r * time)))
## the SSR and residuals with respect to the "data"
cEuler <- modCost(Euler, Ana)$model
cRk4 <- modCost(Rk4 , Ana)$model
cLsoda <- modCost(Lsoda, Ana)$model
cAna <- modCost(Ana2 , Ana)$model
compare <- data.frame(method = c("euler", "rk4", "lsoda", "Ana"),
cost = c(cEuler, cRk4, cLsoda, cAna))
## Plot Euler, RK and analytic solution
plot(Euler, Rk4, col = c("red", "blue"), obs = Ana,
main = "logistic growth", xlab = "time", ylab = "N")
legend("bottomright", c("exact", "euler", "rk4"), pch = c(1, NA, NA),
col = c("black", "red", "blue"), lty = c(NA, 1, 2))
legend("right", ncol = 2, title = "SSR",
legend = c(as.character(compare[,1]),
format(compare[,2], digits = 2)))
compare
## =======================================================================
## Now suppose we do not know K and r and they are to be fitted...
## The "observations" are the analytical solution
## =======================================================================
## Run the model with initial guess: K = 10, r = 2
parms["K"] <- 10
parms["r"] <- 2
init <- ode(x, time, logist, parms)
## FITTING algorithm uses modFit
## First define the objective function (model cost) to be minimised
## more general: using modFit
Cost <- function(P) {
parms["K"] <- P[1]
parms["r"] <- P[2]
out <- ode(x, time, logist, parms)
return(modCost(out, Ana))
}
(Fit<-modFit(p = c(K = 10, r = 2), f = Cost))
summary(Fit)
## run model with the optimized value:
parms[c("K", "r")] <- Fit$par
fitted <- ode(x, time, logist, parms)
## show results, compared with "observations"
plot(init, fitted, col = c("green", "blue"), lwd = 2, lty = 1,
obs = Ana, obspar = list(col = "red", pch = 16, cex = 2),
main = "logistic growth", xlab = "time", ylab = "N")
legend("right", c("initial", "fitted"), col = c("green", "blue"), lwd = 2)
Cost(Fit$par)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(FME)
Loading required package: deSolve
Attaching package: 'deSolve'
The following object is masked from 'package:graphics':
matplot
Loading required package: rootSolve
Loading required package: coda
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/FME/modCost.Rd_%03d_medium.png", width=480, height=480)
> ### Name: modCost
> ### Title: Calculates the Discrepancy of a Model Solution with Observations
> ### Aliases: modCost
> ### Keywords: utilities
>
> ### ** Examples
>
>
> ## =======================================================================
> ## Type 1 input: name, time, value
> ## =======================================================================
>
> ## Create new data: two observed variables, "a", "b"
> Data <- data.frame(name = c(rep("a", 4), rep("b", 4)),
+ time = c(1:4, 2:5), val = c(runif(4), 1:4))
>
> ## "a nonsense model"
> Mod <- function (t, y, par) {
+ da <- 0
+ db <- 1
+ return(list(c(da, db)))
+ }
>
> out <- ode(y = c(a = 0.5, b = 0.5), times = 0:6, func = Mod, parms = NULL)
>
> Data # Show
name time val
1 a 1 0.5232698
2 a 2 0.8324969
3 a 3 0.4688344
4 a 4 0.9550125
5 b 2 1.0000000
6 b 3 2.0000000
7 b 4 3.0000000
8 b 5 4.0000000
> out
time a b
1 0 0.5 0.5
2 1 0.5 1.5
3 2 0.5 2.5
4 3 0.5 3.5
5 4 0.5 4.5
6 5 0.5 5.5
7 6 0.5 6.5
>
> ## The cost function
> modCost(model = out, obs = Data, y = "val")
$model
[1] 9.319103
$minlogp
[1] 12.01106
$var
name scale N SSR.unweighted SSR.unscaled SSR
1 a 1 4 0.3191033 0.3191033 0.3191033
2 b 1 4 9.0000000 9.0000000 9.0000000
$residuals
name x obs mod weight res.unweighted res
1 a 1 0.5232698 0.5 1 -0.02326979 -0.02326979
2 a 2 0.8324969 0.5 1 -0.33249687 -0.33249687
3 a 3 0.4688344 0.5 1 0.03116563 0.03116563
4 a 4 0.9550125 0.5 1 -0.45501250 -0.45501250
5 b 2 1.0000000 2.5 1 1.50000000 1.50000000
6 b 3 2.0000000 3.5 1 1.50000000 1.50000000
7 b 4 3.0000000 4.5 1 1.50000000 1.50000000
8 b 5 4.0000000 5.5 1 1.50000000 1.50000000
attr(,"class")
[1] "modCost"
>
> ## The cost function with a data error added
> Dat2 <- cbind(Data, Err = Data$val*0.1) # error = 10% of value
> modCost(model = out, obs = Dat2, y = "val", err = "Err")
$model
[1] 359.6041
$minlogp
[1] 170.2764
$var
name scale N SSR.unweighted SSR.unscaled SSR
1 a 1 4 0.3191033 39.29159 39.29159
2 b 1 4 9.0000000 320.31250 320.31250
$residuals
name x obs mod weight res.unweighted res
1 a 1 0.5232698 0.5 19.110601 -0.02326979 -0.4446996
2 a 2 0.8324969 0.5 12.012057 -0.33249687 -3.9939714
3 a 3 0.4688344 0.5 21.329494 0.03116563 0.6647472
4 a 4 0.9550125 0.5 10.471067 -0.45501250 -4.7644664
5 b 2 1.0000000 2.5 10.000000 1.50000000 15.0000000
6 b 3 2.0000000 3.5 5.000000 1.50000000 7.5000000
7 b 4 3.0000000 4.5 3.333333 1.50000000 5.0000000
8 b 5 4.0000000 5.5 2.500000 1.50000000 3.7500000
attr(,"class")
[1] "modCost"
>
>
> ## =======================================================================
> ## Type 2 input: Matrix format; column names = variable names
> ## =======================================================================
>
> ## logistic growth model
> TT <- seq(1, 100, 2.5)
> N0 <- 0.1
> r <- 0.5
> K <- 100
>
> ## analytical solution
> Ana <- cbind(time = TT, N = K/(1 + (K/N0 - 1) * exp(-r*TT)))
>
> ## numeric solution
> logist <- function(t, x, parms) {
+ with(as.list(parms), {
+ dx <- r * x[1] * (1 - x[1]/K)
+ list(dx)
+ })
+ }
>
> time <- 0:100
> parms <- c(r = r, K = K)
> x <- c(N = N0)
>
> ## Compare several numerical solutions
> Euler <- ode(x, time, logist, parms, hini = 2, method = "euler")
> Rk4 <- ode(x, time, logist, parms, hini = 2, method = "rk4")
> Lsoda <- ode(x, time, logist, parms) # lsoda is default method
> Ana2 <- cbind(time = time, N = K/(1 + (K/N0 - 1) * exp(-r * time)))
>
> ## the SSR and residuals with respect to the "data"
> cEuler <- modCost(Euler, Ana)$model
> cRk4 <- modCost(Rk4 , Ana)$model
> cLsoda <- modCost(Lsoda, Ana)$model
> cAna <- modCost(Ana2 , Ana)$model
> compare <- data.frame(method = c("euler", "rk4", "lsoda", "Ana"),
+ cost = c(cEuler, cRk4, cLsoda, cAna))
> ## Plot Euler, RK and analytic solution
> plot(Euler, Rk4, col = c("red", "blue"), obs = Ana,
+ main = "logistic growth", xlab = "time", ylab = "N")
>
> legend("bottomright", c("exact", "euler", "rk4"), pch = c(1, NA, NA),
+ col = c("black", "red", "blue"), lty = c(NA, 1, 2))
> legend("right", ncol = 2, title = "SSR",
+ legend = c(as.character(compare[,1]),
+ format(compare[,2], digits = 2)))
>
> compare
method cost
1 euler 1.997133e+03
2 rk4 7.816940e-02
3 lsoda 7.473874e-02
4 Ana 7.471880e-02
>
> ## =======================================================================
> ## Now suppose we do not know K and r and they are to be fitted...
> ## The "observations" are the analytical solution
> ## =======================================================================
>
> ## Run the model with initial guess: K = 10, r = 2
> parms["K"] <- 10
> parms["r"] <- 2
>
> init <- ode(x, time, logist, parms)
>
> ## FITTING algorithm uses modFit
> ## First define the objective function (model cost) to be minimised
>
> ## more general: using modFit
> Cost <- function(P) {
+ parms["K"] <- P[1]
+ parms["r"] <- P[2]
+ out <- ode(x, time, logist, parms)
+ return(modCost(out, Ana))
+ }
> (Fit<-modFit(p = c(K = 10, r = 2), f = Cost))
$par
K r
100.0055659 0.4999987
$hessian
K r
K 65.80535 899.8112
r 899.81115 517530.3761
$residuals
N N N N N
1.625573e-06 1.768266e-02 1.458523e-05 1.666239e-01 1.483597e-04
N N N N N
6.123689e-02 2.880676e-03 -2.015252e-01 5.218142e-03 -1.852097e-02
N N N N N
5.527497e-03 3.530823e-03 5.563434e-03 5.401766e-03 5.559019e-03
N N N N N
5.555077e-03 5.567372e-03 5.564414e-03 5.565610e-03 5.565744e-03
N N N N N
5.565924e-03 5.565822e-03 5.565838e-03 5.565855e-03 5.565865e-03
N N N N N
5.565870e-03 5.565872e-03 5.565873e-03 5.565873e-03 5.565873e-03
N N N N N
5.565873e-03 5.565873e-03 5.565873e-03 5.565873e-03 5.565873e-03
N N N N N
5.565873e-03 5.565873e-03 5.565873e-03 5.565873e-03 5.565873e-03
$info
[1] 3
$message
[1] "Conditions for `info = 1' and `info = 2' both hold."
$iterations
[1] 8
$rsstrace
[1] 2.708812e+05 1.918493e+05 7.539514e+04 2.234998e+03 5.433913e+01
[6] 8.679236e-02 7.372560e-02 7.372557e-02 7.372557e-02
$ssr
[1] 0.07372557
$diag
diag.K diag.r
6.214604 663.627605
$ms
[1] 0.001843139
$var_ms
N
0.001843139
$var_ms_unscaled
N
0.001843139
$var_ms_unweighted
N
0.001843139
$rank
[1] 2
$df.residual
[1] 38
attr(,"class")
[1] "modFit"
>
> summary(Fit)
Parameters:
Estimate Std. Error t value Pr(>|t|)
K 1.000e+02 7.772e-03 12868 <2e-16 ***
r 5.000e-01 8.764e-05 5705 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04405 on 38 degrees of freedom
Parameter correlation:
K r
K 1.0000 -0.1542
r -0.1542 1.0000
>
> ## run model with the optimized value:
> parms[c("K", "r")] <- Fit$par
> fitted <- ode(x, time, logist, parms)
>
> ## show results, compared with "observations"
> plot(init, fitted, col = c("green", "blue"), lwd = 2, lty = 1,
+ obs = Ana, obspar = list(col = "red", pch = 16, cex = 2),
+ main = "logistic growth", xlab = "time", ylab = "N")
>
> legend("right", c("initial", "fitted"), col = c("green", "blue"), lwd = 2)
>
> Cost(Fit$par)
$model
[1] 0.07372557
$minlogp
[1] 36.7944
$var
name scale N SSR.unweighted SSR.unscaled SSR
1 N 1 40 0.07372557 0.07372557 0.07372557
$residuals
name x obs mod weight res.unweighted res
1 N 1.0 0.1647652 0.1647669 1 1.625573e-06 1.625573e-06
2 N 3.5 0.5727371 0.5904198 1 1.768266e-02 1.768266e-02
3 N 6.0 1.9709373 1.9709519 1 1.458523e-05 1.458523e-05
4 N 8.5 6.5573901 6.7240140 1 1.666239e-01 1.666239e-01
5 N 11.0 19.6746418 19.6747901 1 1.483597e-04 1.483597e-04
6 N 13.5 46.0891354 46.1503723 1 6.123689e-02 6.123689e-02
7 N 16.0 74.8992325 74.9021132 1 2.880676e-03 2.880676e-03
8 N 18.5 91.2395822 91.0380570 1 -2.015252e-01 -2.015252e-01
9 N 21.0 97.3227568 97.3279749 1 5.218142e-03 5.218142e-03
10 N 23.5 99.2180196 99.1994986 1 -1.852097e-02 -1.852097e-02
11 N 26.0 99.7747018 99.7802293 1 5.527497e-03 5.527497e-03
12 N 28.5 99.9353471 99.9388779 1 3.530823e-03 3.530823e-03
13 N 31.0 99.9814681 99.9870315 1 5.563434e-03 5.563434e-03
14 N 33.5 99.9946898 100.0000916 1 5.401766e-03 5.401766e-03
15 N 36.0 99.9984785 100.0040376 1 5.559019e-03 5.559019e-03
16 N 38.5 99.9995641 100.0051192 1 5.555077e-03 5.555077e-03
17 N 41.0 99.9998751 100.0054425 1 5.567372e-03 5.567372e-03
18 N 43.5 99.9999642 100.0055286 1 5.564414e-03 5.564414e-03
19 N 46.0 99.9999897 100.0055554 1 5.565610e-03 5.565610e-03
20 N 48.5 99.9999971 100.0055628 1 5.565744e-03 5.565744e-03
21 N 51.0 99.9999992 100.0055651 1 5.565924e-03 5.565924e-03
22 N 53.5 99.9999998 100.0055656 1 5.565822e-03 5.565822e-03
23 N 56.0 99.9999999 100.0055658 1 5.565838e-03 5.565838e-03
24 N 58.5 100.0000000 100.0055658 1 5.565855e-03 5.565855e-03
25 N 61.0 100.0000000 100.0055659 1 5.565865e-03 5.565865e-03
26 N 63.5 100.0000000 100.0055659 1 5.565870e-03 5.565870e-03
27 N 66.0 100.0000000 100.0055659 1 5.565872e-03 5.565872e-03
28 N 68.5 100.0000000 100.0055659 1 5.565873e-03 5.565873e-03
29 N 71.0 100.0000000 100.0055659 1 5.565873e-03 5.565873e-03
30 N 73.5 100.0000000 100.0055659 1 5.565873e-03 5.565873e-03
31 N 76.0 100.0000000 100.0055659 1 5.565873e-03 5.565873e-03
32 N 78.5 100.0000000 100.0055659 1 5.565873e-03 5.565873e-03
33 N 81.0 100.0000000 100.0055659 1 5.565873e-03 5.565873e-03
34 N 83.5 100.0000000 100.0055659 1 5.565873e-03 5.565873e-03
35 N 86.0 100.0000000 100.0055659 1 5.565873e-03 5.565873e-03
36 N 88.5 100.0000000 100.0055659 1 5.565873e-03 5.565873e-03
37 N 91.0 100.0000000 100.0055659 1 5.565873e-03 5.565873e-03
38 N 93.5 100.0000000 100.0055659 1 5.565873e-03 5.565873e-03
39 N 96.0 100.0000000 100.0055659 1 5.565873e-03 5.565873e-03
40 N 98.5 100.0000000 100.0055659 1 5.565873e-03 5.565873e-03
attr(,"class")
[1] "modCost"
>
>
>
>
>
>
> dev.off()
null device
1
>