list of models; each model is written as a function: function(x, params) { with(as.list(params), <function of params>)}. Examples are given in the ModelSet data file as part of the DivE package.
init.params
list of matrices of initial seed model parameters. For each matrix, each row represents a given parameter set; each column represents a parameter value. Column names must match parameter names (params) in the corresponding model in the list models. Examples are given in the ParamSeeds data file as part of the DivE package.
param.ranges
list of matrices of lower and upper model parameters bounds. Used for the modFit function. The first and second row corresponds to the lower and upper bounds respectively; each column represents a parameter value. Column names must match parameter names (params) in the corresponding model in the list models. Examples are given in the param.ranges data file as part of the DivE package.
main.samp
the main sample, either as a 2-column data.frame (species ID, count of species), or a vector of species IDs.
tot.pop
total population (integer); default set to 100x the main.samp size.
numit
control argument passed to optimisation routine; the maximum number of iterations that modFit will perform. See modFit for details.
varleft
control argument passed to optimisation routine; see modFit for details.
subsizes
either number of nested subsamples (integer, must be 2 or greater), or a vector of nested sample lengths. If the former, then the vector of sample lengths will be created using the divsamplenum function.
dssamps
list of user specified rarefaction data divsubsamples objects. The length of each component vector of each object in the list must correspond to the vector of nested sample lengths (as defined by the user in subsizes).
nrf
difference between lengths of successive rarefaction datapoints.
minrarefac
minimum rarefaction x-axis value. This argument is not used if list of divsubsamples object is specified in dssamps.
NResamples
number of resamples used to calculate the rarefaction data. This parameter is not used if list of divsubsamples object is specified in dssamps.
minplaus
lower x-axis bound for plausibility check.
precision.lv
vector of precision level values for each criterion: 1. discrepancy – mean percentage error between rarefaction data points and model predicion, 2. Sample accuracy – percentage error between observed diversity of full rarefaction data and estimated diversity of full data from subsample, 3. local similarity. The scores for each criteria are defined as 1 + (multiples of bin sizes)
plaus.pen
penalty score for breaking the plausibility criterion: a model fit should be monotonically increasing and should have a slowing rate of species accumulation.
crit.wts
vector of weights of each of the four scoring criteria – fit, accuracy, similarity, plausibility. Default is c(1,1,1,1).
fitloops
number of fitting rounds performed for each model. In each round of fitting, the initial seed parameter values for each model will be the fitted parameters of the previous fitting run. This parameter has a significant impact on the computational time. The ‘sweet spot’ is 2.
numpred
number of topscoring models used for diversity prediction. Default is 5.
Details
This is the master function of the DivE estimator. The default operation is a combination of four steps. 1. Generate a list of nested samples lengths from the main sample. 2. For each nested subsample, generate a vector of rarefaction data and their associated mean species diversity. 3. Fit to the generated data a set of models. 4. Evaluate the fits according to the DivE diversity estimation methodology and compare the scores across models and fitting criteria.
A list of DiveMaster objects, each representing the fits to different sets of models, can combined into a single DiveMaster object using the comb.dm function. This is useful when running the DivE estimator with the full set of 58 models in a single run is not possible.
One can estimate the diversity for a given population using the popdiversity function where the arguments are the Divemaster object and the population size respectively.
Value
A list of objects:
model.score
a matrix of aggregated model scores
fmm
a list of fitsingMod objects corresponding to the list of fitted models
ssm
a matrix of scores of the fit of the models corresponding to the list of fitted models
estimate
the estimate of species richness (number of species/classes or diversity) at population size tot.pop. This is the geometric average of the models corresponding to the top-five model scores
lower_estimate
as per estimate value, but the lowest prediction amongst the models having the top-five scores
upper_estimate
as per estimate value, but the highest prediction amongst the models having the top-five scores
models
list of original input models
m
number of topscoring models used for diversity estimate
Author(s)
Daniel Laydon, Aaron Sim, Charles Bangham, Becca Asquith
References
Laydon, D., Melamed, A., Sim, A., Gillet, N. A., Sim, K., Darko, S., Kroll, S., Douek, D. C., Price, D., Bangham, C. R. M., Asquith, B., Quantification of HTLV-1 clonality and TCR diversity, PLOS Comput. Biol. 2014
See Also
fitsinglemod, scoresinglemod
Examples
require(DivE)
data(Bact1)
data(ModelSet)
data(ParamSeeds)
data(ParamRanges)
testmodels <- list()
testmeta <- list()
paramranges <- list()
# Choose a single model
testmodels <- c(testmodels, ModelSet[1])
#testmeta[[1]] <- (ParamSeeds[[1]]) # Commented out for sake of brevity)
testmeta[[1]] <- matrix(c(0.9451638, 0.007428265, 0.9938149, 1.0147441, 0.009543598, 0.9870419),
nrow=2, byrow=TRUE, dimnames=list(c(), c("a1", "a2", "a3"))) # Example seeds
paramranges[[1]] <- ParamRanges[[1]]
# Create divsubsamples object (NB: For quick illustration only -- not default parameters)
dss_1 <- divsubsamples(Bact1, nrf=2, minrarefac=1, maxrarefac=40, NResamples=5)
dss_2 <- divsubsamples(Bact1, nrf=2, minrarefac=1, maxrarefac=65, NResamples=5)
dss <- list(dss_2, dss_1)
# Implement the function (NB: For quick illustration only -- not default parameters)
out <- DiveMaster(models=testmodels, init.params=testmeta, param.ranges=paramranges,
main.samp=Bact1, subsizes=c(65, 40), NResamples=5, fitloops=1,
dssamp=dss, numit=2, varleft=10)
# DiveMaster Outputs
out
out$estimate
out$fmm$logistic
out$fmm$logistic$global
out$ssm
summary(out)
## Combining two DiveMaster objects (assuming a second object 'out2'):
# out3 <- comb.dm(list(out, out2))
## To calculate the diversity for a different population size
# popdiversity(dm=out, popsize=10^5, TopX=1)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(DivE)
Loading required package: deSolve
Attaching package: 'deSolve'
The following object is masked from 'package:graphics':
matplot
Loading required package: FME
Loading required package: rootSolve
Loading required package: coda
Loading required package: rgeos
rgeos version: 0.3-19, (SVN revision 524)
GEOS runtime version: 3.5.0-CAPI-1.9.0 r4084
Linking to sp version: 1.2-3
Polygon checking: TRUE
Loading required package: sp
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/DivE/DiveMaster.Rd_%03d_medium.png", width=480, height=480)
> ### Name: DiveMaster
> ### Title: DiveMaster
> ### Aliases: DiveMaster DiveMaster.default print.DiversityMaster
> ### summary.DiversityMaster print.summary.DiversityMaster
> ### Keywords: diversity
>
> ### ** Examples
>
> require(DivE)
> data(Bact1)
> data(ModelSet)
> data(ParamSeeds)
> data(ParamRanges)
>
> testmodels <- list()
> testmeta <- list()
> paramranges <- list()
>
> # Choose a single model
> testmodels <- c(testmodels, ModelSet[1])
> #testmeta[[1]] <- (ParamSeeds[[1]]) # Commented out for sake of brevity)
> testmeta[[1]] <- matrix(c(0.9451638, 0.007428265, 0.9938149, 1.0147441, 0.009543598, 0.9870419),
+ nrow=2, byrow=TRUE, dimnames=list(c(), c("a1", "a2", "a3"))) # Example seeds
> paramranges[[1]] <- ParamRanges[[1]]
>
>
> # Create divsubsamples object (NB: For quick illustration only -- not default parameters)
> dss_1 <- divsubsamples(Bact1, nrf=2, minrarefac=1, maxrarefac=40, NResamples=5)
> dss_2 <- divsubsamples(Bact1, nrf=2, minrarefac=1, maxrarefac=65, NResamples=5)
> dss <- list(dss_2, dss_1)
>
> # Implement the function (NB: For quick illustration only -- not default parameters)
> out <- DiveMaster(models=testmodels, init.params=testmeta, param.ranges=paramranges,
+ main.samp=Bact1, subsizes=c(65, 40), NResamples=5, fitloops=1,
+ dssamp=dss, numit=2, varleft=10)
Loading predefined subsamples
Fitting model 1 of 1 (Est. time remaining: tbd mins)
Fitting loop 1
Performing fitting routine for sample 1
Choosing optimal initial parameters for global fit
Performing global fit
Performing local fit
Performing fitting routine for sample 2
Choosing optimal initial parameters for global fit
Performing global fit
Performing local fit
Scoring model 1
Aggregating scoring for model 1
>
> # DiveMaster Outputs
> out
Model score ($model.scores):
Combined score
logistic 187.5
Population diversity estimate ($estimate):
[1] 139.1354
> out$estimate
[1] 139.1354
>
> out$fmm$logistic
Fitting results for model: logistic
Model parameters ($param):
a1 a2 a3
65 1.0178687 0.007313728 0.9692542
40 0.8107375 0.008087964 1.0734446
Subsample sizes ($subsamplesizes):
[1] 65 40
Measures for discrepancies between rarefaction data and model fits ($discrep, $ssr, $ms):
Mean_absolute_error Sum_of_squares Mean_squared_residual
65 0.01046920 1.8245584 0.05528965
40 0.02202536 0.8387832 0.03994206
Diversity predictions and the local prediction accuracy ($AccuracyToObserved, $local, $global):
Accuracy_to_observed_dataset_diversity Predicted_local_diversity
65 -0.05530098 136.03666
40 -0.30954488 99.42554
Predicted_global_diversity
65 139.1354
40 100.2342
Similarity measure: normalised area between curve and largest-sample curve ($local.ref.dist):
Distance_from_local_reference_curve
40 0.2489548
Plausibility measures: monotonicity and decreasing 2nd derivative ($plausibility):
Is plausible Monotonically increasing Slowing
65 TRUE TRUE TRUE
40 TRUE TRUE TRUE
>
> out$fmm$logistic$global
Predicted_global_diversity
65 139.1354
40 100.2342
>
> out$ssm
fit accuracy similarity plausibility
logistic 163 37 50 500
>
> summary(out)
Call:
DiveMaster.default(models = testmodels, init.params = testmeta,
param.ranges = paramranges, main.samp = Bact1, numit = 2,
varleft = 10, subsizes = c(65, 40), dssamps = dss, NResamples = 5,
fitloops = 1)
Multiple Model fitting and scoring summary:
Number_of_models_fitted Best_scoring_model Estimated_global_diversity
1 1 logistic 139.1354
Diversity_upperbound Diversity_lowerbound
1 139.1354 139.1354
>
> ## Combining two DiveMaster objects (assuming a second object 'out2'):
> # out3 <- comb.dm(list(out, out2))
>
> ## To calculate the diversity for a different population size
> # popdiversity(dm=out, popsize=10^5, TopX=1)
>
>
>
>
>
>
> dev.off()
null device
1
>