R Graphical Manual

Browse All

Last data update: 2014.03.03

R: DiveMaster

DiveMaster

R Documentation

DiveMaster

Description

Implements the DivE diversity estimator.

Usage

DiveMaster(models, init.params, param.ranges, main.samp,
           tot.pop=(100*(divsamplenum(main.samp,2)[1])), numit=10^5,
           varleft=1e-8, subsizes=6, dssamps=list(), nrf=1, minrarefac=1,
           NResamples=1000, minplaus=10,
           precision.lv=c(0.0001, 0.005, 0.005), plaus.pen=500,
           crit.wts=c(1.0, 1.0, 1.0, 1.0), fitloops=2, numpred=5)

Arguments

`models`	list of models; each model is written as a function: function(x, params) { with(as.list(params), <function of params>)}. Examples are given in the ModelSet data file as part of the DivE package.
`init.params`	list of matrices of initial seed model parameters. For each matrix, each row represents a given parameter set; each column represents a parameter value. Column names must match parameter names (params) in the corresponding model in the list `models`. Examples are given in the ParamSeeds data file as part of the DivE package.
`param.ranges`	list of matrices of lower and upper model parameters bounds. Used for the modFit function. The first and second row corresponds to the lower and upper bounds respectively; each column represents a parameter value. Column names must match parameter names (params) in the corresponding model in the list `models`. Examples are given in the param.ranges data file as part of the DivE package.
`main.samp`	the main sample, either as a 2-column data.frame (species ID, count of species), or a vector of species IDs.
`tot.pop`	total population (integer); default set to 100x the `main.samp` size.
`numit`	control argument passed to optimisation routine; the maximum number of iterations that modFit will perform. See `modFit` for details.
`varleft`	control argument passed to optimisation routine; see `modFit` for details.
`subsizes`	either number of nested subsamples (integer, must be 2 or greater), or a vector of nested sample lengths. If the former, then the vector of sample lengths will be created using the divsamplenum function.
`dssamps`	list of user specified rarefaction data divsubsamples objects. The length of each component vector of each object in the list must correspond to the vector of nested sample lengths (as defined by the user in `subsizes`).
`nrf`	difference between lengths of successive rarefaction datapoints.
`minrarefac`	minimum rarefaction x-axis value. This argument is not used if list of divsubsamples object is specified in `dssamps`.
`NResamples`	number of resamples used to calculate the rarefaction data. This parameter is not used if list of divsubsamples object is specified in `dssamps`.
`minplaus`	lower x-axis bound for plausibility check.
`precision.lv`	vector of precision level values for each criterion: 1. discrepancy – mean percentage error between rarefaction data points and model predicion, 2. Sample accuracy – percentage error between observed diversity of full rarefaction data and estimated diversity of full data from subsample, 3. local similarity. The scores for each criteria are defined as 1 + (multiples of bin sizes)
`plaus.pen`	penalty score for breaking the plausibility criterion: a model fit should be monotonically increasing and should have a slowing rate of species accumulation.
`crit.wts`	vector of weights of each of the four scoring criteria – fit, accuracy, similarity, plausibility. Default is c(1,1,1,1).
`fitloops`	number of fitting rounds performed for each model. In each round of fitting, the initial seed parameter values for each model will be the fitted parameters of the previous fitting run. This parameter has a significant impact on the computational time. The ‘sweet spot’ is 2.
`numpred`	number of topscoring models used for diversity prediction. Default is 5.

Details

This is the master function of the DivE estimator. The default operation is a combination of four steps. 1. Generate a list of nested samples lengths from the main sample. 2. For each nested subsample, generate a vector of rarefaction data and their associated mean species diversity. 3. Fit to the generated data a set of models. 4. Evaluate the fits according to the DivE diversity estimation methodology and compare the scores across models and fitting criteria.

A list of DiveMaster objects, each representing the fits to different sets of models, can combined into a single DiveMaster object using the comb.dm function. This is useful when running the DivE estimator with the full set of 58 models in a single run is not possible.

One can estimate the diversity for a given population using the popdiversity function where the arguments are the Divemaster object and the population size respectively.

Value

A list of objects:

`model.score`	a matrix of aggregated model scores
`fmm`	a list of fitsingMod objects corresponding to the list of fitted models
`ssm`	a matrix of scores of the fit of the models corresponding to the list of fitted models
`estimate`	the estimate of species richness (number of species/classes or diversity) at population size `tot.pop`. This is the geometric average of the models corresponding to the top-five model scores
`lower_estimate`	as per estimate value, but the lowest prediction amongst the models having the top-five scores
`upper_estimate`	as per estimate value, but the highest prediction amongst the models having the top-five scores
`models`	list of original input models
`m`	number of topscoring models used for diversity estimate

Author(s)

Daniel Laydon, Aaron Sim, Charles Bangham, Becca Asquith

References

Laydon, D., Melamed, A., Sim, A., Gillet, N. A., Sim, K., Darko, S., Kroll, S., Douek, D. C., Price, D., Bangham, C. R. M., Asquith, B., Quantification of HTLV-1 clonality and TCR diversity, PLOS Comput. Biol. 2014

Examples

require(DivE)
data(Bact1)
data(ModelSet)
data(ParamSeeds)
data(ParamRanges)

testmodels <- list()
testmeta <- list()
paramranges <- list()

# Choose a single model 
testmodels <- c(testmodels, ModelSet[1])
#testmeta[[1]] <- (ParamSeeds[[1]]) # Commented out for sake of brevity)
testmeta[[1]] <- matrix(c(0.9451638, 0.007428265, 0.9938149, 1.0147441, 0.009543598, 0.9870419),
                        nrow=2, byrow=TRUE, dimnames=list(c(), c("a1", "a2", "a3"))) # Example seeds
paramranges[[1]] <- ParamRanges[[1]]


# Create divsubsamples object (NB: For quick illustration only -- not default parameters)
dss_1 <- divsubsamples(Bact1, nrf=2, minrarefac=1, maxrarefac=40, NResamples=5)
dss_2 <- divsubsamples(Bact1, nrf=2, minrarefac=1, maxrarefac=65, NResamples=5)
dss <- list(dss_2, dss_1)

# Implement the function (NB: For quick illustration only -- not default parameters)
out <- DiveMaster(models=testmodels, init.params=testmeta, param.ranges=paramranges,
                  main.samp=Bact1, subsizes=c(65, 40), NResamples=5, fitloops=1,
                  dssamp=dss, numit=2, varleft=10)

# DiveMaster Outputs
out
out$estimate

out$fmm$logistic

out$fmm$logistic$global

out$ssm

summary(out)

## Combining two DiveMaster objects (assuming a second object 'out2'):
# out3 <- comb.dm(list(out, out2))

## To calculate the diversity for a different population size
# popdiversity(dm=out, popsize=10^5, TopX=1)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(DivE)
Loading required package: deSolve

Attaching package: 'deSolve'

The following object is masked from 'package:graphics':

    matplot

Loading required package: FME
Loading required package: rootSolve
Loading required package: coda
Loading required package: rgeos
rgeos version: 0.3-19, (SVN revision 524)
 GEOS runtime version: 3.5.0-CAPI-1.9.0 r4084 
 Linking to sp version: 1.2-3 
 Polygon checking: TRUE 

Loading required package: sp
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/DivE/DiveMaster.Rd_%03d_medium.png", width=480, height=480)
> ### Name: DiveMaster
> ### Title: DiveMaster
> ### Aliases: DiveMaster DiveMaster.default print.DiversityMaster
> ###   summary.DiversityMaster print.summary.DiversityMaster
> ### Keywords: diversity
> 
> ### ** Examples
> 
> require(DivE)
> data(Bact1)
> data(ModelSet)
> data(ParamSeeds)
> data(ParamRanges)
> 
> testmodels <- list()
> testmeta <- list()
> paramranges <- list()
> 
> # Choose a single model 
> testmodels <- c(testmodels, ModelSet[1])
> #testmeta[[1]] <- (ParamSeeds[[1]]) # Commented out for sake of brevity)
> testmeta[[1]] <- matrix(c(0.9451638, 0.007428265, 0.9938149, 1.0147441, 0.009543598, 0.9870419),
+                         nrow=2, byrow=TRUE, dimnames=list(c(), c("a1", "a2", "a3"))) # Example seeds
> paramranges[[1]] <- ParamRanges[[1]]
> 
> 
> # Create divsubsamples object (NB: For quick illustration only -- not default parameters)
> dss_1 <- divsubsamples(Bact1, nrf=2, minrarefac=1, maxrarefac=40, NResamples=5)
> dss_2 <- divsubsamples(Bact1, nrf=2, minrarefac=1, maxrarefac=65, NResamples=5)
> dss <- list(dss_2, dss_1)
> 
> # Implement the function (NB: For quick illustration only -- not default parameters)
> out <- DiveMaster(models=testmodels, init.params=testmeta, param.ranges=paramranges,
+                   main.samp=Bact1, subsizes=c(65, 40), NResamples=5, fitloops=1,
+                   dssamp=dss, numit=2, varleft=10)
Loading predefined subsamples 
Fitting model 1 of 1 (Est. time remaining: tbd mins) 
Fitting loop 1 
Performing fitting routine for sample  1 
Choosing optimal initial parameters for global fit 
Performing global fit 
Performing local fit 
Performing fitting routine for sample  2 
Choosing optimal initial parameters for global fit 
Performing global fit 
Performing local fit 
Scoring model 1 
Aggregating scoring for model 1 
> 
> # DiveMaster Outputs
> out
Model score ($model.scores):
         Combined score
logistic          187.5

Population diversity estimate ($estimate):
[1] 139.1354
> out$estimate
[1] 139.1354
> 
> out$fmm$logistic
Fitting results for model: logistic 
Model parameters ($param):
          a1          a2        a3
65 1.0178687 0.007313728 0.9692542
40 0.8107375 0.008087964 1.0734446

Subsample sizes ($subsamplesizes):
[1] 65 40

Measures for discrepancies between rarefaction data and model fits ($discrep, $ssr, $ms):
   Mean_absolute_error Sum_of_squares Mean_squared_residual
65          0.01046920      1.8245584            0.05528965
40          0.02202536      0.8387832            0.03994206

Diversity predictions and the local prediction accuracy ($AccuracyToObserved, $local, $global):
   Accuracy_to_observed_dataset_diversity Predicted_local_diversity
65                            -0.05530098                 136.03666
40                            -0.30954488                  99.42554
   Predicted_global_diversity
65                   139.1354
40                   100.2342

Similarity measure: normalised area between curve and largest-sample curve ($local.ref.dist):
   Distance_from_local_reference_curve
40                           0.2489548

Plausibility measures: monotonicity and decreasing 2nd derivative ($plausibility):
   Is plausible Monotonically increasing Slowing
65         TRUE                     TRUE    TRUE
40         TRUE                     TRUE    TRUE
> 
> out$fmm$logistic$global
   Predicted_global_diversity
65                   139.1354
40                   100.2342
> 
> out$ssm
         fit accuracy similarity plausibility
logistic 163       37         50          500
> 
> summary(out)
Call:
DiveMaster.default(models = testmodels, init.params = testmeta, 
    param.ranges = paramranges, main.samp = Bact1, numit = 2, 
    varleft = 10, subsizes = c(65, 40), dssamps = dss, NResamples = 5, 
    fitloops = 1)

Multiple Model fitting and scoring summary:
  Number_of_models_fitted Best_scoring_model Estimated_global_diversity
1                       1           logistic                   139.1354
  Diversity_upperbound Diversity_lowerbound
1             139.1354             139.1354
> 
> ## Combining two DiveMaster objects (assuming a second object 'out2'):
> # out3 <- comb.dm(list(out, out2))
> 
> ## To calculate the diversity for a different population size
> # popdiversity(dm=out, popsize=10^5, TopX=1)
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>

DiveMaster

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Results