R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Wrapper function to find best (out of BM, OU, OUOU, OUBM)...

estimate.evolutionary.model

R Documentation

Wrapper function to find best (out of BM, OU, OUOU, OUBM) fitting evolutionary model and estimate its parameters.

Description

The estimate.evolutionary.model function calls the BrownianMotionModel, ouchModel and mvslouchModel functions with different classes of evolutionary model parameters. It then compares the resulting estimates by the AICc and returns the best overall model.

Usage

estimate.evolutionary.model(phyltree, dfdata, regimes = NULL, 
root.regime = NULL, M.error = NULL, repeats = 3, model.setups = NULL, 
predictors = NULL, kY = NULL, doPrint = FALSE, pESS=NULL)

Arguments

`phyltree`	The phylogeny in `ouch` format. All of the internal nodes have to be uniquely named. The tree can be obtained from e.g. a `nexus` file by the `read.nexus` function from the `ape` package and converted into the `ouch` format by `ouch`'s `ape2ouch` function. See the example of how to correct the internal nodes. It is strongly advisable to rescale the tree so that the determinant of the speciation time matrix (`phyltree@branch.times`) is about 1. This significantly stabilizes the estimation procedure.
`dfdata`	A data frame with the rows corresponding to the species while the columns correspond to the traits. The rows can be named by species, if not then the order of the species has to be the same as the order in which the species are on the phylogeny.
`regimes`	A vector or list of regimes. If vector then each entry corresponds to the branch preceding the respective node. If list then each list entry corresponds to a node and is a vector for regimes on that lineage. If `NULL` then a constant regime is assumed on the whole tree.
`root.regime`	The regime at the root of the tree. If not given taken as first element of regimes.
`M.error`	An optional measurement error covariance matrix. The program tries to recognizes the structure of matrix passed and accepts the following possibilities : a single number that will be on the diagonal of the covariance matrix a m element vector with each value corresponding to a variable and the covariance matrix will have that vector repeated on its diagonal, a nxm element vector a diagonal matrix with this vector on the diagonal, a m x m ((number of variables) x (number of variables)) matrix it is assumed that the measurement errors are independent between observations so the resulting covariance structure is block diagonal, a list of length m (number of variables), each list element is the covariance structure for the appropriate variable, either a single number (each observations has same variance), vector (of length n for each observation), or full matrix, matrix of size mn x mn (m - number of variables, n - number of observations) the measurement error covariance provided as is, NULL no measurement error
`repeats`	How many starting points for the numerical maximum likelihood procedure should be tried for each model setup.
`model.setups`	What models to try when searching for the best evolutionary model. This field may remain `NULL`, in this situation the function generates using `.generate.basic.model.setups()` a basic list of models. Allowed values are `"basic"` A basic list of models to try out is generated, defined using `.generate.basic.model.setups()`. This list should be usually enough. `"fundamental"` A slightly extended list of models to try out is generated, defined using `.generate.fund.model.setups()`. Compared to `"basic"` a few more models are added. `"extended"` An extension of the `"fundamental"` list of models to try out. Defined using `.generate.ext.model.setups()` which at the moment calls `generate.model.setups()`. `"all"` All possible models are generated, using `.generate.all.model.setups()`. Running it will take an intolerable amount of time. Alternatively the use is also free to provide their own list of models in this variable. Each element of the list is a list with fields. `evolmodel` The evolutionary model, it may take one of the three values `"BM"` (Brownian motion model), `"ouch"` (OUOU model), `"mvslouch"` (OUBM model). `Atype` The class of the `A` matrix, ignored if `evolmodel` equals `"BM"`. Otherwise it can take one of the following values: `"SingleValueDiagonal"`, `"Diagonal"`, `"UpperTri"`, `"LowerTri"`, `"SymmetricPositiveDefinite"`, `"Symmetric"`, `"DecomposablePositive"`, `"DecomposableNegative"`, `"DecomposableReal"`, `"Invertible"`. `Syytype` The class of the `A` matrix, ignored if `evolmodel` equals `"BM"`. Otherwise it can take one of the following values: `"SingleValueDiagonal"`, `"Diagonal"`, `"UpperTri"`, `"LowerTri"`, `"Symmetric"`, `"Any"`. `diagA` Should the diagonal of `A` be forced to be positive (`"Positive"`), negative (`"Negative"`) or the sign free to vary (`NULL`) A minimum example list is `list(list(evolmodel="BM"))`.
`predictors`	A vector giving the numbers of the columns from `dfdata` which are to be considered predictor ones, i.e. conditioned on in the program output. If not provided then the "X" variables are treated as predictors, but this only for the OUBM models (for the others in this case none are treated as predictors).
`kY`	Number of "Y" (response) variables, for the OUBM models.
`doPrint`	Should the function print out information on what it is doing (TRUE) or keep silent (default FALSE).
`pESS`	Should the function also find the best model taking into account the phylogenetic effective sample size and it so what method. If `NULL` then do not take this into account. Otherwise one of `"reg"` ("regression" effective sample size that takes into account all of the correlations between species explicitely), `"mean"` (mean value effective sample size 1^{T}R^{-1}1, where R is the interspecies correlation matrix), `"MI"` (mutual information effective sample size), `"mvreg"`(multivariate version of "regression" effective sample size when each species is described by a suite of traits) , `"mvMI"` (multivariate mutual information effective sample size when each species is described by a suite of traits) indicating the way to calculate the pESS. The default is `"reg"`.

Details

If model.setups is left at the default value the function will take a long time to run, as it performs estimation for each model (generate.model.setups generates 90 setups) times the value in repeats. Therefore if the user has particular hypotheses in mind then it is advisable to prepare their own list.

Value

A list is returned that describes the results of the search. See the help for BrownianMotionModel, ouchModel and mvslouchModel for the description of the lower level entries. The elements of this list are the following

`BestModel`	The resulting best model found. Included are the model parameters, a "first-glance" qualitative description of the model, the most important parameters of the process (half-lives and regressions in the case of OU models) and what to call to obtain standard errors. It takes a long time to obtain them so calculating them is not part of the standard procedure.
`BestModelESS`	Only if `pESS` was `TRUE`. The resulting best model found taking into account the phylogenetic essential sample size. Included are the model parameters, a "first-glance" qualitative description of the model, the most important parameters of the process (half-lives and regressions in the case of OU models) and what to call to obtain standard errors. It takes a long time to obtain them so calculating them is not part of the standard procedure.
`testedModels`	A list of results for each tried model.
`model.setups`	A list of models tried.
`repeats`	How many starting points were tried per model.

Note

The slouch package is a recommended alternative if one has a OUBM models and only a single response (Y) trait. The ouch package considers an univariate OU model and looking at it could be helpful.

Author(s)

Krzysztof Bartoszek

References

Ane, C. (2008) Analysis of comparative data with hierarchical autocorrelation. Annals of Applied Statistics 2:1078-1102.

Bartoszek, K. (2015) Phylogenetic effective sample size. ArXiv e-prints 1507.07113.

Bartoszek, K. and Pienaar, J. and Mostad. P. and Andersson, S. and Hansen, T. F. (2012) A phylogenetic comparative method for studying multivariate adaptation. Journal of Theoretical Biology 314:204-215.

Butler, M.A. and A.A. King (2004) Phylogenetic comparative analysis: a modeling approach for adaptive evolution. American Naturalist 164:683-695.

Hansen, T.F. and Pienaar, J. and Orzack, S.H. (2008) A comparative method for studying adaptation to randomly evolving environment. Evolution 62:1965-1977.

Examples

## Not run:  ##It takes too long to run this
### We will first simulate a small phylogenetic tree using functions from ape and ouch.
### For simulating the tree one could also use alternative functions, eg. sim.bd.taxa 
### from the TreeSim package
phyltree<-ape2ouch(rtree(5))

### Correct the names of the internal node labels.
phyltree@nodelabels[1:(phyltree@nnodes-phyltree@nterm)]<-as.character(
1:(phyltree@nnodes-phyltree@nterm))

### Define a vector of regimes.
regimes<-c("small","small","small","large","small","small","large","large","large")

### Define SDE parameters to be able to simulate data under the OUOU model.
OUOUparameters<-list(vY0=matrix(c(1,-1,0.5),nrow=3,ncol=1),
A=rbind(c(9,0,0),c(0,5,0),c(0,0,1)),mPsi=cbind("small"=c(1,-1,0.5),"large"=c(-1,1,0.5)),
Syy=rbind(c(1,0.25,0.3),c(0,1,0.2),c(0,0,1)))

### Now simulate the data and remove the values corresponding to the internal nodes.
OUOUdata<-simulOUCHProcPhylTree(phyltree,OUOUparameters,regimes,NULL)
OUOUdata<-OUOUdata[-(1:(phyltree@nnodes-phyltree@nterm)),]

### And finally try to recover the parameters of the OUOU model.
estimResults<-estimate.evolutionary.model(demotreeouch,OUOUdata,regimes=regimes,
root.regime="small",M.error=NULL,repeats=3,model.setups=NULL,predictors=c(3),kY=2,
doPrint=TRUE,pESS=FALSE)

## End(Not run)