R: Multivariate Ornstein-Uhlenbeck model of continuous traits...
mvOU
R Documentation
Multivariate Ornstein-Uhlenbeck model of continuous traits evolution
Description
This function allows the fitting of a multivariate Ornstein-Uhlenbeck (OU) model by allowing a given tree branch to be subdivided into multiple selective regimes using SIMMAP-like mapping of ancestral states.
Species measurement errors or dispersions can also be included in the model.
Phylogenetic tree with mapped ancestral states in SIMMAP format.
(See make.simmap function from phytools package). A "phylo" object can be used with model "OU1".
data
Matrix or data frame with species in rows and continuous traits in columns. NA values are allowed with the "rpf", "inverse", and "pseudoinverse" methods.
error
Matrix or data frame with species in rows and continuous trait sampling variance (squared standard errors) in columns.
model
Choose between "OUM" for a multiple selective regime model, or "OU1" for a unique selective regime for the whole tree.
param
List of arguments to be passed to the function. See details below.
method
Choose between "rpf", "sparse", "inverse", "pseudoinverse", or "univarpf" for computing the log-likelihood during the fitting process. See details below.
scale.height
Whether the tree should be scaled to unit length or not.
optimization
Methods used by the optimization routines (see ?optim and ?subplex for details). The "fixed" method returns the log-likelihood function only.
control
Max. bound for the number of iteration of the optimizer; other options can be fixed in the list. (See ?optim or ?subplex for details).
precalc
Optional. Precalculation of fixed parameters. See ?mvmorph.Precalc for details.
diagnostic
Whether the convergence diagnostics should be returned or not.
echo
Whether the results must be returned or not.
Details
The mvOU function fits a multivariate model of evolution according to an Ornstein-Uhlenbeck process. The user can incorporate measurement errors and uses SIMMAP-like mapping of ancestral states (phytools objects of class "simmap"). SIMMAP mapping allows one to assign parts of branchs to different selective regimes, and allows testing for change in trait variance that is not synchronous with the species divergence events. See the package vignette: browseVignettes("mvMORPH").
Mapping of ancestral states can be done using the "make.simmap", "make.era.map" or "paintSubTree" functions from the "phytools" package.
The "method" argument allows the user to try different algorithms for computing the log-likelihood. The "rpf", "univarpf" (for univariate analysis) and "sparse" methods use fast GLS algorithms based on factorization for avoiding the computation of the inverse of the variance-covariance matrix and its determinant for the log-likelihood estimation. The "inverse" approach uses the "stable" standard explicit computation of the inverse and determinant of the matrix and is therefore slower. The "pseudoinverse" method uses a generalized inverse that is safer for matrix near singularity but highly time consuming. See ?mvLL for details.
Arguments in the "param"list are:
"sigma" or "alpha" - Starting values for the likelihood search can be specified through the "alpha" and "sigma" arguments in the param list. It is also possible to test for the significance of the off-diagonal sigma (scatter) and alpha (drift) matrix in the full model by making comparison with a constrained model (using sigma="constraint", or alpha="constraint") in the "param" argument list. You can also provide starting values for the constrained model. For instance, for two traits use sigma=list("constraint", c(0.5,0.5)) (or alpha=list("constraint", c(0.5,0.5))).
"decomp" - You can further constrain the alpha matrix by specifying the decomposition of the matrix through the "decomp" argument in the "param" list. Indeed, the multivariate Ornstein-Uhlenbeck model is described by the spectral decomposition of the alpha matrix. Thus it is possible to parameterize the alpha matrix to be decomposable using various parameterizations (e.g., on its eigenvalues with different biological interpretations; Sy et al. 1997, Bartoszek et al. 2012). For a symmetric matrix parameterization the user can choose the "cholesky", "eigen", or "spherical" option.
For general square (non-symmetric) matrices the "svd", "qr" and "schur" parameterizations can be used. The "schur" parameterization constrains the eigenvalues of the alpha matrix to be real numbers. The "svd+", "qr+" or "eigen+" options forces the eigenvalues to be positives by taking their logarithm. It is also possible to specify "diagonal" which is similar to the use of the "constraint" argument for "alpha" argument, or to use "equal" and "equaldiagonal". Finally, one can specify that the alpha matrix is "upper" or "lower" triangular (i.e., one process affect the other unilateraly). Details can be found in the package vignette: browseVignettes("mvMORPH").
"decompSigma" - The sigma matrix is parameterized by various methods to ensure its positive definiteness (Pinheiro and Bates, 1996). These methods can be accessed through the "decompSigma" argument and are the "cholesky", "eigen+", and "spherical" parameterization. The sigma matrix can also be forced to be diagonal using "diagonal" or "equaldiagonal" and forced to have the same variances using "equal". Details can be found in the package vignette: browseVignettes("mvMORPH").
"vcv" - It is possible to specify in the "param" list what kind of variance-covariance matrix to use with the "vcv" argument, depending on how the root is treated.
The vcv="randomRoot" option assumes that the value at the root is a random variable with the stationary distribution of the process. It cannot be used with the "sparse" method to speed up the computations. The vcv="fixedRoot" option assumes that the root is a fixed parameter. On ultrametric trees both approaches should converge on the same results when the OU process is stationary.
"root" - This argument allows the user to specify if the ancestral state at the root (theta 0) should be estimated (root=TRUE), or assumed to be at the oldest regime state stationary distribution (root=FALSE). An alternative is to follow Beaulieu et al. (2012) and explicitly drop the root state influence (root="stationary"). The first option should be used with non-ultrametric trees (i.e., with fossil species; e.g., Hansen 1997) where information on the ancestral state is directly available from the data. Indeed, estimating shifts in the ancestral state from extant species could be problematic and it seems preferable to assume each regime optimum to be at the stationary distribution.
For the "decomp" and "decompSigma arguments, an user-defined matrix with integer values taken as indices of the parameters to be estimated can be provided. See ?mvBM and ?mvRWTS.
Note on the returned Hessian matrix in the result list (param$opt$hessian):
The hessian is the matrix of second order partial derivatives of the likelihood function with respect to the maximum likelihood parameter values. This matrix provides a measure of the steepness of the likelihood surface in the vicinity of the optimum. The eigen-decomposition of the hessian matrix returned by the optimizing function allows assessing the reliability of the fit of the model (even if the optimizer has converged).
When the optimization function does not converge on a stable result, the user may consider increasing the "maxit" argument in the "control" option, or try a simpler model with fewer parameters to estimate. Changing the starting values ("alpha" and "sigma" options in the param list) as well as the optimizing method ("optimization" options) may help sometimes (e.g., alpha=runif(3) for a two-trait analysis with random starting values - i.e., the lower triangular alpha matrix). Note that the number of starting values to provide depends on the matrix decomposition chosen for the alpha parameter (p*(p+1)/2 values for symmetric alpha matrix, but p*p values for non-symmetric ones - with p the number of traits).
Value
LogLik
The log-likelihood of the optimal model.
AIC
Akaike Information Criterion for the optimal model.
AICc
Sample size-corrected AIC.
theta
Estimated ancestral states.
alpha
Matrix of estimated alpha values (strength of selection).
sigma
Evolutionary rate matrix (drift).
convergence
Convergence status of the optimizing function; "0" indicates convergence. (see ?optim for details).
hess.values
Reliability of the likelihood estimates calculated through the eigen-decomposition of the hessian matrix. "0" means that a reliable estimate has been reached. See details above.
param
List of model fit parameters (optimization, method, model, number of parameters...).
llik
The log-likelihood function evaluated in the model fit "$llik(par, root.mle=TRUE)".
Note
This function partly uses a modified version of the C code from the "OUCH" package built by Aaron King, as well as a C code which is part of the "ape" package by Emmanuel Paradis. I kindly thank those authors for sharing their sources.
Note that Bartoszek et al. (2012) proposed the mvSLOUCH package also dedicated to multivariate Ornstein-Uhlenbeck processes, which allows fitting regression models with randomly evolving predictor variables.
The "symmetric", "nsymmetric", "symmetricPositive", and "nsymPositive" options for the "decomp" argument are deprecated.
Author(s)
Julien Clavel
References
Bartoszek K., Pienaar J., Mostad P., Andersson S., Hansen T.F. 2012. A phylogenetic comparative method for studying multivariate adaptation. J. Theor. Biol. 314:204-215.
Beaulieu J.M., Jhwueng D.-C., Boettiger C., O'Meara B.C. 2012. Modeling stabilizing selection: Expanding the Ornstein-Uhlenbeck model of adaptive evolution. Evolution. 66:2369-2389.
Butler M.A., King A.A. 2004. Phylogenetic comparative analysis: a modeling approach for adaptive evolution. Am. Nat. 164:683-695.
Clavel J., Escarguel G., Merceron G. 2015. mvMORPH: an R package for fitting multivariate evolutionary models to morphometric data. Methods Ecol. Evol. 6(11):1311-1319.
Hansen T.F. 1997. Stabilizing selection and the comparative analysis of adaptation. Evolution. 51:1341-1351.
# Simulated dataset
set.seed(14)
# Generating a random tree
tree<-pbtree(n=50)
# Setting the regime states of tip species
sta<-as.vector(c(rep("Forest",20),rep("Savannah",30))); names(sta)<-tree$tip.label
# Making the simmap tree with mapped states
tree<-make.simmap(tree,sta , model="ER", nsim=1)
col<-c("blue","orange"); names(col)<-c("Forest","Savannah")
# Plot of the phylogeny for illustration
plotSimmap(tree,col,fsize=0.6,node.numbers=FALSE,lwd=3, pts=FALSE)
# Simulate the traits
alpha<-matrix(c(2,0.5,0.5,1),2)
sigma<-matrix(c(0.1,0.05,0.05,0.1),2)
theta<-c(2,3,1,1.3)
data<-mvSIM(tree, param=list(sigma=sigma, alpha=alpha, ntraits=2, theta=theta,
names_traits=c("head.size","mouth.size")), model="OUM", nsim=1)
## Fitting the models
# OUM - Analysis with multiple optima
mvOU(tree, data)
# OU1 - Analysis with a unique optimum
mvOU(tree, data, model="OU1", method="sparse")
# various options
mvOU(tree, data, model="OUM", method="sparse", scale.height=FALSE,
param=list(decomp="nsymmetric", root="stationary"))
mvOU(tree, data, model="OUM", method="sparse", scale.height=FALSE,
param=list(decomp="nsymmetric", root=TRUE))
mvOU(tree, data, model="OUM", method="sparse", scale.height=FALSE,
param=list(decomp="symmetricPositive", root=TRUE))
# OUCH setting
mvOU(tree, data, model="OUM", method="rpf", scale.height=FALSE,
param=list(decomp="symmetricPositive", root=FALSE, vcv="ouch"))
## Univariate case - FAST with RPF
set.seed(14)
tree<-pbtree(n=500)
# Setting the regime states of tip species
sta<-as.vector(c(rep("Forest",200),rep("Savannah",300))); names(sta)<-tree$tip.label
# Making the simmap tree with mapped states
tree<-make.simmap(tree,sta , model="ER", nsim=1)
col<-c("blue","orange"); names(col)<-c("Forest","Savannah")
# Plot of the phylogeny for illustration
plotSimmap(tree,col,fsize=0.6,node.numbers=FALSE,lwd=3, pts=FALSE)
# Parameters
alpha<-2.5
sigma<-0.1
theta<-c(0,2)
data<-mvSIM(tree, param=list(sigma=sigma, alpha=alpha, ntraits=1, theta=theta,
names_traits=c("body_size")), model="OUM", nsim=1)
# Fit the model
system.time(mvOU(tree, data, model="OUM", method="univarpf",
param=list(root="stationary")))
system.time(mvOU(tree, data, model="OU1", method="univarpf",
param=list(root="stationary")))
# Add measurement error
error=rnorm(500,sd=0.1)
mvOU(tree, data, error=error^2, model="OUM", method="univarpf",
param=list(root="stationary"))