Performs a canonical correlation (and canonical redundancy) analysis on two sets of variables.
Usage
cca(x, y, xlab = colnames(x), ylab = colnames(y), xcenter = TRUE,
ycenter = TRUE, xscale = FALSE, yscale = FALSE,
standardize.scores = TRUE, use = "complete.obs", na.rm = TRUE)
## S3 method for class 'cca'
plot(x, ...)
## S3 method for class 'cca'
print(x, ...)
## S3 method for class 'cca'
summary(object, ...)
Arguments
x
for cca, a single vector or a matrix whose columns contain the x variables. Otherwise, a cca object.
y
a single vector or a matrix whose columns contain the x variables.
xlab
an optional vector of x labels.
ylab
an optional vector of y labels.
xcenter
boolean; demean the x variables?
ycenter
boolean; demean the y variables?
xscale
boolean; scale the x variables to unit variance?
yscale
boolean; scale the y variables to unit variance?
standardize.scores
boolean; rescale scores (and coefficients) to produce scores of unit variance?
use
use argument to be passed to var when creating covariance matrices.
na.rm
boolean; remove missing values during redundancy analysis?
object
a cca object.
...
additional arguments.
Details
Canonical correlation analysis (CCA) is a form of linear subspace analysis, and involves the projection of two sets of vectors (here, the variable sets x and y) onto a joint subspace. The goal of (CCA) is to find a squence of linear transformations of each variable set, such that the correlations between the transformed variables are maximized (under the proviso that each transformed variable must be orthogonal to those preceding it). These transformed variables – known as “canonical variates” (CVs) – can be thought of as expressing the common variation across the data sets, in a manner analogous to the role of principal components in within-set analysis (see, e.g., princomp). Since the rank of the joint subspace is equal to the minimum of the ranks of the two spaces spanned by the initial data vectors, it follows that the number of CVs will usually be equal to the minimum of the number of x and y variables (perhaps fewer, if the sets are not of full rank).
Formally, we may describe the CCA solution as follows. Given data matrices X and Y, let Cxx, Cxy, Cyx and Cyy be the respective sample covariance matrices for X versus itself, X versus Y, Y versus X, and Y versus itself. Now, for some i less than or equal to the minimum rank of X and Y, let u_i be the ith eigenvector of Cxx^-1 %*% Cxy %*% Cyy^-1 %*% Cyx, with corresponding eigenvalue λ_i. Then the vector u_i contains the coefficients projecting X onto the i
th canonical variate; the corresponding scores are given by X %*% u_i. Similarly, let v_i be the ith eigenvector of Cyy^-1 %*% Cyx %*% Cxx^-1 %*% Cxy. Then v_i contains the coefficients projecting Y onto the ith canonical variate (with scores Y %*% v_i). The eigenvalue in the second case will be the same as the first, and corresponds to the square of the ith canonical correlation for the CCA solution – that is, the correlation between the X and Y scores on the ith canonical variate. Since the canonical correlation structure is unaffected by rescaling of the canonical variate scores, it is common to adjust the coefficients u_i and v_i to ensure that the resulting scores have unit variance; this option is controlled here via the standardize.scores argument.
CCA output can be fairly complex. Quantities of particular interest include the correlations between the original variables in each set and their respective canonical variates (structural correlations or loadings), the coefficients which take the original variables into the CVs, and of course the correlations between the CV scores in one set and their corresponding scores in the opposite set (the canonical correlations). The canonical correlations provide a basic measure of concordance between the transformed variables, but are surprisingly uninformative by themselves; canonical redundancies (see below) are of more typical interest. Interpretation of CVs is usually performed by inspection of loadings, which reveal the extent to which each CV is associated with particular variables in each set. The squared loadings, in particular, convey the fraction of variance in each original variable which is accounted for by a given CV (though not necessarily by the variables in the opposite set!).
A common interest in the context of CCA is the extent to which the variance of one set of variables can be accounted for by the other (in the usual least squares sense). While it is tempting to interpret the squared canonical correlations in this manner, this is incorrect: the squared canonical correlations convey the fraction of variance in the CV scores from one variable set which can be accounted for by scores from the other, but say nothing about the extent to which the CVs themselves account for variation in the original variables. The variance in one set explainable by the other is instead expressed via the so-called redundancy index, which combines the squared canonical correlations with the canonical adequacy (within-set variance accounted for) for each CV. The use of the redundancy index in this way is sometimes called “(canonical) redundancy analysis”, although it is simply an alternate means of presenting CCA results.
As the name of the technique implies, CCA is a symmetric procedure: the designation of one variable set as x and another as y is arbitrary, and may be reversed without incident. (Note, however, that the coefficients and redundancies are set-specific, and will also be reversed in this case.) CCA with one x or y variable is equivalent to OLS regression (with the squared canonical correlation corresponding to the R^2), and CCA on one variable pair yields the familiar Pearson product-moment correlation. Centering and scaling data prior to analysis is equivalent to working with correlation matrices in the underlying analysis (with interpretation/effects analogous to the principal components case).
Value
An object of class cca, whose elements are as follows:
corr
Canonical correlations.
corrsq
Squared canonical correlations (shared variance across canonical variates).
xcoef
Coefficients for the x variables on each canonical variate.
ycoef
Coefficients for the y variables on each canonical variate.
canvarx
Canonical variate scores for the x variables.
canvary
Canonical variate scores for the y variables.
xstructcorr
Structural correlations (loadings) for x variables on each canonical variate.
ystructcorr
Structural correlations (loadings) for y variables on each canonical variate.
xstructcorrsq
Squared structural correlations for x variables on each canonical variate (i.e., fraction of x variance associated with each variate).
ystructcorrsq
Squared structural correlations for y variables on each canonical variate (i.e., fraction of y variance associated with each variate).
xcrosscorr
Canonical cross-loadings for x variables on the y scores for each canonical variate.
ycrosscorr
Canonical cross-loadings for y variables on the y scores for each canonical variate.
xcrosscorrsq
Squared canonical cross-loadings for x variables on the y scores for each canonical variate (i.e., the fraction of variance in each x variable attributable to y through the respective CVs).
ycrosscorrsq
Squared canonical cross-loadings for y variables on the x scores for each canonical variate (i.e., the fraction of variance in each y variable attributable to x through the respective CVs).
xcancom
Canonical communalities for x variables (for each x variable, fraction associated with all canonical variates).
ycancom
Canonical communalities for y variables (for each y variable, fraction associated with all canonical variates).
xcanvad
Canonical variate adequacies for x variables (for each canonical variate, fraction of total x variance for which it is associated).
ycanvad
Canonical variate adequacies for y variables (for each canonical variate, fraction of total y variance for which it is associated).
xvrd
Canonical redundancies for x variables (i.e., total fraction of x variance accounted for by y variables, through each canonical variate).
yvrd
Canonical redundancies for y variables (i.e., total fraction of y variance accounted for by x variables, through each canonical variate).
xrd
Total canonical redundancy for x variables (i.e., total fraction of x variance accounted for by y variables, through all canonical variates).
yrd
Total canonical redundancy for y variables (i.e., total fraction of y variance accounted for by x variables, through all canonical variates).
chisq
Sequential chi-squared values for tests of each respective canonical variate using Bartlett's omnibus statistic.
df
Degrees of freedom for Bartlett's test.
xlab
Variable names for x.
ylab
Variable names for y.
Author(s)
Carter T. Butts <buttsc@uci.edu>
References
Mardia, K. V.; Kent, J. T.; and Bibby, J. M. 1979. Multivariate Analysis. London: Academic Press.
See Also
F.test.cca, cancor, princomp
Examples
#Example parallels the R builtin cancor example
data(LifeCycleSavings)
pop <- LifeCycleSavings[, 2:3]
oec <- LifeCycleSavings[, -(2:3)]
cca.fit <- cca(pop, oec)
#View the results
cca.fit
summary(cca.fit)
plot(cca.fit)