Last data update: 2014.03.03

R: Correlation matrices
correlationR Documentation

Correlation matrices

Description

Compute the correlation matrix between two variables, or more (between all columns of a matrix or data frame).

Usage

correlation(x, ...)
## S3 method for class 'formula'
correlation(formula, data = NULL, subset, na.action, ...)
## Default S3 method:
correlation(x, y = NULL, use = "everything",
    method = c("pearson", "kendall", "spearman"), ...)

is.correlation(x)
as.correlation(x)

## S3 method for class 'correlation'
print(x, digits = 3, cutoff = 0, ...)
## S3 method for class 'correlation'
summary(object, cutpoints = c(0.3, 0.6, 0.8, 0.9, 0.95),
    symbols = c(" ", ".", ",", "+", "*", "B"), ...)
## S3 method for class 'summary.correlation'
print(x, ...)
## S3 method for class 'correlation'
plot(x, y = NULL, outline = TRUE,
    cutpoints = c(0.3, 0.6, 0.8, 0.9, 0.95), palette = rwb.colors, col = NULL,
    numbers = TRUE, digits = 2, type = c("full", "lower", "upper"),
    diag = (type == "full"), cex.lab = par("cex.lab"), cex = 0.75 * par("cex"),
    ...)

Arguments

x

a numeric vector, matrix or data frame (or any object for is.correlation(), or as.correlation()).

formula

a formula with no response variable, referring only to numeric variables.

data

an optional data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).

subset

an optional vector used to select rows (observations) of the data matrix x.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The 'factory-fresh' default is na.omit.

method

a character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman", can be abbreviated.

y

NULL (default), or a vector, matrix or data frame with compatible dimensions to x for correlation(). The default is equivalent to x = y, but more efficient. For plot.correlation(), if a second 'correlation' object is provided in y, then a visual comparison of two correlation matrices is performed (not implemented yet)!

use

an optional character string giving a method for computing correlations in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs".

digits

digits to print after the decimal separator.

cutoff

correlation coefficients lower than this (in absolute value) are suppressed.

object

a 'correlation' object.

cutpoints

the cut points to use for categories. Specify only positive values (absolute value of correlation coefficients are summarized, or negative equivalents are automatically computed for the graph. Do not include 0 or 1 in the cutpoints).

symbols

the symbols to use to summarize the correlation matrix.

outline

do we draw the outline of the ellipse?

palette

a function that can produce a palette of colors.

col

color of the ellipse. If NULL (default), the colors will be computed using cutpoints and palette.

numbers

do we print correlation values in the center of the ellipses?

type

do we plot a complete matrix, or only lower or upper triangle?

diag

do we plot items on the diagonal? They have always a correlation of one.

cex.lab

the expansion factor for labels.

cex

the expansion factor for text.

...

further arguments passed to functions.

Value

correlation() and as.correlation() create a 'correlation' object, while is.correlation() tests for it.

There are print() and summary() methods for the 'correlation' object that differ in the symbolic encoding of the correlations in summary(), using symnum, which makes large correlation matrices more readable.

The method plot returns nothing, but it draws ellipses on a graph that represent the correlation matrix visually. This is essentially the plotcorr() function from package ellipse, with slightly different default arguments and with default cutpoints equivalent to those used in the summary method.

Author(s)

Philippe Grosjean <phgrosjean@sciviews.org>, wrapping code in package ellipse, function plotcorr() for the plot.correlation() method.

See Also

cov, cov2cor, cov.wt, symnum, plotcorr and look also at panel.cor

Examples

## This is a simple correlation coefficient
cor(rnorm(10), runif(10))
## but this is a 'correlation' object containing a correlation matrix
correlation(rnorm(10), runif(10))

## 'correlation' objects allow better inspection of the correlation matrices
## than the output of default R cor() function
(longley.cor <- correlation(longley))
summary(longley.cor) # Synthetic view of the correlation matrix
plot(longley.cor)    # Graphical representation

## Use of the formula interface
(mtcars.cor <- correlation(~ mpg + cyl + disp + hp, data = mtcars,
    method = "spearman", na.action = "na.omit"))

mtcars.cor2 <- correlation(mtcars, method = "spearman")
print(mtcars.cor2, cutoff = 0.6)
summary(mtcars.cor2)
plot(mtcars.cor2, type = "lower")

mtcars.cor2["mpg", "cyl"] # Extract one correlation from the correlation matrix
## TODO: a plot comparing two correlation matrices

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SciViews)
Loading required package: MASS
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/SciViews/correlation.Rd_%03d_medium.png", width=480, height=480)
> ### Name: correlation
> ### Title: Correlation matrices
> ### Aliases: correlation correlation.formula correlation.default
> ###   is.correlation as.correlation print.correlation summary.correlation
> ###   print.summary.correlation plot.correlation
> ### Keywords: distribution
> 
> ### ** Examples
> 
> ## This is a simple correlation coefficient
> cor(rnorm(10), runif(10))
[1] 0.719619
> ## but this is a 'correlation' object containing a correlation matrix
> correlation(rnorm(10), runif(10))
Matrix of Pearson's product-moment correlation:
(calculation uses everything)
  x     y    
x 1.000 0.244
y 0.244 1.000
> 
> ## 'correlation' objects allow better inspection of the correlation matrices
> ## than the output of default R cor() function
> (longley.cor <- correlation(longley))
Matrix of Pearson's product-moment correlation:
(calculation uses everything)
             GNP.deflator GNP    Unemployed Armed.Forces Population Year  
GNP.deflator  1.000        0.992  0.621      0.465        0.979      0.991
GNP           0.992        1.000  0.604      0.446        0.991      0.995
Unemployed    0.621        0.604  1.000     -0.177        0.687      0.668
Armed.Forces  0.465        0.446 -0.177      1.000        0.364      0.417
Population    0.979        0.991  0.687      0.364        1.000      0.994
Year          0.991        0.995  0.668      0.417        0.994      1.000
Employed      0.971        0.984  0.502      0.457        0.960      0.971
             Employed
GNP.deflator  0.971  
GNP           0.984  
Unemployed    0.502  
Armed.Forces  0.457  
Population    0.960  
Year          0.971  
Employed      1.000  
> summary(longley.cor) # Synthetic view of the correlation matrix
Matrix of Pearson's product-moment correlation:
(calculation uses everything)
             GNP. GNP U A P Y E
GNP.deflator 1                 
GNP          B    1            
Unemployed   ,    ,   1        
Armed.Forces .    .     1      
Population   B    B   , . 1    
Year         B    B   , . B 1  
Employed     B    B   . . B B 1
attr(,"legend")
[1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1
> plot(longley.cor)    # Graphical representation
> 
> ## Use of the formula interface
> (mtcars.cor <- correlation(~ mpg + cyl + disp + hp, data = mtcars,
+     method = "spearman", na.action = "na.omit"))
Matrix of Spearman's rank correlation rho:
(missing values are managed with na.omit)
     mpg    cyl    disp   hp    
mpg   1.000 -0.911 -0.909 -0.895
cyl  -0.911  1.000  0.928  0.902
disp -0.909  0.928  1.000  0.851
hp   -0.895  0.902  0.851  1.000
> 
> mtcars.cor2 <- correlation(mtcars, method = "spearman")
> print(mtcars.cor2, cutoff = 0.6)
Matrix of Spearman's rank correlation rho:
(calculation uses everything)
     mpg    cyl    disp   hp     drat   wt     qsec   vs     am     gear  
mpg   1.000 -0.911 -0.909 -0.895  0.651 -0.886         0.707              
cyl  -0.911  1.000  0.928  0.902 -0.679  0.858        -0.814              
disp -0.909  0.928  1.000  0.851 -0.684  0.898        -0.724 -0.624       
hp   -0.895  0.902  0.851  1.000         0.775 -0.667 -0.752              
drat  0.651 -0.679 -0.684         1.000 -0.750                0.687  0.745
wt   -0.886  0.858  0.898  0.775 -0.750  1.000               -0.738 -0.676
qsec                      -0.667                1.000  0.792              
vs    0.707 -0.814 -0.724 -0.752                0.792  1.000              
am                 -0.624         0.687 -0.738                1.000  0.808
gear                              0.745 -0.676                0.808  1.000
carb -0.657                0.733               -0.659 -0.634              
     carb  
mpg  -0.657
cyl        
disp       
hp    0.733
drat       
wt         
qsec -0.659
vs   -0.634
am         
gear       
carb  1.000
> summary(mtcars.cor2)
Matrix of Spearman's rank correlation rho:
(calculation uses everything)
     m cy ds h dr w q v a g cr
mpg  1                        
cyl  * 1                      
disp * *  1                   
hp   + *  +  1                
drat , ,  ,  . 1              
wt   + +  +  , ,  1           
qsec . .  .  ,      1         
vs   , +  ,  , .  . , 1       
am   . .  ,  . ,  ,     1     
gear . .  .  . ,  ,     + 1   
carb , .  .  ,    . , ,     1 
attr(,"legend")
[1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1
> plot(mtcars.cor2, type = "lower")
> 
> mtcars.cor2["mpg", "cyl"] # Extract one correlation from the correlation matrix
[1] -0.9108013
> ## TODO: a plot comparing two correlation matrices
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>