Last data update: 2014.03.03
R: Determine highly correlated variables
Determine highly correlated variables
Description
This function searches through a correlation matrix and returns a vector of integers
corresponding to columns to remove to reduce pair-wise correlations.
Usage
FindCorr(x, cutoff = .90, verbose = FALSE)
Arguments
x
A correlation matrix
cutoff
A numeric value for the pair-wise absolute correlation cutoff
verbose
A boolean for printing the details
Details
The absolute values of pair-wise correlations are considered. If two variables have a high correlation,
the function looks at the mean absolute correlation of each variable and removes the variable with the
largest mean absolute correlation.
There are several function in the subselect package (leaps
, genetic
, anneal
) that can also be used
to accomplish the same goal.
Value
A vector of indices denoting the columns to remove. If no correlations meet the criteria, numeric(0)
is returned.
Author(s)
Original R code by Dong Li, modified by Max Kuhn
References
Max Kuhn. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer,
Allan Engelhardt, Tony Cooper, Zachary Mayer and the R Core Team (2014). caret:
Classification and Regression Training. R package version 6.0-35.
http://CRAN.R-project.org/package=caret
See Also
leaps
, genetic
, anneal
Examples
corrMatrix <- diag(rep(1, 5))
corrMatrix[2, 3] <- corrMatrix[3, 2] <- .7
corrMatrix[5, 3] <- corrMatrix[3, 5] <- -.7
corrMatrix[4, 1] <- corrMatrix[1, 4] <- -.67
corrDF <- expand.grid(row = 1:5, col = 1:5)
corrDF$correlation <- as.vector(corrMatrix)
PlotCorr(xtabs(correlation ~ ., corrDF), las=1, border="grey")
FindCorr(corrMatrix, cutoff = .65, verbose = TRUE)
FindCorr(corrMatrix, cutoff = .99, verbose = TRUE)
# d.pizza example
m <- cor(data.frame(lapply(d.pizza, as.numeric)), use="pairwise.complete.obs")
FindCorr(m, verbose = TRUE)
m[, FindCorr(m)]
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(DescTools)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/DescTools/FindCorr.Rd_%03d_medium.png", width=480, height=480)
> ### Name: FindCorr
> ### Title: Determine highly correlated variables
> ### Aliases: FindCorr
> ### Keywords: manip
>
> ### ** Examples
>
> corrMatrix <- diag(rep(1, 5))
> corrMatrix[2, 3] <- corrMatrix[3, 2] <- .7
> corrMatrix[5, 3] <- corrMatrix[3, 5] <- -.7
> corrMatrix[4, 1] <- corrMatrix[1, 4] <- -.67
>
> corrDF <- expand.grid(row = 1:5, col = 1:5)
> corrDF$correlation <- as.vector(corrMatrix)
> PlotCorr(xtabs(correlation ~ ., corrDF), las=1, border="grey")
>
> FindCorr(corrMatrix, cutoff = .65, verbose = TRUE)
Considering row 3 column 2 value 0.7
Flagging column 3
Considering row 2 column 5 value 0
Considering row 2 column 1 value 0
Considering row 2 column 4 value 0
Considering row 5 column 1 value 0
Considering row 5 column 4 value 0
Considering row 1 column 4 value 0.67
Flagging column 4
[1] 3 4
>
> FindCorr(corrMatrix, cutoff = .99, verbose = TRUE)
Considering row 3 column 2 value 0.7
Considering row 3 column 5 value 0.7
Considering row 3 column 1 value 0
Considering row 3 column 4 value 0
Considering row 2 column 5 value 0
Considering row 2 column 1 value 0
Considering row 2 column 4 value 0
Considering row 5 column 1 value 0
Considering row 5 column 4 value 0
Considering row 1 column 4 value 0.67
integer(0)
>
> # d.pizza example
> m <- cor(data.frame(lapply(d.pizza, as.numeric)), use="pairwise.complete.obs")
> FindCorr(m, verbose = TRUE)
Considering row 8 column 3 value 0.018
Considering row 8 column 2 value 0.028
Considering row 8 column 1 value 0.03
Considering row 8 column 12 value 0.019
Considering row 8 column 16 value 0.076
Considering row 8 column 5 value 0.152
Considering row 8 column 11 value 0.095
Considering row 8 column 13 value 0.51
Considering row 8 column 14 value 0.478
Considering row 8 column 6 value 0.807
Considering row 8 column 7 value 0.543
Considering row 8 column 9 value 0.076
Considering row 8 column 4 value 0.042
Considering row 8 column 10 value 0.038
Considering row 8 column 15 value 0.033
Considering row 3 column 2 value 0.976
Flagging column 3
Considering row 2 column 1 value 0.999
Flagging column 2
Considering row 1 column 12 value 0.067
Considering row 1 column 16 value 0.072
Considering row 1 column 5 value 0.119
Considering row 1 column 11 value 0.056
Considering row 1 column 13 value 0.031
Considering row 1 column 14 value 0.017
Considering row 1 column 6 value 0.009
Considering row 1 column 7 value 0.01
Considering row 1 column 9 value 0.14
Considering row 1 column 4 value 0.038
Considering row 1 column 10 value 0.063
Considering row 1 column 15 value 0.015
Considering row 12 column 16 value 0.707
Considering row 12 column 5 value 0.292
Considering row 12 column 11 value 0.575
Considering row 12 column 13 value 0.05
Considering row 12 column 14 value 0.067
Considering row 12 column 6 value 0.043
Considering row 12 column 7 value 0.109
Considering row 12 column 9 value 0.072
Considering row 12 column 4 value 0.105
Considering row 12 column 10 value 0.003
Considering row 12 column 15 value 0.035
Considering row 16 column 5 value 0.227
Considering row 16 column 11 value 0.355
Considering row 16 column 13 value 0.077
Considering row 16 column 14 value 0.114
Considering row 16 column 6 value 0.008
Considering row 16 column 7 value 0.059
Considering row 16 column 9 value 0.248
Considering row 16 column 4 value 0.102
Considering row 16 column 10 value 0.045
Considering row 16 column 15 value 0.007
Considering row 5 column 11 value 0.478
Considering row 5 column 13 value 0.14
Considering row 5 column 14 value 0.12
Considering row 5 column 6 value 0.052
Considering row 5 column 7 value 0.013
Considering row 5 column 9 value 0.085
Considering row 5 column 4 value 0.111
Considering row 5 column 10 value 0.047
Considering row 5 column 15 value 0.01
Considering row 11 column 13 value 0.076
Considering row 11 column 14 value 0.082
Considering row 11 column 6 value 0.037
Considering row 11 column 7 value 0.014
Considering row 11 column 9 value 0.08
Considering row 11 column 4 value 0.046
Considering row 11 column 10 value 0.015
Considering row 11 column 15 value 0.011
Considering row 13 column 14 value 0.923
Flagging column 13
Considering row 14 column 6 value 0.013
Considering row 14 column 7 value 0.009
Considering row 14 column 9 value 0.042
Considering row 14 column 4 value 0.016
Considering row 14 column 10 value 0.022
Considering row 14 column 15 value 0.021
Considering row 6 column 7 value 0.744
Considering row 6 column 9 value 0.037
Considering row 6 column 4 value 0.023
Considering row 6 column 10 value 0.006
Considering row 6 column 15 value 0.041
Considering row 7 column 9 value 0.034
Considering row 7 column 4 value 0.139
Considering row 7 column 10 value 0.032
Considering row 7 column 15 value 0.006
Considering row 9 column 4 value 0.252
Considering row 9 column 10 value 0.168
Considering row 9 column 15 value 0.005
Considering row 4 column 10 value 0.127
Considering row 4 column 15 value 0.011
Considering row 10 column 15 value 0.012
[1] 3 2 13
> m[, FindCorr(m)]
week date wine_ordered
index 0.974192573 0.999028828 0.030600322
date 0.976198358 1.000000000 0.036036580
week 1.000000000 0.976198358 0.032014141
weekday -0.258535360 -0.042875700 0.013005712
area 0.091975225 0.120181706 0.140393613
count 0.010740354 0.005943085 -0.022125402
rabate 0.020039206 -0.010547837 0.013302530
price 0.018137168 0.028162980 0.509676944
operator 0.070854797 0.129699307 0.038239080
driver -0.037057460 -0.066844465 -0.003367271
delivery_min 0.054344284 0.066614544 0.076473132
temperature 0.088823380 0.068222467 -0.049858606
wine_ordered 0.032014141 0.036036580 1.000000000
wine_delivered 0.016697141 0.020756359 0.922727399
wrongpizza 0.005659736 0.008217897 0.001967424
quality 0.099462346 0.080014155 -0.076622011
>
>
>
>
>
> dev.off()
null device
1
>