This function calculates MI and BCMI between a set of discrete variables
held as columns in a matrix. It also performs jackknife bias correction and
provides a z-score for the hypothesis of no association. Also included are
the *.pw functions that calculate MI between two vectors only. The *njk
functions do not perform the jackknife and are therefore faster.
The data matrix. Each row is an observation and each column is a
variable of interest. Should contain categorical data, all types of data
will be coerced via factors to integers.
disc1
A vector for the pairwise version
disc2
A vector for the pairwise version
Details
The results of dmi() are in many ways similar to a correlation matrix,
with each row and column index corresponding to a given variable.
dminjk() and dminjk.pw() just returns the MI values without performing the
jackknife. The number of processor cores used can be changed by
setting the environment variable "OMP_NUM_THREADS" before starting R.
Value
Returns a list of 3 matrices each of size ncol(dmat) by
ncol(dmat)
mi
The raw MI estimates.
bcmi
Jackknife bias corrected MI estimates (BCMI). These are each MI value
minus the corresponding jackknife estimate of bias.
zvalues
Z-scores for each hypothesis that the corresponding
bcmi value is zero. These have poor statistical properties but can be useful
as a rough measure of the strength of association.
Examples
data(cars)
# Discretise the data first
d <- cut(cars$dist, breaks = 10)
s <- cut(cars$speed, breaks = 10)
# Discrete MI values
dmi.pw(s, d)
# For comparison, analysed as continuous data:
cmi.pw(cars$dist, cars$speed)
# Exploring a group of categorical variables
dat <- mtcars[, c("cyl","vs","am","gear","carb")]
discresults <- dmi(dat)
discresults
# Plot the relative magnitude of the BCMI values
diag(discresults$bcmi) <- NA
mp(discresults$bcmi)