R: Empirical Estimators of Entropy and Mutual Information and...
entropy.empirical
R Documentation
Empirical Estimators of Entropy and Mutual Information and Related Quantities
Description
freqs.empirical computes the empirical frequencies from counts y.
entropy.empirical estimates the Shannon entropy H
of the random variable Y from the corresponding observed counts y
by plug-in of the empirical frequencies.
KL.empirical computes the empirical Kullback-Leibler (KL) divergence
from counts y1 and y2.
chi2.empirical computes the empirical chi-squared statistic
from counts y1 and y2.
mi.empirical computes the empirical mutual information from a table of counts y2d.
chi2indep.empirical computes the empirical chi-squared statistic of independence
from a table of counts y2d.
the unit in which entropy is measured.
The default is "nats" (natural units). For
computing entropy in "bits" set unit="log2".
Details
The empirical entropy estimator is a plug-in estimator:
in the definition of the Shannon entropy the
bin probabilities are replaced by the respective empirical frequencies.
The empirical entropy estimator is the maximum likelihood estimator.
If there are many zero counts and the sample size is small
it is very inefficient and also strongly biased.
Value
freqs.empirical returns the empirical frequencies.
entropy.empirical returns an estimate of the Shannon entropy.
KL.empirical returns an estimate of the KL divergence.
chi2.empirical returns the empirical chi-squared statistic.
mi.empirical returns an estimate of the mutual information.
chi2indep.empirical returns the empirical chi-squared statistic of independence.
# load entropy library
library("entropy")
# a single variable
# observed counts for each bin
y = c(4, 2, 3, 0, 2, 4, 0, 0, 2, 1, 1)
# empirical frequencies
freqs.empirical(y)
# empirical estimate of entropy
entropy.empirical(y)
# example with two variables
# observed counts for two random variables
y1 = c(4, 2, 3, 1, 10, 4)
y2 = c(2, 3, 7, 1, 4, 3)
# empirical Kullback-Leibler divergence
KL.empirical(y1, y2)
# half of the empirical chi-squared statistic
0.5*chi2.empirical(y1, y2)
## joint distribution example
# contingency table with counts for two discrete variables
y2d = rbind( c(1,2,3), c(6,5,4) )
# empirical estimate of mutual information
mi.empirical(y2d)
# half of the empirical chi-squared statistic of independence
0.5*chi2indep.empirical(y2d)