Computes the point-biserial correlation between a dichotomous and a continuous variable.
Usage
biserial.cor(x, y, use = c("all.obs", "complete.obs"), level = 1)
Arguments
x
a numeric vector representing the continuous variable.
y
a factor or a numeric vector (that will be converted to a factor) representing the dichotomous variable.
use
If use is "all.obs", then the presence of missing observations will produce an error. If use
is "complete.obs" then missing values are handled by casewise deletion.
level
which level of y to use.
Details
The point biserial correlation computed by biserial.cor() is defined as follows
(X1.bar - X0.bar) * sqrt(pi * (1 - pi)) / S_x,
where X1.bar and X0.bar denote the sample means of the X-values
corresponding to the first and second level of Y, respectively, S_x is the sample standard deviation of
X, and pi is the sample proportion for Y = 1. The first level of Y is defined by the
level argument; see Examples.
Value
the (numeric) value of the point-biserial correlation.
Note
Changing the order of the levels for y will produce a different result. By default, the first level is used
as a reference level
# the point-biserial correlation between
# the total score and the first item, using
# '0' as the reference level
biserial.cor(rowSums(LSAT), LSAT[[1]])
# and using '1' as the reference level
biserial.cor(rowSums(LSAT), LSAT[[1]], level = 2)