a numericvector containing the values whose weighted median is
to be computed.
w
a vector of weights the same length as x giving the weights
to use for each element of x. Negative weights are treated
as zero weights. Default value is equal weight to all values.
idxs
A vector indicating subset of elements
to operate over. If NULL, no subsetting is done.
na.rm
a logical value indicating whether NA values in
x should be stripped before the computation proceeds,
or not. If NA, no check at all for NAs is done.
Default value is NA (for efficiency).
interpolate
If TRUE, linear interpolation is used to get a
consistent estimate of the weighted median.
ties
If interpolate == FALSE,
a character string specifying how to solve ties between two
x's that are satisfying the weighted median criteria.
Note that at most two values can satisfy the criteria.
When ties is "min", the smaller value of the two
is returned and when it is "max", the larger value is
returned.
If ties is "mean", the mean of the two values is
returned.
Finally, if ties is "weighted" (or NULL) a
weighted average of the two are returned, where the weights are
weights of all values x[i] <= x[k] and x[i] >= x[k],
respectively.
...
Not used.
Details
For the n elements x = c(x[1], x[2], ..., x[n]) with positive
weights w = c(w[1], w[2], ..., w[n]) such that sum(w) = S,
the weighted median is defined as the element x[k] for which
the total weight of all elements x[i] < x[k] is less or equal to
S/2 and for which the total weight of all elements x[i] > x[k]
is less or equal to S/2 (c.f. [1]).
If w is missing then all elements of x are given the same
positive weight. If all weights are zero, NA_real_ is returned.
If one or more weights are Inf, it is the same as these weights
have the same weight and the others has zero. This makes things easier for
cases where the weights are result of a division with zero.
The weighted median solves the following optimization problem:
α^* = arg_α min ∑_{k=1}{K} w_k |x_k-α|
where x=(x_1,x_2,…,x_K) are scalars and
w=(w_1,w_2,…,w_K) are the corresponding "weights" for
each individual x value.
Value
Returns a numeric scalar.
Author(s)
Henrik Bengtsson and Ola Hossjer, Centre for Mathematical
Sciences, Lund University.
Thanks to Roger Koenker, Econometrics, University of Illinois, for
the initial ideas.
References
[1] T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms,
The MIT Press, Massachusetts Institute of Technology, 1989.
See Also
median, mean() and weightedMean().
Examples
x <- 1:10
n <- length(x)
m1 <- median(x) # 5.5
m2 <- weightedMedian(x) # 5.5
stopifnot(identical(m1, m2))
w <- rep(1, n)
m1 <- weightedMedian(x, w) # 5.5 (default)
m2 <- weightedMedian(x, ties="weighted") # 5.5 (default)
m3 <- weightedMedian(x, ties="min") # 5
m4 <- weightedMedian(x, ties="max") # 6
stopifnot(identical(m1,m2))
# Pull the median towards zero
w[1] <- 5
m1 <- weightedMedian(x, w) # 3.5
y <- c(rep(0,w[1]), x[-1]) # Only possible for integer weights
m2 <- median(y) # 3.5
stopifnot(identical(m1,m2))
# Put even more weight on the zero
w[1] <- 8.5
weightedMedian(x, w) # 2
# All weight on the first value
w[1] <- Inf
weightedMedian(x, w) # 1
# All weight on the last value
w[1] <- 1
w[n] <- Inf
weightedMedian(x, w) # 10
# All weights set to zero
w <- rep(0, n)
weightedMedian(x, w) # NA
# Simple benchmarking
bench <- function(N=1e5, K=10) {
x <- rnorm(N)
gc()
t <- c()
t[1] <- system.time(for (k in 1:K) median(x))[3]
t[2] <- system.time(for (k in 1:K) weightedMedian(x))[3]
t <- t / t[1]
names(t) <- c("median", "weightedMedian")
t
}
print(bench(N= 5, K=100))
print(bench(N= 50, K=100))
print(bench(N= 200, K=100))
print(bench(N= 1000, K=100))
print(bench(N= 10e3, K= 20))
print(bench(N=100e3, K= 20))