Compute running medians of odd span. This is the ‘most robust’
scatter plot smoothing possible. For efficiency (and historical
reason), you can use one of two different algorithms giving identical
results.
numeric vector, the ‘dependent’ variable to be
smoothed.
k
integer width of median window; must be odd. Turlach had a
default of k <- 1 + 2 * min((n-1)%/% 2, ceiling(0.1*n)).
Use k = 3 for ‘minimal’ robust smoothing eliminating
isolated outliers.
endrule
character string indicating how the values at the
beginning and the end (of the data) should be treated.
Can be abbreviated. Possible values are:
"keep"
keeps the first and last k2 values
at both ends, where k2 is the half-bandwidth
k2 = k %/% 2,
i.e., y[j] = x[j] for j = 1, …, k2 and (n-k2+1), …, n;
"constant"
copies median(y[1:k2]) to the first
values and analogously for the last ones making the smoothed ends
constant;
"median"
the default, smooths the ends by using
symmetrical medians of subsequently smaller bandwidth, but for
the very first and last value where Tukey's robust end-point
rule is applied, see smoothEnds.
algorithm
character string (partially matching "Turlach" or
"Stuetzle") or the default NULL, specifying which algorithm
should be applied. The default choice depends on n = length(x)
and k where "Turlach" will be used for larger problems.
print.level
integer, indicating verboseness of algorithm;
should rarely be changed by average users.
Details
Apart from the end values, the result y = runmed(x, k) simply has
y[j] = median(x[(j-k2):(j+k2)]) (k = 2*k2+1), computed very
efficiently.
The two algorithms are internally entirely different:
"Turlach"
is the Härdle–Steiger
algorithm (see Ref.) as implemented by Berwin Turlach.
A tree algorithm is used, ensuring performance O(n * log(k)) where n = length(x) which is
asymptotically optimal.
"Stuetzle"
is the (older) Stuetzle–Friedman implementation
which makes use of median updating when one observation
enters and one leaves the smoothing window. While this performs as
O(n * k) which is slower asymptotically, it is
considerably faster for small k or n.
Currently long vectors are only supported for algorithm = "Steutzle".
Value
vector of smoothed values of the same length as x with an
attribute k containing (the ‘oddified’)
k.
Author(s)
Martin Maechler maechler@stat.math.ethz.ch,
based on Fortran code from Werner Stuetzle and S-PLUS and C code from
Berwin Turlach.
References
Härdle, W. and Steiger, W. (1995)
[Algorithm AS 296] Optimal median smoothing,
Applied Statistics44, 258–264.
Jerome H. Friedman and Werner Stuetzle (1982)
Smoothing of Scatterplots;
Report, Dep. Statistics, Stanford U., Project Orion 003.
Martin Maechler (2003)
Fast Running Medians: Finite Sample and Asymptotic Optimality;
working paper available from the author.
See Also
smoothEnds which implements Tukey's end point rule and
is called by default from runmed(*, endrule = "median").
smooth uses running
medians of 3 for its compound smoothers.
Examples
require(graphics)
utils::example(nhtemp)
myNHT <- as.vector(nhtemp)
myNHT[20] <- 2 * nhtemp[20]
plot(myNHT, type = "b", ylim = c(48, 60), main = "Running Medians Example")
lines(runmed(myNHT, 7), col = "red")
## special: multiple y values for one x
plot(cars, main = "'cars' data and runmed(dist, 3)")
lines(cars, col = "light gray", type = "c")
with(cars, lines(speed, runmed(dist, k = 3), col = 2))
## nice quadratic with a few outliers
y <- ys <- (-20:20)^2
y [c(1,10,21,41)] <- c(150, 30, 400, 450)
all(y == runmed(y, 1)) # 1-neighbourhood <==> interpolation
plot(y) ## lines(y, lwd = .1, col = "light gray")
lines(lowess(seq(y), y, f = 0.3), col = "brown")
lines(runmed(y, 7), lwd = 2, col = "blue")
lines(runmed(y, 11), lwd = 2, col = "red")
## Lowess is not robust
y <- ys ; y[21] <- 6666 ; x <- seq(y)
col <- c("black", "brown","blue")
plot(y, col = col[1])
lines(lowess(x, y, f = 0.3), col = col[2])
lines(runmed(y, 7), lwd = 2, col = col[3])
legend(length(y),max(y), c("data", "lowess(y, f = 0.3)", "runmed(y, 7)"),
xjust = 1, col = col, lty = c(0, 1, 1), pch = c(1,NA,NA))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(stats)
> png(filename="/home/ddbj/snapshot/RGM3/R_rel/result/stats/runmed.Rd_%03d_medium.png", width=480, height=480)
> ### Name: runmed
> ### Title: Running Medians - Robust Scatter Plot Smoothing
> ### Aliases: runmed
> ### Keywords: smooth robust
>
> ### ** Examples
>
> require(graphics)
>
> utils::example(nhtemp)
nhtemp> require(stats); require(graphics)
nhtemp> plot(nhtemp, main = "nhtemp data",
nhtemp+ ylab = "Mean annual temperature in New Haven, CT (deg. F)")
> myNHT <- as.vector(nhtemp)
> myNHT[20] <- 2 * nhtemp[20]
> plot(myNHT, type = "b", ylim = c(48, 60), main = "Running Medians Example")
> lines(runmed(myNHT, 7), col = "red")
>
> ## special: multiple y values for one x
> plot(cars, main = "'cars' data and runmed(dist, 3)")
> lines(cars, col = "light gray", type = "c")
> with(cars, lines(speed, runmed(dist, k = 3), col = 2))
>
> ## nice quadratic with a few outliers
> y <- ys <- (-20:20)^2
> y [c(1,10,21,41)] <- c(150, 30, 400, 450)
> all(y == runmed(y, 1)) # 1-neighbourhood <==> interpolation
[1] TRUE
> plot(y) ## lines(y, lwd = .1, col = "light gray")
> lines(lowess(seq(y), y, f = 0.3), col = "brown")
> lines(runmed(y, 7), lwd = 2, col = "blue")
> lines(runmed(y, 11), lwd = 2, col = "red")
>
> ## Lowess is not robust
> y <- ys ; y[21] <- 6666 ; x <- seq(y)
> col <- c("black", "brown","blue")
> plot(y, col = col[1])
> lines(lowess(x, y, f = 0.3), col = col[2])
> lines(runmed(y, 7), lwd = 2, col = col[3])
> legend(length(y),max(y), c("data", "lowess(y, f = 0.3)", "runmed(y, 7)"),
+ xjust = 1, col = col, lty = c(0, 1, 1), pch = c(1,NA,NA))
>
>
>
>
>
> dev.off()
null device
1
>