R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Smooth out aggregated data

smooth.map

R Documentation

Smooth out aggregated data

Description

Increases the resolution of data aggregated over map regions, by either smoothing or interpolation. Also fills in missing values.

Usage

smooth.map(m, z, res = 50, span = 1/10, averages = FALSE, type = c("smooth",
"interp"), merge = FALSE)

Arguments

`m`	a map object
`z`	a named vector
`res`	a vector of length two, specifying the resolution of the sampling grid in each dimension. If a single number, it is taken as the vertical resolution, with double taken as the horizontal resolution.
`span`	kernel parameter (larger = smoother). `span = Inf` is a special case which invokes the cubic spline kernel. `span` is automatically scaled by the map size, and is independent of `res`.
`averages`	If `TRUE`, the values in `z` are interpreted as averages over the regions. Otherwise they are interpreted as totals.
`type`	see details.
`merge`	If `TRUE`, a region named in `z` includes all matching regions in the map (according to `match.map`). If `FALSE`, a region named in `z` is assumed to refer to exactly one region on the map.

Details

For type = "smooth", the region totals are first converted into point measurements on the sampling grid, by dividing the total for a region among all sample points inside it. Then it is a regular kernel smoothing problem. Note that the region totals are not preserved.

The prediction zo for location xo (a vector) is the average of z for nearby sample points:

zo = (sum_x k(x, xo) z(x))/(sum_x k(x, xo))

k(x, xo) = exp(-lambda ||x - xo||^2)

lambda is determined from span. Note that xo is over the same sampling grid as x, but zo is not necessarily the same as z(xo).

For type = "interp", the region totals are preserved by the higher-resolution function. The function is assumed to come from a Gaussian process with kernel k. The measurement z[r] is assumed to be the sum of the function over the discrete sample points inside region r. This leads to a simple formula for the covariance matrix of z and the cross-covariance between zo and z. The prediction is the cross-covariance times the inverse covariance times z. Unlike Tobler's method, the predictions are not constrained to live within the original data range, so there tends to be "ringing" effects.

See the references for more details.

Value

A data frame with columns x, y, and z giving the smoothed value z for locations (x, y). Currently the (x, y) values form a grid, but this is not guaranteed in the future.

Author(s)

Tom Minka

References

W.F. Eddy and A. Mockus. An example of the estimation and display of a smoothly varying function of time and space - the incidence of disease mumps. Journal of the American Society for Information Science, 45(9):686-693, 1994. http://web.eecs.utk.edu/~audris/papers/jasis.pdf

W. R. Tobler. Smooth pycnophylactic interpolation for geographical regions. Journal of the American Statistical Association 74:519-530, 1979.

Examples

# compare to the example for match.map
data(state, package = "datasets")
data(votes.repub)
z = votes.repub[, "1900"]
m = map("state", fill = TRUE, plot = FALSE)
# use a small span to fill in, but not smooth, the data
# increase the resolution to get better results
fit = smooth.map(m, z, span = 1/100, merge = TRUE, ave = TRUE)
mat = tapply(fit$z, fit[1:2], mean)
gray.colors <- function(n) gray(rev(0:(n - 1))/n)
par(bg = "blue")
filled.contour(mat, color.palette = gray.colors, nlev = 32, asp = 1)
# another way to visualize:
image(mat, col = gray.colors(100))

# for a higher degree of smoothing:
# fit = smooth.map(m, z, merge = TRUE, ave = TRUE)
# interpolation, state averages are preserved:
# fit = smooth.map(m, z, merge = TRUE, ave = TRUE, type = "interp")