Last data update: 2014.03.03

R: Bivariate relative risk function
riskR Documentation

Bivariate relative risk function

Description

Estimates a relative risk function based on the ratio of two bivariate kernel density estimates over identical grids and regions. In geographical epidemiology, the two densities would represent a set of disease cases (numerator) and a sample of controls illustrating the at-risk population (denominator). In epidemiological terminology, the ratio of ‘case’ to ‘control’ would technically be referred to as an odds ratio.

Usage

risk(f, g, delta = 0, log = TRUE, h = NULL, adaptive = FALSE, res = 50,
 WIN = NULL, tolerate = FALSE, plotit = TRUE, comment = TRUE)

Arguments

f

Either a pre-calculated object of class "bivden" representing the ‘case’ density estimate, or an object of type data.frame, list, matrix, or ppp giving the observed case data. If this raw data is provided, a kernel density estimate is computed internally, with certain options available to the user in bivariate.density chosen/calculated automatically. See ‘Details’ for further information.

g

As for argument f, but for the controls. Whatever the type, the class of g must match that of f.

delta

A single numeric scaling parameter used for an optional additive constant to the densities; occasionally used for risk surface construction (see ‘Details’). A negative or zero value for delta requests no additive constant (default).

log

Boolean. Whether or not to return the (natural) log-transformed relative risk function as recommended by Kelsall and Diggle (1995a). Defaults to TRUE with the alternative being the raw density ratio.

h

Ignored if f and g are already "bivden" objects. An optional numeric vector of length 1 OR 2, giving the global bandwidth(s) for internal estimation of the case and control densities if adaptive = TRUE, or the fixed bandwidth(s) if adapative = FALSE. When h is a single numeric value, this is elected as the common global/fixed bandwidth for case and control densities. When h has length 2, the values h[1] and h[2] are assigned as the case and control global/fixed bandwidths respectively. By default, a value of h = NULL tells the function to use the global/fixed smoothing parameters as outlined in ‘Details’ below. Note that for adaptive estimation, this argument does not affect calculation of the pilot bandwiths.

adaptive

Ignored if f and g are already "bivden" objects. A boolean value specifying whether or not to employ adaptive smoothing for internally estimating the densities. A value of FALSE (default) elects use of fixed-bandwidth estimates.

res

Ignored if f and g are already "bivden" objects. A numeric value giving the desired resolution (of one side) of the evaluation grid. Higher values increase resolution at the expense of computational efficiency. Defaults to a 50 by 50 grid.

WIN

Ignored if f and g are already "bivden" objects OR objects of class ppp (in which case the study region is set to the value of the resident window component). A polygonal object of class owin giving the relevant study region in which the f and g data was collected.

tolerate

Ignored if f and g are already "bivden" objects. A boolean value specifying whether or not to calculate a corresponding asymptotic p-value surface (for tolerance contours) for the estimated relative risk function. If TRUE, the p-value surface tests for elevated risk only (equivalent to setting test = "greater" in tolerance) and is evaluated over a maximum grid resolution of 50 by 50. Defaults to FALSE for computational reasons.

plotit

Boolean. If TRUE (default), a heatplot of the estimated relative risk function is produced. If tolerate = TRUE, asymptotic tolerance contours are automatically added to the plot at a significance level of 5%.

comment

Ignored if f and g are already "bivden" objects. Boolean. Whether or not to print function progress (including starting and ending date-times) during execution. Defaults to TRUE.

Details

This function estimates a relative risk function via the density ratio method using fixed or adaptive bandwidth bivariate kernel density estimates. Both densities must be estimated using the same evaluation grid (and the same study window) in bivariate.density. In geographical epidemiology, the argument f represents the spatial distribution of the disease cases, and g the at-risk (control) population.

The option to supply the raw case and control data is available. If this is done, the function runs bivariate.density internally, abstracting certain decisions about the density estimation away from the user. If the user sets adaptive = TRUE (and h remains at NULL), the smoothing parameters are calculated as per the approach taken in Davies and Hazelton (2010): a common global bandwidth using the pooled data from OS. Pilot bandwidths are set at half the corresponding OS values. The scaling parameter gamma is common for the case and control density estimates, set as the gamma component of the pooled estimate. If a fixed relative risk is desired (adaptive = FALSE) and no specific bandwidths are given via the argument h, the case and control densities share a common bandwidth computed from the pooled data using OS. In supplying raw data to risk, the user must also specify an evaluation grid resolution (defaulting to 50 by 50) and the study region WIN (unless f and g are objects of class ppp, in which case the resident window component overrides WIN). All other arguments are set to their defaults as in bivariate.density.

If more flexibility is required for estimation of the case and control densities, the user must supply ‘pre-calculated’ objects of class "bivden" (from bivariate.density) as the f and g arguments. This drastically reduces the running time of a call to risk (as the density estimation step is already complete). However, the option of internally computing the asymptotic p-value surfaces (via the argument tolerate) is unavailable in this case; the user must run the tolerance function separately if tolerance contours are desired.

The relative risk function is defined here as the ratio of the ‘case’ density to the ‘control’ (Bithell, 1990; 1991). Using kernel density estimation to model these densities (Diggle, 1985), we obtain a workable estimate thereof. This function defines the risk function r in the following fashion:

r = (f + delta*max(g))/(g + delta*max(g))

Note the (optional) additive constants defined by delta times the maximum of each of the densities in the numerator and denominator respectively (see Bowman and Azzalini, 1997).

The log-risk function rho, given by rho = log[r], is argued to be preferable in practice as it imparts a sense of symmetry in the way the case and control densities are treated (Kelsall and Diggle, 1995a;b). The option of log-transforming the returned risk function is therefore selected by default.

Value

An object of class "rrs". This is a marked list with the following components:

rsM

a numeric res*res matrix (where res is the grid resolution as specified in the calls to bivariate.density for calculation of f and g) giving the values of the risk surface over the evaluation grid. Values corresponding to grid coordinates outside the study region are assigned NA

f

the object of class "bivden" used as the numerator or ‘case’ density estimate

g

the object of class "bivden" used as the denominator or ‘control’ density estimate

log

whether or not the returned risk function is on the log-scale

pooled

the object of class "bivden" (based on the pooled data) calculated internally if f and g were raw data arguments, NA otherwise

P

a numeric 50 by 50 matrix of the asymptotic p-value surface if tolerate = TRUE and f and g were raw data arguments, NA otherwise

Warning

If raw data is supplied to risk, as opposed to previously computed objects of class "bivden", the running time of this function will be greater. This is particularly the case if the user has also selected tolerate = TRUE. In the same fashion as bivariate.density and tolerance, setting comment = TRUE can keep the user appraised of the function progress during run-time.

Author(s)

T.M. Davies, M.L. Hazelton and J.C. Marshall

References

Bithell, J.F. (1990), An application of density estimation to geographical epidemiology, Statistics in Medicine, 9, 691-701.

Bithell, J.F. (1991), Estimation of relative risk functions, Statistics in Medicine, 10, 1745-1751.

Bowman, A.W. and Azzalini A. (1997), Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations, Oxford University Press Inc., New York.

Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative risk, Statistics in Medicine, 29(23) 2423-2437.

Diggle, P.J. (1985), A kernel method for smoothing point process data, Journal of the Royal Statistical Society Series C, 34(2), 138-147.

Kelsall, J.E. and Diggle, P.J. (1995a), Kernel estimation of relative risk, Bernoulli, 1, 3-16.

Kelsall, J.E. and Diggle, P.J. (1995b), Non-parametric estimation of spatial variation in relative risk, Statistics in Medicine, 14, 2335-2342.

Examples

## Not run: 
data(PBC)
PBC.casedata <- split(PBC)[[1]]
PBC.controldata <- split(PBC)[[2]]

pbc.h <- OS(PBC, nstar = sqrt(PBC.casedata$n * PBC.controldata$n))

pbc.pool <- bivariate.density(data = PBC, pilotH = pbc.h, 
 adaptive = FALSE)
pbc.case <- bivariate.density(data = PBC.casedata, 
 pilotH = pbc.h, adaptive = FALSE)
pbc.con <- bivariate.density(data = PBC.controldata, 
 pilotH = pbc.h, adaptive = FALSE)

pbc.rrs <- risk(f = pbc.case, g = pbc.con, plotit = FALSE)
summary(pbc.rrs)


## End(Not run)

Results