R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Attributable fraction for mached and non-matched case-control...

AF.cc

R Documentation

Attributable fraction for mached and non-matched case-control sampling designs.

Description

AF.cc estimates the model-based adjusted attributable fraction for data from matched and non-matched case-control sampling designs.

Usage

AF.cc(formula, data, exposure, clusterid, matched = FALSE)

Arguments

`formula`	an object of class "`formula`" (or one that can be coerced to that class): a symbolic description of the model used for confounder adjustment. The exposure and confounders should be specified as independent (right-hand side) variables. The outcome should be specified as dependent (left-hand side) variable. The formula is used to fit a logistic regression by `glm` for non-matched case-control and conditional logistic regression by `gee` (in package `drgee`) for matched case-control.
`data`	an optional data frame, list or environment (or object coercible by `as.data.frame` to a data frame) containing the variables in the model. If not found in `data`, the variables are taken from environment (`formula`), typically the environment from which the function is called.
`exposure`	the name of the exposure variable as a string. The exposure must be binary (0/1) where unexposed is coded as 0.
`clusterid`	the name of the cluster identifier variable as a string, if data are clustered (e.g. matched).
`matched`	a logical that specifies if the sampling design is matched (TRUE) or non-matched (FALSE) case-control. Default setting is non-matched (`matched = FALSE`).

Details

Af.cc estimates the attributable fraction for a binary outcome Y under the hypothetical scenario where a binary exposure X is eliminated from the population. The estimate is adjusted for confounders Z by logistic regression for unmatched case-control (glm) and conditional logistic regression for matched case-control (gee). The estimation assumes that the outcome is rare so that the risk ratio can be approximated by the odds ratio, for details see Bruzzi et. al. Let the AF be defined as

AF = 1 - Pr(Y0 = 1) / Pr(Y = 1)

where Pr(Y0 = 1) denotes the counterfactual probability of the outcome if the exposure would have been eliminated from the population. If Z is sufficient for confounding control then the probability Pr(Y0 = 1) can be expressed as

Pr(Y0=1) = E_z{Pr(Y = 1 | X = 0, Z)}.

Using Bayes' theorem this implies that the AF can be expressed as

AF = 1 - E_z{Pr( Y = 1 | X = 0, Z)} / Pr(Y = 1) = 1 - E_z{RR^{-X} (Z) | Y = 1}

where RR(Z) is the risk ratio

Pr(Y = 1 | X = 1,Z)/Pr(Y=1 | X = 0, Z).

Moreover, the risk ratio can be approximated by the odds ratio if the outcome is rare. Thus,

AF is approximately 1 - E_z{OR^{-X}(Z) | Y = 1}.

The odds ratio is estimated by logistic regression or conditional logistic regression. If clusterid is supplied, then a clustered sandwich formula is used in all variance calculations.

Value

`AF.est`	estimated attributable fraction.
`AF.var`	estimated variance of `AF.est`. The variance is obtained by combining the delta methods with the sandwich formula.
`log.or`	a vector of the estimated log odds ratio for every individual. `log.or` contains the estimated coefficient for the exposure variable `X` for every level of the confounder `Z` as specified by the user in the formula. If the model to be estimated is logit {Pr(Y=1\|X,Z)} = α + β X + γ Z then `log.or` is the estimate of β. If the model to be estimated is logit{Pr(Y=1\|X,Z)} = α + β X +γ Z +ψ XZ then `log.odds` is the estimate of β + ψ Z.
`fit`	the fitted model. Fitted using logistic regression, `glm`, for non-matched case-control and conditional logistic regression, `gee`, for matched case-control.

Author(s)

Elisabeth Dahlqwist, Arvid Sj<c3><83><c2><b6>lander

References

Bruzzi, P., Green, S. B., Byar, D., Brinton, L. A., and Schairer, C. (1985). Estimating the population attributable risk for multiple risk factors using case-control data. American Journal of Epidemiology 122, 904-914.

Examples

expit <- function(x) 1 / (1 + exp( - x))
NN <- 1000000
n <- 500

# Example 1: non matched case-control
# Simulate a sample from a non matched case-control sampling design
# Make the outcome a rare event by setting the intercept to -6
intercept <- -6
Z <- rnorm(n = NN)
X <- rbinom(n = NN, size = 1, prob = expit(Z))
Y <- rbinom(n = NN, size = 1, prob = expit(intercept + X + Z))
population <- data.frame(Z, X, Y)
Case <- which(population$Y == 1)
Control <- which(population$Y == 0)
# Sample cases and controls from the population
case <- sample(Case, n)
control <- sample(Control, n)
data <- population[c(case, control), ]
AF.est.cc <- AF.cc(formula = Y ~ X + Z + X * Z, data = data, exposure = "X")
summary(AF.est.cc)

# Example 2: matched case-control
# Duplicate observations in order to create a matched data sample
# Create an unobserved confounder U common for each pair of individuals
U  <- rnorm(n = NN)
Z1 <- rnorm(n = NN)
Z2 <- rnorm(n = NN)
X1 <- rbinom(n = NN, size = 1, prob = expit(U + Z1))
X2 <- rbinom(n = NN, size = 1, prob = expit(U + Z2))
Y1 <- rbinom(n = NN, size = 1, prob = expit(intercept + U + Z1 + X1))
Y2 <- rbinom(n = NN, size = 1, prob = expit(intercept + U + Z2 + X2))
# Select discordant pairs
discordant <- which(Y1!=Y2)
id <- rep(1:n, 2)
# Sample from discordant pairs
incl <- sample(x = discordant, size = n, replace = TRUE)
data <- data.frame(id = id, Y = c(Y1[incl], Y2[incl]), X = c(X1[incl], X2[incl]),
                   Z = c(Z1[incl], Z2[incl]))
AF.est.cc.match <- AF.cc(formula = Y ~ X + Z + X * Z, data = data,
                         exposure = "X", clusterid = "id", matched = TRUE)
summary(AF.est.cc.match)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(AF)
Loading required package: survival
Loading required package: drgee
Loading required package: nleqslv
Loading required package: Rcpp
Loading required package: data.table
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/AF/AF.cc.Rd_%03d_medium.png", width=480, height=480)
> ### Name: AF.cc
> ### Title: Attributable fraction for mached and non-matched case-control
> ###   sampling designs.
> ### Aliases: AF.cc
> 
> ### ** Examples
> 
> expit <- function(x) 1 / (1 + exp( - x))
> NN <- 1000000
> n <- 500
> 
> # Example 1: non matched case-control
> # Simulate a sample from a non matched case-control sampling design
> # Make the outcome a rare event by setting the intercept to -6
> intercept <- -6
> Z <- rnorm(n = NN)
> X <- rbinom(n = NN, size = 1, prob = expit(Z))
> Y <- rbinom(n = NN, size = 1, prob = expit(intercept + X + Z))
> population <- data.frame(Z, X, Y)
> Case <- which(population$Y == 1)
> Control <- which(population$Y == 0)
> # Sample cases and controls from the population
> case <- sample(Case, n)
> control <- sample(Control, n)
> data <- population[c(case, control), ]
> AF.est.cc <- AF.cc(formula = Y ~ X + Z + X * Z, data = data, exposure = "X")
> summary(AF.est.cc)
Call:  
AF.cc(formula = Y ~ X + Z + X * Z, data = data, exposure = "X")

Estimated attributable fraction (AF) and untransformed 95% Wald CI: 

        AF  Std.Error  z value     Pr(>|z|) Lower limit Upper limit
 0.5815639 0.08056838 7.218265 5.265505e-13   0.4236528    0.739475

Exposure : X 
Outcome  : Y 

 Observations Cases
         1000   500

Method for confounder adjustment:  Logistic regression 

Formula:  Y ~ X + Z + X * Z 
> 
> # Example 2: matched case-control
> # Duplicate observations in order to create a matched data sample
> # Create an unobserved confounder U common for each pair of individuals
> U  <- rnorm(n = NN)
> Z1 <- rnorm(n = NN)
> Z2 <- rnorm(n = NN)
> X1 <- rbinom(n = NN, size = 1, prob = expit(U + Z1))
> X2 <- rbinom(n = NN, size = 1, prob = expit(U + Z2))
> Y1 <- rbinom(n = NN, size = 1, prob = expit(intercept + U + Z1 + X1))
> Y2 <- rbinom(n = NN, size = 1, prob = expit(intercept + U + Z2 + X2))
> # Select discordant pairs
> discordant <- which(Y1!=Y2)
> id <- rep(1:n, 2)
> # Sample from discordant pairs
> incl <- sample(x = discordant, size = n, replace = TRUE)
> data <- data.frame(id = id, Y = c(Y1[incl], Y2[incl]), X = c(X1[incl], X2[incl]),
+                    Z = c(Z1[incl], Z2[incl]))
> AF.est.cc.match <- AF.cc(formula = Y ~ X + Z + X * Z, data = data,
+                          exposure = "X", clusterid = "id", matched = TRUE)
> summary(AF.est.cc.match)
Call:  
AF.cc(formula = Y ~ X + Z + X * Z, data = data, exposure = "X", 
    clusterid = "id", matched = TRUE)

Estimated attributable fraction (AF) and untransformed 95% Wald CI: 

        AF Robust SE  z value     Pr(>|z|) Lower limit Upper limit
 0.5127673 0.1203309 4.261309 2.032327e-05    0.276923   0.7486116

Exposure : X 
Outcome  : Y 

 Observations Cases Clusters
         1000   500      500

Method for confounder adjustment:  Conditional logistic regression 

Formula:  Y ~ X + Z + X * Z 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>