Last data update: 2014.03.03

R: Simulate a dataset based on a HSROC model
simdataR Documentation

Simulate a dataset based on a HSROC model

Description

This function simulates a dataset based on the HSROC diagnostic meta-analysis model. It allows for the reference standard to be imperfect or perfect.

Usage

simdata(N, n, n.random = "FALSE", sub_rs = NULL, prev, se_ref = NULL, 
   sp_ref = NULL, T, range.T = c(-Inf, Inf), L, range.L = c(-Inf, Inf), 
   sd_t, sd_a, b, path = getwd()  )

Arguments

N

the number of studies to be included in the meta-analysis.

n

numerical vector, possibly a single value, specifying the number of individuals within each study. See details for further explanations.

n.random

if TRUE, the number of individuals within each study is drawn from n with replacement.

sub_rs

a list that specifies the reference standard used by each study. See details for further explanations.

prev

a vector of length N giving the prevalence in each study.

se_ref

a vector of length equal to the number of reference standards giving the sensitivity for each reference test.

sp_ref

a vector of length equal to the number of reference standards giving the specificity for each reference test.

T

single numeric value, the overall mean cut-off value to define a positive test.

range.T

a vector of length 2 specifiying a range of values for the individual cut-off theta_i. See details for further explanations.

L

single numeric value, the overall difference in mean values (diagnostic accuracy) on the continuous index test result comparing the diseased group and the non-diseased group.

range.L

a vector of length 2 specifiying a range of values for the individual difference in mean values (diagnostic accuracy) on the continuous index test result comparing the diseased group and the non-diseased group alpha_i. See details for further explanations.

sd_t

single numeric value, the between study standard deviation in the cut-off theta_i.

sd_a

single numeric value, the between study standard deviation in the mean value of the index test disease group alpha_i

b

single numeric value, the ratio of the continuous standard deviation of the index test results on patients with the disease compared to patients without the disease.

path

a character string pointing to the directory where the simulated data will be saved to.

Details

The HSROC model uses the following parametrization : S_i = Phi(-(theta_i - alpha_i/2)/exp(beta/2)) and C_i = Phi((theta_i + alpha_i/2)/exp(-beta/2))

If n.random is FALSE, the number of components in n must match the value of N, unless n is equal to a single value. For the latter case, all studies would be assumed to have the same number of individuals, that is n. If n.random is TRUE, the number of elements may not necessarly be equal to the value of N.

The first element of the list-object sub_rs corresponds to the number of different reference standards. The default value is 1. The number of additional elements will depend on the value of the first element. There must be as many additional elements in sub_rs as there are different reference standards. Assuming the studies are labelled 1, ..., N, each of these additional elements must be a vector (possibly of length one) taking as their values the labels of the studies sharing the same reference standard. For example, if we have 2 reference tests, the first one applied over studies 1-10 and the second one applied over studies 11-15 then the sub_rs list-argument should be of length 3 with the following elements : 2, 1:10, 11:15

The range.T argument ensures the individual theta_i will be generated within the range provided. If no range is provided by the user (default) the function assumes no restrictions are made on the possible values of theta_i. The range.L argument ensures the individual alpha_i will be generated within the range provided. If no range is provided by the user (default) the function assumes no restrictions are made on the possible values of theta_i.

For more help on this function, see the tutorial pdf file available on http://www.nandinidendukuri.com/filesonjoomlasite/HSROC_R_Tutorial.pdf

Value

A list of the 2x2 tables for each study, the between-study parameters, the within-study parameters and the reference standard.

Text files are created in the path directory. These files are :

“True_values.txt”, reports the within-study parameters alpha_i, theta_i, sensitivity of test under evaluation ( S1_i ), specificity of test under evaluation ( C1_i ) and prevalence (pi_i) used in the simulation.

“True_values2.txt”, reports the values of the between-study parameters LAMBDA, standard deviation of alpha_i ( sigma_alpha ), THETA, standard deviation of theta_i ( sigma_theta ) and beta used to simulate the data.

“True_REFSTD.txt”, reports the values of the reference standard used to simulate the data.

“True_values_index.txt”, reports the variable names of the 3 files described above.

.

References

Dendukuri, N., Schiller, I., Joseph, L., and Pai, M. (2012) Bayesian meta-analysis of the accuracy of a test for tuberculosis pleuritis in the absence of a gold-standard reference. Biometrics. doi:10.1111/j. 1541-0420.2012.01773.x

Rutter, C. M., and Gatsonis, C. A. (2001) A hierarchical regression approach to meta-analysis of diagnostic accuracy evaluations. Statistics in Medicine, 20(19):2865-2884.

Examples


#EXAMPLE 1
#We want to simulate data for 10 studies based on an HSROC model assuming 
#each study uses the same imperfect reference standard.
 
N = 10
LAMBDA = 2
sd_alpha = 0.75
THETA = 1.5
sd_theta = 0.5
beta = 0
pi = runif(10,0,1)

REFSTD = list(1, 1:10)  #Only 1 reference standard ...
s2 = c(0.5)	 #Sensitivity of the reference test
c2 = c(0.85) 	 #Specificity of the reference test	


sim.data = simdata(N=N, n = c(50,50,60,60,70,70,80,80,90,90), 
   sub_rs = REFSTD, prev=pi, se_ref=s2, sp_ref=c2, T=THETA, 
   L=LAMBDA, sd_t=sd_theta, sd_a=sd_alpha, b=beta)



#EXAMPLE 2
#We want to simulate data for 15 studies based on an HSROC model such that 
#the first 5 studies share a common reference standard and the remaining 
#10 studies also share a common reference standard.
 
N = 15
LAMBDA = 3.6
sd_alpha = 1.15
THETA = 2.3
sd_theta = 0.75
beta = 0.15
pi = runif(15,0.1,0.5)

REFSTD = list(2, 1:5, 6:15)  #Two different reference standards ...
s2 = c(0.40, 0.6)	 #Sensitivity of the reference tests
c2 = c(0.75,0.95) 	 #Specificity of the reference tests	

#Thus, for the first 5 studies, S2 = 0.40 and C2 = 0.75 while for the last 
#10 studies s2 = 0.6 and c2 = 0.95


sim.data = simdata(N=N, n=seq(30,120,1), n.random=TRUE, sub_rs = REFSTD, 
   prev=pi,  se_ref=s2, sp_ref=c2, T=THETA, L=LAMBDA, sd_t=sd_theta, 
   sd_a=sd_alpha, b=beta)


#EXAMPLE 3
#Assume the same context as the one in EXAMPLE 2 and let's suppose
#that each individual cut-off theta_i should lie between [-5,5] 
 
N = 15
LAMBDA = 3.6
sd_alpha = 1.15
THETA = 2.3
sd_theta = 0.75
beta = 0.15
pi = runif(15,0.1,0.5)

REFSTD = list(2, 1:5, 6:15)  #Two different reference standards ...
s2 = c(0.40, 0.6)	 #Sensitivity of the reference tests
c2 = c(0.75,0.95) 	 #Specificity of the reference tests	

#Thus, for the first 5 studies, S2 = 0.40 and C2 = 0.75 while for the last 
#10 studies s2 = 0.6 and c2 = 0.95


sim.data = simdata(N=N, n=seq(30,120,1), n.random=TRUE, sub_rs = REFSTD, 
   prev=pi,  se_ref=s2, sp_ref=c2, T=THETA, range.T=c(-5,5),L=LAMBDA, 
   sd_t=sd_theta,sd_a=sd_alpha, b=beta)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(HSROC)
Loading required package: lattice
Loading required package: coda
Loading required package: MASS
Loading required package: MCMCpack
##
## Markov Chain Monte Carlo Package (MCMCpack)
## Copyright (C) 2003-2016 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
##
## Support provided by the U.S. National Science Foundation
## (Grants SES-0350646 and SES-0350613)
##
##
## Hierarchical Summary Receiver Operating Characteristic package (HSROC)
## Copyright (C) 2010-2016 Ian Schiller and Nandini Dendukuri 
##
## Development of HSROC package was supported by grants from the 
## Canadian Institutes of Health Research (MOP #89857) 
## and the Fonds de la Recherche en Sant<U+00E9> Qu<U+00E9>bec. 
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/HSROC/simdata.Rd_%03d_medium.png", width=480, height=480)
> ### Name: simdata
> ### Title: Simulate a dataset based on a HSROC model
> ### Aliases: simdata
> ### Keywords: datagen
> 
> ### ** Examples
> 
> 
> #EXAMPLE 1
> #We want to simulate data for 10 studies based on an HSROC model assuming 
> #each study uses the same imperfect reference standard.
>  
> N = 10
> LAMBDA = 2
> sd_alpha = 0.75
> THETA = 1.5
> sd_theta = 0.5
> beta = 0
> pi = runif(10,0,1)
> 
> REFSTD = list(1, 1:10)  #Only 1 reference standard ...
> s2 = c(0.5)	 #Sensitivity of the reference test
> c2 = c(0.85) 	 #Specificity of the reference test	
> 
> 
> sim.data = simdata(N=N, n = c(50,50,60,60,70,70,80,80,90,90), 
+    sub_rs = REFSTD, prev=pi, se_ref=s2, sp_ref=c2, T=THETA, 
+    L=LAMBDA, sd_t=sd_theta, sd_a=sd_alpha, b=beta)
> 
> 
> 
> #EXAMPLE 2
> #We want to simulate data for 15 studies based on an HSROC model such that 
> #the first 5 studies share a common reference standard and the remaining 
> #10 studies also share a common reference standard.
>  
> N = 15
> LAMBDA = 3.6
> sd_alpha = 1.15
> THETA = 2.3
> sd_theta = 0.75
> beta = 0.15
> pi = runif(15,0.1,0.5)
> 
> REFSTD = list(2, 1:5, 6:15)  #Two different reference standards ...
> s2 = c(0.40, 0.6)	 #Sensitivity of the reference tests
> c2 = c(0.75,0.95) 	 #Specificity of the reference tests	
> 
> #Thus, for the first 5 studies, S2 = 0.40 and C2 = 0.75 while for the last 
> #10 studies s2 = 0.6 and c2 = 0.95
> 
> 
> sim.data = simdata(N=N, n=seq(30,120,1), n.random=TRUE, sub_rs = REFSTD, 
+    prev=pi,  se_ref=s2, sp_ref=c2, T=THETA, L=LAMBDA, sd_t=sd_theta, 
+    sd_a=sd_alpha, b=beta)
> 
> 
> #EXAMPLE 3
> #Assume the same context as the one in EXAMPLE 2 and let's suppose
> #that each individual cut-off theta_i should lie between [-5,5] 
>  
> N = 15
> LAMBDA = 3.6
> sd_alpha = 1.15
> THETA = 2.3
> sd_theta = 0.75
> beta = 0.15
> pi = runif(15,0.1,0.5)
> 
> REFSTD = list(2, 1:5, 6:15)  #Two different reference standards ...
> s2 = c(0.40, 0.6)	 #Sensitivity of the reference tests
> c2 = c(0.75,0.95) 	 #Specificity of the reference tests	
> 
> #Thus, for the first 5 studies, S2 = 0.40 and C2 = 0.75 while for the last 
> #10 studies s2 = 0.6 and c2 = 0.95
> 
> 
> sim.data = simdata(N=N, n=seq(30,120,1), n.random=TRUE, sub_rs = REFSTD, 
+    prev=pi,  se_ref=s2, sp_ref=c2, T=THETA, range.T=c(-5,5),L=LAMBDA, 
+    sd_t=sd_theta,sd_a=sd_alpha, b=beta)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>