Last data update: 2014.03.03

R: Generate random data from an oncogenetic tree
generate.dataR Documentation

Generate random data from an oncogenetic tree

Description

Generates random event occurrence data based on an oncogenetic tree model.

Usage

generate.data(N, otree, with.errors=TRUE,
          edge.weights=if (with.errors) "estimated" else "observed",
          method=c("S","D1","D2"))

Arguments

N

The required sample size.

otree

An object of the class oncotree.

with.errors

A logical value specifying whether false positive and negative errors should be applied.

edge.weights

A choice of whether the observed or estimated edge transition probabilities should be used in the calculation of probabilities. See oncotree.fit for explanation of the difference. By default, estimated edge transition probabilies if with.errors=TRUE and the observed ones if with.errors=FALSE.

method

Simulation method, see Details for explanation of the options.

Details

There are three choices for the method of simulation; the best choice depends on the size of the tree, required sample size, and whether errors are needed.

Method “S” generates the data based on the conditional probability definition of the oncogenetic tree, and then ‘corrupts’ the resulting sample by introducing random errors. This method is applicable in all circumstances, but can be slower than other methods if N is large and with.errors=FALSE is used.

Method “D1” calculates the joint distribution generated by the tree exactly (using distribution.oncotree), and the observations are generated by sampling this distribution. Thus if with.errors=TRUE and the tree is large, this method might fail due to the exponential growth in the number of potential outcomes. On the other hand, for a moderately sized tree and a large desired sample size N this is the most efficient method.

Method “D2” calculates the joint distribution generated by the tree without false positives/negatives, samples from it, and then ‘corrupts’ the resulting sample. If with.errors=FALSE is used then this method is equivalent to method “D1”.

Value

A data set where each row is an independent observation.

Author(s)

Aniko Szabo

See Also

oncotree.fit

Examples

   data(ov.cgh)
   ov.tree <- oncotree.fit(ov.cgh)
   
   set.seed(7365)
   rd <- generate.data(200, ov.tree, with.errors=TRUE)
   
   #compare timing of methods
   system.time(generate.data(20, ov.tree, with.errors=TRUE, method="S"))
   system.time(generate.data(20, ov.tree, with.errors=TRUE, method="D1"))
   system.time(generate.data(20, ov.tree, with.errors=TRUE, method="D2"))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Oncotree)
Loading required package: boot
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/Oncotree/generate.data.Rd_%03d_medium.png", width=480, height=480)
> ### Name: generate.data
> ### Title: Generate random data from an oncogenetic tree
> ### Aliases: generate.data
> ### Keywords: datagen models
> 
> ### ** Examples
> 
>    data(ov.cgh)
>    ov.tree <- oncotree.fit(ov.cgh)
>    
>    set.seed(7365)
>    rd <- generate.data(200, ov.tree, with.errors=TRUE)
>    
>    #compare timing of methods
>    system.time(generate.data(20, ov.tree, with.errors=TRUE, method="S"))
   user  system elapsed 
      0       0       0 
>    system.time(generate.data(20, ov.tree, with.errors=TRUE, method="D1"))
   user  system elapsed 
  0.752   0.036   0.785 
>    system.time(generate.data(20, ov.tree, with.errors=TRUE, method="D2"))
   user  system elapsed 
  0.040   0.000   0.043 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>