Last data update: 2014.03.03

R: Simulation of data for use in NHEMOtree
Sim_DataR Documentation

Simulation of data for use in NHEMOtree

Description

Simulation of data with one grouping variable containing four classes and 20 explanatory variables. Variables X1 to X3 are informative for seperating the four classes. Variable X1 separates class 1, X2 separates class 1 and class 2, and X3 separates class 3 from class 4. Variables X4, X5, and X6 are created on basis of X3 and can also be used to separate class 3 from class 4 but with decreased prediction accuracy.

Usage

Sim_Data(Obs, VG=1, VP1=0.05, VP2=0.1, VP3=0.3)

Arguments

Obs

Amount of observations.

VG

Overall accuracy for data separation in [0,1] with VG=1 (default) for perfect seperation.

VP1

Decrease of prediction accuracy for variable X4 in comparison with X3 to separate class 3 from class 4 (default: VP1=0.05).

VP2

Decrease of prediction accuracy for variable X5 in comparison with X3 to separate class 3 from class 4 (default: VP2=0.1).

VP3

Decrease of prediction accuracy for variable X6 in comparison with X3 to separate class 3 from class 4 (default: VP3=0.3).

Details

With this function data with one grouping variable containing four classes and 20 explanatory variables X1 to X10 is simulated.

Variable X1 separates class 1, X2 separates class 1 and class 2, and X3 separates class 3 from class 4. For all samples belonging to the according classes the explanatory variables X1 to X3 are drawn from a normal distribution with μ=80 and σ^2=25. Samples which are not allocated to the corresponding class are drawn from a uniform distribution with minimum 0 and an adjustable maximum value. The maximum values of the uniform distributions are the smallest drawn random values of each variable.

Variables X4, X5, and X6 are created on basis of X3 and separate class 3 from class 4, too. However, the prediction accuracy of these variables decreases gradually. The decrease is assigned by 'VP1', 'VP2', and 'VP3'. Thus, the according amount of the discriminating samples of former class 3 are disturbed by assigning a value drawn from a uniform distribution. Accordingly, X4, X5 and X6 discriminate class 3 worse than X3. X7 to X10 are noisy variables drawn from a normal distribution that contain no information.

Noise is added to the class assignment by a binomial distribution. Each potential class is only with probability "VG" the equivalent class and with probability 1-"VG" one of the other classes.

Variable costs correlate with their prediction accuracy so that variables containing more information are more expensive than variables with less or none information. The costs of the variables are generated with function "Sim_Costs".

Author(s)

Swaantje Casjens

See Also

Sim_Costs, NHEMOtree

Examples

  d<- Sim_Data(Obs=200)
  head(d)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(NHEMOtree)
Loading required package: partykit
Loading required package: grid
Loading required package: emoa
Loading required package: sets
Loading required package: rpart
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/NHEMOtree/Sim_Data.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Sim_Data
> ### Title: Simulation of data for use in NHEMOtree
> ### Aliases: Sim_Data
> ### Keywords: Non-hierarchical evolutionary multi-objective tree learner
> 
> ### ** Examples
> 
>   d<- Sim_Data(Obs=200)
>   head(d)
  Y2       X1       X2        X3        X4        X5        X6       X7
1  A 68.84104 78.46287  4.098282  4.098282  4.098282  4.098282 6.325403
2  A 79.16130 81.18937 17.852013 17.852013 17.852013 17.852013 4.753584
3  A 77.08088 78.68188 33.500370 33.500370 33.500370 33.500370 5.095474
4  A 81.14431 80.93213 60.010588 60.010588 60.010588 60.010588 6.097924
5  A 82.86189 73.66258 35.868174 35.868174 35.868174 35.868174 5.136485
6  A 85.85443 82.96483 31.322616 31.322616 31.322616 31.322616 6.215519
        X8       X9      X10
1 5.859676 5.009940 3.926446
2 4.613853 4.959480 4.079549
3 6.946333 7.009725 4.675621
4 3.296312 4.960158 5.176186
5 4.074919 6.958616 3.906102
6 5.127719 4.173529 3.784175
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>