Last data update: 2014.03.03

R: Functions for Simulating Data
quadBoundaryFuncR Documentation

Functions for Simulating Data

Description

These functions simulate data that are used in the text.

Usage

quadBoundaryFunc(n)

easyBoundaryFunc(n, intercept = 0, interaction = 2)

Arguments

n

the sample size

intercept

the coefficient for the logistic regression intercept term

interaction

the coefficient for the logistic regression interaction term

Details

The quadBoundaryFunc function creates a class boundary that is a function of both predictors. The probability values are based on a logistic regression model with model equation: -1-2*X1 -0.2*X1^2 + 2*X2^2. The predictors here are multivariate normal with mean (1, 0) and a moderate degree of positive correlation.

Similarly, the easyBoundaryFunc uses a logistic regression model with model equation: intercept -4*X1 + 4*X2 + interaction*X1*X2. The predictors here are multivariate normal with mean (1, 0) and a strong positive correlation.

Value

Both functions return data frames with columns

X1

numeric predictor value

X2

numeric predictor value

prob

numeric value reflecting the true probability of the first class

class

a factor variable with levels 'Class1' and 'Class2'

Author(s)

Max Kuhn

Examples

## in Chapter 11, 'Measuring Performance in Classification Model'
set.seed(975)
training <- quadBoundaryFunc(500)
testing <- quadBoundaryFunc(1000)
 

## in Chapter 20, 'Factors That Can Affect Model Performance'
set.seed(615)
dat <- easyBoundaryFunc(200, interaction = 3, intercept = 3)
dat$X1 <- scale(dat$X1)
dat$X2 <- scale(dat$X2)
dat$Data <- "Original"
dat$prob <- NULL

## in Chapter X, 'An Introduction to Feature Selection'

set.seed(874)
reliefEx3 <- easyBoundaryFunc(500)
reliefEx3$X1 <- scale(reliefEx3$X1)
reliefEx3$X2 <- scale(reliefEx3$X2)
reliefEx3$prob <- NULL

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(AppliedPredictiveModeling)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/AppliedPredictiveModeling/quadBoundaryFunc.Rd_%03d_medium.png", width=480, height=480)
> ### Name: quadBoundaryFunc
> ### Title: Functions for Simulating Data
> ### Aliases: quadBoundaryFunc easyBoundaryFunc
> ### Keywords: utilities
> 
> ### ** Examples
> 
> ## in Chapter 11, 'Measuring Performance in Classification Model'
> set.seed(975)
> training <- quadBoundaryFunc(500)
> testing <- quadBoundaryFunc(1000)
>  
> 
> ## in Chapter 20, 'Factors That Can Affect Model Performance'
> set.seed(615)
> dat <- easyBoundaryFunc(200, interaction = 3, intercept = 3)
> dat$X1 <- scale(dat$X1)
> dat$X2 <- scale(dat$X2)
> dat$Data <- "Original"
> dat$prob <- NULL
> 
> ## in Chapter X, 'An Introduction to Feature Selection'
> 
> set.seed(874)
> reliefEx3 <- easyBoundaryFunc(500)
> reliefEx3$X1 <- scale(reliefEx3$X1)
> reliefEx3$X2 <- scale(reliefEx3$X2)
> reliefEx3$prob <- NULL
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>