the coefficient for the logistic regression intercept term
interaction
the coefficient for the logistic regression interaction term
Details
The quadBoundaryFunc function creates a class boundary that is a function of both predictors. The probability values are based on a logistic regression model with model equation: -1-2*X1 -0.2*X1^2 + 2*X2^2. The predictors here are multivariate normal with mean (1, 0) and a moderate degree of positive correlation.
Similarly, the easyBoundaryFunc uses a logistic regression model with model equation: intercept -4*X1 + 4*X2 + interaction*X1*X2. The predictors here are multivariate normal with mean (1, 0) and a strong positive correlation.
Value
Both functions return data frames with columns
X1
numeric predictor value
X2
numeric predictor value
prob
numeric value reflecting the true probability of the first class
class
a factor variable with levels 'Class1' and 'Class2'
Author(s)
Max Kuhn
Examples
## in Chapter 11, 'Measuring Performance in Classification Model'
set.seed(975)
training <- quadBoundaryFunc(500)
testing <- quadBoundaryFunc(1000)
## in Chapter 20, 'Factors That Can Affect Model Performance'
set.seed(615)
dat <- easyBoundaryFunc(200, interaction = 3, intercept = 3)
dat$X1 <- scale(dat$X1)
dat$X2 <- scale(dat$X2)
dat$Data <- "Original"
dat$prob <- NULL
## in Chapter X, 'An Introduction to Feature Selection'
set.seed(874)
reliefEx3 <- easyBoundaryFunc(500)
reliefEx3$X1 <- scale(reliefEx3$X1)
reliefEx3$X2 <- scale(reliefEx3$X2)
reliefEx3$prob <- NULL
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(AppliedPredictiveModeling)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/AppliedPredictiveModeling/quadBoundaryFunc.Rd_%03d_medium.png", width=480, height=480)
> ### Name: quadBoundaryFunc
> ### Title: Functions for Simulating Data
> ### Aliases: quadBoundaryFunc easyBoundaryFunc
> ### Keywords: utilities
>
> ### ** Examples
>
> ## in Chapter 11, 'Measuring Performance in Classification Model'
> set.seed(975)
> training <- quadBoundaryFunc(500)
> testing <- quadBoundaryFunc(1000)
>
>
> ## in Chapter 20, 'Factors That Can Affect Model Performance'
> set.seed(615)
> dat <- easyBoundaryFunc(200, interaction = 3, intercept = 3)
> dat$X1 <- scale(dat$X1)
> dat$X2 <- scale(dat$X2)
> dat$Data <- "Original"
> dat$prob <- NULL
>
> ## in Chapter X, 'An Introduction to Feature Selection'
>
> set.seed(874)
> reliefEx3 <- easyBoundaryFunc(500)
> reliefEx3$X1 <- scale(reliefEx3$X1)
> reliefEx3$X2 <- scale(reliefEx3$X2)
> reliefEx3$prob <- NULL
>
>
>
>
>
>
> dev.off()
null device
1
>