Last data update: 2014.03.03

R: Update the NeRI-based model using new data or new threshold...
updateModel.ResR Documentation

Update the NeRI-based model using new data or new threshold values

Description

This function will take the frequency-ranked set of variables and will generate a new model with terms that meet the net residual improvement (NeRI) threshold criteria.

Usage

	updateModel.Res(Outcome, 
	                covariates = "1", 
	                pvalue = c(0.025, 0.05),
	                VarFrequencyTable, 
	                variableList, 
	                data, 
	                type = c("LM", "LOGIT", "COX"),
	                testType=c("Binomial", "Wilcox", "tStudent"), 
	                lastTopVariable = 0, 
	                timeOutcome = "Time",
	                interaction = 1,
	                maxTrainModelSize = -1,
					bootLoops=1)

Arguments

Outcome

The name of the column in data that stores the variable to be predicted by the model

covariates

A string of the type "1 + var1 + var2" that defines which variables will always be included in the models (as covariates)

pvalue

The maximum p-value, associated to the NeRI, allowed for a term in the model

VarFrequencyTable

An array with the ranked frequencies of the features, (e.g. the ranked.var value returned by the ForwardSelection.Model.Res function)

variableList

A data frame with two columns. The first one must have the names of the candidate variables and the other one the description of such variables

data

A data frame where all variables are stored in different columns

type

Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX")

testType

Type of non-parametric test to be evaluated by the improvedResiduals function: Binomial test ("Binomial"), Wilcoxon rank-sum test ("Wilcox"), Student's t-test ("tStudent"), or F-test ("Ftest")

lastTopVariable

The maximum number of variables to be tested

timeOutcome

The name of the column in data that stores the time to event (needed only for a Cox proportional hazards regression model fitting)

interaction

Set to either 1 for first order models, or to 2 for second order models

maxTrainModelSize

Maximum number of terms that can be included in the model

bootLoops

the number of loops for bootstrap estimation of test error

Value

final.model

An object of class lm, glm, or coxph containing the final model

var.names

A vector with the names of the features that were included in the final model

formula

An object of class formula with the formula used to fit the final model

z.NeRI

A vector in which each element represents the z-score of the NeRI, associated to the testType, for each feature found in the final model

loops

The number of loops it took for the model to stabilize

Author(s)

Jose G. Tamez-Pena and Antonio Martinez-Torteya

See Also

updateModel.Bin

Examples

	## Not run: 
	# Start the graphics device driver to save all plots in a pdf format
	pdf(file = "Example.pdf")
	# Get the stage C prostate cancer data from the rpart package
	library(rpart)
	data(stagec)
	# Split the stages into several columns
	dataCancer <- cbind(stagec[,c(1:3,5:6)],
	                    gleason4 = 1*(stagec[,7] == 4),
	                    gleason5 = 1*(stagec[,7] == 5),
	                    gleason6 = 1*(stagec[,7] == 6),
	                    gleason7 = 1*(stagec[,7] == 7),
	                    gleason8 = 1*(stagec[,7] == 8),
	                    gleason910 = 1*(stagec[,7] >= 9),
	                    eet = 1*(stagec[,4] == 2),
	                    diploid = 1*(stagec[,8] == "diploid"),
	                    tetraploid = 1*(stagec[,8] == "tetraploid"),
	                    notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
	# Remove the incomplete cases
	dataCancer <- dataCancer[complete.cases(dataCancer),]
	# Load a pre-stablished data frame with the names and descriptions of all variables
	
	data(cancerVarNames)
	
	# Rank the variables:
	# - Analyzing the raw data
	# - Using a Cox proportional hazards fitting
	# - According to the NeRI
	rankedDataCancer <- univariateRankVariables(variableList = cancerVarNames,
	                                            formula = "Surv(pgtime, pgstat) ~ 1",
	                                            Outcome = "pgstat",
	                                            data = dataCancer,
	                                            categorizationType = "Raw",
	                                            type = "COX",
	                                            rankingTest = "NeRI",
	                                            description = "Description")
	# Get a Cox proportional hazards model using:
	# - 10 bootstrap loops
	# - The ranked variables
	# - The Wilcoxon rank-sum test as the feature inclusion criterion
	cancerModel <- ForwardSelection.Model.Res(loops = 10,
	                                    Outcome = "pgstat",
	                                    variableList = rankedDataCancer,
	                                    data = dataCancer,
	                                    type = "COX",
	                                    testType= "Wilcox",
	                                    timeOutcome = "pgtime")
	# Update the model, adding first order interactions
	
	uCancerModel <- updateModel.Res(Outcome = "pgstat",
	        VarFrequencyTable = cancerModel$ranked.var,
	        variableList = cancerVarNames,
	        data = dataCancer,
	        type = "COX",
	        testType = "Wilcox",
	        timeOutcome = "pgtime",
	        interaction = 2)
	# Shut down the graphics device driver
	dev.off()
## End(Not run)

Results