R Graphical Manual

Browse All

Last data update: 2014.03.03

R: A function to perform MDR on a dataset using k-fold...

mdr.cv

R Documentation

A function to perform MDR on a dataset using k-fold cross-validation for internal validation.

Description

Determines the best MDR model up to a specified size of interaction K by minimizing balanced accuracy (mean of sensitivity and specificity), while using a k-fold cross-validation internal validation method. The function mdr.cv is essentially a wrapper for the function mdr.

Usage

mdr.cv(data, K, cv, ratio = NULL, equal = "HR", genotype = c(0, 1, 2))

Arguments

`data`	the dataset; an n by (p+1) matrix where the first column is the binary response vector (coded 0 or 1) and the remaining columns are the p SNP genotypes (coded numerically)
`K`	the highest level of interaction to consider
`cv`	the number of cross-validation intervals; for k-fold cross-validation, cv=k
`ratio`	the case/control ratio threshold to ascribe high-risk/low-risk status of a genotype combination
`equal`	how to treat genotype combinations with case/control ratio equal to the threshold; default is "HR" for high-risk, but can also consider "LR" for low-risk
`genotype`	a numeric vector of possible genotypes arising in `data`; default is c(0,1,2), but this vector can be longer or shorter depending on if more or fewer than three genotypes are possible

Details

MDR is a non-parametric data-mining approach to variable selection designed to detect gene-gene or gene-environment interactions in case-control studies. This function uses balanced accuracy as the evaluation measure to rank potential models. An overall best model is chosen to minimize balanced accuracy, while also preventing model over-fitting with internal validation. This function uses cv-fold cross-validation to separate the data into training and testing sets. The data is randomly separated into cv equal pieces and cv-1/cv of the data is used for training/model-building and 1/cv for testing/prediction; this procedure is repeated cv times.

Value

An object of class 'mdr', which is a list containing:

`final model`	a numeric vector of the predictors included in the final model
`final model accuracy`	the balanced accuracy of the final model from the validation set
`top models`	a list containing the best model (with minimum BA) for each level of interaction, from 1 to `K`
`top model accuracies`	a matrix containing the training, testing, and validation accuracies for each level of interaction, from 1 to `K`
`high-risk/low-risk`	a vector of the high-risk/low-risk parameterizations of the genotype combinations for the final model
`genotypes`	the numeric vector of possible genotypes specified
`validation method`	"CV", since cross-validation was utilized for internal validation

...

Warning

MDR is a combinatorial search approach, so considering high-order interactions (i.e. large values for K) can be computationally expensive.

Note

When determining the high-risk/low-risk status of a genotype combination, the order of combinations uses the convention that the genotypes of the first locus vary the most, based on the function expand.grid. For instance, with 3 genotypes (0,1,2), a two-way interaction results in the following 9 combinations: (0,0), (1,0), (2,0), (0,1), (1,1), (2,1), (0,2), (1,2), (2,2).

Author(s)

Stacey Winham

References

Ritchie et al (2001). Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hm Genet 69, 138-147.

Hahn LW, Ritchie MD, Moore JH (2003). Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19(3):376-82.

Velez et al (2007). A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 31, 306-315.

Motsinger AA, Ritchie MD (2006). The effect of reduction in cross-validation intervals on the performance of multifactor dimensionality reduction. Genet Epidemiol 30(6):546-55.

Examples

#load test data
data(mdr1)

fit<-mdr.cv(data=mdr1[,1:11], K=2, cv=5, ratio = NULL, equal = "HR", genotype = c(0, 1, 2)) #fit MDR with 5-fold cross-validation to a subset of the sample data, allowing for 1 to 2-way interactions

print(fit) #view the fitted mdr object

summary(fit) #create summary table of best MDR model

plot(fit, data=mdr1) #create contingency plot of best MDR model; may need to expand the plot window for large values of K

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(MDR)
Loading required package: lattice
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/MDR/mdr.cv.Rd_%03d_medium.png", width=480, height=480)
> ### Name: mdr.cv
> ### Title: A function to perform MDR on a dataset using k-fold
> ###   cross-validation for internal validation.
> ### Aliases: mdr.cv
> 
> ### ** Examples
> 
> #load test data
> data(mdr1)
> 
> fit<-mdr.cv(data=mdr1[,1:11], K=2, cv=5, ratio = NULL, equal = "HR", genotype = c(0, 1, 2)) #fit MDR with 5-fold cross-validation to a subset of the sample data, allowing for 1 to 2-way interactions
> 
> print(fit) #view the fitted mdr object
$`final model`
     [,1] [,2]
[1,]    4    9

$`final model accuracy`
prediction accuracy 
           64.35354 

$`top models`
$`top models`[[1]]
     [,1]
[1,]    9

$`top models`[[2]]
     [,1] [,2]
[1,]    4    9


$`top model accuracies`
     classification accuracy prediction accuracy cross-validation consistency
[1,]                62.29951            59.03734                            4
[2,]                67.28735            64.35354                            5

$`high-risk/low-risk`
[1] 0 0 1 0 1 1 0 1 1

$genotypes
[1] 0 1 2

$`validation method`
[1] "CV"

attr(,"class")
[1] "mdr"
> 
> summary(fit) #create summary table of best MDR model
  Level    Best Models      Classification Accuracy    Prediction Accuracy
      1              9                        62.30                  59.04
*     2              4 9                      67.29                  64.35
     Cross-Validation Consistency
                                4
*                               5
 
'*' indicates overall best model> 
> plot(fit, data=mdr1) #create contingency plot of best MDR model; may need to expand the plot window for large values of K
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>