Number of folds to estimate classification error rate, only when no testing data is provided. Default is K=10.
B
Number of replications of the cross-validation. Default is B=20.
nbf.cv
Number of factors for cross validation to compute error rate, only when no testing data is provided. By default, nbf = NULL and the number of factors is estimated for each fold of the cross validation. nbf can
also be set to a positive integer value. If nbf = 0, the data are not factor-adjusted.
method
The method used to perform supervised classification model. 3 options are available. If
method = "glmnet", a Lasso penalized logistic regression is performed using glmnet R package.
If method = "sda", a LDA or DDA (see diagonal argument) is performed using Shrinkage Discriminant
Analysis using sda R package. If method = "sparseLDA", a Lasso penalized LDA is performed using
SparseLDA R package.
sda.method
The method used for variable selection, only if method="sda". If sda.method="lfdr",
variables are selected through CAT scores and False Non Discovery Rate control. If sda.method="HC", the variable selection
method is Higher Cristicism Thresholding.
alpha
The proportion of the HC objective to be observed, only if method="sda" and sda.method="HC". Default is 0.1.
...
Some arguments to tune the classification method. See the documentation of the chosen method (glmnet, sda or sda) for more informations about these parameters.
Value
Returns a list with the following elements:
method
Recall of the classification method
selected
A vector containing index of the selected variables
proba.train
A matrix containing predicted group frequencies of training data.
proba.test
A matrix containing predicted group frequencies of testing data, if a testing data set has been provided
predict.test
A matrix containing predicted classes of testing data, if a testing data set has been provided
cv.error
A numeric value containing the average classification error, computed by cross validation, if no testing data set has been provided
cv.error.se
A numeric value containing the standard error of the classification error, computed by cross validation, if no testing data set has been provided
mod
The classification model performed. The class of this element is the class of a model returned by the chosen method. See the documentation of the chosen method for more details.
Author(s)
Emeline Perthame, Chloe Friguet and David Causeur
References
Ahdesmaki, M. and Strimmer, K. (2010), Feature selection in omics prediction problems using cat scores and false non-discovery rate control. Annals of Applied Statistics, 4, 503-519.
Clemmensen, L., Hastie, T. and Witten, D. and Ersboll, B. (2011), Sparse discriminant analysis. Technometrics, 53(4), 406-413.
Friedman, J., Hastie, T. and Tibshirani, R. (2010), Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.
Friguet, C., Kloareg, M. and Causeur, D. (2009), A factor model approach to multiple testing under dependence. Journal of the American Statistical Association, 104:488, 1406-1415.
Perthame, E., Friguet, C. and Causeur, D. (2015), Stability of feature selection in classification issues for high-dimensional correlated data, Statistics and Computing.
data(data.train)
data(data.test)
# When testing data set is provided
res = decorrelate.train(data.train)
res2 = decorrelate.test(res, data.test)
classif = FADA(res2,method="sda",sda.method="lfdr")
### Not run
# When no testing data set is provided
# Classification error rate is computed by a K-fold cross validation.
# res = decorrelate.train(data.train)
# classif = FADA(res, method="sda",sda.method="lfdr")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(FADA)
Loading required package: MASS
Loading required package: elasticnet
Loading required package: lars
Loaded lars 1.2
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/FADA/FADA.Rd_%03d_medium.png", width=480, height=480)
> ### Name: FADA
> ### Title: Factor Adjusted Discriminant Analysis 3-4 : Supervised
> ### classification on decorrelated data
> ### Aliases: FADA
>
> ### ** Examples
>
> data(data.train)
> data(data.test)
>
> # When testing data set is provided
> res = decorrelate.train(data.train)
[1] "Number of factors: 3 factors"
[1] "Objective criterion: "
[1] 0.05912524
[1] 1.603967
[1] 0.001050686
[1] 0.0004215778
> res2 = decorrelate.test(res, data.test)
> classif = FADA(res2,method="sda",sda.method="lfdr")
>
> ### Not run
> # When no testing data set is provided
> # Classification error rate is computed by a K-fold cross validation.
> # res = decorrelate.train(data.train)
> # classif = FADA(res, method="sda",sda.method="lfdr")
>
>
>
>
>
> dev.off()
null device
1
>