Last data update: 2014.03.03

R: Factor Adjusted Discriminant Analysis 1: Decorrelation of the...
decorrelate.trainR Documentation

Factor Adjusted Discriminant Analysis 1: Decorrelation of the training data

Description

This function decorrelates the training dataset by adjusting data for the effects of latent factors of dependence.

Usage

decorrelate.train(data.train, nbf = NULL, maxnbfactors=12, diagnostic.plot = FALSE, 
min.err = 0.001, verbose = TRUE,EM = TRUE, maxiter = 15,...)

Arguments

data.train

A list containing the training dataset with the following components: x is the n x p matrix of explanatory variables, where n stands for the training sample size and p for the number of explanatory variables ; y is a numeric vector giving the group of each individual numbered from 1 to K.

nbf

Number of factors. If nbf = NULL, the number of factors is estimated. nbf can also be set to a positive integer value. If nbf = 0, the data are not factor-adjusted.

maxnbfactors

The maximum number of factors. Default is maxnbfactors=12.

diagnostic.plot

If diagnostic.plot =TRUE, the values of the variance inflation criterion are plotted for each number of factors. Default is diagnostic.plot =FALSE. This option might be helpful to manually determine the optimal number of factors.

min.err

Threshold of convergence of the algorithm criterion. Default is min.err=0.001.

verbose

Print out number of factors and values of the objective criterion along the iterations. Default is TRUE.

EM

The method used to estimate the parameters of the factor model. If EM=TRUE, parameters are estimated by an EM algorithm. Setting EM=TRUE is recommended when the number of covariates exceeds the number of observations. If EM=FALSE, the parameters are estimated by maximum-likelihood using factanal. Default is EM=TRUE

maxiter

Maximum number of iterations for estimation of the factor model.

...

Other arguments that can be passed in the cv.glmnet and glmnet functions from glmnet package. These functions are used to estimate individual group probabilities. Modifying these parameters should not affect the decorrelation procedure. However, the argument nfolds in cv.glmnet is set to 10 by default and should be reduced (minimum 3) for large datasets, in order to decrease the computation time of decorrelation.train.

Value

Returns a list with the following elements:

meanclass

Group means estimated after iterative decorrelation

fa.training

Decorrelated training data

Psi

Estimation of the factor model parameters: specific variance

B

Estimation of the factor model parameters: loadings

factors.training

Scores of the trainings individuals on the factors

groups

Recall of group variable of training data

proba.training

Internal value (estimation of individual probabilities for the training dataset)

Author(s)

Emeline Perthame, Chloe Friguet and David Causeur

References

Friedman, J., Hastie, T. and Tibshirani, R. (2010), Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.

Friguet, C., Kloareg, M. and Causeur, D. (2009), A factor model approach to multiple testing under dependence. Journal of the American Statistical Association, 104:488, 1406-1415.

Perthame, E., Friguet, C. and Causeur, D. (2015), Stability of feature selection in classification issues for high-dimensional correlated data, Statistics and Computing.

See Also

FADA-package FADA glmnet-package factanal

Examples

data(data.train)

res0 = decorrelate.train(data.train,nbf=3) #  when the number of factors is forced

res1 = decorrelate.train(data.train) #  when the optimal number of factors is unknown

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(FADA)
Loading required package: MASS
Loading required package: elasticnet
Loading required package: lars
Loaded lars 1.2

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/FADA/decorrelate.train.Rd_%03d_medium.png", width=480, height=480)
> ### Name: decorrelate.train
> ### Title: Factor Adjusted Discriminant Analysis 1: Decorrelation of the
> ###   training data
> ### Aliases: decorrelate.train
> 
> ### ** Examples
> 
> data(data.train)
> 
> res0 = decorrelate.train(data.train,nbf=3) #  when the number of factors is forced
[1] "Number of factors: 3 factors"
[1] "Objective criterion: "
[1] 0.05912524
[1] 1.603967
[1] 0.001050686
[1] 0.0004215778
> 
> res1 = decorrelate.train(data.train) #  when the optimal number of factors is unknown
[1] "Number of factors: 3 factors"
[1] "Objective criterion: "
[1] 0.05912524
[1] 1.603967
[1] 0.001050686
[1] 0.0004215778
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>