Last data update: 2014.03.03

R: Partial Least Squares Discriminant Analysis for...
isoPLSDAR Documentation

Partial Least Squares Discriminant Analysis for IsomirDataSeq

Description

Use PLS-DA method with the normalized count data to detect the most important features (miRNAs/isomiRs) that explain better the group of samples given by the experimental design. It is a supervised clustering method with permutations to calculate the significance of the analysis.

Usage

isoPLSDA(ids, group, validation = NULL, learn = NULL, test = NULL,
 tol = 0.001, nperm = 400, refinment = FALSE, vip = 1.2)

Arguments

ids

object of class IsomirDataSeq

group

column name in colData(ids) to use as variable to explain.

validation

type of validation, either NULL or "learntest". Default NULL

learn

optional vector of indexes for a learn-set. Only used when validation="learntest". Default NULL

test

optional vector of indices for a test-set. Only used when validation="learntest". Default NULL

tol

tolerance value based on maximum change of cumulative R-squared coefficient for each additional PLS component. Default tol=0.001

nperm

number of permutations to compute the PLD-DA p-value based on R2 magnitude. Default nperm=400

refinment

logical indicating whether a refined model, based on filtering out variables with low VIP values

vip

Variance Importance in Projection threshold value when a refinement process is considered. Default vip=1.2

Details

Partial Least Squares Discriminant Analysis (PLS-DA) is a technique specifically appropriate for analysis of high dimensionality data sets and multicollinearity (Perez-Enciso, 2013). PLS-DA is a supervised method (i.e. makes use of class labels) with the aim to provide a dimension reduction strategy in a situation where we want to relate a binary response variable (in our case young or old status) to a set of predictor variables. Dimensionality reduction procedure is based on orthogonal transformations of the original variables (miRNAs/isomiRs) into a set of linearly uncorrelated latent variables (usually termed as components) such that maximizes the separation between the different classes in the first few components (Xia, 2011). We used sum of squares captured by the model (R2) as a goodness of fit measure.

We implemented this method using the DiscriMiner-package into isoPLSDA function. The output p-value of this function will tell about the statistical significant of the group separation using miRNA/isomiR expression data.

Read more about the parameters related to the PLS-DA directly from plsDA function.

Value

A list with the following elements: R2Matrix (R-squared coefficients of the PLS model), components (of the PLS, similar to PCs in a PCA), vip (most important isomiRs/miRNAs), group (classification of the samples), p.value and R2PermutationVector obtained by the permutations.

If the option refinment is set to TRUE, then the following elements will appear: R2RefinedMatrix and componentsRefinedModel (R-squared coefficients of the PLS model only using the most important miRNAs/isomiRs). As well, p.valRefined and R2RefinedPermutationVector with p-value and R2 of the permutations where samples were randomized. And finally, p.valRefinedFixed and R2RefinedFixedPermutationVector with p-value and R2 of the permutations where miRNAs/isomiRs were randomized.

References

Perez-Enciso, Miguel and Tenenhaus, Michel. Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach. Human Genetics. 2003.

Xia, Jianguo and Wishart, David S. Web-based inference of biological patterns, functions and pathways from metabolomic data using MetaboAnalyst. Nature Protocols. 2011.

Examples

data(mirData)
# Only miRNAs with > 10 reads in all samples.
ids <- isoCounts(mirData, minc=10, mins=6)
ids <- isoNorm(ids)
pls.ids = isoPLSDA(ids, "condition", nperm = 2)
cat(paste0("pval:",pls.ids$p.val))
cat(paste0("components:",pls.ids$components))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(isomiRs)
Loading required package: DiscriMiner
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/isomiRs/isoPLSDA.Rd_%03d_medium.png", width=480, height=480)
> ### Name: isoPLSDA
> ### Title: Partial Least Squares Discriminant Analysis for 'IsomirDataSeq'
> ### Aliases: isoPLSDA
> 
> ### ** Examples
> 
> data(mirData)
> # Only miRNAs with > 10 reads in all samples.
> ids <- isoCounts(mirData, minc=10, mins=6)
> ids <- isoNorm(ids)
converting counts to integer mode
-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.
> pls.ids = isoPLSDA(ids, "condition", nperm = 2)
> cat(paste0("pval:",pls.ids$p.val))
pval:0> cat(paste0("components:",pls.ids$components))
components:16.6858092276317 components:3.04045594908574 components:9.34212467094332 components:-9.24066935893392 components:-9.70619388142206 components:-10.1215266073048 components:-12.841132088321 components:15.8274781050384 components:7.15087127544249 components:-2.0753270359708 components:-3.46336377223972 components:-4.59852648394939 components:2.49048849972438 components:5.62559676754491 components:-7.01897559719785 components:-3.9154729400621 components:-1.79013597554926 components:4.60849924553993> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>