R: Initial negative set selection for building machine...
PSOL_InitialNegativeSelection
R Documentation
Initial negative set selection for building machine learning-based classification model
Description
This function selects an initial negative set with the machine learning(ML)-based positive-only
sample learning (PSOL) algorithm. The PSOL algorithm has been previously applied to predict
genomic loci encoding functional non-coding RNAs (Wang, et al. 2006). We have employed this
algorithm to identify stress-related candidate genes in Arabidopsis based on the stress
microarray datasets (Ma and Wang, 2013).
a numeric matrix recording the features for all sample.
positives
a character vector recording positive samples
unlabels
a character vector recording unlabeled samples.
negNum
an integer number specifying the size of negative samples will be selected.
cpus
an integer number specifying the number of cpus will be used for parallel computing.
PSOLResDic
a character string specifying the file directionry storing PSOL results.
Value
A list containing three components:
positives
a character vector including the input positive samples.
negatives
a character vector recording the selected negative samples.
unlabels
a character vector recording the unlabeled samples.
Author(s)
Chuang Ma and Xiangfeng Wang.
References
[1] Chunlin Wang, Chris Ding, Richard F. Meraz and Stephen R. Holbrook. PSoL: a positive sample
only learning algorithm for finding non-coding RNA genes. Bioinformatics, 2006, 22(21): 2590-2596.
[2] Chuang Ma, Xiangfeng Wang. Machine learning-based differential network analysis: a case study
of stress-responsive transcriptomes in Arabidopsis thaliana. 2013(Submitted).
Examples
## Not run:
##generate expression feature matrix
sampleVec1 <- c(1, 2, 3, 4, 5, 6)
sampleVec2 <- c(1, 2, 3, 4, 5, 6)
featureMat <- expFeatureMatrix(
expMat1 = ControlExpMat, sampleVec1 = sampleVec1,
expMat2 = SaltExpMat, sampleVec2 = sampleVec2,
logTransformed = TRUE, base = 2,
features = c("zscore", "foldchange",
"cv","expression"))
##positive samples
positiveSamples <- as.character(sampleData$KnownSaltGenes)
##unlabeled samples
unlabelSamples <- setdiff( rownames(featureMat), positiveSamples )
##selecting an intial set of negative samples
##for building ML-based classification model
##suppose the PSOL results will be stored in:
PSOLResDic <- "/home/wanglab/mlDNA/PSOL/"
res <- PSOL_InitialNegativeSelection(featureMatrix = featureMat,
positives = positiveSamples,
unlabels = unlabelSamples,
negNum = length(positiveSamples),
cpus = 6, PSOLResDic = PSOLResDic )
##initial negative samples extracted from unlabelled samples with PSOL algorithm
negatives <- res$negatives
## End(Not run)