Advance rank sum method to identify differentially
expressed genes. It is possible to combine data from
different studies, e.g. data sets generated at different
laboratories.
the data set that should be analyzed. Every
row of this data set must correspond to a gene.
cl
a vector containing the class labels of the
samples. In the two class unpaired case, the label
of a sample is either 0 (e.g., control group) or 1
(e.g., case group). For one group data, the label for
each sample should be 1.
origin
a vector containing the origin labels of the
sample. e.g. for
the data sets generated at multiple laboratories, the label
is the same for samples within one lab and different for samples
from different labs.
num.perm
number of permutations used in the calculation
of the null density. Default is 'B=100'.
logged
if "TRUE", data has bee logged, otherwise set
it to "FALSE"
na.rm
if 'FALSE' (default), the NA value will not
be used in computing rank. If 'TRUE', the missing
values will be replaced by the genewise mean of
the non-missing values. Gene will all value missing
will be assigned "NA"
gene.names
if "NULL", no gene name will be attached
to the estimated percentage of false prediction (pfp).
plot
If "TRUE", plot the estimated pfp verse the rank
of each gene
rand
if specified, the random number generator
will be put in a reproducible state.
Value
A result of identifying differentially expressed
genes between two classes. The identification consists of two parts,
the identification of up-regulated and down-regulated genes in class 2
compared to class 1, respectively.
pfp
estimated percentage of false positive predictions
(pfp) up to the position of each gene under two
identificaiton each
pval
estimated pvalue for each gene being up- and down-regulated
RSs
Origina rank-sum (average rank) of each genes
RSrank
rank of the rank sum of each gene in ascending order
Orirank
original ranks in each comparison, which is
used to compute rank sum
AveFC
fold change of average expression under class 1 over
that under class 2, if multiple origin, than avraged
across all origin. log-fold change if data is in log scaled,
original fold change if data is unlogged.
all.FC
fold change of class 1/class 2 under each origin.
log-fold change if data is in log scaled
Note
Percentage of false prediction (pfp), in theory, is
equivalent of false discovery rate (FDR), and it is
possible to be large than 1.
The function looks for up- and down- regulated genes in two
seperate steps, thus two pfps are computed and used to identify
gene that belong to each group.
The function is able to deal with single or multiple-orgin
studies. It is similar to funcion RP.advance expect a rank
sum is computed instead of rank product. This method
is more sensitive to individual rank values, while rank
product is more robust to
outliers (refer RankProd vignette for details)
#Suppose we want to check the consistence of the data
#sets generated in two different
#labs. For example, we would look for genes that were
# measured to be up-regulated in
#class 2 at lab 1, but down-regulated in class 2 at lab 2.
data(arab)
arab.cl2 <- arab.cl
arab.cl2[arab.cl==0 &arab.origin==2] <- 1
arab.cl2[arab.cl==1 &arab.origin==2] <- 0
arab.cl2
##[1] 0 0 0 1 1 1 1 1 0 0
#look for genes differentially expressed
#between hypothetical class 1 and 2
arab.sub=arab[1:500,] ##using subset for fast computation
arab.gnames.sub=arab.gnames[1:500]
Rsum.adv.out <- RSadvance(arab.sub,arab.cl2,arab.origin,
num.perm=100,
logged=TRUE,
gene.names=arab.gnames.sub,rand=123)
attributes(Rsum.adv.out)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(RankProd)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/RankProd/RSadvance.Rd_%03d_medium.png", width=480, height=480)
> ### Name: RSadvance
> ### Title: Advanced Rank Sum Analysis of Microarray
> ### Aliases: RSadvance
> ### Keywords: htest
>
> ### ** Examples
>
>
> #Suppose we want to check the consistence of the data
> #sets generated in two different
> #labs. For example, we would look for genes that were
> # measured to be up-regulated in
> #class 2 at lab 1, but down-regulated in class 2 at lab 2.
> data(arab)
> arab.cl2 <- arab.cl
>
> arab.cl2[arab.cl==0 &arab.origin==2] <- 1
>
> arab.cl2[arab.cl==1 &arab.origin==2] <- 0
>
> arab.cl2
[1] 0 0 0 1 1 1 1 1 0 0
> ##[1] 0 0 0 1 1 1 1 1 0 0
>
>
> #look for genes differentially expressed
> #between hypothetical class 1 and 2
> arab.sub=arab[1:500,] ##using subset for fast computation
> arab.gnames.sub=arab.gnames[1:500]
> Rsum.adv.out <- RSadvance(arab.sub,arab.cl2,arab.origin,
+ num.perm=100,
+ logged=TRUE,
+ gene.names=arab.gnames.sub,rand=123)
The data is from 2 different origins
Rank Sum analysis for two-class case
Starting 100 permutations...
Computing pfp...
>
> attributes(Rsum.adv.out)
$names
[1] "pfp" "pval" "RSs" "RSrank" "Orirank" "AveFC" "all.FC"
>
>
>
>
>
>
> dev.off()
null device
1
>