R: Estimate expression of a known set of gene splicing variants.
calcExp
R Documentation
Estimate expression of a known set of gene splicing variants.
Description
Estimate expression of gene splicing variants,
assuming that the set of variants is known.
When rpkm is set to TRUE, fragments per kilobase
per million are returned. Otherwise relative expression estimates are returned.
List of fragment distributions as generated by the getDistrs function
genomeDB
knownGenome object containing annotated genome,
as returned by the procGenome function.
pc
Named vector of exon path counts as returned by pathCounts
readLength
Read length in bp, e.g. in a paired-end experiment where
75bp are sequenced on each end one would set readLength=75.
islandid
Name of the gene island to be analyzed. If not specified, all
gene islands are analyzed.
rpkm
Set to FALSE to return relative expression levels,
i.e. the proportion of reads generated from each variant per
gene. These proportions add up to 1 for each gene.
Set to TRUE to return fragments per kilobase per million (RPKM).
priorq
Parameter of the prior distribution on the proportion of reads coming from each variant. The prior is Dirichlet with prior sample size for each variant equal to priorq.
We recommend priorq=2 for estimation, as it pools the
estimated expression away from 0 and 1 and returned lower
estimation errors than priorq=1 in our simulated
experiments.
priorqGeneExpr
Parameter for prior distribution on overall gene
expression. Defaults to 2, which ensures non-zero estimates for all genes
citype
Set to "none" to return no credibility intervals. Set to "asymp" to return approximate 95% CIs (obtained via the delta method). Set to "exact" to obtain exact
CIs via Monte Carlo simulation. Options "asymp" and especially "exact" can increase the computation time substantially.
niter
Number of Monte Carlo iterations. Only used when citype=="exact".
burnin
Number of burnin Monte Carlo iterations. Only used when citype=="exact".
mc.cores
Number of processors to be used for parallel computation. Can only be used if the package multicore is available for your system.
verbose
Set to TRUE to display progress information.
Value
Expression set with expression estimates.
featureNames identify each transcript via
RefSeq ids, and the featureData contains further information.
If citype was set to a value other than "none", the featureData also contains the 95% credibility intervals
(i.e. intervals that contain the true parameter value with 95% posterior probability).
Author(s)
Camille Stephan-Otto Attolini, Manuel Kroiss, David Rossell
References
Rossell D, Stephan-Otto Attolini C, Kroiss M, Stocker A. Quantifying Alternative Splicing from Paired-End
RNA-sequencing data. Annals of Applied Statistics, 8(1):309-330.
See Also
relexprByGene
Examples
data(K562.r1l1)
data(hg19DB)
#Pre-process
bam0 <- rmShortInserts(K562.r1l1, isizeMin=100)
pbam0 <- procBam(bam0)
head(getReads(pbam0))
#Estimate distributions, get path counts
distrs <- getDistrs(hg19DB,bam=bam0,readLength=75)
pc <- pathCounts(pbam0, DB=hg19DB)
#Get estimates
eset <- calcExp(distrs=distrs, genomeDB=hg19DB, pc=pc, readLength=75, rpkm=FALSE)
head(exprs(eset))
head(fData(eset))
#Re-normalize relative expression to add up to 1 within gene_id rather
# than island_id
eset <- relexprByGene(eset)
#Add fake sample by permuting and combine
eset2 <- eset[sample(1:nrow(eset),replace=FALSE),]
sampleNames(eset2) <- '2' #must have a different name
esetall <- mergeExp(eset,eset2)
#After merge samples are correctly matched
head(exprs(esetall))
head(fData(esetall))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(casper)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: IRanges
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/casper/calcExp.Rd_%03d_medium.png", width=480, height=480)
> ### Name: calcExp
> ### Title: Estimate expression of a known set of gene splicing variants.
> ### Aliases: calcExp
> ### Keywords: models
>
> ### ** Examples
>
> data(K562.r1l1)
> data(hg19DB)
>
> #Pre-process
> bam0 <- rmShortInserts(K562.r1l1, isizeMin=100)
> pbam0 <- procBam(bam0)
> head(getReads(pbam0))
GRanges object with 6 ranges and 3 metadata columns:
seqnames ranges strand | rid XS names
<Rle> <IRanges> <Rle> | <integer> <Rle> <integer>
[1] chr17 [ 7124912, 7124986] + | 1 * 0
[2] chr17 [ 7124986, 7125001] - | 2 * 0
[3] chr17 [ 7125271, 7125329] - | 2 * 0
[4] chr9 [ 94485006, 94485080] + | 1 * 1
[5] chr9 [ 94485115, 94485189] - | 2 * 1
[6] chrX [133680161, 133680235] + | 1 * 2
-------
seqinfo: 13 sequences from an unspecified genome; no seqlengths
>
> #Estimate distributions, get path counts
> distrs <- getDistrs(hg19DB,bam=bam0,readLength=75)
> pc <- pathCounts(pbam0, DB=hg19DB)
>
> #Get estimates
> eset <- calcExp(distrs=distrs, genomeDB=hg19DB, pc=pc, readLength=75, rpkm=FALSE)
> head(exprs(eset))
1
NM_005158 0.16787846
NM_001168236 0.18742670
NM_001136000 0.08277252
NM_001168239 0.28327384
NM_001136001 0.04592768
NM_007314 0.09612429
> head(fData(eset))
transcript gene_id island_id explCnts
NM_005158 NM_005158 27 463 109
NM_001168236 NM_001168236 27 463 109
NM_001136000 NM_001136000 27 463 109
NM_001168239 NM_001168239 27 463 109
NM_001136001 NM_001136001 27 463 109
NM_007314 NM_007314 27 463 109
>
> #Re-normalize relative expression to add up to 1 within gene_id rather
> # than island_id
> eset <- relexprByGene(eset)
>
> #Add fake sample by permuting and combine
> eset2 <- eset[sample(1:nrow(eset),replace=FALSE),]
> sampleNames(eset2) <- '2' #must have a different name
> esetall <- mergeExp(eset,eset2)
>
> #After merge samples are correctly matched
> head(exprs(esetall))
Sample1 Sample2
NM_005158 0.16787846 0.16787846
NM_001168236 0.18742670 0.18742670
NM_001136000 0.08277252 0.08277252
NM_001168239 0.28327384 0.28327384
NM_001136001 0.04592768 0.04592768
NM_007314 0.09612429 0.09612429
> head(fData(esetall))
transcript gene_id island_id readCount
NM_005158 NM_005158 27 463 218
NM_001168236 NM_001168236 27 463 218
NM_001136000 NM_001136000 27 463 218
NM_001168239 NM_001168239 27 463 218
NM_001136001 NM_001136001 27 463 218
NM_007314 NM_007314 27 463 218
>
>
>
>
>
> dev.off()
null device
1
>