R: Estimate expression for de novo splicing variants.
denovoExpr
R Documentation
Estimate expression for de novo splicing variants.
Description
Obtains expression estimates from denovoGenomeExpr objects, as
returned by calcDenovo.
When rpkm is set to TRUE, fragments per kilobase
per million are returned. Otherwise relative expression estimates are
returned.
The estimates can be obtained by Bayesian model averaging (default)
or by selecting the model with highest posterior probability. See details.
Named vector of exon path counts as returned by pathCounts
rpkm
Set to FALSE to return relative expression levels,
i.e. the proportion of reads generated from each variant per
gene. These proportions add up to 1 for each gene.
Set to TRUE to return fragments per kilobase per million
(RPKM).
summarize
Set to "modelAvg" to obtain model averaging
estimates, or to "bestModel" to select the model with highest
posterior probability. We recommend the former, as even the best model
may have low posterior probability.
minProbExpr
Variants with (marginal posterior) probability of being
expressed below minProbExpr are omitted from the results. This
argument is useful to eliminate variants that are not at least
moderately supported by the data.
minExpr
Variants with relative expression minExpr are
omitted from the results. This is useful to eliminate variants to
which few reads are assigned, e.g. due to read miss-alignments or biases.
Value
Expression set with expression estimates.
The featureData indicates the gene island id, posterior probability
that each variant is expressed (column "probExpressed") and the
number of aligned reads per gene island (column "explCnts").
Author(s)
David Rossell
References
Rossell D, Stephan-Otto Attolini C, Kroiss M, Stocker A. Quantifying Alternative Splicing from Paired-End
RNA-sequencing data. Annals of Applied Statistics, 8(1):309-330.
Examples
## NOTE: toy example with few reads & genes to illustrate code usage
## Results with complete data are much more interesting!
data(K562.r1l1)
data(hg19DB)
#Pre-process
bam0 <- rmShortInserts(K562.r1l1, isizeMin=100)
pbam0 <- procBam(bam0)
#Estimate distributions, get path counts
distrs <- getDistrs(hg19DB,bam=bam0,readLength=75)
pc <- pathCounts(pbam0, DB=hg19DB)
#Set prior distrib on model space
mprior <- modelPrior(hg19DB, maxExons=40, smooth=FALSE)
#Fit model
denovo <- calcDenovo(distrs,targetGenomeDB=hg19DB,pc=pc,readLength=75,priorq=3,mprior=mprior,minpp=0)
head(names(denovo))
denovo[['6499']]
posprob(denovo[['6499']])
#Get estimates
eset <- denovoExpr(denovo, pc=pc, rpkm=TRUE, minProbExpr=0.5)
head(exprs(eset))
head(fData(eset))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(casper)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: IRanges
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/casper/denovoExpr.Rd_%03d_medium.png", width=480, height=480)
> ### Name: denovoExpr
> ### Title: Estimate expression for de novo splicing variants.
> ### Aliases: denovoExpr
> ### Keywords: models htest
>
> ### ** Examples
>
>
> ## NOTE: toy example with few reads & genes to illustrate code usage
> ## Results with complete data are much more interesting!
>
> data(K562.r1l1)
> data(hg19DB)
>
> #Pre-process
> bam0 <- rmShortInserts(K562.r1l1, isizeMin=100)
> pbam0 <- procBam(bam0)
>
> #Estimate distributions, get path counts
> distrs <- getDistrs(hg19DB,bam=bam0,readLength=75)
> pc <- pathCounts(pbam0, DB=hg19DB)
>
> #Set prior distrib on model space
> mprior <- modelPrior(hg19DB, maxExons=40, smooth=FALSE)
Counting number of annotated transcripts per gene... Done.
Counting number of exons contained in each variant... Done.
Estimating parameters... Done.
>
> #Fit model
> denovo <- calcDenovo(distrs,targetGenomeDB=hg19DB,pc=pc,readLength=75,priorq=3,mprior=mprior,minpp=0)
Formatting input...
Performing model search (this may take a while)
...............
>
> head(names(denovo))
[1] "326" "463" "11211" "14256" "14325" "15370"
> denovo[['6499']]
denovoGeneExpr object
Posterior model probabilities
model posprob priorprob
2 1 1.000000e+00 0.002946124
1 0 1.383908e-21 0.138242898
...
Estimated expression (conditional on each model)
model expr varName
1 0 1.0000000 NM_001025091
2 1 0.3729019 NM_001090
3 1 0.6270981 NM_001025091
...
Use posprob() to access posterior probabilities; variants() to get exons in each variant
> posprob(denovo[['6499']])
model posprob priorprob
2 1 1.000000e+00 0.002946124
1 0 1.383908e-21 0.138242898
>
> #Get estimates
> eset <- denovoExpr(denovo, pc=pc, rpkm=TRUE, minProbExpr=0.5)
>
> head(exprs(eset))
1
NM_001136000 5.414092
NM_001136001 6.417195
NM_001168236 5.776101
NM_001168237 5.157635
NM_001168238 5.332304
NM_001168239 6.110314
> head(fData(eset))
island_id transcript explCnts probExpressed
NM_001136000 463 NM_001136000 109 1
NM_001136001 463 NM_001136001 109 1
NM_001168236 463 NM_001168236 109 1
NM_001168237 463 NM_001168237 109 1
NM_001168238 463 NM_001168238 109 1
NM_001168239 463 NM_001168239 109 1
>
>
>
>
>
>
> dev.off()
null device
1
>