Last data update: 2014.03.03

R: Estimate expression for de novo splicing variants.
denovoExprR Documentation

Estimate expression for de novo splicing variants.

Description

Obtains expression estimates from denovoGenomeExpr objects, as returned by calcDenovo. When rpkm is set to TRUE, fragments per kilobase per million are returned. Otherwise relative expression estimates are returned.

The estimates can be obtained by Bayesian model averaging (default) or by selecting the model with highest posterior probability. See details.

Usage

denovoExpr(x, pc, rpkm = TRUE, summarize = "modelAvg", minProbExpr = 0.5, minExpr = 0.05) 

Arguments

x

denovoGenomeExpr object returned by calcExp

pc

Named vector of exon path counts as returned by pathCounts

rpkm

Set to FALSE to return relative expression levels, i.e. the proportion of reads generated from each variant per gene. These proportions add up to 1 for each gene. Set to TRUE to return fragments per kilobase per million (RPKM).

summarize

Set to "modelAvg" to obtain model averaging estimates, or to "bestModel" to select the model with highest posterior probability. We recommend the former, as even the best model may have low posterior probability.

minProbExpr

Variants with (marginal posterior) probability of being expressed below minProbExpr are omitted from the results. This argument is useful to eliminate variants that are not at least moderately supported by the data.

minExpr

Variants with relative expression minExpr are omitted from the results. This is useful to eliminate variants to which few reads are assigned, e.g. due to read miss-alignments or biases.

Value

Expression set with expression estimates. The featureData indicates the gene island id, posterior probability that each variant is expressed (column "probExpressed") and the number of aligned reads per gene island (column "explCnts").

Author(s)

David Rossell

References

Rossell D, Stephan-Otto Attolini C, Kroiss M, Stocker A. Quantifying Alternative Splicing from Paired-End RNA-sequencing data. Annals of Applied Statistics, 8(1):309-330.

Examples


## NOTE: toy example with few reads & genes to illustrate code usage
##       Results with complete data are much more interesting!

data(K562.r1l1)
data(hg19DB)

#Pre-process
bam0 <- rmShortInserts(K562.r1l1, isizeMin=100)
pbam0 <- procBam(bam0)

#Estimate distributions, get path counts
distrs <- getDistrs(hg19DB,bam=bam0,readLength=75)
pc <- pathCounts(pbam0, DB=hg19DB)

#Set prior distrib on model space
mprior <- modelPrior(hg19DB, maxExons=40, smooth=FALSE)

#Fit model
denovo <- calcDenovo(distrs,targetGenomeDB=hg19DB,pc=pc,readLength=75,priorq=3,mprior=mprior,minpp=0)

head(names(denovo))
denovo[['6499']]
posprob(denovo[['6499']])

#Get estimates
eset <- denovoExpr(denovo, pc=pc, rpkm=TRUE, minProbExpr=0.5)

head(exprs(eset))
head(fData(eset))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(casper)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/casper/denovoExpr.Rd_%03d_medium.png", width=480, height=480)
> ### Name: denovoExpr
> ### Title: Estimate expression for de novo splicing variants.
> ### Aliases: denovoExpr
> ### Keywords: models htest
> 
> ### ** Examples
> 
> 
> ## NOTE: toy example with few reads & genes to illustrate code usage
> ##       Results with complete data are much more interesting!
> 
> data(K562.r1l1)
> data(hg19DB)
> 
> #Pre-process
> bam0 <- rmShortInserts(K562.r1l1, isizeMin=100)
> pbam0 <- procBam(bam0)
> 
> #Estimate distributions, get path counts
> distrs <- getDistrs(hg19DB,bam=bam0,readLength=75)
> pc <- pathCounts(pbam0, DB=hg19DB)
> 
> #Set prior distrib on model space
> mprior <- modelPrior(hg19DB, maxExons=40, smooth=FALSE)
Counting number of annotated transcripts per gene... Done.
Counting number of exons contained in each variant... Done.
Estimating parameters... Done.
> 
> #Fit model
> denovo <- calcDenovo(distrs,targetGenomeDB=hg19DB,pc=pc,readLength=75,priorq=3,mprior=mprior,minpp=0)
Formatting input...
Performing model search (this may take a while)
...............
> 
> head(names(denovo))
[1] "326"   "463"   "11211" "14256" "14325" "15370"
> denovo[['6499']]
denovoGeneExpr object

Posterior model probabilities
  model      posprob   priorprob
2     1 1.000000e+00 0.002946124
1     0 1.383908e-21 0.138242898
...
Estimated expression (conditional on each model)
  model      expr      varName
1     0 1.0000000 NM_001025091
2     1 0.3729019    NM_001090
3     1 0.6270981 NM_001025091
...
Use posprob() to access posterior probabilities; variants() to get exons in each variant
> posprob(denovo[['6499']])
  model      posprob   priorprob
2     1 1.000000e+00 0.002946124
1     0 1.383908e-21 0.138242898
> 
> #Get estimates
> eset <- denovoExpr(denovo, pc=pc, rpkm=TRUE, minProbExpr=0.5)
> 
> head(exprs(eset))
                    1
NM_001136000 5.414092
NM_001136001 6.417195
NM_001168236 5.776101
NM_001168237 5.157635
NM_001168238 5.332304
NM_001168239 6.110314
> head(fData(eset))
             island_id   transcript explCnts probExpressed
NM_001136000       463 NM_001136000      109             1
NM_001136001       463 NM_001136001      109             1
NM_001168236       463 NM_001168236      109             1
NM_001168237       463 NM_001168237      109             1
NM_001168238       463 NM_001168238      109             1
NM_001168239       463 NM_001168239      109             1
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>