Last data update: 2014.03.03

R: Estimate expression of a known set of gene splicing variants.
calcExpR Documentation

Estimate expression of a known set of gene splicing variants.

Description

Estimate expression of gene splicing variants, assuming that the set of variants is known. When rpkm is set to TRUE, fragments per kilobase per million are returned. Otherwise relative expression estimates are returned.

Usage

calcExp(distrs, genomeDB, pc, readLength, islandid, rpkm=TRUE, priorq=2,
priorqGeneExpr=2, citype="none", niter=10^3, burnin=100, mc.cores=1, verbose=FALSE)

Arguments

distrs

List of fragment distributions as generated by the getDistrs function

genomeDB

knownGenome object containing annotated genome, as returned by the procGenome function.

pc

Named vector of exon path counts as returned by pathCounts

readLength

Read length in bp, e.g. in a paired-end experiment where 75bp are sequenced on each end one would set readLength=75.

islandid

Name of the gene island to be analyzed. If not specified, all gene islands are analyzed.

rpkm

Set to FALSE to return relative expression levels, i.e. the proportion of reads generated from each variant per gene. These proportions add up to 1 for each gene. Set to TRUE to return fragments per kilobase per million (RPKM).

priorq

Parameter of the prior distribution on the proportion of reads coming from each variant. The prior is Dirichlet with prior sample size for each variant equal to priorq. We recommend priorq=2 for estimation, as it pools the estimated expression away from 0 and 1 and returned lower estimation errors than priorq=1 in our simulated experiments.

priorqGeneExpr

Parameter for prior distribution on overall gene expression. Defaults to 2, which ensures non-zero estimates for all genes

citype

Set to "none" to return no credibility intervals. Set to "asymp" to return approximate 95% CIs (obtained via the delta method). Set to "exact" to obtain exact CIs via Monte Carlo simulation. Options "asymp" and especially "exact" can increase the computation time substantially.

niter

Number of Monte Carlo iterations. Only used when citype=="exact".

burnin

Number of burnin Monte Carlo iterations. Only used when citype=="exact".

mc.cores

Number of processors to be used for parallel computation. Can only be used if the package multicore is available for your system.

verbose

Set to TRUE to display progress information.

Value

Expression set with expression estimates. featureNames identify each transcript via RefSeq ids, and the featureData contains further information. If citype was set to a value other than "none", the featureData also contains the 95% credibility intervals (i.e. intervals that contain the true parameter value with 95% posterior probability).

Author(s)

Camille Stephan-Otto Attolini, Manuel Kroiss, David Rossell

References

Rossell D, Stephan-Otto Attolini C, Kroiss M, Stocker A. Quantifying Alternative Splicing from Paired-End RNA-sequencing data. Annals of Applied Statistics, 8(1):309-330.

See Also

relexprByGene

Examples

data(K562.r1l1)
data(hg19DB)

#Pre-process
bam0 <- rmShortInserts(K562.r1l1, isizeMin=100)
pbam0 <- procBam(bam0)
head(getReads(pbam0))

#Estimate distributions, get path counts
distrs <- getDistrs(hg19DB,bam=bam0,readLength=75)
pc <- pathCounts(pbam0, DB=hg19DB)

#Get estimates
eset <- calcExp(distrs=distrs, genomeDB=hg19DB, pc=pc, readLength=75, rpkm=FALSE)
head(exprs(eset))
head(fData(eset))

#Re-normalize relative expression to add up to 1 within gene_id rather
# than island_id
eset <- relexprByGene(eset)

#Add fake sample by permuting and combine
eset2 <- eset[sample(1:nrow(eset),replace=FALSE),]
sampleNames(eset2) <- '2' #must have a different name
esetall <- mergeExp(eset,eset2)

#After merge samples are correctly matched
head(exprs(esetall))
head(fData(esetall))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(casper)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/casper/calcExp.Rd_%03d_medium.png", width=480, height=480)
> ### Name: calcExp
> ### Title: Estimate expression of a known set of gene splicing variants.
> ### Aliases: calcExp
> ### Keywords: models
> 
> ### ** Examples
> 
> data(K562.r1l1)
> data(hg19DB)
> 
> #Pre-process
> bam0 <- rmShortInserts(K562.r1l1, isizeMin=100)
> pbam0 <- procBam(bam0)
> head(getReads(pbam0))
GRanges object with 6 ranges and 3 metadata columns:
      seqnames                 ranges strand |       rid    XS     names
         <Rle>              <IRanges>  <Rle> | <integer> <Rle> <integer>
  [1]    chr17 [  7124912,   7124986]      + |         1     *         0
  [2]    chr17 [  7124986,   7125001]      - |         2     *         0
  [3]    chr17 [  7125271,   7125329]      - |         2     *         0
  [4]     chr9 [ 94485006,  94485080]      + |         1     *         1
  [5]     chr9 [ 94485115,  94485189]      - |         2     *         1
  [6]     chrX [133680161, 133680235]      + |         1     *         2
  -------
  seqinfo: 13 sequences from an unspecified genome; no seqlengths
> 
> #Estimate distributions, get path counts
> distrs <- getDistrs(hg19DB,bam=bam0,readLength=75)
> pc <- pathCounts(pbam0, DB=hg19DB)
> 
> #Get estimates
> eset <- calcExp(distrs=distrs, genomeDB=hg19DB, pc=pc, readLength=75, rpkm=FALSE)
> head(exprs(eset))
                      1
NM_005158    0.16787846
NM_001168236 0.18742670
NM_001136000 0.08277252
NM_001168239 0.28327384
NM_001136001 0.04592768
NM_007314    0.09612429
> head(fData(eset))
               transcript gene_id island_id explCnts
NM_005158       NM_005158      27       463      109
NM_001168236 NM_001168236      27       463      109
NM_001136000 NM_001136000      27       463      109
NM_001168239 NM_001168239      27       463      109
NM_001136001 NM_001136001      27       463      109
NM_007314       NM_007314      27       463      109
> 
> #Re-normalize relative expression to add up to 1 within gene_id rather
> # than island_id
> eset <- relexprByGene(eset)
> 
> #Add fake sample by permuting and combine
> eset2 <- eset[sample(1:nrow(eset),replace=FALSE),]
> sampleNames(eset2) <- '2' #must have a different name
> esetall <- mergeExp(eset,eset2)
> 
> #After merge samples are correctly matched
> head(exprs(esetall))
                Sample1    Sample2
NM_005158    0.16787846 0.16787846
NM_001168236 0.18742670 0.18742670
NM_001136000 0.08277252 0.08277252
NM_001168239 0.28327384 0.28327384
NM_001136001 0.04592768 0.04592768
NM_007314    0.09612429 0.09612429
> head(fData(esetall))
               transcript gene_id island_id readCount
NM_005158       NM_005158      27       463       218
NM_001168236 NM_001168236      27       463       218
NM_001136000 NM_001136000      27       463       218
NM_001168239 NM_001168239      27       463       218
NM_001136001 NM_001136001      27       463       218
NM_007314       NM_007314      27       463       218
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>