Last data update: 2014.03.03

R: Apply an offset on the read start along the transcript and...
countShiftReadsR Documentation

Apply an offset on the read start along the transcript and returns the coverage on the 5pUTR, CDS, 3pUTR, as well as a matrix of codon coverage per ORF.

Description

Apply an offset on the read start along the transcript and returns the coverage on the 5pUTR, CDS, 3pUTR, as well as a matrix of codon coverage per ORF.

Usage

countShiftReads(exonGRanges, cdsPosTransc, alnGRanges, shiftValue, motifSize)

Arguments

exonGRanges

a GRangesList. It contains the exon coordinates grouped by transcript.

cdsPosTransc

a list. It contains the relative positions of the start and end of the ORFs. The transcript names in exonGRanges and cdsPosTransc should be the same.

alnGRanges

A GRanges object containing the alignment information. In order to improve the performance the GAlignments BAM object should be transformed into a GRanges object with cigar match size metadata.

shiftValue

integer. The offset for recalibrating reads on transcripts when computing coverage. The default value for this parameter is 0, no offset should be performed.

motifSize

an integer. The number of nucleotides in each motif on which to compute coverage and usage. Default 3 nucleotides (codon).

Value

a list with 2 objects. The first object in the list is a data.frame containing: information on ORFs (names, chromosomal position, length) as well as the counts on the 5pUTR, CDS and 3pUTR once the offset is applied. The second object in the list is a list in itself. It contains for each ORF in the cdsPosTransc, for each codon the sum of read starts covering the 3 codon nucleotides. This per codon coverage does not contain information on the codon type, just its position in the ORF and its coverage.

Examples

#read the BAM file into a GAlignments object using
#GenomicAlignments::readGAlignments
#the GAlignments object should be similar to ctrlGAlignments
data(ctrlGAlignments)
aln <- ctrlGAlignments

#transform the GAlignments object into a GRanges object (faster processing)
alnGRanges <- readsToStartOrEnd(aln, what="start")

#make a txdb object containing the annotations for the specified species.
#In this case hg19.
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene
#Please make sure that seqnames of txdb correspond to
#the seqnames of the alignment files ("chr" particle)
#if not rename the txdb seqlevels
#renameSeqlevels(txdb, sub("chr", "", seqlevels(txdb)))

#get all CDSs by transcript
cds <- GenomicFeatures::cdsBy(txdb, by="tx", use.names=TRUE)

#get all exons by transcript
exonGRanges <- GenomicFeatures::exonsBy(txdb, by="tx", use.names=TRUE)
#get the per transcript relative position of start and end codons
cdsPosTransc <- orfRelativePos(cds, exonGRanges)
#compute the counts on the different features after applying
#the specified shift value on the read start along the transcript
countsData <- countShiftReads(exonGRanges[names(cdsPosTransc)], cdsPosTransc,
           alnGRanges, -14)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(RiboProfiling)
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Warning messages:
1: replacing previous import 'BiocGenerics::Position' by 'ggplot2::Position' when loading 'RiboProfiling' 
2: replacing previous import 'ggplot2::Position' by 'BiocGenerics::Position' when loading 'ggbio' 
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/RiboProfiling/countShiftReads.Rd_%03d_medium.png", width=480, height=480)
> ### Name: countShiftReads
> ### Title: Apply an offset on the read start along the transcript and
> ###   returns the coverage on the 5pUTR, CDS, 3pUTR, as well as a matrix of
> ###   codon coverage per ORF.
> ### Aliases: countShiftReads
> 
> ### ** Examples
> 
> #read the BAM file into a GAlignments object using
> #GenomicAlignments::readGAlignments
> #the GAlignments object should be similar to ctrlGAlignments
> data(ctrlGAlignments)
> aln <- ctrlGAlignments
> 
> #transform the GAlignments object into a GRanges object (faster processing)
> alnGRanges <- readsToStartOrEnd(aln, what="start")
> 
> #make a txdb object containing the annotations for the specified species.
> #In this case hg19.
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene
> #Please make sure that seqnames of txdb correspond to
> #the seqnames of the alignment files ("chr" particle)
> #if not rename the txdb seqlevels
> #renameSeqlevels(txdb, sub("chr", "", seqlevels(txdb)))
> 
> #get all CDSs by transcript
> cds <- GenomicFeatures::cdsBy(txdb, by="tx", use.names=TRUE)
> 
> #get all exons by transcript
> exonGRanges <- GenomicFeatures::exonsBy(txdb, by="tx", use.names=TRUE)
> #get the per transcript relative position of start and end codons
> cdsPosTransc <- orfRelativePos(cds, exonGRanges)
> #compute the counts on the different features after applying
> #the specified shift value on the read start along the transcript
> countsData <- countShiftReads(exonGRanges[names(cdsPosTransc)], cdsPosTransc,
+            alnGRanges, -14)
Warning message:
In countShiftReads(exonGRanges[names(cdsPosTransc)], cdsPosTransc,  :
  Param motifSize should be an integer! Accepted values 3, 6 or 9. Default value is 3.

> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>