Last data update: 2014.03.03

R: Compute RPKM based on gene annotations
computeRPKMR Documentation

Compute RPKM based on gene annotations

Description

Given a list of single-end or paired-end read alignment files in BAM/SAM/BED format, compute the read counts and normalized read counts as expression of annotated transcript in the unit of "reads per kilobase of exon per million mapped reads" (RPKM).

Usage

computeRPKM(bamFiles, RIPSeekerRead = TRUE, paired = FALSE, 
	countMode = "IntersectionNotEmpty", featureGRanges, 
	idType = "ensembl_transcript_id", featureType = "exon", 
	ignore.strand = FALSE, txDbName = "biomart", 
	moreGeneInfo = FALSE, saveData, justRPKM = TRUE, ...)

Arguments

bamFiles

A list of one or more BAM/SAM/BED alignment files.

RIPSeekerRead

Binary flag. If TRUE, then import and process the alignment files using the built-in function combineAlignGals from RIPSeeker package; if FALSE, then import the files by directly calling the required functions. The flag makes using the function outside of RIPSeeker package become possible.

paired

Binary to indicate whether the alignments files are paired-end. The alignments file must be either paired-end or single-end but not both.

countMode

An argument used to set the mode argument in the underlying function summarizeOverlaps employed to compute the read counts for each feature. The possible mode includes "Union", "IntersectionStrict", and "IntersectionNotEmpty". All three modes avoid double counting the reads by either discarding reads that completely fall into multiple features or counting the read only once for the feature that uniquely and completely includes it. Please refer to summarizeOverlaps for details.

featureGRanges

GRanges of features as an optional argument for function to compute RPKM/FPKM just for those features without retrieving online annotations.

idType

A character string that specifies the type of the annotations, which can "ensembl_transcript_id", "ensembl_gene_id", "ucsc", etc. Refer to listFilters for more information.

featureType

Features that will be groupped by genes/transcripts in a GRangesList. The available options are "exon" (Default), "intron", "fiveUTR", "threeUTR", and "CDS" corresponding to the functions exonsBy, cdsBy, intronsByTranscript, fiveUTRsByTranscript, threeUTRsByTranscript, and cdsBy, respectively.

ignore.strand

Whether to ignore strand when counting the reads (Default: FALSE).

txDbName

Name of the transcript database to use to retreive the annotation. The available options are "biomart" (Default) or "UCSC" corresponding to the functions makeTxDbFromBiomart and makeTxDbFromUCSC, respectively.

moreGeneInfo

Binary indicator to indicate whether to download more information for each genes/transcripts rather than having only the gene/transcript IDs (Default: FALSE).

saveData

Path of output file.

justRPKM

Binary for whether to return only the RangedSummarizedExperiment.

...

Extra arguments passed to functions makeTxDbFromBiomart, makeTxDbFromUCSC, useMart, combineAlignGals.

Details

The function is a wrapper function making use of several external functions from several well maintained and freely available Bioconductor packages including GenomicFeatures, GenomicRanges, biomaRt and Rsamtools packages. The paired-end alignments are converted into single-end using function galp2gal and then subject to read count computation by summarizeOverlaps, which does not yet directly support paired-end alignments.

Value

rpkmSEobject

A RangedSummarizedExperiment object with assays slot saved for counts, rowRanges holds the features, metadata for RPKM/FPKM (normalized) gene expression.

rpkmDF

Data frame with or without the detailed gene information columns depending on whether moreGeneInfo is TRUE or FALSE. rpkmDF is only returned within in a list when justRPKM is FALSE.

featureGRanges

The features in GRanges object that are used to compute the gene expression. featureGRanges is only returned within in a list when justRPKM is FALSE.

Note

Also works for RNA-seq alignments.

Author(s)

Yue Li

References

M. Carlson, H. Pages, P. Aboyoun, S. Falcon, M. Morgan, D. Sarkar and M. Lawrence. GenomicFeatures: Tools for making and manipulating transcript centric annotations. R package version 1.8.2.

P. Aboyoun, H. Pages and M. Lawrence (). GenomicRanges: Representation and manipulation of genomic intervals. R package version 1.8.9.

Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Steffen Durinck, Paul T. Spellman, Ewan Birney and Wolfgang Huber, Nature Protocols 4, 1184-1191 (2009).

BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Steffen Durinck, Yves Moreau, Arek Kasprzyk, Sean Davis, Bart De Moor, Alvis Brazma and Wolfgang Huber, Bioinformatics 21, 3439-3440 (2005).

Martin Morgan and Herv'e Pag'es (). Rsamtools: Binary alignment (BAM), variant call (BCF), or tabix file import. R package version 1.8.5. http://bioconductor.org/packages/release/bioc/html/Rsamtools.html

See Also

makeTxDbFromBiomart, makeTxDbFromUCSC, useMart, exonsBy, cdsBy, intronsByTranscript, fiveUTRsByTranscript, threeUTRsByTranscript, cdsBy, combineAlignGals, summarizeOverlaps, ScanBamParam, readGAlignmentPairs, readGAlignments

Examples

if(interactive()) { # need internet connection
	
# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

# use biomart
txDbName <- "biomart"
biomart <- "ENSEMBL_MART_ENSEMBL"		# use archive to get ensembl 65
dataset <- "mmusculus_gene_ensembl"		
host <- "dec2011.archive.ensembl.org" 	# use ensembl 65 for annotation

resultlist <- computeRPKM(bamFiles=grep(pattern="SRR039214", 
        bamFiles, value=TRUE, invert=TRUE), #featureGRanges=featureGRanges, 
				dataset=dataset, moreGeneInfo=TRUE, justRPKM=FALSE,
				idType="ensembl_transcript_id", txDbName=txDbName, 
				biomart=biomart, host=host, by="tx")
		
}

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(RIPSeeker)
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit


Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
Loading required package: GenomicAlignments
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/RIPSeeker/computeRPKM.Rd_%03d_medium.png", width=480, height=480)
> ### Name: computeRPKM
> ### Title: Compute RPKM based on gene annotations
> ### Aliases: computeRPKM
> 
> ### ** Examples
> 
> #if(interactive()) { # need internet connection
> 	
> # Retrieve system files
> extdata.dir <- system.file("extdata", package="RIPSeeker") 
> 
> bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)
> 
> bamFiles <- grep("PRC2", bamFiles, value=TRUE)
> 
> # use biomart
> txDbName <- "biomart"
> biomart <- "ENSEMBL_MART_ENSEMBL"		# use archive to get ensembl 65
> dataset <- "mmusculus_gene_ensembl"		
> host <- "dec2011.archive.ensembl.org" 	# use ensembl 65 for annotation
> 
> resultlist <- computeRPKM(bamFiles=grep(pattern="SRR039214", 
+         bamFiles, value=TRUE, invert=TRUE), #featureGRanges=featureGRanges, 
+ 				dataset=dataset, moreGeneInfo=TRUE, justRPKM=FALSE,
+ 				idType="ensembl_transcript_id", txDbName=txDbName, 
+ 				biomart=biomart, host=host, by="tx")
Loading required package: AnnotationDbi
Entity 'nbsp' not defined
attributes construct error
Couldn't find end of Start Tag img line 22
Entity 'hellip' not defined
Entity 'hellip' not defined
Entity 'nbsp' not defined
Entity 'raquo' not defined
attributes construct error
Couldn't find end of Start Tag img line 40
Entity 'hellip' not defined
Entity 'hellip' not defined
Entity 'hellip' not defined
Entity 'hellip' not defined
Entity 'hellip' not defined
Opening and ending tag mismatch: img line 64 and li
Opening and ending tag mismatch: li line 64 and ul
Opening and ending tag mismatch: ul line 63 and div
Entity 'copy' not defined
attributes construct error
Couldn't find end of Start Tag img line 225
attributes construct error
Couldn't find end of Start Tag img line 228
attributes construct error
Couldn't find end of Start Tag img line 231
attributes construct error
Couldn't find end of Start Tag img line 260
Opening and ending tag mismatch: div line 18 and body
Opening and ending tag mismatch: body line 17 and html
Premature end of data in tag html line 2
Error: 1: Entity 'nbsp' not defined
2: attributes construct error
3: Couldn't find end of Start Tag img line 22
4: Entity 'hellip' not defined
5: Entity 'hellip' not defined
6: Entity 'nbsp' not defined
7: Entity 'raquo' not defined
8: attributes construct error
9: Couldn't find end of Start Tag img line 40
10: Entity 'hellip' not defined
11: Entity 'hellip' not defined
12: Entity 'hellip' not defined
13: Entity 'hellip' not defined
14: Entity 'hellip' not defined
15: Opening and ending tag mismatch: img line 64 and li
16: Opening and ending tag mismatch: li line 64 and ul
17: Opening and ending tag mismatch: ul line 63 and div
18: Entity 'copy' not defined
19: attributes construct error
20: Couldn't find end of Start Tag img line 225
21: attributes construct error
22: Couldn't find end of Start Tag img line 228
23: attributes construct error
24: Couldn't find end of Start Tag img line 231
25: attributes construct error
26: Couldn't find end of Start Tag img line 260
27: Opening and e
Execution halted