Last data update: 2014.03.03

R: Summarize peak distribution over exon, intron, enhancer,...
assignChromosomeRegionR Documentation

Summarize peak distribution over exon, intron, enhancer, proximal promoter, 5 prime UTR and 3 prime UTR

Description

Summarize peak distribution over exon, intron, enhancer, proximal promoter, 5 prime UTR and 3 prime UTR

Usage

assignChromosomeRegion(peaks.RD, exon, TSS, utr5, utr3, 
         proximal.promoter.cutoff=1000L, immediate.downstream.cutoff=1000L, 
         nucleotideLevel=FALSE, precedence=NULL, TxDb=NULL)

Arguments

peaks.RD

peaks in GRanges: See example below

exon

exon data obtained from getAnnotation or customized annotation of class GRanges containing additional variable: strand (1 or + for plus strand and -1 or - for minus strand). This parameter is for backward compatibility only. TxDb should be used instead.

TSS

TSS data obtained from getAnnotation or customized annotation of class GRanges containing additional variable: strand (1 or + for plus strand and -1 or - for minus strand). For example, data(TSS.human.NCBI36),data(TSS.mouse.NCBIM37), data(TSS.rat.RGSC3.4) and data(TSS.zebrafish.Zv8). This parameter is for backward compatibility only. TxDb should be used instead.

utr5

5 prime UTR data obtained from getAnnotation or customized annotation of class GRanges containing additional variable: strand (1 or + for plus strand and -1 or - for minus strand). This parameter is for backward compatibility only. TxDb should be used instead.

utr3

3 prime UTR data obtained from getAnnotation or customized annotation of class GRanges containing additional variable: strand (1 or + for plus strand and -1 or - for minus strand). This parameter is for backward compatibility only. TxDb should be used instead.

proximal.promoter.cutoff

Specify the cutoff in bases to classify proximal promoter or enhencer. Peaks that reside within proximal.promoter.cutoff upstream from or overlap with transcription start site are classified as proximal promoters. Peaks that reside upstream of the proximal.promoter.cutoff from gene start are classified as enhancers. The default is 1000 bases.

immediate.downstream.cutoff

Specify the cutoff in bases to classify immediate downstream region or enhancer region. Peaks that reside within immediate.downstream.cutoff downstream of gene end but not overlap 3 prime UTR are classified as immediate downstream. Peaks that reside downstream over immediate.downstreatm.cutoff from gene end are classified as enhancers. The default is 1000 bases.

nucleotideLevel

Logical. Choose between peak centric and nucleotide centric view. Default=FALSE

precedence

If no precedence specified, double count will be enabled, which means that if a peak overlap with both promoter and 5'UTR, both promoter and 5'UTR will be incremented. If a precedence order is specified, for example, if promoter is specified before 5'UTR, then only promoter will be incremented for the same example. The values could be any conbinations of "Promoters", "immediateDownstream", "fiveUTRs", "threeUTRs", "Exons" and "Introns", Default=NULL

TxDb

an object of TxDb

Value

A list of two named vectors: percentage and jacard (Jacard Index). The information in the vectors:

Exons

Percent or the picard index of the peaks resided in exon regions.

Introns

Percent or the picard index of the peaks resided in intron regions.

fiveUTRs

Percent or the picard index of the peaks resided in 5 prime UTR regions.

threeUTRs

Percent or the picard index of the peaks resided in 3 prime UTR regions.

Promoter

Percent or the picard index of the peaks resided in proximal promoter regions.

ImmediateDownstream

Percent or the picard index of the peaks resided in immediate downstream regions.

Enhancer.Silencer

Percent or the picard index of the peaks resided in enhancer/silencer regions.

Author(s)

Jianhong Ou, Lihua Julie Zhu

References

1. Zhu L.J. et al. (2010) ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 2010, 11:237doi:10.1186/1471-2105-11-237

2. Zhu L.J. (2013) Integrative analysis of ChIP-chip and ChIP-seq dataset. Methods Mol Biol. 2013;1067:105-24. doi: 10.1007/978-1-62703-607-8_8.

See Also

annotatePeakInBatch, findOverlapsOfPeaks,getEnriched, makeVennDiagram,addGeneIDs, peaksNearBDP,summarizePatternInPeaks

Examples

if (interactive()){
    ##Display the list of genomes available at UCSC:
    #library(rtracklayer)
    #ucscGenomes()[, "db"]
    ## Display the list of Tracks supported by makeTxDbFromUCSC()
    #supportedUCSCtables()
    ##Retrieving a full transcript dataset for Human from UCSC
    ##TranscriptDb <- 
    ##     makeTxDbFromUCSC(genome="hg19", tablename="ensGene")
    if(require(TxDb.Hsapiens.UCSC.hg19.knownGene)){
        TxDb <- TxDb.Hsapiens.UCSC.hg19.knownGene
        exons <- exons(TxDb, columns=NULL)
        fiveUTRs <- unique(unlist(fiveUTRsByTranscript(TxDb)))
        Feature.distribution <- 
            assignChromosomeRegion(exons, nucleotideLevel=TRUE, TxDb=TxDb)
        barplot(Feature.distribution$percentage)
        assignChromosomeRegion(fiveUTRs, nucleotideLevel=FALSE, TxDb=TxDb)
        data(myPeakList)
        assignChromosomeRegion(myPeakList, nucleotideLevel=TRUE, 
                               precedence=c("Promoters", "immediateDownstream", 
                                            "fiveUTRs", "threeUTRs", 
                                            "Exons", "Introns"), 
                               TxDb=TxDb)
    }
}

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ChIPpeakAnno)
Loading required package: grid
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: Biostrings
Loading required package: XVector
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: VennDiagram
Loading required package: futile.logger

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ChIPpeakAnno/assignChromosomeRegion.Rd_%03d_medium.png", width=480, height=480)
> ### Name: assignChromosomeRegion
> ### Title: Summarize peak distribution over exon, intron, enhancer,
> ###   proximal promoter, 5 prime UTR and 3 prime UTR
> ### Aliases: assignChromosomeRegion
> ### Keywords: misc
> 
> ### ** Examples
> 
> #if (interactive()){
>     ##Display the list of genomes available at UCSC:
>     #library(rtracklayer)
>     #ucscGenomes()[, "db"]
>     ## Display the list of Tracks supported by makeTxDbFromUCSC()
>     #supportedUCSCtables()
>     ##Retrieving a full transcript dataset for Human from UCSC
>     ##TranscriptDb <- 
>     ##     makeTxDbFromUCSC(genome="hg19", tablename="ensGene")
>     if(require(TxDb.Hsapiens.UCSC.hg19.knownGene)){
+         TxDb <- TxDb.Hsapiens.UCSC.hg19.knownGene
+         exons <- exons(TxDb, columns=NULL)
+         fiveUTRs <- unique(unlist(fiveUTRsByTranscript(TxDb)))
+         Feature.distribution <- 
+             assignChromosomeRegion(exons, nucleotideLevel=TRUE, TxDb=TxDb)
+         barplot(Feature.distribution$percentage)
+         assignChromosomeRegion(fiveUTRs, nucleotideLevel=FALSE, TxDb=TxDb)
+         data(myPeakList)
+         assignChromosomeRegion(myPeakList, nucleotideLevel=TRUE, 
+                                precedence=c("Promoters", "immediateDownstream", 
+                                             "fiveUTRs", "threeUTRs", 
+                                             "Exons", "Introns"), 
+                                TxDb=TxDb)
+     }
Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

$percentage
          Promoters immediateDownstream            fiveUTRs           threeUTRs 
         0.12420582          0.12556664          0.03018612          0.20546008 
              Exons             Introns           microRNAs               tRNAs 
         0.37187685         98.48957484          0.00000000          0.00000000 
  Intergenic.Region 
         0.43333172 

$jaccard
          Promoters immediateDownstream            fiveUTRs           threeUTRs 
        0.002950441         0.003303413         0.001391700         0.003238811 
              Exons             Introns           microRNAs               tRNAs 
        0.001881843         0.018346983         0.000000000         0.000000000 
  Intergenic.Region 
        0.584224691 

Warning messages:
1: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 2386 out-of-bound ranges located on sequences
  chr12, chr16, chr17, chr18, chr19, chr20, chr21, and chr22. Note that
  only ranges located on a non-circular sequence whose length is not NA
  can be considered out-of-bound (use seqlengths() and isCircular() to
  get the lengths and circularity flags of the underlying sequences). You
  can use trim() to trim these ranges. See ?`trim,GenomicRanges-method`
  for more information.
2: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 2386 out-of-bound ranges located on sequences
  chr12, chr16, chr17, chr18, chr19, chr20, chr21, and chr22. Note that
  only ranges located on a non-circular sequence whose length is not NA
  can be considered out-of-bound (use seqlengths() and isCircular() to
  get the lengths and circularity flags of the underlying sequences). You
  can use trim() to trim these ranges. See ?`trim,GenomicRanges-method`
  for more information.
> #}
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>