Last data update: 2014.03.03

R: Obtain the distance to the nearest TSS, miRNA, and/or exon...
annotatePeakInBatchR Documentation

Obtain the distance to the nearest TSS, miRNA, and/or exon for a list of peaks

Description

Obtain the distance to the nearest TSS, miRNA, exon et al for a list of peak locations leveraging IRanges and biomaRt package

Usage

annotatePeakInBatch(myPeakList, mart, featureType = c("TSS", "miRNA","Exon"), 
AnnotationData, output=c("nearestLocation", "overlapping", "both", 
                         "shortestDistance", "inside",
                         "upstream&inside", "inside&downstream",
                         "upstream", "downstream", 
                         "upstreamORdownstream",
                         "nearestBiDirectionalPromoters"),
multiple=c(TRUE,FALSE), 
maxgap=0L, PeakLocForDistance=c("start", "middle", "end"), 
FeatureLocForDistance=c("TSS", "middle","start", "end","geneEnd"), 
select=c("all", "first","last","arbitrary"),
ignore.strand=TRUE, bindingRegion=NULL, ...)

Arguments

myPeakList

A GRanges object

mart

A mart object, used if AnnotationData is not supplied, see useMart of bioMaRt package for details

featureType

A charcter vector used with mart argument if AnnotationData is not supplied; it's value is "TSS"", "miRNA"" or "Exon"

AnnotationData

A GRanges or annoGR oject. It can be obtained from function getAnnotation or customized annotation of class GRanges containing additional variable: strand (1 or + for plus strand and -1 or - for minus strand). Pre-compliled annotations, such as TSS.human.NCBI36, TSS.mouse.NCBIM37, TSS.rat.RGSC3.4 and TSS.zebrafish.Zv8, are provided by this package (attach them with data() function). Another method to provide annotation data is to obtain through biomaRt real time by using the parameters of mart and featureType

output
nearestLocation (default)

will output the nearest features calculated as PeakLoc - FeatureLocForDistance

overlapping

will output overlapping features with maximum gap specified as maxgap between peak range and feature range

shortestDistance

will output nearest features

both

will output all the nearest features, in addition, will output any features that overlap the peak that is not the nearest features

upstream&inside

will output all upstream and overlapping features with maximum gap

inside&downstream

will output all downstream and overlapping features with maximum gap

upstream

will output all upstream features with maximum gap.

downstream

will output all downstream features with maximum gap.

upstreamORdownstream

will output all upstream features with maximum gap or downstream with maximum gap

nearestBiDirectionalPromoters

will use annoPeaks to annotate peaks. Nearest promoters from both direction of the peaks (strand is considered). It will report bidirectional promoters if there are promoters in both directions in the given region (defined by bindingRegion). Otherwise, it will report the closest promoter in one direction.

multiple

Not applicable when output is nearest. TRUE: output multiple overlapping features for each peak. FALSE: output at most one overlapping feature for each peak. This parameter is kept for backward compatibility, please use select.

maxgap

Non-negative integer. Intervals with a separation of maxgap or less are considered to be overlapping

PeakLocForDistance

Specify the location of peak for calculating distance,i.e., middle means using middle of the peak to calculate distance to feature, start means using start of the peak to calculate the distance to feature. To be compatible with previous version, by default using start

FeatureLocForDistance

Specify the location of feature for calculating distance,i.e., middle means using middle of the feature to calculate distance of peak to feature, start means using start of the feature to calculate the distance to feature, TSS means using start of feature when feature is on plus strand and using end of feature when feature is on minus strand, geneEnd means using end of feature when feature is on plus strand and using start of feature when feature is on minus strand. To be compatible with previous version, by default using TSS

select

"all" may return multiple overlapping peaks, "first" will return the first overlapping peak, "last" will return the last overlapping peak and "arbitrary" will return one of the overlapping peaks.

ignore.strand

When set to TRUE, the strand information is ignored in the annotation.

bindingRegion

Annotation range used for annoPeaks, which is a vector with two integer values, default to c (-5000, 5000). The first one must be no bigger than 0. And the sec ond one must be no less than 1. Once bindingRegion is defined, annotation will based on annoPeaks. Here is how to use it together with the parameter output and FeatureLocForDistance.

  • To obtain peaks with nearest bi-directional promoters within 5kb upstream and 3kb downstream of TSS, set output = "nearestBiDirectionalPromoters" and bindingRegion = c(-5000, 3000)

  • To obtain peaks within 5kb upstream and up to 3kb downstream of TSS within the gene body, set output="overlapping", FeatureLocForDistance="TSS" and bindingRegion = c(-5000, 3000)

  • To obtain peaks up to 5kb upstream within the gene body and 3kb downstream of gene/Exon End, set output="overlapping", FeatureLocForDistance="geneEnd" and bindingRegion = c(-5000, 3000)

For details, see annoPeaks.

...

Parameters could be passed to annoPeaks

Value

An object of GRanges with slot start holding the start position of the peak, slot end holding the end position of the peak, slot space holding the chromosome location where the peak is located, slot rownames holding the id of the peak. In addition, the following variables are included.

feature

id of the feature such as ensembl gene ID

insideFeature

upstream: peak resides upstream of the feature; downstream: peak resides downstream of the feature; inside: peak resides inside the feature; overlapStart: peak overlaps with the start of the feature; overlapEnd: peak overlaps with the end of the feature; includeFeature: peak include the feature entirely

distancetoFeature

distance to the nearest feature such as transcription start site. By default, the distance is calculated as the distance between the start of the binding site and the TSS that is the gene start for genes located on the forward strand and the gene end for genes located on the reverse strand. The user can specify the location of peak and location of feature for calculating this

start_position

start position of the feature such as gene

end_position

end position of the feature such as the gene

strand

1 or + for positive strand and -1 or - for negative strand where the feature is located

shortestDistance

The shortest distance from either end of peak to either end the feature.

fromOverlappingOrNearest

nearest: indicates this feature's start (feature's end for features at minus strand) is closest to the peak start; Overlapping: indicates this feature overlaps with this peak although it is not the nearest feature start

Author(s)

Lihua Julie Zhu, Jianhong Ou

References

1. Zhu L.J. et al. (2010) ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 2010, 11:237doi:10.1186/1471-2105-11-237

2. Zhu L (2013). "Integrative analysis of ChIP-chip and ChIP-seq dataset." In Lee T and Luk ACS (eds.), Tilling Arrays, volume 1067, chapter 4, pp. -19. Humana Press. http://dx.doi.org/10.1007/978-1-62703-607-8_8

See Also

getAnnotation, findOverlappingPeaks, makeVennDiagram, addGeneIDs, peaksNearBDP, summarizePatternInPeaks, annoGR, annoPeaks

Examples


#if (interactive()){
    ## example 1: annotate myPeakList by TxDb or EnsDb.
    data(myPeakList)
    library(EnsDb.Hsapiens.v75)
    annoData <- annoGR(EnsDb.Hsapiens.v75)
    annotatePeak = annotatePeakInBatch(myPeakList[1:6], AnnotationData=annoData)
    annotatePeak
    
    ## example 2: annotate myPeakList (GRanges) 
    ## with TSS.human.NCBI36 (Granges)
    data(TSS.human.NCBI36)
    annotatedPeak = annotatePeakInBatch(myPeakList[1:6], 
                                        AnnotationData=TSS.human.NCBI36)
    annotatedPeak
    
    ## example 3: you have a list of transcription factor biding sites from 
    ## literature and are interested in determining the extent of the overlap 
    ## to the list of peaks from your experiment. Prior calling the function 
    ## annotatePeakInBatch, need to represent both dataset as RangedData 
    ## where start is the start of the binding site, end is the end of the 
    ## binding site, names is the name of the binding site, space and strand 
    ## are the chromosome name and strand where the binding site is located.
    
    myexp <- GRanges(seqnames=c(6,6,6,6,5,4,4), 
                     IRanges(start=c(1543200,1557200,1563000,1569800,
                                     167889600,100,1000),
                             end=c(1555199,1560599,1565199,1573799,
                                   167893599,200,1200),
                             names=c("p1","p2","p3","p4","p5","p6", "p7")), 
                     strand="+")
    literature <- GRanges(seqnames=c(6,6,6,6,5,4,4), 
                          IRanges(start=c(1549800,1554400,1565000,1569400,
                                          167888600,120,800),
                                  end=c(1550599,1560799,1565399,1571199,
                                        167888999,140,1400),
                                  names=c("f1","f2","f3","f4","f5","f6","f7")),
                          strand=rep(c("+", "-"), c(5, 2)))
    annotatedPeak1 <- annotatePeakInBatch(myexp, 
                                          AnnotationData=literature)
    pie(table(annotatedPeak1$insideFeature))
    annotatedPeak1
    ### use toGRanges or rtracklayer::import to convert BED or GFF format
    ###  to GRanges before calling annotatePeakInBatch
    test.bed <- data.frame(space=c("4", "6"), 
                           start=c("100", "1000"),
                           end=c("200", "1100"), 
                           name=c("peak1", "peak2"))
    test.GR = toGRanges(test.bed)
    annotatePeakInBatch(test.GR, AnnotationData = literature)
#}

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ChIPpeakAnno)
Loading required package: grid
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: Biostrings
Loading required package: XVector
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: VennDiagram
Loading required package: futile.logger

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ChIPpeakAnno/annotatePeakInBatch.Rd_%03d_medium.png", width=480, height=480)
> ### Name: annotatePeakInBatch
> ### Title: Obtain the distance to the nearest TSS, miRNA, and/or exon for a
> ###   list of peaks
> ### Aliases: annotatePeakInBatch
> ### Keywords: misc
> 
> ### ** Examples
> 
> 
> #if (interactive()){
>     ## example 1: annotate myPeakList by TxDb or EnsDb.
>     data(myPeakList)
>     library(EnsDb.Hsapiens.v75)
Loading required package: ensembldb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

>     annoData <- annoGR(EnsDb.Hsapiens.v75)
>     annotatePeak = annotatePeakInBatch(myPeakList[1:6], AnnotationData=annoData)
>     annotatePeak
GRanges object with 6 ranges and 9 metadata columns:
                                seqnames             ranges strand |
                                   <Rle>          <IRanges>  <Rle> |
   X1_93_556427.ENSG00000223659     chr1 [ 556660,  556760]      * |
   X1_41_559455.ENSG00000223659     chr1 [ 559774,  559874]      * |
   X1_12_703729.ENSG00000240618     chr1 [ 703885,  703985]      * |
   X1_20_925025.ENSG00000272512     chr1 [ 926058,  926158]      * |
  X1_11_1041174.ENSG00000131591     chr1 [1041646, 1041746]      * |
  X1_14_1269014.ENSG00000169962     chr1 [1270239, 1270339]      * |
                                         peak         feature start_position
                                  <character>     <character>      <integer>
   X1_93_556427.ENSG00000223659  X1_93_556427 ENSG00000223659         562757
   X1_41_559455.ENSG00000223659  X1_41_559455 ENSG00000223659         562757
   X1_12_703729.ENSG00000240618  X1_12_703729 ENSG00000240618         694412
   X1_20_925025.ENSG00000272512  X1_20_925025 ENSG00000272512         931346
  X1_11_1041174.ENSG00000131591 X1_11_1041174 ENSG00000131591        1017198
  X1_14_1269014.ENSG00000169962 X1_14_1269014 ENSG00000169962        1266694
                                end_position feature_strand insideFeature
                                   <integer>    <character>      <factor>
   X1_93_556427.ENSG00000223659       564390              -    downstream
   X1_41_559455.ENSG00000223659       564390              -    downstream
   X1_12_703729.ENSG00000240618       700305              -      upstream
   X1_20_925025.ENSG00000272512       933431              -    downstream
  X1_11_1041174.ENSG00000131591      1051741              -        inside
  X1_14_1269014.ENSG00000169962      1270686              +        inside
                                distancetoFeature shortestDistance
                                        <numeric>        <integer>
   X1_93_556427.ENSG00000223659              7730             5997
   X1_41_559455.ENSG00000223659              4616             2883
   X1_12_703729.ENSG00000240618             -3580             3580
   X1_20_925025.ENSG00000272512              7373             5188
  X1_11_1041174.ENSG00000131591             10095             9995
  X1_14_1269014.ENSG00000169962              3545              347
                                fromOverlappingOrNearest
                                             <character>
   X1_93_556427.ENSG00000223659          NearestLocation
   X1_41_559455.ENSG00000223659          NearestLocation
   X1_12_703729.ENSG00000240618          NearestLocation
   X1_20_925025.ENSG00000272512          NearestLocation
  X1_11_1041174.ENSG00000131591          NearestLocation
  X1_14_1269014.ENSG00000169962          NearestLocation
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths
>     
>     ## example 2: annotate myPeakList (GRanges) 
>     ## with TSS.human.NCBI36 (Granges)
>     data(TSS.human.NCBI36)
>     annotatedPeak = annotatePeakInBatch(myPeakList[1:6], 
+                                         AnnotationData=TSS.human.NCBI36)
>     annotatedPeak
GRanges object with 6 ranges and 9 metadata columns:
                                seqnames             ranges strand |
                                   <Rle>          <IRanges>  <Rle> |
   X1_93_556427.ENSG00000212875     chr1 [ 556660,  556760]      * |
   X1_41_559455.ENSG00000212678     chr1 [ 559774,  559874]      * |
   X1_12_703729.ENSG00000197049     chr1 [ 703885,  703985]      * |
   X1_20_925025.ENSG00000188290     chr1 [ 926058,  926158]      * |
  X1_11_1041174.ENSG00000131591     chr1 [1041646, 1041746]      * |
  X1_14_1269014.ENSG00000107404     chr1 [1270239, 1270339]      * |
                                         peak         feature start_position
                                  <character>     <character>      <integer>
   X1_93_556427.ENSG00000212875  X1_93_556427 ENSG00000212875         556318
   X1_41_559455.ENSG00000212678  X1_41_559455 ENSG00000212678         559620
   X1_12_703729.ENSG00000197049  X1_12_703729 ENSG00000197049         711184
   X1_20_925025.ENSG00000188290  X1_20_925025 ENSG00000188290         924209
  X1_11_1041174.ENSG00000131591 X1_11_1041174 ENSG00000131591        1007062
  X1_14_1269014.ENSG00000107404 X1_14_1269014 ENSG00000107404        1260523
                                end_position feature_strand insideFeature
                                   <integer>    <character>      <factor>
   X1_93_556427.ENSG00000212875       557859              +        inside
   X1_41_559455.ENSG00000212678       560165              +        inside
   X1_12_703729.ENSG00000197049       712376              +      upstream
   X1_20_925025.ENSG00000188290       925333              -      upstream
  X1_11_1041174.ENSG00000131591      1041341              -      upstream
  X1_14_1269014.ENSG00000107404      1274623              -        inside
                                distancetoFeature shortestDistance
                                        <numeric>        <integer>
   X1_93_556427.ENSG00000212875               342              342
   X1_41_559455.ENSG00000212678               154              154
   X1_12_703729.ENSG00000197049             -7299             7199
   X1_20_925025.ENSG00000188290              -725              725
  X1_11_1041174.ENSG00000131591              -305              305
  X1_14_1269014.ENSG00000107404              4384             4284
                                fromOverlappingOrNearest
                                             <character>
   X1_93_556427.ENSG00000212875          NearestLocation
   X1_41_559455.ENSG00000212678          NearestLocation
   X1_12_703729.ENSG00000197049          NearestLocation
   X1_20_925025.ENSG00000188290          NearestLocation
  X1_11_1041174.ENSG00000131591          NearestLocation
  X1_14_1269014.ENSG00000107404          NearestLocation
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths
>     
>     ## example 3: you have a list of transcription factor biding sites from 
>     ## literature and are interested in determining the extent of the overlap 
>     ## to the list of peaks from your experiment. Prior calling the function 
>     ## annotatePeakInBatch, need to represent both dataset as RangedData 
>     ## where start is the start of the binding site, end is the end of the 
>     ## binding site, names is the name of the binding site, space and strand 
>     ## are the chromosome name and strand where the binding site is located.
>     
>     myexp <- GRanges(seqnames=c(6,6,6,6,5,4,4), 
+                      IRanges(start=c(1543200,1557200,1563000,1569800,
+                                      167889600,100,1000),
+                              end=c(1555199,1560599,1565199,1573799,
+                                    167893599,200,1200),
+                              names=c("p1","p2","p3","p4","p5","p6", "p7")), 
+                      strand="+")
>     literature <- GRanges(seqnames=c(6,6,6,6,5,4,4), 
+                           IRanges(start=c(1549800,1554400,1565000,1569400,
+                                           167888600,120,800),
+                                   end=c(1550599,1560799,1565399,1571199,
+                                         167888999,140,1400),
+                                   names=c("f1","f2","f3","f4","f5","f6","f7")),
+                           strand=rep(c("+", "-"), c(5, 2)))
>     annotatedPeak1 <- annotatePeakInBatch(myexp, 
+                                           AnnotationData=literature)
>     pie(table(annotatedPeak1$insideFeature))
>     annotatedPeak1
GRanges object with 7 ranges and 9 metadata columns:
        seqnames                 ranges strand |        peak     feature
           <Rle>              <IRanges>  <Rle> | <character> <character>
  p1.f2     chr6 [  1543200,   1555199]      + |          p1          f2
  p2.f2     chr6 [  1557200,   1560599]      + |          p2          f2
  p3.f3     chr6 [  1563000,   1565199]      + |          p3          f3
  p4.f4     chr6 [  1569800,   1573799]      + |          p4          f4
  p5.f5     chr5 [167889600, 167893599]      + |          p5          f5
  p6.f6     chr4 [      100,       200]      + |          p6          f6
  p7.f7     chr4 [     1000,      1200]      + |          p7          f7
        start_position end_position feature_strand  insideFeature
             <integer>    <integer>    <character>       <factor>
  p1.f2        1554400      1560799              +   overlapStart
  p2.f2        1554400      1560799              +         inside
  p3.f3        1565000      1565399              +   overlapStart
  p4.f4        1569400      1571199              +     overlapEnd
  p5.f5      167888600    167888999              +     downstream
  p6.f6            120          140              - includeFeature
  p7.f7            800         1400              -         inside
        distancetoFeature shortestDistance fromOverlappingOrNearest
                <numeric>        <integer>              <character>
  p1.f2            -11200              799          NearestLocation
  p2.f2              2800              200          NearestLocation
  p3.f3             -2000              199          NearestLocation
  p4.f4               400              400          NearestLocation
  p5.f5              1000              601          NearestLocation
  p6.f6                40               20          NearestLocation
  p7.f7               400              200          NearestLocation
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
>     ### use toGRanges or rtracklayer::import to convert BED or GFF format
>     ###  to GRanges before calling annotatePeakInBatch
>     test.bed <- data.frame(space=c("4", "6"), 
+                            start=c("100", "1000"),
+                            end=c("200", "1100"), 
+                            name=c("peak1", "peak2"))
>     test.GR = toGRanges(test.bed)
>     annotatePeakInBatch(test.GR, AnnotationData = literature)
GRanges object with 2 ranges and 9 metadata columns:
           seqnames       ranges strand |        peak     feature
              <Rle>    <IRanges>  <Rle> | <character> <character>
  peak1.f6     chr4 [ 100,  200]      * |       peak1          f6
  peak2.f1     chr6 [1000, 1100]      * |       peak2          f1
           start_position end_position feature_strand  insideFeature
                <integer>    <integer>    <character>       <factor>
  peak1.f6            120          140              - includeFeature
  peak2.f1        1549800      1550599              +       upstream
           distancetoFeature shortestDistance fromOverlappingOrNearest
                   <numeric>        <integer>              <character>
  peak1.f6                40               20          NearestLocation
  peak2.f1          -1548800          1548700          NearestLocation
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> #}
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>