A mart object, used if AnnotationData is not supplied, see useMart
of bioMaRt package for details
featureType
A charcter vector used with mart argument if
AnnotationData is not supplied; it's value is "TSS"", "miRNA"" or "Exon"
AnnotationData
A GRanges or annoGR oject.
It can be obtained from function getAnnotation or customized annotation of
class GRanges containing additional variable: strand (1 or + for plus strand
and -1 or - for minus strand). Pre-compliled annotations, such as
TSS.human.NCBI36, TSS.mouse.NCBIM37, TSS.rat.RGSC3.4 and TSS.zebrafish.Zv8,
are provided by this package (attach them with data() function). Another
method to provide annotation data is to obtain through biomaRt real time by
using the parameters of mart and featureType
output
nearestLocation (default)
will output the nearest features calculated as PeakLoc - FeatureLocForDistance
overlapping
will output overlapping features with maximum gap specified as maxgap between peak range and feature range
shortestDistance
will output nearest features
both
will output all the nearest features, in addition, will output any features that overlap the peak that is not the nearest features
upstream&inside
will output all upstream and overlapping features with maximum gap
inside&downstream
will output all downstream and overlapping features
with maximum gap
upstream
will output all upstream features with maximum gap.
downstream
will output all downstream features with maximum gap.
upstreamORdownstream
will output all upstream features with maximum gap or downstream with maximum gap
nearestBiDirectionalPromoters
will use annoPeaks to annotate peaks. Nearest promoters from both direction of the peaks (strand is considered). It will report bidirectional promoters if there are promoters in both directions in the given region (defined by bindingRegion). Otherwise, it will report the closest promoter in one direction.
multiple
Not applicable when output is nearest. TRUE: output multiple
overlapping features for each peak. FALSE: output at most one overlapping
feature for each peak. This parameter is kept for backward compatibility,
please use select.
maxgap
Non-negative integer. Intervals with a separation of maxgap or
less are considered to be overlapping
PeakLocForDistance
Specify the location of peak for calculating
distance,i.e., middle means using middle of the peak to calculate distance
to feature, start means using start of the peak to calculate the distance
to feature. To be compatible with previous version, by default using start
FeatureLocForDistance
Specify the location of feature for calculating
distance,i.e., middle means using middle of the feature to calculate distance
of peak to feature, start means using start of the feature to calculate the
distance to feature, TSS means using start of feature when feature is on plus
strand and using end of feature when feature is on minus strand, geneEnd means
using end of feature when feature is on plus strand and using start of feature
when feature is on minus strand. To be compatible with previous version, by
default using TSS
select
"all" may return multiple overlapping peaks, "first" will return the first
overlapping peak, "last" will return the last overlapping peak and "arbitrary"
will return one of the overlapping peaks.
ignore.strand
When set to TRUE, the strand information is ignored in
the annotation.
bindingRegion
Annotation range used for annoPeaks, which is a vector with two integer values, default to c (-5000, 5000). The first one must be no bigger than 0. And the sec ond one must be no less than 1. Once bindingRegion is defined,
annotation will based on annoPeaks. Here is how to use it together with the parameter output and FeatureLocForDistance.
To obtain peaks with nearest bi-directional promoters within 5kb upstream and 3kb downstream of TSS, set output = "nearestBiDirectionalPromoters" and bindingRegion = c(-5000, 3000)
To obtain peaks within 5kb upstream and up to 3kb downstream of TSS within the gene body, set output="overlapping", FeatureLocForDistance="TSS" and bindingRegion = c(-5000, 3000)
To obtain peaks up to 5kb upstream within the gene body and 3kb downstream of gene/Exon End, set output="overlapping", FeatureLocForDistance="geneEnd" and bindingRegion = c(-5000, 3000)
For details, see annoPeaks.
...
Parameters could be passed to annoPeaks
Value
An object of GRanges with slot start holding
the start position of the peak, slot end holding the end position of the peak,
slot space holding the chromosome location where the peak is located, slot
rownames holding the id of the peak. In addition, the following variables
are included.
feature
id of the feature such as ensembl gene ID
insideFeature
upstream: peak resides upstream of the
feature; downstream: peak resides downstream of the feature; inside:
peak resides inside the feature; overlapStart: peak overlaps with the
start of the feature; overlapEnd: peak overlaps with the end of the
feature; includeFeature: peak include the feature entirely
distancetoFeature
distance to the nearest feature such as
transcription start site. By default, the distance is calculated as the
distance between the start of the binding site and the TSS that is the gene
start for genes located on the forward strand and the gene end for genes
located on the reverse strand. The user can specify the location of peak
and location of feature for calculating this
start_position
start position of the feature such as gene
end_position
end position of the feature such as the gene
strand
1 or + for positive strand and -1 or - for negative
strand where the feature is located
shortestDistance
The shortest distance from either end of
peak to either end the feature.
fromOverlappingOrNearest
nearest: indicates this feature's start
(feature's end for features at minus strand) is closest to the peak start;
Overlapping: indicates this feature overlaps with this peak although it is
not the nearest feature start
Author(s)
Lihua Julie Zhu, Jianhong Ou
References
1. Zhu L.J. et al. (2010) ChIPpeakAnno: a Bioconductor package to annotate
ChIP-seq and ChIP-chip data. BMC Bioinformatics 2010,
11:237doi:10.1186/1471-2105-11-237
2. Zhu L (2013). "Integrative analysis of ChIP-chip and ChIP-seq dataset."
In Lee T and Luk ACS (eds.), Tilling Arrays, volume 1067, chapter 4, pp. -19.
Humana Press. http://dx.doi.org/10.1007/978-1-62703-607-8_8
#if (interactive()){
## example 1: annotate myPeakList by TxDb or EnsDb.
data(myPeakList)
library(EnsDb.Hsapiens.v75)
annoData <- annoGR(EnsDb.Hsapiens.v75)
annotatePeak = annotatePeakInBatch(myPeakList[1:6], AnnotationData=annoData)
annotatePeak
## example 2: annotate myPeakList (GRanges)
## with TSS.human.NCBI36 (Granges)
data(TSS.human.NCBI36)
annotatedPeak = annotatePeakInBatch(myPeakList[1:6],
AnnotationData=TSS.human.NCBI36)
annotatedPeak
## example 3: you have a list of transcription factor biding sites from
## literature and are interested in determining the extent of the overlap
## to the list of peaks from your experiment. Prior calling the function
## annotatePeakInBatch, need to represent both dataset as RangedData
## where start is the start of the binding site, end is the end of the
## binding site, names is the name of the binding site, space and strand
## are the chromosome name and strand where the binding site is located.
myexp <- GRanges(seqnames=c(6,6,6,6,5,4,4),
IRanges(start=c(1543200,1557200,1563000,1569800,
167889600,100,1000),
end=c(1555199,1560599,1565199,1573799,
167893599,200,1200),
names=c("p1","p2","p3","p4","p5","p6", "p7")),
strand="+")
literature <- GRanges(seqnames=c(6,6,6,6,5,4,4),
IRanges(start=c(1549800,1554400,1565000,1569400,
167888600,120,800),
end=c(1550599,1560799,1565399,1571199,
167888999,140,1400),
names=c("f1","f2","f3","f4","f5","f6","f7")),
strand=rep(c("+", "-"), c(5, 2)))
annotatedPeak1 <- annotatePeakInBatch(myexp,
AnnotationData=literature)
pie(table(annotatedPeak1$insideFeature))
annotatedPeak1
### use toGRanges or rtracklayer::import to convert BED or GFF format
### to GRanges before calling annotatePeakInBatch
test.bed <- data.frame(space=c("4", "6"),
start=c("100", "1000"),
end=c("200", "1100"),
name=c("peak1", "peak2"))
test.GR = toGRanges(test.bed)
annotatePeakInBatch(test.GR, AnnotationData = literature)
#}
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(ChIPpeakAnno)
Loading required package: grid
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: Biostrings
Loading required package: XVector
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: VennDiagram
Loading required package: futile.logger
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ChIPpeakAnno/annotatePeakInBatch.Rd_%03d_medium.png", width=480, height=480)
> ### Name: annotatePeakInBatch
> ### Title: Obtain the distance to the nearest TSS, miRNA, and/or exon for a
> ### list of peaks
> ### Aliases: annotatePeakInBatch
> ### Keywords: misc
>
> ### ** Examples
>
>
> #if (interactive()){
> ## example 1: annotate myPeakList by TxDb or EnsDb.
> data(myPeakList)
> library(EnsDb.Hsapiens.v75)
Loading required package: ensembldb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> annoData <- annoGR(EnsDb.Hsapiens.v75)
> annotatePeak = annotatePeakInBatch(myPeakList[1:6], AnnotationData=annoData)
> annotatePeak
GRanges object with 6 ranges and 9 metadata columns:
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
X1_93_556427.ENSG00000223659 chr1 [ 556660, 556760] * |
X1_41_559455.ENSG00000223659 chr1 [ 559774, 559874] * |
X1_12_703729.ENSG00000240618 chr1 [ 703885, 703985] * |
X1_20_925025.ENSG00000272512 chr1 [ 926058, 926158] * |
X1_11_1041174.ENSG00000131591 chr1 [1041646, 1041746] * |
X1_14_1269014.ENSG00000169962 chr1 [1270239, 1270339] * |
peak feature start_position
<character> <character> <integer>
X1_93_556427.ENSG00000223659 X1_93_556427 ENSG00000223659 562757
X1_41_559455.ENSG00000223659 X1_41_559455 ENSG00000223659 562757
X1_12_703729.ENSG00000240618 X1_12_703729 ENSG00000240618 694412
X1_20_925025.ENSG00000272512 X1_20_925025 ENSG00000272512 931346
X1_11_1041174.ENSG00000131591 X1_11_1041174 ENSG00000131591 1017198
X1_14_1269014.ENSG00000169962 X1_14_1269014 ENSG00000169962 1266694
end_position feature_strand insideFeature
<integer> <character> <factor>
X1_93_556427.ENSG00000223659 564390 - downstream
X1_41_559455.ENSG00000223659 564390 - downstream
X1_12_703729.ENSG00000240618 700305 - upstream
X1_20_925025.ENSG00000272512 933431 - downstream
X1_11_1041174.ENSG00000131591 1051741 - inside
X1_14_1269014.ENSG00000169962 1270686 + inside
distancetoFeature shortestDistance
<numeric> <integer>
X1_93_556427.ENSG00000223659 7730 5997
X1_41_559455.ENSG00000223659 4616 2883
X1_12_703729.ENSG00000240618 -3580 3580
X1_20_925025.ENSG00000272512 7373 5188
X1_11_1041174.ENSG00000131591 10095 9995
X1_14_1269014.ENSG00000169962 3545 347
fromOverlappingOrNearest
<character>
X1_93_556427.ENSG00000223659 NearestLocation
X1_41_559455.ENSG00000223659 NearestLocation
X1_12_703729.ENSG00000240618 NearestLocation
X1_20_925025.ENSG00000272512 NearestLocation
X1_11_1041174.ENSG00000131591 NearestLocation
X1_14_1269014.ENSG00000169962 NearestLocation
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
>
> ## example 2: annotate myPeakList (GRanges)
> ## with TSS.human.NCBI36 (Granges)
> data(TSS.human.NCBI36)
> annotatedPeak = annotatePeakInBatch(myPeakList[1:6],
+ AnnotationData=TSS.human.NCBI36)
> annotatedPeak
GRanges object with 6 ranges and 9 metadata columns:
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
X1_93_556427.ENSG00000212875 chr1 [ 556660, 556760] * |
X1_41_559455.ENSG00000212678 chr1 [ 559774, 559874] * |
X1_12_703729.ENSG00000197049 chr1 [ 703885, 703985] * |
X1_20_925025.ENSG00000188290 chr1 [ 926058, 926158] * |
X1_11_1041174.ENSG00000131591 chr1 [1041646, 1041746] * |
X1_14_1269014.ENSG00000107404 chr1 [1270239, 1270339] * |
peak feature start_position
<character> <character> <integer>
X1_93_556427.ENSG00000212875 X1_93_556427 ENSG00000212875 556318
X1_41_559455.ENSG00000212678 X1_41_559455 ENSG00000212678 559620
X1_12_703729.ENSG00000197049 X1_12_703729 ENSG00000197049 711184
X1_20_925025.ENSG00000188290 X1_20_925025 ENSG00000188290 924209
X1_11_1041174.ENSG00000131591 X1_11_1041174 ENSG00000131591 1007062
X1_14_1269014.ENSG00000107404 X1_14_1269014 ENSG00000107404 1260523
end_position feature_strand insideFeature
<integer> <character> <factor>
X1_93_556427.ENSG00000212875 557859 + inside
X1_41_559455.ENSG00000212678 560165 + inside
X1_12_703729.ENSG00000197049 712376 + upstream
X1_20_925025.ENSG00000188290 925333 - upstream
X1_11_1041174.ENSG00000131591 1041341 - upstream
X1_14_1269014.ENSG00000107404 1274623 - inside
distancetoFeature shortestDistance
<numeric> <integer>
X1_93_556427.ENSG00000212875 342 342
X1_41_559455.ENSG00000212678 154 154
X1_12_703729.ENSG00000197049 -7299 7199
X1_20_925025.ENSG00000188290 -725 725
X1_11_1041174.ENSG00000131591 -305 305
X1_14_1269014.ENSG00000107404 4384 4284
fromOverlappingOrNearest
<character>
X1_93_556427.ENSG00000212875 NearestLocation
X1_41_559455.ENSG00000212678 NearestLocation
X1_12_703729.ENSG00000197049 NearestLocation
X1_20_925025.ENSG00000188290 NearestLocation
X1_11_1041174.ENSG00000131591 NearestLocation
X1_14_1269014.ENSG00000107404 NearestLocation
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
>
> ## example 3: you have a list of transcription factor biding sites from
> ## literature and are interested in determining the extent of the overlap
> ## to the list of peaks from your experiment. Prior calling the function
> ## annotatePeakInBatch, need to represent both dataset as RangedData
> ## where start is the start of the binding site, end is the end of the
> ## binding site, names is the name of the binding site, space and strand
> ## are the chromosome name and strand where the binding site is located.
>
> myexp <- GRanges(seqnames=c(6,6,6,6,5,4,4),
+ IRanges(start=c(1543200,1557200,1563000,1569800,
+ 167889600,100,1000),
+ end=c(1555199,1560599,1565199,1573799,
+ 167893599,200,1200),
+ names=c("p1","p2","p3","p4","p5","p6", "p7")),
+ strand="+")
> literature <- GRanges(seqnames=c(6,6,6,6,5,4,4),
+ IRanges(start=c(1549800,1554400,1565000,1569400,
+ 167888600,120,800),
+ end=c(1550599,1560799,1565399,1571199,
+ 167888999,140,1400),
+ names=c("f1","f2","f3","f4","f5","f6","f7")),
+ strand=rep(c("+", "-"), c(5, 2)))
> annotatedPeak1 <- annotatePeakInBatch(myexp,
+ AnnotationData=literature)
> pie(table(annotatedPeak1$insideFeature))
> annotatedPeak1
GRanges object with 7 ranges and 9 metadata columns:
seqnames ranges strand | peak feature
<Rle> <IRanges> <Rle> | <character> <character>
p1.f2 chr6 [ 1543200, 1555199] + | p1 f2
p2.f2 chr6 [ 1557200, 1560599] + | p2 f2
p3.f3 chr6 [ 1563000, 1565199] + | p3 f3
p4.f4 chr6 [ 1569800, 1573799] + | p4 f4
p5.f5 chr5 [167889600, 167893599] + | p5 f5
p6.f6 chr4 [ 100, 200] + | p6 f6
p7.f7 chr4 [ 1000, 1200] + | p7 f7
start_position end_position feature_strand insideFeature
<integer> <integer> <character> <factor>
p1.f2 1554400 1560799 + overlapStart
p2.f2 1554400 1560799 + inside
p3.f3 1565000 1565399 + overlapStart
p4.f4 1569400 1571199 + overlapEnd
p5.f5 167888600 167888999 + downstream
p6.f6 120 140 - includeFeature
p7.f7 800 1400 - inside
distancetoFeature shortestDistance fromOverlappingOrNearest
<numeric> <integer> <character>
p1.f2 -11200 799 NearestLocation
p2.f2 2800 200 NearestLocation
p3.f3 -2000 199 NearestLocation
p4.f4 400 400 NearestLocation
p5.f5 1000 601 NearestLocation
p6.f6 40 20 NearestLocation
p7.f7 400 200 NearestLocation
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
> ### use toGRanges or rtracklayer::import to convert BED or GFF format
> ### to GRanges before calling annotatePeakInBatch
> test.bed <- data.frame(space=c("4", "6"),
+ start=c("100", "1000"),
+ end=c("200", "1100"),
+ name=c("peak1", "peak2"))
> test.GR = toGRanges(test.bed)
> annotatePeakInBatch(test.GR, AnnotationData = literature)
GRanges object with 2 ranges and 9 metadata columns:
seqnames ranges strand | peak feature
<Rle> <IRanges> <Rle> | <character> <character>
peak1.f6 chr4 [ 100, 200] * | peak1 f6
peak2.f1 chr6 [1000, 1100] * | peak2 f1
start_position end_position feature_strand insideFeature
<integer> <integer> <character> <factor>
peak1.f6 120 140 - includeFeature
peak2.f1 1549800 1550599 + upstream
distancetoFeature shortestDistance fromOverlappingOrNearest
<numeric> <integer> <character>
peak1.f6 40 20 NearestLocation
peak2.f1 -1548800 1548700 NearestLocation
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
> #}
>
>
>
>
>
> dev.off()
null device
1
>