Last data update: 2014.03.03

R: Add a peakset to, or retrieve a peakset from, a DBA object
dba.peaksetR Documentation

Add a peakset to, or retrieve a peakset from, a DBA object

Description

Adds a peakset to, or retrieves a peakset from, a DBA object

Usage

dba.peakset(DBA=NULL, peaks, sampID, tissue, factor, condition, treatment, replicate,
            control, peak.caller, peak.format, reads=0, consensus=FALSE, 
            bamReads, bamControl,
            scoreCol, bLowerScoreBetter, filter, counts,
            bRemoveM=TRUE, bRemoveRandom=TRUE,
            minOverlap=2, bMerge=TRUE,
            bRetrieve=FALSE, writeFile, numCols=4,
            DataType=DBA$config$DataType)

Arguments

DBA

DBA object. Required unless creating a new DBA object by adding an initial peakset.

peaks

When adding a specified peakset: set of peaks, either a GRanges or RangedData object, or a peak dataframe or matrix (chr,start,end,score), or a filename where the peaks are stored.

When adding a consensus peakset: a sample mask or vector of peakset numbers to include in the consensus. If missing or NULL, a consensus is derived from all peaksets present in the model. See dba.mask, or dba.show to get peakset numbers.

When adding and empty peakset (zero peaks), set peaks=NA.

When adding a set of consensus peaksets: a sample mask or vector of peakset numbers. Sample sets will be derived only from subsets of these peaksets.

When adding all the peaks from one DBA object to another: a DBA object. In this case, the only other parameter to have an effect is minOverlap.

When retrieving and/or writing a peakset: either a GRanges or RangedData object, or a peak dataframe or matrix (chr,start,end,score), or a peakset number; if NULL, retrieves/writes the full binding matrix.

sampID

ID string for the peakset being added; if missing, one is assigned (a serial number for a new peakset, or a concatenation of IDs for a consensus peakset).

tissue

tissue name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of tissues).

factor

factor name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of factors).

condition

condition name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of conditions).

treatment

treatment name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of treatment).

replicate

replicate number for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of replicate numbers).

control

control name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of control names).

peak.caller

peak caller name string. If peaks is specified as a file, and peak.format is missing, a default fie format for the caller will be used (see peak.format). Supported values:

  • “raw”: default peak.format: raw text file

  • “bed”: default peak.format: bed file

  • “narrow”: default peak.format: narrowPeaks file

  • “macs”: default peak.format: MACS .xls file

  • “bayes”: default peak.format: bayesPeak file

  • “tpic”: default peak.format: TPIC file

  • “sicer”: default peak.format: SICER file

  • “fp4”: default peak.format: FindPeaks v4 file

  • “swembl”: default peak.format: SWEMBL file

  • “csv”: default peak.format: comma separated value file

  • “report”: default peak.format: csv file saved via dba.report

When adding a consensus peakset, a default value (a concatenation of peak caller names) is assigned if this is missing.

peak.format

peak format string. If specified, overrides the default file format for the specified peak caller. Supported formats (with default score column):

  • “raw”: raw text file file; scoreCol=4

  • “bed”: bed file; scoreCol=5

  • “narrow”: narrowPeaks file; scoreCol=8

  • “macs”: MACS .xls file; scoreCol=7

  • “bayes”: bayesPeak file; scoreCol=4, filter=0.5

  • “tpic”: TPIC file; scoreCol=0 (all scores=1)

  • “sicer”: SICER file; scoreCol=7

  • “fp4”: FindPeaks v4 file; scoreCol=5

  • “swembl”: SWEMBL file; scoreCol=4

  • “csv”: csv file; scoreCol=4

  • “report”: report file; scoreCol=9, bLowerScoreBetter=T

reads

total number of ChIPed library reads for the peakset being added.

consensus

either the logical value of the consensus attribute when adding a specific peakset (set to TRUE for consensus peaksets generated by dba.peakset), or a metadata attribute or vector of attributes when generating a set of consensus peaksets. In the latter case, a consensus peakset will be added for each set of samples that have the same values for the specified attributes. Alternatively, attributes may be specified proceeded by a negative sign, in which case a consensus peakset will be added for each set of samples that differ only in their values for those attributes. See examples. Allowable attributes:

  • DBA_TISSUE; -DBA_TISSUE

  • DBA_FACTOR; -DBA_FACTOR

  • DBA_CONDITION; -DBA_CONDITION

  • DBA_TREATMENT; -DBA_TREATMENT

  • DBA_REPLICATE; -DBA_REPLICATE

  • DBA_CALLER; -DBA_CALLER

bamReads

file path of the BAM/BED file containing the aligned reads for the peakset being added.

bamControl

file path of the BAM/BED file containing the aligned reads for the control used for the peakset being added.

scoreCol

peak column to normalize to 0...1 scale when adding a peakset; 0 indicates no normalization

bLowerScoreBetter

Logical indicating that lower scores indicate higher confidence peaks; default is that higher scores indicate better peaks.

filter

Numeric indicating a filter value for peaks. If present, any peaks with a score less than this value (or higher if bLowerScoreBetter==TRUE) will be removed from the peakset.

counts

Used for adding externally computed peak counts. Can be a filename or a dataframe. Can consist of a single column (or vector) with the counts, or two columns, with an ID for each interval in the first column and the counts in the second column, or four columns (chr, start, end, counts). When counts is specified, peaks and related parameters are ignored, and all peaksets in the DBA object must be specified in this way, all with exactly the same number of intervals.

bRemoveM

logical indicating whether to remove peaks on chrM when adding a peakset

bRemoveRandom

logical indicating whether to remove peaks on chrN_random when adding a peakset

minOverlap

the minimum number of peaksets a peak must be in to be included when adding a consensus peakset. When retrieving, if the peaks parameter is a vector (logical mask or vector of peakset numbers), a binding matrix will be retrieved including all peaks in at least this many peaksets. If minOverlap is between zero and one, peak will be included from at least this proportion of peaksets.

bMerge

logical indicating whether global binding matrix should be compiled after adding the peakset. When adding several peaksets via successive calls to dba.peakset, it may be more efficient to set this parameter to FALSE and call dba(DBA) after all the peaksets have been added.

bRetrieve

logical indicating that a peakset is being retrieved and/or written, not added.

writeFile

file to write retrieved peakset.

numCols

number of columns to include when writing out peakset. First four columns are chr, start, end, score; the remainder are maintained from the original peakset. Ignored when writing out complete binding matrix.

DataType

The class of object for returned peaksets:

  • DBA_DATA_GRANGES

  • DBA_DATA_RANGEDDATA

  • DBA_DATA_FRAME

Can be set as default behavior by setting DBA$config$DataType.

Details

MODE: Add a specified peakset:

dba.peakset(DBA=NULL, peaks, sampID, tissue, factor, condition, replicate, control, peak.caller, reads, consensus, bamReads, bamControl, normCol, bRemoveM, bRemoveRandom)

MODE: Add a consensus peakset (derived from overlapping peaks in peaksets already present):

dba.peakset(DBA, peaks, minOverlap)

MODE: Add a sets of consensus peaksets bases on sample sets that share or differ in specified attributes

dba.peakset(DBA, peaks, consensus, minOverlap)

MODE: Retrieve a peakset:

dba.peakset(DBA, peaks, bRetrieve=T)

MODE: Write a peakset out to a file:

dba.peakset(DBA, peaks, bRetrieve=T, writeFile, numCols)

Value

DBA object when adding a peakset. Peakset matrix or RangedData object when retrieving and/or writing a peakset.

Author(s)

Rory Stark

See Also

to add peaksets using a sample sheet, see dba.

Examples


# create a new DBA object by adding three peaksets
mcf7 <- dba.peakset(NULL,
                   peaks=system.file("extra/peaks/MCF7_ER_1.bed.gz", package="DiffBind"),
                   peak.caller="bed", sampID="MCF7.1",tissue="MCF7",
                   factor="ER",condition="Responsive",replicate=1)
mcf7 <- dba.peakset(mcf7,
                   peaks=system.file("extra/peaks/MCF7_ER_2.bed.gz", package="DiffBind"),    
                   peak.caller="bed", sampID="MCF7.2",tissue="MCF7",
                   factor="ER",condition="Responsive",replicate=2)
mcf7 <- dba.peakset(mcf7,
                   peaks=system.file("extra/peaks/MCF7_ER_3.bed.gz", package="DiffBind"),      
                   peak.caller="bed", sampID="MCF7.3",tissue="MCF7",
                   factor="ER",condition="Responsive",replicate=3)
mcf7

#retrieve peaks that are in all three peaksets
mcf7.consensus <- dba.peakset(mcf7, 1:3, minOverlap=3, bRetrieve=TRUE)
mcf7.consensus

#add a consensus peakset -- peaks in all three replicates
mcf7 <- dba.peakset(mcf7, 1:3, minOverlap=3,sampID="MCF7_3of3")
mcf7

#add consensus peaksets for all sample types by combining replicates
data(tamoxifen_peaks)
tamoxifen <- dba.peakset(tamoxifen,consensus = -DBA_REPLICATE)
dba.show(tamoxifen,mask=tamoxifen$masks$Consensus)

#add consensus peaksets for all sample types by (same tissue and condition) 
data(tamoxifen_peaks)
tamoxifen <- dba.peakset(tamoxifen,consensus = c(DBA_TISSUE,DBA_CONDITION))
dba.show(tamoxifen,mask=tamoxifen$masks$Consensus)
dba.plotVenn(tamoxifen,tamoxifen$masks$Responsive & tamoxifen$masks$Consensus)

#create consensus peaksets from sample type consensuses for Responsive and Resistant sample groups
tamoxifen <- dba.peakset(tamoxifen,peaks=tamoxifen$masks$Consensus,consensus=DBA_CONDITION)
dba.show(tamoxifen,mask=tamoxifen$masks$Consensus)
dba.plotVenn(tamoxifen,17:18)
 
#retrieve the consensus peakset as RangedData object
mcf7.consensus <- dba.peakset(mcf7,mcf7$masks$Consensus,bRetrieve=TRUE)
mcf7.consensus

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(DiffBind)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/DiffBind/dba.peakset.Rd_%03d_medium.png", width=480, height=480)
> ### Name: dba.peakset
> ### Title: Add a peakset to, or retrieve a peakset from, a DBA object
> ### Aliases: dba.peakset
> 
> ### ** Examples
> 
> 
> # create a new DBA object by adding three peaksets
> mcf7 <- dba.peakset(NULL,
+                    peaks=system.file("extra/peaks/MCF7_ER_1.bed.gz", package="DiffBind"),
+                    peak.caller="bed", sampID="MCF7.1",tissue="MCF7",
+                    factor="ER",condition="Responsive",replicate=1)
> mcf7 <- dba.peakset(mcf7,
+                    peaks=system.file("extra/peaks/MCF7_ER_2.bed.gz", package="DiffBind"),    
+                    peak.caller="bed", sampID="MCF7.2",tissue="MCF7",
+                    factor="ER",condition="Responsive",replicate=2)
> mcf7 <- dba.peakset(mcf7,
+                    peaks=system.file("extra/peaks/MCF7_ER_3.bed.gz", package="DiffBind"),      
+                    peak.caller="bed", sampID="MCF7.3",tissue="MCF7",
+                    factor="ER",condition="Responsive",replicate=3)
> mcf7
3 Samples, 1215 sites in matrix (1780 total):
      ID Tissue Factor  Condition Replicate Caller Intervals
1 MCF7.1   MCF7     ER Responsive         1    bed      1556
2 MCF7.2   MCF7     ER Responsive         2    bed      1046
3 MCF7.3   MCF7     ER Responsive         3    bed      1339
> 
> #retrieve peaks that are in all three peaksets
> mcf7.consensus <- dba.peakset(mcf7, 1:3, minOverlap=3, bRetrieve=TRUE)
> mcf7.consensus
GRanges object with 885 ranges and 3 metadata columns:
      seqnames               ranges strand |             MCF7.1
         <Rle>            <IRanges>  <Rle> |          <numeric>
    1    chr18     [111564, 112005]      * | 0.0573241900280181
    2    chr18     [346464, 347342]      * |  0.108360551694246
    3    chr18     [399098, 399919]      * |  0.288184393430442
    4    chr18     [499098, 500193]      * |  0.959737465364313
    5    chr18     [503660, 504550]      * |  0.236083032770391
  ...      ...                  ...    ... .                ...
  881    chr18 [77438923, 77440019]      * |  0.607061810188697
  882    chr18 [77541035, 77541645]      * | 0.0949923375799136
  883    chr18 [77694196, 77695138]      * |  0.506886890295816
  884    chr18 [77968125, 77968822]      * |  0.155703472082476
  885    chr18 [77987486, 77988208]      * |  0.197907153140044
                  MCF7.2             MCF7.3
               <numeric>          <numeric>
    1 0.0268569057628165 0.0554273660247567
    2 0.0279247492015297 0.0405869362152264
    3  0.155123557848488   0.22176513890865
    4  0.496817376257209  0.802688474695799
    5  0.101589864493241   0.20406054944851
  ...                ...                ...
  881  0.064588446062649   0.28330126876705
  882 0.0292466879163222 0.0593926496879118
  883  0.107488734573361  0.201369600316728
  884 0.0853438488036615  0.102488045381156
  885 0.0667144837523761 0.0820399250245897
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> #add a consensus peakset -- peaks in all three replicates
> mcf7 <- dba.peakset(mcf7, 1:3, minOverlap=3,sampID="MCF7_3of3")
> mcf7
4 Samples, 1215 sites in matrix (1780 total):
         ID Tissue Factor  Condition Replicate Caller Intervals
1    MCF7.1   MCF7     ER Responsive         1    bed      1556
2    MCF7.2   MCF7     ER Responsive         2    bed      1046
3    MCF7.3   MCF7     ER Responsive         3    bed      1339
4 MCF7_3of3   MCF7     ER Responsive     1-2-3    bed       885
> 
> #add consensus peaksets for all sample types by combining replicates
> data(tamoxifen_peaks)
> tamoxifen <- dba.peakset(tamoxifen,consensus = -DBA_REPLICATE)
Add consensus: BT474 Resistant
Add consensus: MCF7 Responsive
Add consensus: T47D Responsive
Add consensus: MCF7 Resistant
Add consensus: ZR75 Responsive
> dba.show(tamoxifen,mask=tamoxifen$masks$Consensus)
                ID Tissue Factor  Condition  Treatment Replicate Caller
12 BT474:Resistant  BT474     ER  Resistant Full-Media       1-2    bed
13 MCF7:Responsive   MCF7     ER Responsive Full-Media     1-2-3    bed
14 T47D:Responsive   T47D     ER Responsive Full-Media       1-2    bed
15  MCF7:Resistant   MCF7     ER  Resistant Full-Media       1-2    bed
16 ZR75:Responsive   ZR75     ER Responsive Full-Media       1-2    bed
   Intervals
12       896
13      1215
14       318
15       879
16      1933
> 
> #add consensus peaksets for all sample types by (same tissue and condition) 
> data(tamoxifen_peaks)
> tamoxifen <- dba.peakset(tamoxifen,consensus = c(DBA_TISSUE,DBA_CONDITION))
Add consensus: BT474 Resistant
Add consensus: MCF7 Responsive
Add consensus: T47D Responsive
Add consensus: MCF7 Resistant
Add consensus: ZR75 Responsive
> dba.show(tamoxifen,mask=tamoxifen$masks$Consensus)
                ID Tissue Factor  Condition  Treatment Replicate Caller
12 BT474:Resistant  BT474     ER  Resistant Full-Media       1-2    bed
13 MCF7:Responsive   MCF7     ER Responsive Full-Media     1-2-3    bed
14 T47D:Responsive   T47D     ER Responsive Full-Media       1-2    bed
15  MCF7:Resistant   MCF7     ER  Resistant Full-Media       1-2    bed
16 ZR75:Responsive   ZR75     ER Responsive Full-Media       1-2    bed
   Intervals
12       896
13      1215
14       318
15       879
16      1933
> dba.plotVenn(tamoxifen,tamoxifen$masks$Responsive & tamoxifen$masks$Consensus)
> 
> #create consensus peaksets from sample type consensuses for Responsive and Resistant sample groups
> tamoxifen <- dba.peakset(tamoxifen,peaks=tamoxifen$masks$Consensus,consensus=DBA_CONDITION)
Add consensus: Resistant
Add consensus: Responsive
> dba.show(tamoxifen,mask=tamoxifen$masks$Consensus)
                ID         Tissue Factor  Condition  Treatment Replicate Caller
12 BT474:Resistant          BT474     ER  Resistant Full-Media       1-2    bed
13 MCF7:Responsive           MCF7     ER Responsive Full-Media     1-2-3    bed
14 T47D:Responsive           T47D     ER Responsive Full-Media       1-2    bed
15  MCF7:Resistant           MCF7     ER  Resistant Full-Media       1-2    bed
16 ZR75:Responsive           ZR75     ER Responsive Full-Media       1-2    bed
17       Resistant     BT474-MCF7     ER  Resistant Full-Media       1-2    bed
18      Responsive MCF7-T47D-ZR75     ER Responsive Full-Media 1-2-3-1-2    bed
   Intervals
12       896
13      1215
14       318
15       879
16      1933
17       456
18       792
> dba.plotVenn(tamoxifen,17:18)
>  
> #retrieve the consensus peakset as RangedData object
> mcf7.consensus <- dba.peakset(mcf7,mcf7$masks$Consensus,bRetrieve=TRUE)
> mcf7.consensus
GRanges object with 885 ranges and 1 metadata column:
      seqnames               ranges strand |          MCF7_3of3
         <Rle>            <IRanges>  <Rle> |          <numeric>
    1    chr18     [111564, 112005]      * | 0.0465361539385305
    2    chr18     [346464, 347342]      * | 0.0589574123703341
    3    chr18     [399098, 399919]      * |  0.221691030062527
    4    chr18     [499098, 500193]      * |  0.753081105439107
    5    chr18     [503660, 504550]      * |  0.180577815570714
  ...      ...                  ...    ... .                ...
  881    chr18 [77438923, 77440019]      * |  0.318317175006132
  882    chr18 [77541035, 77541645]      * | 0.0612105583947159
  883    chr18 [77694196, 77695138]      * |  0.271915075061968
  884    chr18 [77968125, 77968822]      * |  0.114511788755764
  885    chr18 [77987486, 77988208]      * |  0.115553853972337
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>