Last data update: 2014.03.03

R: A function assigning promoter regions to given probe IDs.
matchProbeToPromoterR Documentation

A function assigning promoter regions to given probe IDs.

Description

This function returns a GRangesList object asigning promoter regions to probes. The assignment of transcripts to probes and the transcriptional start sites must be given as arguments.

Usage

matchProbeToPromoter(probeToTranscript, transcriptToTSS, promWidth = 4000, mode = "union", fix = "center")

Arguments

probeToTranscript

A list with character vectors as elements. The elements' names are probe IDs and the character vectors store the transcript IDs assigned to that probe.

transcriptToTSS

A data.frame with four columns:

  1. Transcript ID as given in the argument probeToTranscript

  2. Chromosome

  3. Transcriptional start site in base pairs

  4. Strand

promWidth

Width of the promoter regions in base pairs. Promoters are defined as promWidth base pairs upstream of the transcriptional start site. (default 4000bp)

mode

How probes with multiple transcripts should be handled. Must be either "union", "keepAll" or "dropMultiple". (default "union")

fix

Denotes what to use as anchor when defining the promoter region. Must be either "center", "start" or "end". "Center" means that the TSS is in the middle of the promoter, whereas "end" means that the promoter is placed upsream of the TSS. (default "center")

Details

More than one transcript can be assigned to one probe in the given probeToTranscript argument. Several options how to handle such cases can be choosen by argument mode. "union": The union of all promoters is calculated and assigned to the probe. "keepAll": All promoters of all transcripts are assigned to the probe. If some transcript have identical TSSs, the same promoter region occurs several times. "dropMultiple": All probes that have more than one transcript with different TSS are removed.

The argument transcriptToTSS must have at least 4 columns giving the information as described above. The column names are not decisive, but their position.

Value

An object of class GRangesList with one element for each probe. If mode is not set to "dropMultiple", GRanges may consist of more than one range. The names of the lists' elements are the probe IDs and additionally, each GRanges has a meta data column "probe" giving the corresponding probe ID.

Author(s)

Hans-Ulrich Klein (h.klein@uni-muenster.de)

See Also

summarizeReads

Examples

probeToTrans <- list("101"="ENST00011", 
                     "102"=c("ENST00021", "ENST00022"),
                     "103"=NA)
transToTSS <- data.frame(
    transID=c("ENST00011", "ENST00021", "ENST00022"),
    chr=c("1", "1", "1"),
    tss=c(100000, 200000, 201000),
    strand=c("-", "+", "+"))

matchProbeToPromoter(probeToTrans, transToTSS,
    promWidth=4000, mode="union")
matchProbeToPromoter(probeToTrans, transToTSS,
    promWidth=4000, mode="keepAll")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(epigenomix)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/epigenomix/matchProbeToPromoter.Rd_%03d_medium.png", width=480, height=480)
> ### Name: matchProbeToPromoter
> ### Title: A function assigning promoter regions to given probe IDs.
> ### Aliases: matchProbeToPromoter
> ###   matchProbeToPromoter,list,data.frame-method
> 
> ### ** Examples
> 
> probeToTrans <- list("101"="ENST00011", 
+                      "102"=c("ENST00021", "ENST00022"),
+                      "103"=NA)
> transToTSS <- data.frame(
+     transID=c("ENST00011", "ENST00021", "ENST00022"),
+     chr=c("1", "1", "1"),
+     tss=c(100000, 200000, 201000),
+     strand=c("-", "+", "+"))
> 
> matchProbeToPromoter(probeToTrans, transToTSS,
+     promWidth=4000, mode="union")
GRangesList object of length 2:
$101 
GRanges object with 1 range and 1 metadata column:
      seqnames          ranges strand |       probe
         <Rle>       <IRanges>  <Rle> | <character>
  [1]        1 [98000, 101999]      - |         101

$102 
GRanges object with 1 range and 1 metadata column:
      seqnames           ranges strand | probe
  [1]        1 [198000, 202999]      + |   102

-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> matchProbeToPromoter(probeToTrans, transToTSS,
+     promWidth=4000, mode="keepAll")
GRangesList object of length 2:
$101 
GRanges object with 1 range and 1 metadata column:
      seqnames          ranges strand |       probe
         <Rle>       <IRanges>  <Rle> | <character>
  [1]        1 [98000, 101999]      - |         101

$102 
GRanges object with 2 ranges and 1 metadata column:
      seqnames           ranges strand | probe
  [1]        1 [198000, 201999]      + |   102
  [2]        1 [199000, 202999]      + |   102

-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>