Last data update: 2014.03.03

R: Find indices of features bounding a set of chromosome...
boundingIndicesR Documentation

Find indices of features bounding a set of chromosome ranges/genes

Description

This function is similar to findOverlaps but it guarantees at least two features will be covered. This is useful in the case of finding features corresponding to a set of genes. Some genes will fall entirely between two features and thus would not return any ranges with findOverlaps. Specifically, this function will find the indices of the features (first and last) bounding the ends of a range/gene (start and stop) such that first <= start < stop <= last. Equality is necessary so that multiple conversions between indices and genomic positions will not expand with each conversion. Ranges/genes that are outside the range of feature positions will be given the indices of the corresponding first or last index rather than 0 or n + 1 so that genes can always be connected to some data.

Usage

boundingIndices(starts, stops, positions, all.indices = FALSE)

Arguments

starts

integer vector of first base position of each query range

stops

integer vector of last base position of each query range

positions

Base positions in which to search

all.indices

logical, return a list containing full sequence of indices for each query

Details

This function uses some tricks from findIntervals, where is for k queries and n features it is O(k * log(n)) generally and ~O(k) for sorted queries. Therefore will be dramatically faster for sets of query genes that are sorted by start position within each chromosome. The index of the stop position for each gene is found using the left bound from the start of the gene reducing the search space for the stop position somewhat. boundingIndices does not check for NAs or unsorted data in the subject positions. These assumptions are safe for position info coming from a GenoSet or GRanges.

Value

integer matrix of 2 columms for start and stop index of range in data or a list of full sequences of indices for each query (see all.indices argument)

See Also

Other "range summaries": boundingIndicesByChr, rangeSampleMeans

Examples

  starts = seq(10,100,10)
  boundingIndices( starts=starts, stops=starts+5, positions = 1:100 )

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(genoset)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: GenomicRanges
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


*** Genoset API Changes ***
The genoset class has transitioned to the
RangedSummarizedExperiment API from the eSet API (e.g. use colnames instead of sampleNames). ***

Attaching package: 'genoset'

The following object is masked from 'package:GenomicRanges':

    pos

The following objects are masked from 'package:S4Vectors':

    colMeans, colSums, rowMeans, rowSums

The following objects are masked from 'package:base':

    colMeans, colSums, rowMeans, rowSums

Warning messages:
1: multiple methods tables found for 'colMeans' 
2: multiple methods tables found for 'colSums' 
3: multiple methods tables found for 'rowMeans' 
4: multiple methods tables found for 'rowSums' 
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/genoset/boundingIndices.Rd_%03d_medium.png", width=480, height=480)
> ### Name: boundingIndices
> ### Title: Find indices of features bounding a set of chromosome
> ###   ranges/genes
> ### Aliases: boundingIndices
> 
> ### ** Examples
> 
>   starts = seq(10,100,10)
>   boundingIndices( starts=starts, stops=starts+5, positions = 1:100 )
      left right
 [1,]   10    15
 [2,]   20    25
 [3,]   30    35
 [4,]   40    45
 [5,]   50    55
 [6,]   60    65
 [7,]   70    75
 [8,]   80    85
 [9,]   90    95
[10,]   99   100
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>