Last data update: 2014.03.03

R: Create Random Regions
createRandomRegionsR Documentation

Create Random Regions

Description

Creates a set of random regions with a given mean size and standard deviation.

Usage

createRandomRegions(nregions=100, length.mean=250, length.sd=20, genome="hg19", mask=NULL, non.overlapping=TRUE)

Arguments

nregions

The number of regions to be created.

length.mean

The mean size of the regions created. This is not guaranteed to be the mean of the final region set. See note.

length.sd

The standard deviation of the region size. This is not guaranteed to be the standard deviation of the final region set. See note.

genome

The reference genome to use. A valid genome object. Either a GenomicRanges or data.frame containing one region per whole chromosome or a character uniquely identifying a genome in BSgenome (e.g. "hg19", "mm10" but not "hg"). Internally it uses getGenomeAndMask.

mask

The set of regions specifying where a random region can not be (centromeres, repetitive regions, unmappable regions...). A region set in any of the accepted formats (GenomicRanges, data.frame, ...). NULL will try to derive a mask from the genome (currently only works is the genome is a character string) and NA explicitly gives an empty mask.

non.overlapping

A boolean stating whether the random regions can overlap (FALSE) or not (TRUE).

Details

A set of nregions will be created and randomly placed over the genome. The lengths of the region set will follow a normal distribution with a mean size length.mean and a standard deviation length.sd. The new regions can be made explicitly non overlapping by setting non.overlapping to TRUE. A mask can be provided so no regions fall in a forbidden part of the genome.

Value

It returns a GenomicRanges object with the regions resulting from the randomization process.

Note

If the standard deviation of the length is large with respect to the mean, negative lengths might be created. These region lengths will be transfromed to into a 1 and so the, for large standard deviations the mean and sd of the lengths are not guaranteed to be the ones in the parameters.

See Also

getGenome, getMask, getGenomeAndMask, characterToBSGenome, maskFromBSGenome, randomizeRegions, resampleRegions

Examples

genome <- data.frame(c("chr1", "chr2"), c(1, 1), c(180000000, 20000000))
mask <- data.frame("chr1", c(20000000, 100000000), c(22000000, 130000000))

createRandomRegions(nregions=10, length.mean=1000, length.sd=500)

createRandomRegions(nregions=10, genome=genome, mask=mask, non.overlapping=TRUE)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(regioneR)
Loading required package: memoise
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: BSgenome
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/regioneR/createRandomRegions.Rd_%03d_medium.png", width=480, height=480)
> ### Name: createRandomRegions
> ### Title: Create Random Regions
> ### Aliases: createRandomRegions
> 
> ### ** Examples
> 
> genome <- data.frame(c("chr1", "chr2"), c(1, 1), c(180000000, 20000000))
> mask <- data.frame("chr1", c(20000000, 100000000), c(22000000, 130000000))
> 
> createRandomRegions(nregions=10, length.mean=1000, length.sd=500)
GRanges object with 10 ranges and 0 metadata columns:
       seqnames                 ranges strand
          <Rle>              <IRanges>  <Rle>
   [1]     chr7 [   837537,    838807]      *
   [2]     chr2 [ 69639144,  69640119]      *
   [3]     chr2 [ 63592733,  63593656]      *
   [4]     chr3 [129906728, 129907157]      *
   [5]    chr12 [ 28593929,  28594308]      *
   [6]    chr18 [ 29781622,  29782345]      *
   [7]    chr17 [ 70016925,  70018005]      *
   [8]     chr2 [172054155, 172054593]      *
   [9]     chr2 [ 89840485,  89841059]      *
  [10]     chr6 [ 70918512,  70919166]      *
  -------
  seqinfo: 93 sequences from an unspecified genome; no seqlengths
> 
> createRandomRegions(nregions=10, genome=genome, mask=mask, non.overlapping=TRUE)
GRanges object with 10 ranges and 0 metadata columns:
       seqnames                 ranges strand
          <Rle>              <IRanges>  <Rle>
   [1]     chr1 [ 16594114,  16594380]      *
   [2]     chr1 [ 39462344,  39462619]      *
   [3]     chr1 [ 13944311,  13944548]      *
   [4]     chr1 [146728255, 146728504]      *
   [5]     chr1 [   687668,    687927]      *
   [6]     chr1 [ 40808560,  40808770]      *
   [7]     chr1 [151201103, 151201348]      *
   [8]     chr1 [ 56279462,  56279711]      *
   [9]     chr1 [153853442, 153853677]      *
  [10]     chr1 [ 35309412,  35309682]      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>