R Graphical Manual

Browse All

Last data update: 2014.03.03

R: predict the cleavage and polyadenylation(CP) site

CPsites

R Documentation

predict the cleavage and polyadenylation(CP) site

Description

predict the alternative cleavage and polyadenylation (CP or APA) site.

Usage

    CPsites(coverage, groupList=NULL, genome, utr3, 
        window_size=100, search_point_START=50, search_point_END=NA, 
        cutStart=window_size, cutEnd=0, adjust_distal_polyA_end=TRUE, 
        coverage_threshold=5, long_coverage_threshold=2, 
        background=c("same_as_long_coverage_threshold", 
                     "1K", "5K", "10K", "50K"),
        txdb=NA,
        gcCompensation=NA, mappabilityCompensation=NA, 
        FFT=FALSE, fft.sm.power=20, 
        PolyA_PWM=NA, classifier=NA, classifier_cutoff=.8, shift_range=window_size,
        BPPARAM=NULL, tmpfolder=NULL, silence=TRUE)

Arguments

`coverage`	coverage for each sample, output of coverageFromBedGraph
`groupList`	group list of tag names
`genome`	an object of BSgenome
`utr3`	output of utr3Annotation
`window_size`	window size for noval distal position searching and adjusted polyA searching, default: 100
`search_point_START`	start point for searching
`search_point_END`	end point for searching
`cutStart`	how many nucleotides should be removed from the start before search, 0.1 means 10 percent, 25 means cut first 25.
`cutEnd`	how many nucleotides should be removed from the end before search, 0.1 means 10 percent.
`adjust_distal_polyA_end`	If true, adjust distal polyA end by cleanUpdTSeq
`coverage_threshold`	cutoff coverage threshold for first 100 nucleotides. If the coverage of first 100 nucleotides is lower than coverage_threshold, that transcript will be dropped.
`long_coverage_threshold`	cutoff threshold for coverage in the region of long form. If the coverage in the region of long form is less than long_coverage_threshold, that transcript will be dropped.
`background`	the range for calculating cutoff threshold of local background
`txdb`	an object of TxDb
`gcCompensation`	GC content compensation vector
`mappabilityCompensation`	mappability compensation vector
`FFT`	use Fast Fourier Transform Algorithm to smooth the data or not. default: FALSE
`fft.sm.power`	if FFT is TRUE, the frequency should be removed
`PolyA_PWM`	Position Weight Matrix of polyA
`classifier`	An object of class `"PASclassifier"`
`classifier_cutoff`	This is the cutoff used to assign whether a putative pA is true or false. This can be any floating point number between 0 and 1. For example, classifier_cutoff = 0.5 will assign an putative pA site with prob.1 > 0.5 to the True class (1), and any putative pA site with prob.1 <= 0.5 as False (0).
`shift_range`	the shift range for polyA site searching
`BPPARAM`	An optional `BiocParallelParam` instance determining the parallel back-end to be used during evaluation, or a list of BiocParallelParam instances, to be applied in sequence for nested calls to bplapply.
`tmpfolder`	temp folder could save and reload the analysis data for resume analysis.
`silence`	report progress or not. default not report.

Value

return an object of GRanges contain the estimated CP sites.

Author(s)

Jianhong Ou

References

ref: Cheung MS, Down TA, Latorre I, Ahringer J. Systematic bias in high-throughput sequencing data and its correction by BEADS. Nucleic Acids Res. 2011 Aug;39(15):e103. doi: 10.1093/nar/gkr425. Epub 2011 Jun 6. PubMed PMID: 21646344; PubMed Central PMCID: PMC3159482.

mappability could be calculated by [GEM](http://algorithms.cnag.cat/wiki/Man:gem-mappability)

ref: Derrien T, Estelle J, Marco Sola S, Knowles DG, Raineri E, Guigo R, Ribeca P. Fast computation and applications of genome mappability. PLoS One. 2012;7(1):e30377. doi: 10.1371/journal.pone.0030377. Epub 2012 Jan 19. PubMed PMID: 22276185; PubMed Central PMCID: PMC3261895.

Examples

    if(interactive()){
        library(BSgenome.Mmusculus.UCSC.mm10)
        path <- file.path(find.package("InPAS"), "extdata")
        bedgraphs <- file.path(path, "Baf3.extract.bedgraph")
        data(utr3.mm10)
        tags <- "Baf3"
        genome <- BSgenome.Mmusculus.UCSC.mm10
        coverage <- 
            coverageFromBedGraph(bedgraphs, tags, genome, hugeData=FALSE)
        CP <- CPsites(coverage=coverage, gp1=tags, gp2=NULL, genome=genome, 
            utr3=utr3.mm10, coverage_threshold=5, long_coverage_threshold=5)
    }

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(InPAS)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: GenomicRanges
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/InPAS/CPsites.Rd_%03d_medium.png", width=480, height=480)
> ### Name: CPsites
> ### Title: predict the cleavage and polyadenylation(CP) site
> ### Aliases: CPsites
> ### Keywords: misc
> 
> ### ** Examples
> 
> #    if(interactive()){
>         library(BSgenome.Mmusculus.UCSC.mm10)
Loading required package: BSgenome
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
>         path <- file.path(find.package("InPAS"), "extdata")
>         bedgraphs <- file.path(path, "Baf3.extract.bedgraph")
>         data(utr3.mm10)
>         tags <- "Baf3"
>         genome <- BSgenome.Mmusculus.UCSC.mm10
>         coverage <- 
+             coverageFromBedGraph(bedgraphs, tags, genome, hugeData=FALSE)
>         CP <- CPsites(coverage=coverage, gp1=tags, gp2=NULL, genome=genome, 
+             utr3=utr3.mm10, coverage_threshold=5, long_coverage_threshold=5)
Error in CPsites(coverage = coverage, gp1 = tags, gp2 = NULL, genome = genome,  : 
  unused arguments (gp1 = tags, gp2 = NULL)
Execution halted