Last data update: 2014.03.03

R: predict the cleavage and polyadenylation(CP) site
CPsitesR Documentation

predict the cleavage and polyadenylation(CP) site

Description

predict the alternative cleavage and polyadenylation (CP or APA) site.

Usage

    CPsites(coverage, groupList=NULL, genome, utr3, 
        window_size=100, search_point_START=50, search_point_END=NA, 
        cutStart=window_size, cutEnd=0, adjust_distal_polyA_end=TRUE, 
        coverage_threshold=5, long_coverage_threshold=2, 
        background=c("same_as_long_coverage_threshold", 
                     "1K", "5K", "10K", "50K"),
        txdb=NA,
        gcCompensation=NA, mappabilityCompensation=NA, 
        FFT=FALSE, fft.sm.power=20, 
        PolyA_PWM=NA, classifier=NA, classifier_cutoff=.8, shift_range=window_size,
        BPPARAM=NULL, tmpfolder=NULL, silence=TRUE)

Arguments

coverage

coverage for each sample, output of coverageFromBedGraph

groupList

group list of tag names

genome

an object of BSgenome

utr3

output of utr3Annotation

window_size

window size for noval distal position searching and adjusted polyA searching, default: 100

search_point_START

start point for searching

search_point_END

end point for searching

cutStart

how many nucleotides should be removed from the start before search, 0.1 means 10 percent, 25 means cut first 25.

cutEnd

how many nucleotides should be removed from the end before search, 0.1 means 10 percent.

adjust_distal_polyA_end

If true, adjust distal polyA end by cleanUpdTSeq

coverage_threshold

cutoff coverage threshold for first 100 nucleotides. If the coverage of first 100 nucleotides is lower than coverage_threshold, that transcript will be dropped.

long_coverage_threshold

cutoff threshold for coverage in the region of long form. If the coverage in the region of long form is less than long_coverage_threshold, that transcript will be dropped.

background

the range for calculating cutoff threshold of local background

txdb

an object of TxDb

gcCompensation

GC content compensation vector

mappabilityCompensation

mappability compensation vector

FFT

use Fast Fourier Transform Algorithm to smooth the data or not. default: FALSE

fft.sm.power

if FFT is TRUE, the frequency should be removed

PolyA_PWM

Position Weight Matrix of polyA

classifier

An object of class "PASclassifier"

classifier_cutoff

This is the cutoff used to assign whether a putative pA is true or false. This can be any floating point number between 0 and 1. For example, classifier_cutoff = 0.5 will assign an putative pA site with prob.1 > 0.5 to the True class (1), and any putative pA site with prob.1 <= 0.5 as False (0).

shift_range

the shift range for polyA site searching

BPPARAM

An optional BiocParallelParam instance determining the parallel back-end to be used during evaluation, or a list of BiocParallelParam instances, to be applied in sequence for nested calls to bplapply.

tmpfolder

temp folder could save and reload the analysis data for resume analysis.

silence

report progress or not. default not report.

Value

return an object of GRanges contain the estimated CP sites.

Author(s)

Jianhong Ou

References

ref: Cheung MS, Down TA, Latorre I, Ahringer J. Systematic bias in high-throughput sequencing data and its correction by BEADS. Nucleic Acids Res. 2011 Aug;39(15):e103. doi: 10.1093/nar/gkr425. Epub 2011 Jun 6. PubMed PMID: 21646344; PubMed Central PMCID: PMC3159482.

mappability could be calculated by [GEM](http://algorithms.cnag.cat/wiki/Man:gem-mappability)

ref: Derrien T, Estelle J, Marco Sola S, Knowles DG, Raineri E, Guigo R, Ribeca P. Fast computation and applications of genome mappability. PLoS One. 2012;7(1):e30377. doi: 10.1371/journal.pone.0030377. Epub 2012 Jan 19. PubMed PMID: 22276185; PubMed Central PMCID: PMC3261895.

Examples

    if(interactive()){
        library(BSgenome.Mmusculus.UCSC.mm10)
        path <- file.path(find.package("InPAS"), "extdata")
        bedgraphs <- file.path(path, "Baf3.extract.bedgraph")
        data(utr3.mm10)
        tags <- "Baf3"
        genome <- BSgenome.Mmusculus.UCSC.mm10
        coverage <- 
            coverageFromBedGraph(bedgraphs, tags, genome, hugeData=FALSE)
        CP <- CPsites(coverage=coverage, gp1=tags, gp2=NULL, genome=genome, 
            utr3=utr3.mm10, coverage_threshold=5, long_coverage_threshold=5)
    }

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(InPAS)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: GenomicRanges
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/InPAS/CPsites.Rd_%03d_medium.png", width=480, height=480)
> ### Name: CPsites
> ### Title: predict the cleavage and polyadenylation(CP) site
> ### Aliases: CPsites
> ### Keywords: misc
> 
> ### ** Examples
> 
> #    if(interactive()){
>         library(BSgenome.Mmusculus.UCSC.mm10)
Loading required package: BSgenome
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
>         path <- file.path(find.package("InPAS"), "extdata")
>         bedgraphs <- file.path(path, "Baf3.extract.bedgraph")
>         data(utr3.mm10)
>         tags <- "Baf3"
>         genome <- BSgenome.Mmusculus.UCSC.mm10
>         coverage <- 
+             coverageFromBedGraph(bedgraphs, tags, genome, hugeData=FALSE)
>         CP <- CPsites(coverage=coverage, gp1=tags, gp2=NULL, genome=genome, 
+             utr3=utr3.mm10, coverage_threshold=5, long_coverage_threshold=5)
Error in CPsites(coverage = coverage, gp1 = tags, gp2 = NULL, genome = genome,  : 
  unused arguments (gp1 = tags, gp2 = NULL)
Execution halted