Last data update: 2014.03.03

R: Filters on spliceR-lists for reduction of data sets
preSpliceRFilterR Documentation

Filters on spliceR-lists for reduction of data sets

Description

Applies a number of filters on a spliceR object to reduce data set size before running downstream analyses.

Usage

preSpliceRFilter(spliceRobject, filters, expressionCutoff=0)

Arguments

spliceRobject

a SpliceRList object, either created manually from transcript and exon information (see SpliceRList), or created by prepareCuff from CuffLinks data.

filters

vector, giving the filters that should be applied - any combinations of 'geneOK', 'expressedGenes', 'sigGenes', 'isoOK', 'expressedIso', 'isoClass' and/or 'sigIso'. Works only for data from cufflinks, as a manually generated SpliceRList does not include these metacolumns.

expressionCutoff

Numeric, giving the expression threshold (often in FPKM) used for the 'expressedGenes' and 'expressedIso' filter. Default value is 0.

Details

Often, many genes and isoforms are flagged as not "OK" or "LOWDATA" by Cufflinks, indicating low confidence in these. This function is handy for reducing the data size of a Cufflinks data set to reduce running times for downstream analyses.

Note, that preSpliceRFilter removes trancsripts from the dataset permanently, reducing size, while the filter options of spliceR and annotatePTC only selects transcripts for analysis, but does not remove any data.

Value

A SpliceRList with transcripts after filtering.

Author(s)

Kristoffer Vitting-Seerup, Johannes Waage

References

Vitting-Seerup K , Porse BT, Sandelin A, Waage J. (2014) spliceR: an R package for classification of alternative splicing and prediction of coding potential from RNA-seq data. BMC Bioinformatics 15:81.

Examples

#Load cufflinks example data
cuffDB <- prepareCuffExample()

#Generate SpliceRList from cufflinks data
cuffDB_spliceR <- prepareCuff(cuffDB)

#Filter 
cuffDB_spliceR_filtered <- preSpliceRFilter(cuffDB_spliceR, filters=c("expressedIso", "isoOK", "expressedGenes", "geneOK"))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(spliceR)
Loading required package: cummeRbund
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: RSQLite
Loading required package: DBI
Loading required package: ggplot2
Loading required package: reshape2
Loading required package: fastcluster

Attaching package: 'fastcluster'

The following object is masked from 'package:stats':

    hclust

Loading required package: rtracklayer
Loading required package: GenomicRanges
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Gviz
Loading required package: grid

Attaching package: 'cummeRbund'

The following object is masked from 'package:GenomicRanges':

    promoters

The following object is masked from 'package:IRanges':

    promoters

The following object is masked from 'package:BiocGenerics':

    conditions

Loading required package: VennDiagram
Loading required package: futile.logger
Loading required package: RColorBrewer
Loading required package: plyr

Attaching package: 'plyr'

The following object is masked from 'package:cummeRbund':

    count

The following object is masked from 'package:IRanges':

    desc

The following object is masked from 'package:S4Vectors':

    rename


Attaching package: 'spliceR'

The following object is masked from 'package:cummeRbund':

    conditions

The following object is masked from 'package:BiocGenerics':

    conditions

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/spliceR/preSpliceRFilter.Rd_%03d_medium.png", width=480, height=480)
> ### Name: preSpliceRFilter
> ### Title: Filters on spliceR-lists for reduction of data sets
> ### Aliases: preSpliceRFilter
> 
> ### ** Examples
> 
> #Load cufflinks example data
> cuffDB <- prepareCuffExample()
Creating database /tmp/RtmpJZqTuG/cuffData.db
Reading Run Info File /tmp/RtmpJZqTuG/run.info
Writing runInfo Table
Reading Read Group Info  /tmp/RtmpJZqTuG/read_groups.info
Writing replicates Table
Reading GTF file
Writing GTF features to 'features' table...
Reading /tmp/RtmpJZqTuG/genes.fpkm_tracking
Checking samples table...
Populating samples table...
Writing genes table
Reshaping geneData table
Recasting
Writing geneData table
Reading /tmp/RtmpJZqTuG/gene_exp.diff
Writing geneExpDiffData table
Reading /tmp/RtmpJZqTuG/promoters.diff
Writing promoterDiffData table
Reading /tmp/RtmpJZqTuG/genes.count_tracking
Reshaping geneCount table
Recasting
Writing geneCount table
Reading read group info in /tmp/RtmpJZqTuG/genes.read_group_tracking
Writing geneReplicateData table
Reading /tmp/RtmpJZqTuG/isoforms.fpkm_tracking
Checking samples table...
OK!
Writing isoforms table
Reshaping isoformData table
Recasting
Writing isoformData table
Reading /tmp/RtmpJZqTuG/isoform_exp.diff
Writing isoformExpDiffData table
Reading /tmp/RtmpJZqTuG/isoforms.count_tracking
Reshaping isoformCount table
Recasting
Writing isoformCount table
Reading read group info in /tmp/RtmpJZqTuG/isoforms.read_group_tracking
Writing isoformReplicateData table
Reading /tmp/RtmpJZqTuG/tss_groups.fpkm_tracking
Checking samples table...
OK!
Writing TSS table
Reshaping TSSData table
Recasting
Writing TSSData table
Reading /tmp/RtmpJZqTuG/tss_group_exp.diff
Writing TSSExpDiffData table
Reading /tmp/RtmpJZqTuG/splicing.diff
Writing splicingDiffData table
Reading /tmp/RtmpJZqTuG/tss_groups.count_tracking
Reshaping TSSCount table
Recasting
Writing TSSCount table
Reading read group info in /tmp/RtmpJZqTuG/tss_groups.read_group_tracking
Writing TSSReplicateData table
Reading /tmp/RtmpJZqTuG/cds.fpkm_tracking
Checking samples table...
OK!
Writing CDS table
Reshaping CDSData table
Recasting
Writing CDSData table
Reading /tmp/RtmpJZqTuG/cds_exp.diff
Writing CDSExpDiffData table
Reading /tmp/RtmpJZqTuG/cds.diff
Writing CDSDiffData table
Reading /tmp/RtmpJZqTuG/cds.count_tracking
Reshaping CDSCount table
Recasting
Writing CDSCount table
Reading read group info in /tmp/RtmpJZqTuG/cds.read_group_tracking
Writing CDSReplicateData table
Indexing Tables...
Warning messages:
1: attributes are not identical across measure variables; they will be dropped 
2: attributes are not identical across measure variables; they will be dropped 
3: attributes are not identical across measure variables; they will be dropped 
4: attributes are not identical across measure variables; they will be dropped 
5: attributes are not identical across measure variables; they will be dropped 
6: attributes are not identical across measure variables; they will be dropped 
7: attributes are not identical across measure variables; they will be dropped 
8: attributes are not identical across measure variables; they will be dropped 
> 
> #Generate SpliceRList from cufflinks data
> cuffDB_spliceR <- prepareCuff(cuffDB)
Reading cuffDB, isoforms...
Reading cuffDB, exons...
Analyzing cufflinks annotation problem...
Fixing cufflinks annotation problem...
Cufflinks annotation problem was fixed for 65 Cuff_genes
Creating spliceRList...
> 
> #Filter 
> cuffDB_spliceR_filtered <- preSpliceRFilter(cuffDB_spliceR, filters=c("expressedIso", "isoOK", "expressedGenes", "geneOK"))
3609 entries pre-filtering...
2087 entries post-filtering...
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>