R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Aggregates pooled CRISPR screen sgRNA data to gene data

aggregatetogenes

R Documentation

Aggregates pooled CRISPR screen sgRNA data to gene data

Description

Aggregate all sgRNA data from pooled CRISPR screens to their corresponding gene level.

Usage

aggregatetogenes(data.frame, namecolumn = 1, countcolumn = 2,
agg.function = sum, extractpattern = expression("^(.+?)_.+"), type="aggregate")

Arguments

`data.frame`	data.frame with sgRNA readcounts. Must have one column with sgRNA names and one column with readcounts. Please note that the data must be formatted in a way, that gene names are included within the sgRNA name and can be extracted using the extractpattern expression. e.g. GENE_sgRNA1 -> GENE as gene name, _ as the separator and sgRNA1 as the sgRNA identifier.
`namecolumn`	integer, indicates in which column the names are stored
`countcolumn`	integer, indicates in which column the readcount are stored
`agg.function`	expression, the function to be used for aggregating data. Since for sgRNAs, aggregating data to the corresponding gene, sum will be right function in this case. Other possibilities include any other mathematical function R is capable of, e.g. median, mean.
`extractpattern`	Regular Expression, used to extract the gene name from the sgRNA name. Please make sure that the gene name extracted is accesible by putting its regular expression in brackets (). The default value expression("^(.+?)_.+") will look for the gene name (.+?) in front of the separator _ and any character afterwards .+ e.g. gene1_anything .
`type`	CaRpools can either aggregate the data frame ('type = "annotate"') or annotate the gene identifiers only as an additional column ('type = "annotate"'). Default "aggregate" Values "aggregate", "annotate"

Details

aggregatetogenes can be used after load.file() to create quality control plots for aggregated gene data instead of single sgRNA data.

Before:

DesignID	fullmatch
AAK1_104_0	0
AAK1_105_0	197
AAK1_106_0	271
AAK1_107_0	1
AAK1_108_0	0

Afterwards:

DesignID	fullmatch
AAK1	880
AATK	2105
ABI1	1610

Value

A data.frame is returned with namecolumn (which no includes only gene names) and all readcount information aggregated by the agg.function.

Note

none

Author(s)

Jan Winter

Examples

data(caRpools)

CONTROL1.g=aggregatetogenes(data.frame = CONTROL1, agg.function=sum,
                            extractpattern = expression("^(.+?)(_.+)"))