data.frame with sgRNA readcounts. Must have one column with sgRNA names and one column with readcounts. Please note that the data must be formatted in a way, that gene names are included within the sgRNA name and can be extracted using the extractpattern expression.
e.g. GENE_sgRNA1 -> GENE as gene name, _ as the separator and sgRNA1 as the sgRNA identifier.
namecolumn
integer, indicates in which column the names are stored
countcolumn
integer, indicates in which column the readcount are stored
agg.function
expression, the function to be used for aggregating data. Since for sgRNAs, aggregating data to the corresponding gene, sum will be right function in this case. Other possibilities include any other mathematical function R is capable of, e.g. median, mean.
extractpattern
Regular Expression, used to extract the gene name from the sgRNA name. Please make sure that the gene name extracted is accesible by putting its regular expression in brackets (). The default value expression("^(.+?)_.+") will look for the gene name (.+?) in front of the separator _ and any character afterwards .+ e.g. gene1_anything .
type
CaRpools can either aggregate the data frame ('type = "annotate"') or annotate the gene identifiers only as an additional column ('type = "annotate"').
*Default* "aggregate"
*Values* "aggregate", "annotate"
Details
aggregatetogenes can be used after load.file() to create quality control plots for aggregated gene data instead of single sgRNA data.
Before:
DesignID
fullmatch
AAK1_104_0
0
AAK1_105_0
197
AAK1_106_0
271
AAK1_107_0
1
AAK1_108_0
0
Afterwards:
DesignID
fullmatch
AAK1
880
AATK
2105
ABI1
1610
Value
A data.frame is returned with namecolumn (which no includes only gene names) and all readcount information aggregated by the agg.function.