R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Extracting sgRNA information from NGS FASTQ files to create...

data.extract

R Documentation

Extracting sgRNA information from NGS FASTQ files to create read-count files for caRpools Analysis

Description

CaRpools offers two ways of providing CRISPR/Cas9 screening data. Either raw **read-count files** are directly used as described before, or read-count files are generated from NGS FASTQ files by extracting the 20 nt target sequence, mapping it against a reference library and extracting the read-count information for each sgRNA identifier.

In a first step, NGS FASTQ data is extracted and mapped against a reference library file using bowtie2.

Usage

data.extract(scriptpath=NULL, datapath=NULL, fastqfile=NULL, extract = FALSE,
pattern = "default", machinepattern = "default", createindex = FALSE,
referencefile = NULL, mapping = FALSE, reversecomplement = FALSE,
threads = 1, bowtieparams = "", sensitivity = "very-sensitive-local", match = "perfect")

Arguments

`scriptpath`	Absolute path of the folder that contains 'CRISPR-extract.pl' and 'CRISPR-mapping.pl' Default NULL Values absolute path (character)
`datapath`	Absolute path of the folder that contains the data files (e.g. file.FASTQ) Default NULL Values absolute path (character)
`fastqfile`	Filename of FASTQ file WITHOUT .fastq extension Default NULL Values filename (character)
`extract`	Whether CRISPR-extract.pl is used to extract the 20 nt target sequence from the NGS reads using 'pattern' Default FALSE Values TRUE, FALSE (boolean)
`pattern`	PERL regular Expression to extract 20 nt target sequence from NGS reads. Please see extract pattern in this manual for more information. Default Regular Expression (character)
`machinepattern`	Maschine ID of your Sequencing maschine. Used ot identify the read id.
`createindex`	Do you want caRpools to generate a bowtie2 index? Only necessary if 'mapping=TRUE'. Default FALSE Values TRUE, FALSE
`referencefile`	Filename of the library reference FASTA file, without extension. Is the same as bowtie2 file, if 'createindex=TRUE'.
`mapping`	Indicates whether FASTQ files need to be mapped against 'referencefile'/'bowtie2file'. FALSE by default. Default FALSE Values TRUE, FALSE
`reversecomplement`	Is the NGS sequence in reverse complement order? Default FALSE Values TRUE, FALSE
`threads`	How many threads can bowtie2 use for mapping? Only used if 'mapping=TRUE'. Usually cores of CPU. Default 2 Values any integer
`bowtieparams`	If you want to pass additional parameters to bowtie2.
`sensitivity`	You can djust the sensitivity of bowtie2 using this parameter. By default, bowtie2 is used in a very-sensitive-local setting. More information about different sensitivy parameters can be found at the [bowtie2 options](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#options). Default "very-sensitive-local" Other options: very-fast, fast, sensitive, very-fast-local, fast-local, sensitive-local
`match`	After bowtie2 mapping, the aligment is converted into read count files filename_extracted-design.txt and filename_extracted-genes.txt. You can indiciate how well the alignment must be in order to be used for generating the read count for each sgRNA. By default, this is set to perfect, which only employs a mapped read if the full 20 nt from the sequencing match perfectly to the sgRNA found in your library reference. The following options can be used: * __perfect__ - Read is used of all 20 nt from the sequencing are matching the target sequence given in the library reference * __high__ - Read is used if at least 18 nt (starting from the PAM) are matching the target sequence in the reference * __seed__ - Read is used if at least 14 nt (starting from the PAM) are a perfect match against the target sequence in the reference

Details

none

Value

Returns file name for load.file(). Generated additional read-count files.

Note

Needs bowtie2 and PERL working. use check.caRpools() first.

Author(s)

Jan Winter

Examples

data(caRpools)
# fileCONTROL1 = data.extract(scriptpath="path.to.scripts",
# datapath="path.to.FASTQ", fastqfile="filename1", extract=TRUE,
# seq.pattern, maschine.pattern, createindex=TRUE,
# bowtie2file=filename.lib.reference, referencefile="filename.lib.reference", 
# mapping=TRUE, reversecomplement=FALSE, threads, bowtieparams,
#sensitivity="very-sensitive-local",match="perfect")  
# Now we can load the generated Read-Count file directly!
#CONTROL1 = load.file(paste(datapath, fileCONTROL1, sep="/")) # Untreated sample 1 loaded

# Don't forget the library reference
# libFILE = load.file( paste(datapath, paste(referencefile,".fasta",sep=""), sep="/"),
# header = FALSE, type="fastalib")