R: Extracting sgRNA information from NGS FASTQ files to create...
data.extract
R Documentation
Extracting sgRNA information from NGS FASTQ files to create read-count files for caRpools Analysis
Description
CaRpools offers two ways of providing CRISPR/Cas9 screening data.
Either raw **read-count files** are directly used as described before, or read-count files are generated from NGS FASTQ files by extracting the 20 nt target sequence, mapping it against a reference library and extracting the read-count information for each sgRNA identifier.
In a first step, NGS FASTQ data is extracted and mapped against a reference library file using bowtie2.
Absolute path of the folder that contains 'CRISPR-extract.pl' and 'CRISPR-mapping.pl'
*Default* NULL
*Values* absolute path (character)
datapath
Absolute path of the folder that contains the data files (e.g. file.FASTQ)
*Default* NULL
*Values* absolute path (character)
fastqfile
Filename of FASTQ file WITHOUT .fastq extension
*Default* NULL
*Values* filename (character)
extract
Whether CRISPR-extract.pl is used to extract the 20 nt target sequence from the NGS reads using 'pattern'
*Default* FALSE
*Values* TRUE, FALSE (boolean)
pattern
PERL regular Expression to extract 20 nt target sequence from NGS reads. Please see *extract pattern* in this manual for more information.
*Default* Regular Expression (character)
machinepattern
Maschine ID of your Sequencing maschine. Used ot identify the read id.
createindex
Do you want caRpools to generate a bowtie2 index? Only necessary if 'mapping=TRUE'.
*Default* FALSE
*Values* TRUE, FALSE
referencefile
Filename of the library reference FASTA file, without extension. Is the same as bowtie2 file, if 'createindex=TRUE'.
mapping
Indicates whether FASTQ files need to be mapped against 'referencefile'/'bowtie2file'. FALSE by default.
*Default* FALSE
*Values* TRUE, FALSE
reversecomplement
Is the NGS sequence in reverse complement order?
*Default* FALSE
*Values* TRUE, FALSE
threads
How many threads can bowtie2 use for mapping? Only used if 'mapping=TRUE'. Usually cores of CPU.
*Default* 2
*Values* any integer
bowtieparams
If you want to pass additional parameters to bowtie2.
sensitivity
You can djust the sensitivity of bowtie2 using this parameter. By default, bowtie2 is used in a very-sensitive-local setting. More information about different sensitivy parameters can be found at the [bowtie2 options](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#options).
*Default* "very-sensitive-local"
*Other options: very-fast, fast, sensitive, very-fast-local, fast-local, sensitive-local*
match
After bowtie2 mapping, the aligment is converted into read count files *filename_extracted-design.txt* and *filename_extracted-genes.txt*.
You can indiciate how well the alignment must be in order to be used for generating the read count for each sgRNA.
By default, this is set to *perfect*, which only employs a mapped read if the full 20 nt from the sequencing match perfectly to the sgRNA found in your library reference. The following options can be used:
* __perfect__ - Read is used of all 20 nt from the sequencing are matching the target sequence given in the library reference
* __high__ - Read is used if at least 18 nt (starting from the PAM) are matching the target sequence in the reference
* __seed__ - Read is used if at least 14 nt (starting from the PAM) are a perfect match against the target sequence in the reference
Details
none
Value
Returns file name for load.file().
Generated additional read-count files.
Note
Needs bowtie2 and PERL working. use check.caRpools() first.