Named list. Named list with all required parameter names and their respective values, which
should be generated via the helper function getDefaultParameterList.
Note that all supported parameters must be defined in the list, as obtained by the function getDefaultParameterList .
See also ?getDefaultParameterList for details.
files.df
Data frame with at least the column "signal" specifying the absolute paths to the BAM files that will be processed.
Optionally, further columns can be added.
Supported are "input", "individual" and "genotype". See the Vignette for further details.
The data frame can either be created manually or via the helper function collectFiles.
onlyPrepareForDatasetCorrelation
Logical(1). Default FALSE. If set to TRUE, only steps necessary to analyze
the correlation among datasets with respect to their read counts are calculated, which is less thsan time-consuming than running the full pipeline.
This is a quality control step to identify outlier datasets
that show artefacts and that should therefore be removed from the analysis. If set to FALSE (the default), the full pipeline is
executed. In both cases, the function plotAndCalculateCorrelationDatasets can be executed afterwards.
verbose
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled?
Details
If you already have BAM files in objects of class BamFile or BamFileList,
see the function collectFiles for how to seemlessly integrate them into the SNPhood framework.
In addition, see the vignettes for more details.
Value
Object of class SNPhood. See the class description (?"SNPhood-class", or click the link) for details.
Examples
## For the following example, see also the workflow vignette!
library(SNPhoodData)
# get a list of files to process
dataDir = system.file("extdata", package = "SNPhoodData")
files.df = collectFiles(patternFiles = paste0(dataDir,"/*.bam"))
files.df$individual = c("GM10847", "GM10847", "GM12890", "GM12890")
fileUserRegions = list.files(pattern = "*.txt",dataDir, full.names = TRUE)
par.l = getDefaultParameterList(path_userRegions = fileUserRegions)
par.l$poolDatasets = TRUE
# Run the main function with the full pipeline
SNPhood.o = analyzeSNPhood (par.l, files.df)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(SNPhood)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
Loading required package: data.table
Attaching package: 'data.table'
The following object is masked from 'package:GenomicRanges':
shift
The following object is masked from 'package:IRanges':
shift
Loading required package: checkmate
------------------------------------------------------------------------------------------------------------------
| Welcome to the SNPhood package and thank you for using our software. This is SNPhood version 1.2.2. |
| See the vignettes (type browseVignettes("SNPhood") or the help pages for how to use SNPhood for your analyses. |
| Thank you for using our software. Please do not hesitate to contact us if there are any questions. |
------------------------------------------------------------------------------------------------------------------
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/SNPhood/analyzeSNPhood.Rd_%03d_medium.png", width=480, height=480)
> ### Name: analyzeSNPhood
> ### Title: Main function of _SNPhood_
> ### Aliases: analyzeSNPhood
>
> ### ** Examples
>
> ## For the following example, see also the workflow vignette!
> library(SNPhoodData)
> # get a list of files to process
> dataDir = system.file("extdata", package = "SNPhoodData")
> files.df = collectFiles(patternFiles = paste0(dataDir,"/*.bam"))
Search for files with the pattern '*.bam' in directory /home/ddbj/local/lib64/R/library/SNPhoodData/extdata (recursive: FALSE, case-sensitive:FALSE)
Found the following files:
/home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM10847_H3K27AC_1_reconcile.dedup.chr21.bam
/home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM10847_H3K27AC_2_reconcile.dedup.chr21.bam
/home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM12890_H3K27AC_1_reconcile.dedup.chr21.bam
/home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM12890_H3K27AC_2_reconcile.dedup.chr21.bam
> files.df$individual = c("GM10847", "GM10847", "GM12890", "GM12890")
> fileUserRegions = list.files(pattern = "*.txt",dataDir, full.names = TRUE)
> par.l = getDefaultParameterList(path_userRegions = fileUserRegions)
> par.l$poolDatasets = TRUE
> # Run the main function with the full pipeline
> SNPhood.o = analyzeSNPhood (par.l, files.df)
Total size of all objects: 4.2 Mb
START WITH AUTOMATED PIPELINE
The following arguments have been provided:
SUCCESSFULLY FINISHED PARSING AND CHECKING THE CONFIGURATION FILE. PARSED PARAMETERS:
Parameter "readFlag_isPaired": "TRUE"
Parameter "readFlag_isProperPair": "TRUE"
Parameter "readFlag_isUnmappedQuery": "FALSE"
Parameter "readFlag_hasUnmappedMate": "FALSE"
Parameter "readFlag_isMinusStrand": "NA"
Parameter "readFlag_isMateMinusStrand": "NA"
Parameter "readFlag_isFirstMateRead": "NA"
Parameter "readFlag_isSecondMateRead": "NA"
Parameter "readFlag_isNotPrimaryRead": "FALSE"
Parameter "readFlag_isNotPassingQualityControls": "FALSE"
Parameter "readFlag_isDuplicate": "FALSE"
Parameter "readFlag_reverseComplement": "FALSE"
Parameter "readFlag_simpleCigar": "TRUE"
Parameter "readFlag_minMapQ": "0"
Parameter "path_userRegions": "/home/ddbj/local/lib64/R/library/SNPhoodData/extdata/cisQ.H3K27AC.chr21.txt"
Parameter "zeroBasedCoordinates": "FALSE"
Parameter "regionSize": "500"
Parameter "binSize": "50"
Parameter "readGroupSpecific": "TRUE"
Parameter "strand": "both"
Parameter "startOpen": "FALSE"
Parameter "endOpen": "FALSE"
Parameter "headerLine": "FALSE"
Parameter "linesToParse": "-1"
Parameter "lastBinTreatment": "delete"
Parameter "assemblyVersion": "hg19"
Parameter "effectiveGenomeSizePercentage": "-1"
Parameter "nCores": "1"
Parameter "keepAllReadCounts": "FALSE"
Parameter "normByInput": "FALSE"
Parameter "normAmongEachOther": "TRUE"
Parameter "poolDatasets": "TRUE"
Add strand information for all regions. Assume strand is irrelevant (*)...
Modify chromosome names because they do not start with "chr"
Parse the user regions file
Finished parsing. Number of entries processed: 178
Split regions into bins
Finished execution using 1 cores.
Execution time: 0 secs
Finished execution using 1 cores.
Execution time: 0 secs
Finished execution using 1 cores.
Execution time: 0.2 secs
Finished execution using 1 cores.
Execution time: 0.2 secs
Finished execution using 1 cores.
Execution time: 0 secs
The last bin in each region will be deleted as they are shorter than the other bins (1 bp as compared to 50 bp).
Finished execution using 1 cores.
Execution time: 0 secs
Split 174 entries into 3480 bins
Execution time: 0.5 secs
Parse BAM header...
Read group specific reporting has been requested. The following read groups have been identified in the BAM header: paternal, maternal, ambiguous
PROCESS INPUT FILE SET NA (1 of 1)
PROCESS INDIVIDUAL 1 from 2: GM10847
PROCESS FILE 1 of 2:/home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM10847_H3K27AC_1_reconcile.dedup.chr21.bam
Parse BAM header...
Extract data from BAM file /home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM10847_H3K27AC_1_reconcile.dedup.chr21.bam for the SNPs only (this may take a while)...
Determine genotype distribution at original user positions for A, C, G, and T...
Read group paternal
Finished execution using 1 cores.
Execution time: 0.1 secs
Read group maternal
Finished execution using 1 cores.
Execution time: 0.2 secs
Read group ambiguous
Finished execution using 1 cores.
Execution time: 0.2 secs
Execution time: 3.5 secs
Extract data from BAM file /home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM10847_H3K27AC_1_reconcile.dedup.chr21.bam for the full SNP regions (this may take a while)...
Analyze read counts specifically for each read group (this may take a while)...
Read group paternal...
Filtered 18252 reads out of 19726 in file /home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM10847_H3K27AC_1_reconcile.dedup.chr21.bam because of strand and read group
Analyze read counts per bin (this may take a while)...
Finished execution using 1 cores.
Execution time: 2.2 secs
Error in analyzeSNPhood(par.l, files.df) :
Assertion on 'functionName' failed: Must be a function, not 'character'.
Calls: analyzeSNPhood ... .execInParallelGen -> assertFunction -> makeAssertion -> mstop
In addition: Warning messages:
1: In analyzeSNPhood(par.l, files.df) :
Forcing parameter normAmongEachOther to FALSE because either input normalization is turned on, only one files is going to be processed, or because allele-specific reads are requested
2: In .parseAndProcessUserRegions(par.l, chrSizes.df, verbose = verbose) :
4 duplicate region removed in file /home/ddbj/local/lib64/R/library/SNPhoodData/extdata/cisQ.H3K27AC.chr21.txt out of 178 regions. New number of regions: 174
Execution halted