Last data update: 2014.03.03

R: Main function of _SNPhood_
analyzeSNPhoodR Documentation

Main function of SNPhood

Description

analyzeSNPhood is the main function of the SNPhood package. All results, parameters and metadata are stored in an object of class SNPhood.

Usage

analyzeSNPhood(par.l, files.df, onlyPrepareForDatasetCorrelation = FALSE,
  verbose = TRUE)

Arguments

par.l

Named list. Named list with all required parameter names and their respective values, which should be generated via the helper function getDefaultParameterList. Note that all supported parameters must be defined in the list, as obtained by the function getDefaultParameterList . See also ?getDefaultParameterList for details.

files.df

Data frame with at least the column "signal" specifying the absolute paths to the BAM files that will be processed. Optionally, further columns can be added. Supported are "input", "individual" and "genotype". See the Vignette for further details. The data frame can either be created manually or via the helper function collectFiles.

onlyPrepareForDatasetCorrelation

Logical(1). Default FALSE. If set to TRUE, only steps necessary to analyze the correlation among datasets with respect to their read counts are calculated, which is less thsan time-consuming than running the full pipeline. This is a quality control step to identify outlier datasets that show artefacts and that should therefore be removed from the analysis. If set to FALSE (the default), the full pipeline is executed. In both cases, the function plotAndCalculateCorrelationDatasets can be executed afterwards.

verbose

Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled?

Details

If you already have BAM files in objects of class BamFile or BamFileList, see the function collectFiles for how to seemlessly integrate them into the SNPhood framework.

In addition, see the vignettes for more details.

Value

Object of class SNPhood. See the class description (?"SNPhood-class", or click the link) for details.

Examples

## For the following example, see also the workflow vignette!
library(SNPhoodData)
# get a list of files to process
dataDir = system.file("extdata", package = "SNPhoodData")
files.df = collectFiles(patternFiles = paste0(dataDir,"/*.bam"))
files.df$individual = c("GM10847", "GM10847", "GM12890", "GM12890")
fileUserRegions = list.files(pattern = "*.txt",dataDir, full.names = TRUE)
par.l = getDefaultParameterList(path_userRegions = fileUserRegions)
par.l$poolDatasets = TRUE
# Run the main function with the full pipeline
SNPhood.o = analyzeSNPhood (par.l, files.df)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SNPhood)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
Loading required package: data.table

Attaching package: 'data.table'

The following object is masked from 'package:GenomicRanges':

    shift

The following object is masked from 'package:IRanges':

    shift

Loading required package: checkmate

------------------------------------------------------------------------------------------------------------------
|       Welcome to the SNPhood package and thank you for using our software. This is SNPhood version 1.2.2.      |
| See the vignettes (type browseVignettes("SNPhood") or the help pages for how to use SNPhood for your analyses. |
|       Thank you for using our software. Please do not hesitate to contact us if there are any questions.       |
------------------------------------------------------------------------------------------------------------------

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/SNPhood/analyzeSNPhood.Rd_%03d_medium.png", width=480, height=480)
> ### Name: analyzeSNPhood
> ### Title: Main function of _SNPhood_
> ### Aliases: analyzeSNPhood
> 
> ### ** Examples
> 
> ## For the following example, see also the workflow vignette!
> library(SNPhoodData)
> # get a list of files to process
> dataDir = system.file("extdata", package = "SNPhoodData")
> files.df = collectFiles(patternFiles = paste0(dataDir,"/*.bam"))
Search for files with the pattern '*.bam' in directory /home/ddbj/local/lib64/R/library/SNPhoodData/extdata (recursive: FALSE, case-sensitive:FALSE)
Found the following files:
/home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM10847_H3K27AC_1_reconcile.dedup.chr21.bam
 /home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM10847_H3K27AC_2_reconcile.dedup.chr21.bam
 /home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM12890_H3K27AC_1_reconcile.dedup.chr21.bam
 /home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM12890_H3K27AC_2_reconcile.dedup.chr21.bam
> files.df$individual = c("GM10847", "GM10847", "GM12890", "GM12890")
> fileUserRegions = list.files(pattern = "*.txt",dataDir, full.names = TRUE)
> par.l = getDefaultParameterList(path_userRegions = fileUserRegions)
> par.l$poolDatasets = TRUE
> # Run the main function with the full pipeline
> SNPhood.o = analyzeSNPhood (par.l, files.df)
Total size of all objects: 4.2 Mb


START WITH AUTOMATED PIPELINE


The following arguments have been provided:
SUCCESSFULLY FINISHED PARSING AND CHECKING THE CONFIGURATION FILE. PARSED PARAMETERS:
 Parameter "readFlag_isPaired": "TRUE"
 Parameter "readFlag_isProperPair": "TRUE"
 Parameter "readFlag_isUnmappedQuery": "FALSE"
 Parameter "readFlag_hasUnmappedMate": "FALSE"
 Parameter "readFlag_isMinusStrand": "NA"
 Parameter "readFlag_isMateMinusStrand": "NA"
 Parameter "readFlag_isFirstMateRead": "NA"
 Parameter "readFlag_isSecondMateRead": "NA"
 Parameter "readFlag_isNotPrimaryRead": "FALSE"
 Parameter "readFlag_isNotPassingQualityControls": "FALSE"
 Parameter "readFlag_isDuplicate": "FALSE"
 Parameter "readFlag_reverseComplement": "FALSE"
 Parameter "readFlag_simpleCigar": "TRUE"
 Parameter "readFlag_minMapQ": "0"
 Parameter "path_userRegions": "/home/ddbj/local/lib64/R/library/SNPhoodData/extdata/cisQ.H3K27AC.chr21.txt"
 Parameter "zeroBasedCoordinates": "FALSE"
 Parameter "regionSize": "500"
 Parameter "binSize": "50"
 Parameter "readGroupSpecific": "TRUE"
 Parameter "strand": "both"
 Parameter "startOpen": "FALSE"
 Parameter "endOpen": "FALSE"
 Parameter "headerLine": "FALSE"
 Parameter "linesToParse": "-1"
 Parameter "lastBinTreatment": "delete"
 Parameter "assemblyVersion": "hg19"
 Parameter "effectiveGenomeSizePercentage": "-1"
 Parameter "nCores": "1"
 Parameter "keepAllReadCounts": "FALSE"
 Parameter "normByInput": "FALSE"
 Parameter "normAmongEachOther": "TRUE"
 Parameter "poolDatasets": "TRUE"
   Add strand information for all regions. Assume strand is irrelevant (*)...
   Modify chromosome names because they do not start with "chr"
 Parse the user regions file
  Finished parsing. Number of entries processed: 178
 Split regions into bins
 Finished execution using 1 cores.
 Execution time: 0 secs
 Finished execution using 1 cores.
 Execution time: 0 secs
 Finished execution using 1 cores.
 Execution time: 0.2 secs
 Finished execution using 1 cores.
 Execution time: 0.2 secs
 Finished execution using 1 cores.
 Execution time: 0 secs
 The last bin in each region will be deleted as they are shorter than the other bins (1 bp as compared to 50 bp).
 Finished execution using 1 cores.
 Execution time: 0 secs
 Split 174 entries into 3480 bins
 Execution time: 0.5 secs
 Parse BAM header...
  Read group specific reporting has been requested. The following read groups have been identified in the BAM header: paternal, maternal, ambiguous

PROCESS INPUT FILE SET NA (1 of 1)

PROCESS INDIVIDUAL 1 from 2: GM10847

PROCESS FILE 1 of 2:/home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM10847_H3K27AC_1_reconcile.dedup.chr21.bam
 Parse BAM header...
 Extract data from BAM file /home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM10847_H3K27AC_1_reconcile.dedup.chr21.bam for the SNPs only (this may take a while)...
 Determine genotype distribution at original user positions for A, C, G, and T...
  Read group paternal
 Finished execution using 1 cores.
 Execution time: 0.1 secs
  Read group maternal
 Finished execution using 1 cores.
 Execution time: 0.2 secs
  Read group ambiguous
 Finished execution using 1 cores.
 Execution time: 0.2 secs
 Execution time: 3.5 secs
 Extract data from BAM file /home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM10847_H3K27AC_1_reconcile.dedup.chr21.bam for the full SNP regions (this may take a while)...
 Analyze read counts specifically for each read group (this may take a while)...
  Read group paternal...
   Filtered 18252 reads out of 19726 in file /home/ddbj/local/lib64/R/library/SNPhoodData/extdata/SNYDER_HG19_GM10847_H3K27AC_1_reconcile.dedup.chr21.bam because of strand and read group
   Analyze read counts per bin (this may take a while)...
 Finished execution using 1 cores.
 Execution time: 2.2 secs
Error in analyzeSNPhood(par.l, files.df) : 
  Assertion on 'functionName' failed: Must be a function, not 'character'.
Calls: analyzeSNPhood ... .execInParallelGen -> assertFunction -> makeAssertion -> mstop
In addition: Warning messages:
1: In analyzeSNPhood(par.l, files.df) :
  Forcing parameter normAmongEachOther to FALSE because either input normalization is turned on, only one files is going to be processed, or because allele-specific reads are requested
2: In .parseAndProcessUserRegions(par.l, chrSizes.df, verbose = verbose) :
  4 duplicate region removed in file /home/ddbj/local/lib64/R/library/SNPhoodData/extdata/cisQ.H3K27AC.chr21.txt out of 178 regions. New number of regions: 174
Execution halted