Last data update: 2014.03.03
R: function to cluster sequences based on their CpG and GC...
function to cluster sequences based on their CpG and GC content
Description
diagnostical function - GC content and CpG content are clustered using 2D gaussian
models (Mclust). FALSE is returned if > max.clust (default=1) subgroups are found
using the bayesian information criterion (BIC). If do.plot=TRUE, the results are visualized.
Usage
## S4 method for signature 'cobindr'
testCpG(x, max.clust = 4, do.plot = F, n.cpu = NA)
Arguments
x
an object of the class "cobindr", which will hold all necessary
information about the sequences and the hits.
max.clust
integer describing the maximal number of clusters which are used for
separating the data.
do.plot
logical flag, if do.plot=TRUE a scatterplot for the GC and CpG
content for each sequence is produced and the clusters are color coded.
n.cpu
number of CPUs to be used for parallelization. Default value is 'NA'
in which case the number of available CPUs is checked and than used.
Value
result
logical flag, FALSE is returned if more than one subgroups are found
using the bayesian information criterion (BIC)
gc
matrix with rows corresponding to sequences and columns
corresponding to GC and CpG content
Author(s)
Robert Lehmann <r.lehmann@biologie.hu-berlin.de>
References
the method uses clustering functions from the package "mclust" (http://www.stat.washington.edu/mclust/)
See Also
plot.gc
Examples
cfg <- cobindRConfiguration()
sequence_type(cfg) <- 'fasta'
sequence_source(cfg) <- system.file('extdata/example.fasta', package='cobindR')
# avoid complaint of validation mechanism
pfm_path(cfg) <- system.file('extdata/pfms',package='cobindR')
pairs(cfg) <- ''
runObj <- cobindr( cfg)
testCpG(runObj, max.clust = 2, do.plot = TRUE)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(cobindR)
Attaching package: 'cobindR'
The following object is masked from 'package:base':
sequence
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/cobindR/testCpG.Rd_%03d_medium.png", width=480, height=480)
> ### Name: testCpG
> ### Title: function to cluster sequences based on their CpG and GC content
> ### Aliases: testCpG testCpG-method testCpG,cobindr-method
> ### Keywords: dplot manip clust
>
> ### ** Examples
>
> cfg <- cobindRConfiguration()
Warning message:
In .local(.Object, ...) :
no config-file defined, generating configuration-object with default values
> sequence_type(cfg) <- 'fasta'
> sequence_source(cfg) <- system.file('extdata/example.fasta', package='cobindR')
> # avoid complaint of validation mechanism
> pfm_path(cfg) <- system.file('extdata/pfms',package='cobindR')
> pairs(cfg) <- ''
> runObj <- cobindr( cfg)
[1] "Creating a new experiment!"
reading file /home/ddbj/local/lib64/R/library/cobindR/extdata/example.fasta ...
| | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |== | 4% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 9% | |======= | 10% | |======= | 11% | |======== | 11% | |======== | 12% | |========= | 12% | |========= | 13% | |========= | 14% | |========== | 14% | |========== | 15% | |=========== | 15% | |=========== | 16% | |============ | 16% | |============ | 17% | |============ | 18% | |============= | 18% | |============= | 19% | |============== | 19% | |============== | 20% | |============== | 21% | |=============== | 21% | |=============== | 22% | |================ | 22% | |================ | 23% | |================ | 24% | |================= | 24% | |================= | 25% | |================== | 25% | |================== | 26% | |=================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |==================== | 29% | |===================== | 29% | |===================== | 30% | |===================== | 31% | |====================== | 31% | |====================== | 32% | |======================= | 32% | |======================= | 33% | |======================= | 34% | |======================== | 34% | |======================== | 35% | |========================= | 35% | |========================= | 36% | |========================== | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================ | 41% | |============================= | 41% | |============================= | 42% | |============================== | 42% | |============================== | 43% | |============================== | 44% | |=============================== | 44% | |=============================== | 45% | |================================ | 45% | |================================ | 46% | |================================= | 46% | |================================= | 47% | |================================= | 48% | |================================== | 48% | |================================== | 49% | |=================================== | 49% | |=================================== | 50% | |=================================== | 51% | |==================================== | 51% | |==================================== | 52% | |===================================== | 52% | |===================================== | 53% | |===================================== | 54% | |====================================== | 54% | |====================================== | 55% | |======================================= | 55% | |======================================= | 56% | |======================================== | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 62% | |============================================ | 63% | |============================================ | 64% | |============================================= | 64% | |============================================= | 65% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 66% | |=============================================== | 67% | |=============================================== | 68% | |================================================ | 68% | |================================================ | 69% | |================================================= | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |=================================================== | 74% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 79% | |======================================================== | 80% | |======================================================== | 81% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 82% | |========================================================== | 83% | |========================================================== | 84% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 86% | |============================================================= | 87% | |============================================================= | 88% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 89% | |=============================================================== | 90% | |=============================================================== | 91% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 92% | |================================================================= | 93% | |================================================================= | 94% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 96% | |==================================================================== | 97% | |==================================================================== | 98% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 99% | |======================================================================| 100%ready retrieving sequences!
reading pfm files: /home/ddbj/local/lib64/R/library/cobindR/extdata/pfms ...
[1] "ES_Sox2_1_c1058"
[1] "ES_Klf4_3_c1373"
[1] "ES_Oct4_1_c570"
[1] "ES_Sox2_1_c1058"
ignored files:
Loading required package: parallel
Using the parallel (multicore) version of cobindR - function cpg.gc.content with 4 cores
There were 50 or more warnings (use warnings() to see the first 50)
> testCpG(runObj, max.clust = 2, do.plot = TRUE)
Using the parallel (multicore) version of cobindR - function cpg.gc.content with 4 cores
$result
[1] FALSE
$gc
CpG GC
hsapiens_fasta_default analysis_2016-07-06 13:48:32_0 0.040816327 0.5248
hsapiens_fasta_default analysis_2016-07-06 13:48:32_1 0.032012805 0.5320
hsapiens_fasta_default analysis_2016-07-06 13:48:32_2 0.036814726 0.5012
hsapiens_fasta_default analysis_2016-07-06 13:48:32_3 0.020808323 0.5044
hsapiens_fasta_default analysis_2016-07-06 13:48:32_4 0.046818727 0.5088
hsapiens_fast