input data matrix, or a GRanges object with input stored in the meta DataFrame, assume ordered.
maxk
maximum length of stay for the sojourn distribution
maxbp
maximum length of stay in bp for the sojourn distribution, given positional information specified in xPos / xRange
J
number of states
xPos
a vector of positions for each x row
xRange
a IRanges/GRanges object, same length as x rows
usePos
character value to indicate whether the 'start', 'end' or 'mid' point position should be used to estimate the sojourn distribution
emis.type
type of the emission distribution, only the following types are supported: 'norm', 'mvnorm', 'pois', 'nbinom', 'mvt', 't'
com.emis
whether to set a common emission prior across different seqnames. if TRUE, the emission will not be updated during individual runs.
xAnno
a optional TxDb / GRanges / GRangesList / list object used in sojournAnno to infer parameters for the sojourn distribution
soj.type
type of the sojourn distribution, only the following types are supported: 'nonpara', 'gamma', 'pois', 'nbinom'
q.alpha
a quantile factor controlling the estimated prior for the mean of the emission of each states, seq(from=q.alpha, to=1-q.alpha, length.out=J)
r.var
a ratio factor controlling the estimated prior for the variance / covariance structure of each states.
A value larger than 1 tend to allow larger variation in extreme states; a value smaller than 1 will decrease the probability of having extreme state
useMC
TRUE if mclapply should be used to speed up the calculation, use options(mc.cores=n) to set number of parallel processes
cMethod
C algorithm used for the most likely state sequence, 'F-B' or 'Viterbi'
maxit
max iteration of the EM run with Forward-Backward algorithm
maxgap
max distance between neighbouring feature to consider a split
tol
tolerance level of the likelihood change to terminate the EM run
grp
vector of group assignment for each sample, with a length the same as columns in the data matrix, samples within each group would be processed simultaneously if a multivariate emission distribution is available
cluster.m
clustering method for prior grouping, possible values are 'ward','single','complete','average','mcquitty','median','centroid'
avg.m
method to calculate average value for each segment, 'median' or 'mean' possibly trimmed
prior.m
method to select emission prior for each state, 'quantile' uses different levels of quantile; the 'cluster' method uses clara function from cluster
trim
the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
na.rm
TRUE if NA value should be ignored
Details
This is the batch function of building Hidden Semi Markov Model (HSMM) to estimating the most likely state sequences for multiple input data series.
The function will sequentially process each region identified by the distinctive seqnames in x or in xRange if available, or assuming all data from the same region.
A second layer of stratification is introduced by the argument grp, which could be used to reflect experimental design.
The assumption is that profiles from the same group could be considered homogeneous, thus processed together if emis.type is compatible (currently only with 'mvnorm').
Argument for the sojourn density will be initialized as flat prior or estimated from other data before calling the work horse function hsmmRun.
Then for each batch run results will be combined and eventually a biomvRCNS-class object will be returned.
See the vignette for more details and examples.
Value
A biomvRCNS-class object:
x:
Object of class "GRanges", with range information either from real positional data or just indices, with input data matrix stored in the meta columns.
Additional meta columns for the estimated states and associated probabilities for each sample or group will also be appended following the input data matrix.
res:
Object of class "GRanges" , each range represent one continuous segment identified, with sample name slot 'SAMPLE', estimated state slot 'STATE' and segment mean slot 'MEAN' stored in the meta columns
param:
Object of class "list", list of all parameters used in the model run, plus the re-estimated emission and sojourn parameters.
Author(s)
Yang Du
References
Guedon, Y. (2003). Estimating hidden semi-Markov chains from discrete sequences. Journal of Computational and Graphical Statistics, 12(3), 604-639.
See Also
biomvRseg
Examples
data(coriell)
xgr<-GRanges(seqnames=paste('chr', coriell[,2], sep=''), IRanges(start=coriell[,3], width=1, names=coriell[,1]))
values(xgr)<-DataFrame(coriell[,4:5], row.names=NULL)
xgr<-sort(xgr)
reshsmm<-biomvRhsmm(x=xgr, maxbp=4E4, J=3, soj.type='gamma', emis.type='norm', grp=c(1,2))
## access model parameters
reshsmm@param$soj.par
reshsmm@param$emis.par
## states assigned and associated probabilities
mcols(reshsmm@x)[,-(1:2)]
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(biomvRCNS)
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: Gviz
Loading required package: grid
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/biomvRCNS/biomvRhsmm.Rd_%03d_medium.png", width=480, height=480)
> ### Name: biomvRhsmm
> ### Title: Estimating the most likely state sequence using Hidden Semi
> ### Markov Model
> ### Aliases: biomvRhsmm
> ### Keywords: hsmm
>
> ### ** Examples
>
> data(coriell)
> xgr<-GRanges(seqnames=paste('chr', coriell[,2], sep=''), IRanges(start=coriell[,3], width=1, names=coriell[,1]))
> values(xgr)<-DataFrame(coriell[,4:5], row.names=NULL)
> xgr<-sort(xgr)
> reshsmm<-biomvRhsmm(x=xgr, maxbp=4E4, J=3, soj.type='gamma', emis.type='norm', grp=c(1,2))
Checking input ...
Loading required package: cluster
xAnno is not present or not supported, try to use maxbp/maxk in the uniform prior for the sojourn distribution.
Estimating common emission prior ...
Preparing sojourn prior for seq 'chr1' ...
Preparing sojourn prior for seq 'chr2' ...
Building HSMM for seq 'chr2' in column '1' ...
Building HSMM for seq 'chr1' in column '1' ...
[ hsmmRun ] seq 'chr2' column 'Coriell.05296' iteration: 1
[ hsmmRun ] seq 'chr1' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr2' in column '2' ...
[ hsmmRun ] seq 'chr2' column 'Coriell.13330' iteration: 1
Building HSMM for seq 'chr1' in column '2' ...
[ hsmmRun ] seq 'chr1' column 'Coriell.13330' iteration: 1
Preparing sojourn prior for seq 'chr4' ...
Building HSMM for seq 'chr4' in column '1' ...
[ hsmmRun ] seq 'chr4' column 'Coriell.05296' iteration: 1
Preparing sojourn prior for seq 'chr3' ...
Building HSMM for seq 'chr3' in column '1' ...
[ hsmmRun ] seq 'chr3' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr3' in column '2' ...
Building HSMM for seq 'chr4' in column '2' ...
[ hsmmRun ] seq 'chr3' column 'Coriell.13330' iteration: 1
[ hsmmRun ] seq 'chr4' column 'Coriell.13330' iteration: 1
Preparing sojourn prior for seq 'chr5' ...
Building HSMM for seq 'chr5' in column '1' ...
[ hsmmRun ] seq 'chr5' column 'Coriell.05296' iteration: 1
Preparing sojourn prior for seq 'chr6' ...
Building HSMM for seq 'chr6' in column '1' ...
[ hsmmRun ] seq 'chr6' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr5' in column '2' ...
[ hsmmRun ] seq 'chr5' column 'Coriell.13330' iteration: 1
Building HSMM for seq 'chr6' in column '2' ...
[ hsmmRun ] seq 'chr6' column 'Coriell.13330' iteration: 1
Preparing sojourn prior for seq 'chr7' ...
Preparing sojourn prior for seq 'chr8' ...
Building HSMM for seq 'chr7' in column '1' ...
[ hsmmRun ] seq 'chr7' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr8' in column '1' ...
[ hsmmRun ] seq 'chr8' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr7' in column '2' ...
[ hsmmRun ] seq 'chr7' column 'Coriell.13330' iteration: 1
Building HSMM for seq 'chr8' in column '2' ...
[ hsmmRun ] seq 'chr8' column 'Coriell.13330' iteration: 1
Preparing sojourn prior for seq 'chr9' ...
Preparing sojourn prior for seq 'chr10' ...
Building HSMM for seq 'chr9' in column '1' ...
[ hsmmRun ] seq 'chr9' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr10' in column '1' ...
[ hsmmRun ] seq 'chr10' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr9' in column '2' ...
[ hsmmRun ] seq 'chr9' column 'Coriell.13330' iteration: 1
Building HSMM for seq 'chr10' in column '2' ...
[ hsmmRun ] seq 'chr10' column 'Coriell.13330' iteration: 1
Preparing sojourn prior for seq 'chr11' ...
Preparing sojourn prior for seq 'chr12' ...
Building HSMM for seq 'chr11' in column '1' ...
[ hsmmRun ] seq 'chr11' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr12' in column '1' ...
[ hsmmRun ] seq 'chr12' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr12' in column '2' ...
[ hsmmRun ] seq 'chr12' column 'Coriell.13330' iteration: 1
Building HSMM for seq 'chr11' in column '2' ...
[ hsmmRun ] seq 'chr11' column 'Coriell.13330' iteration: 1
Preparing sojourn prior for seq 'chr14' ...
Building HSMM for seq 'chr14' in column '1' ...
[ hsmmRun ] seq 'chr14' column 'Coriell.05296' iteration: 1
Preparing sojourn prior for seq 'chr13' ...
Building HSMM for seq 'chr13' in column '1' ...
[ hsmmRun ] seq 'chr13' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr14' in column '2' ...
[ hsmmRun ] seq 'chr14' column 'Coriell.13330' iteration: 1
Building HSMM for seq 'chr13' in column '2' ...
[ hsmmRun ] seq 'chr13' column 'Coriell.13330' iteration: 1
Preparing sojourn prior for seq 'chr16' ...
Building HSMM for seq 'chr16' in column '1' ...
Preparing sojourn prior for seq 'chr15' ...
[ hsmmRun ] seq 'chr16' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr15' in column '1' ...
[ hsmmRun ] seq 'chr15' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr16' in column '2' ...
[ hsmmRun ] seq 'chr16' column 'Coriell.13330' iteration: 1
Building HSMM for seq 'chr15' in column '2' ...
[ hsmmRun ] seq 'chr15' column 'Coriell.13330' iteration: 1
Preparing sojourn prior for seq 'chr18' ...
Building HSMM for seq 'chr18' in column '1' ...
Preparing sojourn prior for seq 'chr17' ...
[ hsmmRun ] seq 'chr18' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr17' in column '1' ...
[ hsmmRun ] seq 'chr17' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr18' in column '2' ...
[ hsmmRun ] seq 'chr18' column 'Coriell.13330' iteration: 1
Building HSMM for seq 'chr17' in column '2' ...
[ hsmmRun ] seq 'chr17' column 'Coriell.13330' iteration: 1
Preparing sojourn prior for seq 'chr20' ...
Preparing sojourn prior for seq 'chr19' ...
Building HSMM for seq 'chr20' in column '1' ...
[ hsmmRun ] seq 'chr20' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr19' in column '1' ...
[ hsmmRun ] seq 'chr19' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr19' in column '2' ...
Building HSMM for seq 'chr20' in column '2' ...
[ hsmmRun ] seq 'chr19' column 'Coriell.13330' iteration: 1
[ hsmmRun ] seq 'chr20' column 'Coriell.13330' iteration: 1
Preparing sojourn prior for seq 'chr21' ...
Building HSMM for seq 'chr21' in column '1' ...
[ hsmmRun ] seq 'chr21' column 'Coriell.05296' iteration: 1
Preparing sojourn prior for seq 'chr22' ...
Building HSMM for seq 'chr22' in column '1' ...
[ hsmmRun ] seq 'chr22' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr21' in column '2' ...
[ hsmmRun ] seq 'chr21' column 'Coriell.13330' iteration: 1
Building HSMM for seq 'chr22' in column '2' ...
[ hsmmRun ] seq 'chr22' column 'Coriell.13330' iteration: 1
Preparing sojourn prior for seq 'chr23' ...
Building HSMM for seq 'chr23' in column '1' ...
[ hsmmRun ] seq 'chr23' column 'Coriell.05296' iteration: 1
Building HSMM for seq 'chr23' in column '2' ...
[ hsmmRun ] seq 'chr23' column 'Coriell.13330' iteration: 1
Building HSMM complete, preparing output ...
>
> ## access model parameters
> reshsmm@param$soj.par
Coriell.05296 Coriell.13330
chr1 List,2 List,2
chr2 List,2 List,2
chr3 List,2 List,2
chr4 List,2 List,2
chr5 List,2 List,2
chr6 List,2 List,2
chr7 List,2 List,2
chr8 List,2 List,2
chr9 List,2 List,2
chr10 List,2 List,2
chr11 List,2 List,2
chr12 List,2 List,2
chr13 List,2 List,2
chr14 List,2 List,2
chr15 List,2 List,2
chr16 List,2 List,2
chr17 List,2 List,2
chr18 List,2 List,2
chr19 List,2 List,2
chr20 List,2 List,2
chr21 List,2 List,2
chr22 List,2 List,2
chr23 List,2 List,2
> reshsmm@param$emis.par
Coriell.05296 Coriell.13330
chr1 List,2 List,2
chr2 List,2 List,2
chr3 List,2 List,2
chr4 List,2 List,2
chr5 List,2 List,2
chr6 List,2 List,2
chr7 List,2 List,2
chr8 List,2 List,2
chr9 List,2 List,2
chr10 List,2 List,2
chr11 List,2 List,2
chr12 List,2 List,2
chr13 List,2 List,2
chr14 List,2 List,2
chr15 List,2 List,2
chr16 List,2 List,2
chr17 List,2 List,2
chr18 List,2 List,2
chr19 List,2 List,2
chr20 List,2 List,2
chr21 List,2 List,2
chr22 List,2 List,2
chr23 List,2 List,2
>
> ## states assigned and associated probabilities
> mcols(reshsmm@x)[,-(1:2)]
DataFrame with 2271 rows and 4 columns
S.Coriell.05296 SP.Coriell.05296 S.Coriell.13330 SP.Coriell.13330
<Rle> <Rle> <Rle> <Rle>
1 2 0.9982371 2 0.9978316
2 2 0.9998178 2 0.9993645
3 2 0.9994201 2 0.9994550
4 2 0.9807228 2 0.9927570
5 2 0.9976871 2 0.5681629
... ... ... ... ...
2267 3 0.9942007 3 0.9999999
2268 3 0.9900335 3 0.9993025
2269 1 1.0000000 1 0.8685671
2270 2 0.7985585 3 0.9999825
2271 1 1.0000000 3 0.9998342
>
>
>
>
>
>
> dev.off()
null device
1
>