input data matrix or vector, ordered wrt. position
xid
name of the sample
xRange
a IRanges/GRanges object, same length as x rows
soj
a list object containing the relevant sojourn distribution parameters
emis
a list object containing the relevant emission distribution parameters
cMethod
C algorithm used for the most likely state sequence, 'F-B' or 'Viterbi'
maxit
max iteration of the EM run with Forward-Backward algorithm
maxgap
max distance between neighbouring feature to consider a split
tol
tolerance level of the likelihood change to terminate the EM run
avg.m
method to calculate average value for each segment, 'median' or 'mean' possibly trimmed
trim
the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
na.rm
TRUE if NA value should be ignored
com.emis
whether to set a common emission prior across different seqnames. if TRUE, the emission will not be updated during individual runs.
Details
The function fits a Hidden-semi Markov model for the input data matrix / vector, which should contains ordered data from a continuous region on one chromosome.
The model will start with flat prior for the initial state probability and transition probability, while emission parameter for each state will be estimated using different quantiles of the input controlled by argument q.alpha and r.var.
Argument for the sojourn density should be provided via the list object soj, which is either initialized as flat prior or estimated from other data in a previous call.
The positional information in the xRange is used for the optional spiting of physically distant features and construction of returning GRanges object res.
Value
a list object,
yhat
a "Rle" object, the most likely state sequence, same length as x rows number
yp
"Rle", the associated state probability, same length as x rows number
res
Object of class "GRanges" , each range represent one continuous segment identified, with sample name slot 'SAMPLE', estimated state slot 'STATE' and segment mean slot 'MEAN' stored in the meta columns
Author(s)
Yang Du
References
Guedon, Y. (2003). Estimating hidden semi-Markov chains from discrete sequences. Journal of Computational and Graphical Statistics, 12(3), 604-639.
See Also
biomvRhsmm
Examples
data(coriell)
# select only chr1
x<-coriell[coriell[,2]==1,]
xgr<-GRanges(seqnames=paste('chr', x[,2], sep=''), IRanges(start=x[,3], width=1, names=x[,1]))
values(xgr)<-DataFrame(x[,4:5], row.names=NULL)
xgr<-xgr[order(xgr)]
J<-2 ; maxk<-50
# a uniform initial sojourn, not utilizing positional information, just the index
soj<-list(J=J, maxk=maxk, type='gamma', d=cbind(dunif(1:maxk, 1, maxk), dunif(1:maxk, 1, maxk)))
soj$D <- sapply(1:J, function(j) rev(cumsum(rev(soj$d[1:maxk,j]))))
# run 1 sample only, Coriell.13330
sample<-colnames(coriell)[5]
runout<-hsmmRun(matrix(values(xgr)[,sample]), sample, xgr, soj, emis=list(type='norm', mu=range(x[,4:5]), var=rep(var(unlist(x[,4:5])), J)))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(biomvRCNS)
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: Gviz
Loading required package: grid
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/biomvRCNS/hsmmRun.Rd_%03d_medium.png", width=480, height=480)
> ### Name: hsmmRun
> ### Title: Estimating the most likely state sequence using Hidden Semi
> ### Markov Model
> ### Aliases: hsmmRun
> ### Keywords: hsmm
>
> ### ** Examples
>
> data(coriell)
> # select only chr1
> x<-coriell[coriell[,2]==1,]
> xgr<-GRanges(seqnames=paste('chr', x[,2], sep=''), IRanges(start=x[,3], width=1, names=x[,1]))
> values(xgr)<-DataFrame(x[,4:5], row.names=NULL)
> xgr<-xgr[order(xgr)]
>
> J<-2 ; maxk<-50
> # a uniform initial sojourn, not utilizing positional information, just the index
> soj<-list(J=J, maxk=maxk, type='gamma', d=cbind(dunif(1:maxk, 1, maxk), dunif(1:maxk, 1, maxk)))
> soj$D <- sapply(1:J, function(j) rev(cumsum(rev(soj$d[1:maxk,j]))))
> # run 1 sample only, Coriell.13330
> sample<-colnames(coriell)[5]
> runout<-hsmmRun(matrix(values(xgr)[,sample]), sample, xgr, soj, emis=list(type='norm', mu=range(x[,4:5]), var=rep(var(unlist(x[,4:5])), J)))
[ hsmmRun ] seq 'chr1' column 'Coriell.13330' iteration: 1
>
>
>
>
>
> dev.off()
null device
1
>