Last data update: 2014.03.03

R: Estimating the most likely state sequence using Hidden Semi...
hsmmRunR Documentation

Estimating the most likely state sequence using Hidden Semi Markov Model

Description

This is the working horse of the biomvRhsmm

Usage

hsmmRun(x, xid="sampleid", xRange, soj, emis, cMethod='F-B', maxit=1, maxgap=Inf, tol=1e-06, avg.m='median', trim=0, na.rm=TRUE, com.emis=FALSE)

Arguments

x

input data matrix or vector, ordered wrt. position

xid

name of the sample

xRange

a IRanges/GRanges object, same length as x rows

soj

a list object containing the relevant sojourn distribution parameters

emis

a list object containing the relevant emission distribution parameters

cMethod

C algorithm used for the most likely state sequence, 'F-B' or 'Viterbi'

maxit

max iteration of the EM run with Forward-Backward algorithm

maxgap

max distance between neighbouring feature to consider a split

tol

tolerance level of the likelihood change to terminate the EM run

avg.m

method to calculate average value for each segment, 'median' or 'mean' possibly trimmed

trim

the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.

na.rm

TRUE if NA value should be ignored

com.emis

whether to set a common emission prior across different seqnames. if TRUE, the emission will not be updated during individual runs.

Details

The function fits a Hidden-semi Markov model for the input data matrix / vector, which should contains ordered data from a continuous region on one chromosome. The model will start with flat prior for the initial state probability and transition probability, while emission parameter for each state will be estimated using different quantiles of the input controlled by argument q.alpha and r.var. Argument for the sojourn density should be provided via the list object soj, which is either initialized as flat prior or estimated from other data in a previous call. The positional information in the xRange is used for the optional spiting of physically distant features and construction of returning GRanges object res.

Value

a list object,

yhat

a "Rle" object, the most likely state sequence, same length as x rows number

yp

"Rle", the associated state probability, same length as x rows number

res

Object of class "GRanges" , each range represent one continuous segment identified, with sample name slot 'SAMPLE', estimated state slot 'STATE' and segment mean slot 'MEAN' stored in the meta columns

Author(s)

Yang Du

References

Guedon, Y. (2003). Estimating hidden semi-Markov chains from discrete sequences. Journal of Computational and Graphical Statistics, 12(3), 604-639.

See Also

biomvRhsmm

Examples

	data(coriell)
	# select only chr1 
	x<-coriell[coriell[,2]==1,]
	xgr<-GRanges(seqnames=paste('chr', x[,2], sep=''), IRanges(start=x[,3], width=1, names=x[,1]))
	values(xgr)<-DataFrame(x[,4:5], row.names=NULL)
	xgr<-xgr[order(xgr)]

	J<-2 ; maxk<-50
	# a uniform initial sojourn, not utilizing positional information, just the index
	soj<-list(J=J, maxk=maxk, type='gamma', d=cbind(dunif(1:maxk, 1, maxk), dunif(1:maxk, 1, maxk)))
	soj$D <- sapply(1:J, function(j) rev(cumsum(rev(soj$d[1:maxk,j]))))
	# run 1 sample only, Coriell.13330
	sample<-colnames(coriell)[5]
	runout<-hsmmRun(matrix(values(xgr)[,sample]), sample, xgr, soj, emis=list(type='norm', mu=range(x[,4:5]), var=rep(var(unlist(x[,4:5])), J)))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(biomvRCNS)
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: Gviz
Loading required package: grid
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/biomvRCNS/hsmmRun.Rd_%03d_medium.png", width=480, height=480)
> ### Name: hsmmRun
> ### Title: Estimating the most likely state sequence using Hidden Semi
> ###   Markov Model
> ### Aliases: hsmmRun
> ### Keywords: hsmm
> 
> ### ** Examples
> 
> 	data(coriell)
> 	# select only chr1 
> 	x<-coriell[coriell[,2]==1,]
> 	xgr<-GRanges(seqnames=paste('chr', x[,2], sep=''), IRanges(start=x[,3], width=1, names=x[,1]))
> 	values(xgr)<-DataFrame(x[,4:5], row.names=NULL)
> 	xgr<-xgr[order(xgr)]
> 
> 	J<-2 ; maxk<-50
> 	# a uniform initial sojourn, not utilizing positional information, just the index
> 	soj<-list(J=J, maxk=maxk, type='gamma', d=cbind(dunif(1:maxk, 1, maxk), dunif(1:maxk, 1, maxk)))
> 	soj$D <- sapply(1:J, function(j) rev(cumsum(rev(soj$d[1:maxk,j]))))
> 	# run 1 sample only, Coriell.13330
> 	sample<-colnames(coriell)[5]
> 	runout<-hsmmRun(matrix(values(xgr)[,sample]), sample, xgr, soj, emis=list(type='norm', mu=range(x[,4:5]), var=rep(var(unlist(x[,4:5])), J)))
[ hsmmRun ] seq 'chr1' column 'Coriell.13330' iteration: 1
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>