R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Estimating the most likely state sequence using Hidden Semi...

hsmmRun

R Documentation

Estimating the most likely state sequence using Hidden Semi Markov Model

Description

This is the working horse of the biomvRhsmm

Usage

hsmmRun(x, xid="sampleid", xRange, soj, emis, cMethod='F-B', maxit=1, maxgap=Inf, tol=1e-06, avg.m='median', trim=0, na.rm=TRUE, com.emis=FALSE)

Arguments

`x`	input data matrix or vector, ordered wrt. position
`xid`	name of the sample
`xRange`	a `IRanges`/`GRanges` object, same length as `x` rows
`soj`	a list object containing the relevant sojourn distribution parameters
`emis`	a list object containing the relevant emission distribution parameters
`cMethod`	C algorithm used for the most likely state sequence, 'F-B' or 'Viterbi'
`maxit`	max iteration of the EM run with Forward-Backward algorithm
`maxgap`	max distance between neighbouring feature to consider a split
`tol`	tolerance level of the likelihood change to terminate the EM run
`avg.m`	method to calculate average value for each segment, 'median' or 'mean' possibly trimmed
`trim`	the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
`na.rm`	`TRUE` if `NA` value should be ignored
`com.emis`	whether to set a common emission prior across different seqnames. if TRUE, the emission will not be updated during individual runs.

Details

The function fits a Hidden-semi Markov model for the input data matrix / vector, which should contains ordered data from a continuous region on one chromosome. The model will start with flat prior for the initial state probability and transition probability, while emission parameter for each state will be estimated using different quantiles of the input controlled by argument q.alpha and r.var. Argument for the sojourn density should be provided via the list object soj, which is either initialized as flat prior or estimated from other data in a previous call. The positional information in the xRange is used for the optional spiting of physically distant features and construction of returning GRanges object res.

Value

a list object,

`yhat`	a `"Rle"` object, the most likely state sequence, same length as `x` rows number
`yp`	`"Rle"`, the associated state probability, same length as `x` rows number
`res`	Object of class `"GRanges"` , each range represent one continuous segment identified, with sample name slot 'SAMPLE', estimated state slot 'STATE' and segment mean slot 'MEAN' stored in the meta columns

Author(s)

Yang Du

References

Guedon, Y. (2003). Estimating hidden semi-Markov chains from discrete sequences. Journal of Computational and Graphical Statistics, 12(3), 604-639.

Examples

	data(coriell)
	# select only chr1 
	x<-coriell[coriell[,2]==1,]
	xgr<-GRanges(seqnames=paste('chr', x[,2], sep=''), IRanges(start=x[,3], width=1, names=x[,1]))
	values(xgr)<-DataFrame(x[,4:5], row.names=NULL)
	xgr<-xgr[order(xgr)]

	J<-2 ; maxk<-50
	# a uniform initial sojourn, not utilizing positional information, just the index
	soj<-list(J=J, maxk=maxk, type='gamma', d=cbind(dunif(1:maxk, 1, maxk), dunif(1:maxk, 1, maxk)))
	soj$D <- sapply(1:J, function(j) rev(cumsum(rev(soj$d[1:maxk,j]))))
	# run 1 sample only, Coriell.13330
	sample<-colnames(coriell)[5]
	runout<-hsmmRun(matrix(values(xgr)[,sample]), sample, xgr, soj, emis=list(type='norm', mu=range(x[,4:5]), var=rep(var(unlist(x[,4:5])), J)))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(biomvRCNS)
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: Gviz
Loading required package: grid
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/biomvRCNS/hsmmRun.Rd_%03d_medium.png", width=480, height=480)
> ### Name: hsmmRun
> ### Title: Estimating the most likely state sequence using Hidden Semi
> ###   Markov Model
> ### Aliases: hsmmRun
> ### Keywords: hsmm
> 
> ### ** Examples
> 
> 	data(coriell)
> 	# select only chr1 
> 	x<-coriell[coriell[,2]==1,]
> 	xgr<-GRanges(seqnames=paste('chr', x[,2], sep=''), IRanges(start=x[,3], width=1, names=x[,1]))
> 	values(xgr)<-DataFrame(x[,4:5], row.names=NULL)
> 	xgr<-xgr[order(xgr)]
> 
> 	J<-2 ; maxk<-50
> 	# a uniform initial sojourn, not utilizing positional information, just the index
> 	soj<-list(J=J, maxk=maxk, type='gamma', d=cbind(dunif(1:maxk, 1, maxk), dunif(1:maxk, 1, maxk)))
> 	soj$D <- sapply(1:J, function(j) rev(cumsum(rev(soj$d[1:maxk,j]))))
> 	# run 1 sample only, Coriell.13330
> 	sample<-colnames(coriell)[5]
> 	runout<-hsmmRun(matrix(values(xgr)[,sample]), sample, xgr, soj, emis=list(type='norm', mu=range(x[,4:5]), var=rep(var(unlist(x[,4:5])), J)))
[ hsmmRun ] seq 'chr1' column 'Coriell.13330' iteration: 1
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>