R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Bayesian Latent Class Analysis via an EM Algorithm and Using...

blca.boot

R Documentation

Bayesian Latent Class Analysis via an EM Algorithm and Using Empirical Bootstrapping

Description

Latent class analysis (LCA) attempts to find G hidden classes in binary data X. blca.boot repeatedly samples from X with replacement then utilises an EM algorithm to find maximum posterior (MAP) and standard error estimates of the parameters.

Usage

blca.boot(X, G, alpha = 1, beta = 1, delta = rep(1, G), 
	  start.vals = c("single", "across"), counts.n = NULL, 
	  fit = NULL, iter = 50, B = 100, relabel = FALSE, 
          verbose = TRUE, verbose.update = 10, small = 1e-100)

Arguments

`X`	The data matrix. This may take one of several forms, see `data.blca`.
`G`	The number of classes to run lca for.
`alpha, beta`	The prior values for the data conditional on group membership. These may take several forms: a single value, recycled across all groups and columns, a vector of length G or M (the number of columns in the data), or finally, a G \times M matrix specifying each prior value separately. Defaults to 1, i.e, a uniform prior, for each value.
`delta`	Prior values for the mixture components in model. Defaults to 1, i.e., a uniform prior. May be single or vector valued (of length G).
`start.vals`	Denotes how class membership is to be assigned during the initial step of the algorithm. Two character values may be chosen, "single", which randomly assigns data points exclusively to one class, or "across", which assigns class membership via `runif`. Alternatively, class membership may be pre-specified, either as a vector of class membership, or as a matrix of probabilities. Defaults to "single".
`counts.n`	If data patterns have already been counted, a data matrix consisting of each unique data pattern can be supplied to the function, in addition to a vector counts.n, which supplies the corresponding number of times each pattern occurs in the data.
`fit`	Previously fitted models may be supplied in order to approximate standard error and unbiased point estimates. fit should be an object of class "blca.em". Defaults to NULL if no object is supplied.
`iter`	The maximum number of iterations that the algorithm runs over, for each bootstrapped sample. Will stop earlier if the algorithm converges.
`B`	The number of bootstrap samples to run. Defaults to 100.
`relabel`	Logical valued. As the data is recursively sampled, it is possible that label-switching may occur with respect to parameter estimates. If TRUE, parameter estimates are checked at each iteration, and relabeled if necessary. Defaults to FALSE.
`verbose`	Logical valued. If TRUE, the current number of completed bootstrap samples is printed at regular intervals.
`verbose.update`	If `verbose=TRUE`, `verbose.update` determines the periodicity with which updates are printed.
`small`	To ensure numerical stability a small constant is added to certain parameter estimates. Defaults to 1e-100.

Details

Bootstrapping methods can be used to estimate properties of a distribution's parameters, such as the standard error estimates, by constructing multiple resamples of an observed dataset, obtained by sampling with replacement from said dataset. The multiple parameter estimates obtained from these resamples may then be analysed. This method is implemented in blca.boot by first running blca.em over the full data set and then using the returned values of the item and class probabilities as the initial values when running the algorithm for each bootstrapped sample. Alternatively, initial parameter estimates may be specified using the fit argument.

Note that if a previously fitted model is supplied, then the prior values with which the model was fitted will be used for the sampling run, regardless of the values supplied to the prior arguments.

Value

A list of class "blca.boot" is returned, containing:

`call`	The initial call passed to the function.
`itemprob`	The item probabilities, conditional on class membership.
`classprob`	The class probabilities.
`Z`	Estimate of class membership for each unique datapoint.
`itemprob.sd`	Posterior standard deviation estimates of the item probabilities.
`classprob.sd`	Posterior standard deviation estimates of the class probabilities.
`classprob.initial, itemprob.initial`	Initial parameter values for classprob and itemprob, used to run over each bootstrapped sample.
`samples`	A list containing the parameter estimates for each bootstrapped sample.
`logpost`	The log-posterior of the estimated model.
`BIC`	The Bayesian Information Criterion for the estimated model.
`AIC`	Akaike's Information Criterion for the estimated model.
`label`	Logical value, indicating whether label switching has been checked for.
`counts`	The number of times each unique datapoint point occured.
`prior`	A list containing the prior values specified for the model.

Note

Earlier versions of this function erroneously referred to posterior standard deviations as standard errors. This also extended to arguments supplied to and returned by the function, some of which are now returned with the corrected corrected suffix blca.em.sd (for standard deviation). For backwards compatability reasons, the earlier suffix .se has been retained as a returned argument.

Author(s)

Arthur White

References

Wasserman, L, 22nd May 2007, All of Nonparametric Statistics, Springer-Verlag.

Examples

type1 <- c(0.8, 0.8, 0.2, 0.2)
type2 <- c(0.2, 0.2, 0.8, 0.8)
x <- rlca(1000, rbind(type1,type2), c(0.6,0.4))
fit.boot <- blca.boot(x, 2)
summary(fit.boot)

fit <- blca.em(x, 2, se=FALSE)
fit.boot <- blca.boot(x, 2, fit=fit)
fit.boot
plot(fit.boot, which=1:4)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BayesLCA)
Loading required package: e1071
Loading required package: coda
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/BayesLCA/blca.boot.Rd_%03d_medium.png", width=480, height=480)
> ### Name: blca.boot
> ### Title: Bayesian Latent Class Analysis via an EM Algorithm and Using
> ###   Empirical Bootstrapping
> ### Aliases: blca.boot
> ### Keywords: bootstrap blca
> 
> ### ** Examples
> 
> type1 <- c(0.8, 0.8, 0.2, 0.2)
> type2 <- c(0.2, 0.2, 0.8, 0.8)
> x <- rlca(1000, rbind(type1,type2), c(0.6,0.4))
> fit.boot <- blca.boot(x, 2)
Object 'fit' not supplied. Obtaining starting values via blca.em...
Restart number 1, logpost = -2453.41... 
New maximum found... Restart number 2, logpost = -2453.41... 
New maximum found... Restart number 3, logpost = -2453.41... 
Restart number 4, logpost = -2453.41... 
Restart number 5, logpost = -2453.41... 
Starting values obtained...
Beginning bootstrapping run...
10 of 100 samples completed...
20 of 100 samples completed...
30 of 100 samples completed...
40 of 100 samples completed...
50 of 100 samples completed...
60 of 100 samples completed...
70 of 100 samples completed...
80 of 100 samples completed...
90 of 100 samples completed...
100 of 100 samples completed...
Bootstrap sampling run completed.
> summary(fit.boot)
__________________

Bayes-LCA
Diagnostic Summary
__________________

Hyper-Parameters: 

 Item Probabilities:

 alpha: 
        Col 1 Col 2 Col 3 Col 4
Group 1     1     1     1     1
Group 2     1     1     1     1

 beta: 
        Col 1 Col 2 Col 3 Col 4
Group 1     1     1     1     1
Group 2     1     1     1     1

 Class Probabilities:

 delta: 
Group 1 Group 2 
      1       1 
__________________

Method: Bootstrap  

 Number of Samples: 100 

 Log-Posterior: -2453.509 

 AIC: -4925.018 

 BIC: -4969.188 
> 
> fit <- blca.em(x, 2, se=FALSE)
Restart number 1, logpost = -2453.41... 
Restart number 2, logpost = -2453.41... 
Restart number 3, logpost = -2453.41... 
Restart number 4, logpost = -2453.41... 
Restart number 5, logpost = -2453.41... 
> fit.boot <- blca.boot(x, 2, fit=fit)
Beginning bootstrapping run...
10 of 100 samples completed...
20 of 100 samples completed...
30 of 100 samples completed...
40 of 100 samples completed...
50 of 100 samples completed...
60 of 100 samples completed...
70 of 100 samples completed...
80 of 100 samples completed...
90 of 100 samples completed...
100 of 100 samples completed...
Bootstrap sampling run completed.
> fit.boot

MAP Estimates:
 

Item Probabilities:
 
        Col 1 Col 2 Col 3 Col 4
Group 1 0.809 0.837 0.186 0.168
Group 2 0.245 0.204 0.779 0.795

Membership Probabilities:
 
Group 1 Group 2 
  0.551   0.449 

Posterior Standard Deviation Estimates:
 

Item Probabilities:
 
        Col 1 Col 2 Col 3 Col 4
Group 1 0.022 0.020 0.019 0.021
Group 2 0.024 0.024 0.026 0.024

Membership Probabilities:
 
Group 1 Group 2 
  0.022   0.022 
> plot(fit.boot, which=1:4)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>