R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Create simulated cross-classification data

poLCA.simdata

R Documentation

Create simulated cross-classification data

Description

Uses the latent class model's assumed data-generating process to create a simulated dataset that can be used to test the properties of the poLCA latent class and latent class regression estimator.

Usage

poLCA.simdata(N = 5000, probs = NULL, nclass = 2, ndv = 4, 
              nresp = NULL, x = NULL, niv = 0, b = NULL, 
              P = NULL, missval = FALSE, pctmiss = NULL)

Arguments

`N`	number of observations.
`probs`	a list of matrices of dimension `nclass` by `nresp` with each matrix corresponding to one manifest variable, and each row containing the class-conditional outcome probabilities (which must sum to 1) If `probs` is `NULL` (default) then the outcome probabilities are generated randomly.
`nclass`	number of latent classes. If`probs` is specified, then `nclass` is set equal to the number of rows in each matrix in that list. If `P` is specified, then `nclass` is set equal to the length of that vector. If `b` is specified, then `nclass` is set equal to one greater than the number of columns in `b`. Otherwise, the default is two.
`ndv`	number of manifest variables. If `probs` is specified, then `ndv` is set equal to the number of matrices in that list. If `nresp` is specified, then `ndv` is set equal to the length of that vector. Otherwise, the default is four.
`nresp`	number of possible outcomes for each manifest variable. If `probs` is specified, then `ndv` is set equal to the number of columns in each matrix in that list. If both `probs` and `nresp` are `NULL` (default), then the manifest variables are assigned a random number of outcomes between two and five.
`x`	a matrix of concomicant variables with `N` rows and `niv` columns. If `x=NULL` (default), but `niv>0`, then `niv` concomitant variables will be generated as mutually independent random draws from a standard normal distribution.
`niv`	number of concomitant variables (covariates). Setting `niv=0` (default) creates a data set assuming no covariates. If `nclass=1` then `niv` is automatically set equal to 0. If both `x` and `niv` are entered, then the number of columns in `x` overrides the value of `niv`. The number of rows in `b`, less one, also overrides `niv`.
`b`	when using covariates, an `niv+1` by `nclass-1` matrix of (multinomial) logit coefficients. If `b` is `NULL` (default), then coefficients are generated as random integers between -2 and 2.
`P`	a vector of mixing proportions (class population shares) of length `nclass`. `P` must sum to 1. Disregarded if `b` is specified or `niv>1` because then `P` is, in part, a function of the concomitant variables. If `P` is `NULL` (default), then the mixing proportions are generated randomly.
`missval`	logical. If `TRUE` then a fraction `pctmiss` of the manifest variables are randomly dropped as missing values. Default is `FALSE`.
`pctmiss`	percentage of values to be dropped as missing, if `missval=TRUE`. If `pctmiss` is `NULL` (default), then a value between 5 and 40 percent is chosen randomly.

Details

Note that entering probs overrides nclass, ndv, and nresp. It also overrides P if the length of the P vector is not equal to the length of the probs list. Likewise, if probs=NULL, then length(nresp) overrides ndv and length(P) overrides nclass. Setting niv>1 causes any user-entered value of P to be disregarded.

Value

`dat`	a data frame containing the simulated variables. Variable names for manifest variables are Y1, Y2, etc. Variable names for concomitant variables are X1, X2, etc.
`probs`	a list of matrices of dimension `nclass` by `nresp` containing the class-conditional response probabilities.
`nresp`	a vector containing the number of possible outcomes for each manifest variable.
`b`	coefficients on covariates, if used.
`P`	mixing proportions corresponding to each latent class.
`pctmiss`	percent of observations missing.
`trueclass`	`N` by 1 vector containing the "true" class membership for each individual.

Examples

# Create a sample data set with 3 classes and no covariates,
# and run poLCA to recover the specified parameters.
# Each matrix in the probs list contains one of the manifest variables'
# "true" conditional response probabilities.

probs <- list(matrix(c(0.6,0.1,0.3,     0.6,0.3,0.1,     0.3,0.1,0.6    ),ncol=3,byrow=TRUE), # Y1
              matrix(c(0.2,0.8,         0.7,0.3,         0.3,0.7        ),ncol=2,byrow=TRUE), # Y2
              matrix(c(0.3,0.6,0.1,     0.1,0.3,0.6,     0.3,0.6,0.1    ),ncol=3,byrow=TRUE), # Y3
              matrix(c(0.1,0.1,0.5,0.3, 0.5,0.3,0.1,0.1, 0.3,0.1,0.1,0.5),ncol=4,byrow=TRUE), # Y4
              matrix(c(0.1,0.1,0.8,     0.1,0.8,0.1,     0.8,0.1,0.1    ),ncol=3,byrow=TRUE)) # Y5
simdat <- poLCA.simdata(N=1000,probs,P=c(0.2,0.3,0.5))
f1 <- cbind(Y1,Y2,Y3,Y4,Y5)~1
lc1 <- poLCA(f1,simdat$dat,nclass=3)
table(lc1$predclass,simdat$trueclass)

# Create a sample dataset with 2 classes and three covariates.
# Then compare predicted class memberships when the model is 
# estimated "correctly" with covariates to when it is estimated
# "incorrectly" without covariates.

simdat2 <- poLCA.simdata(N=1000,ndv=7,niv=3,nclass=2,b=matrix(c(1,-2,1,-1)))
f2a <- cbind(Y1,Y2,Y3,Y4,Y5,Y6,Y7)~X1+X2+X3
lc2a <- poLCA(f2a,simdat2$dat,nclass=2)
f2b <- cbind(Y1,Y2,Y3,Y4,Y5,Y6,Y7)~1
lc2b <- poLCA(f2b,simdat2$dat,nclass=2)
table(lc2a$predclass,lc2b$predclass)