grouped is used to fit regression models for grouped or coarse data under the assumption
that the data are Coarsened At Random.
Usage
grouped(formula, link = c("identity", "log", "logit"),
distribution = c("normal", "t", "logistic"), data,
subset, na.action, str.values, df = NULL, iter = 3, ...)
Arguments
formula
a two-sided formula describing the model structure. In the left-hand side, a two-column response
matrix must be supplied, specifying the lower and upper limits (1st and 2nd column, respectively)
of the interval in which the true response lies. They can be defined arbitrarily or you can use the
functions equispaced and rounding.
link
the link function under which the underlying response variable follows the distribution given by the
distribution argument. Available choices are "identity", "log" and "logit".
See Details for more info.
distribution
the assumed distribution for the true latent response variable. Available choices are
"normal", "t" and "logistic". See Details for more info.
data
an optional data.frame containing the variables in the model. If not found in data, the variables
are taken from environment(formula), typically the environment from which grouped is
called.
subset
an optional vector specifying a subset of observations to be used in the fitting process.
na.action
a function which indicates what should happen when the data contain NAs.
str.values
a numeric vector of starting values.
df
a scalar numeric value denoting the degrees of freedom when the underlying distribution for the response
variable is assumed to be Student's-t.
iter
the number of extra times to call optim in case the first optimization has not
converged.
...
additional arguments; currently none is used.
Details
Let Z_i, i = 1, ..., n be a random sample from a response variable of interest. In many
problems one can think of the sample space S_i of Z_i as being partitioned into a number of groups; one
then observes not the exact value of Z_i but the group into which it falls. Data generated in this way are called
grouped (Heitjan, 1989). The function grouped and this package are devoted in the analysis of such data in the
case the data are Coarsened At Random (Heitjan and Rubin, 1991).
The framework we use assumes a latent variable Z_i which is coarsely measured and for which we only know
Y_{li} and Y_{ui}, i.e., the interval in which Z_i lies. Given some covariates X_i,
Z_i|X_i may assume either a Normal, a Logistic or (generalized) Student's-t distribution. In addition three
link functions are available for greater flexibility. In particular, the likelihood is of the following form
where F(.) denotes the cdf of the assumed distribution given by the argument distribution and
y_l^* = φ(y_l), where φ(.) denotes the link function,
and y_u is defined analogously.
An interesting example of coarse data is the various quality of life indexes. The observed value of such indexes can
be thought of as a rounded version of the true latent quality of life that the index attempts to capture.
Applications of this approach can be found in Lesaffre et al. (2005) and Tsonaka et al. (2005). Various other
examples of grouped and coarse data can be found in Heitjan (1989; 1993).
Value
an object of class grouped is a list with the following components:
coefficients
the estimated coefficients, including the standard deviation σ.
hessian
the approximate Hessian matrix at convergence returned by optim.
fitted
the fitted values.
details
a list with components: (i) X the design matrix, (ii) y the response data matrix,
(iii) convergence the convergence identifier returned by optim, (iv) logLik the
value of the log-likelihood at convergence, (v) k the number of outer iterations used, (vi)
n the sample size, (vii) df the degrees of freedom; NULL except for the t
distribution, (viii) link the link function used, (ix) distribution the distribution
assumed for the true latent response variable and (x) max.sc the maximum absolute value of the
score vector at convergence.
Heitjan, D. (1989) Inference from grouped continuous data: A review (with discussion).
Statistical Science, 4, 164–183.
Heitjan, D. (1993) Ignorability and coarse data: some biomedical examples. Biometrics, 49, 1099–1109.
Heitjan, D. and Rubin, D. (1991) Ignorability and coarse data. Annals of Statistics, 19, 2244–2253.
Lesaffre, E., Rizopoulos, D. and Tsonaka, S. (2007) The logistic-transform for bounded
outcome scores. Biostatistics, 8, 72–85.
Tsonaka, S., Rizopoulos, D. and Lesaffre, E. (2006) Power and sample size calculations for discrete
bounded outcomes. Statistics in Medicine, 25, 4241–4252.