R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Regression for Grouped Data - Coarse Data

grouped

R Documentation

Regression for Grouped Data - Coarse Data

Description

grouped is used to fit regression models for grouped or coarse data under the assumption that the data are Coarsened At Random.

Usage

grouped(formula, link = c("identity", "log", "logit"), 
            distribution = c("normal", "t", "logistic"), data,
            subset, na.action, str.values, df = NULL, iter = 3, ...)

Arguments

`formula`	a two-sided formula describing the model structure. In the left-hand side, a two-column response matrix must be supplied, specifying the lower and upper limits (1st and 2nd column, respectively) of the interval in which the true response lies. They can be defined arbitrarily or you can use the functions `equispaced` and `rounding`.
`link`	the link function under which the underlying response variable follows the distribution given by the `distribution` argument. Available choices are `"identity"`, `"log"` and `"logit"`. See Details for more info.
`distribution`	the assumed distribution for the true latent response variable. Available choices are `"normal"`, `"t"` and `"logistic"`. See Details for more info.
`data`	an optional `data.frame` containing the variables in the model. If not found in data, the variables are taken from `environment(formula)`, typically the environment from which `grouped` is called.
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`na.action`	a function which indicates what should happen when the data contain `NA`s.
`str.values`	a numeric vector of starting values.
`df`	a scalar numeric value denoting the degrees of freedom when the underlying distribution for the response variable is assumed to be Student's-t.
`iter`	the number of extra times to call `optim` in case the first optimization has not converged.
`...`	additional arguments; currently none is used.

Details

Let Z_i, i = 1, ..., n be a random sample from a response variable of interest. In many problems one can think of the sample space S_i of Z_i as being partitioned into a number of groups; one then observes not the exact value of Z_i but the group into which it falls. Data generated in this way are called grouped (Heitjan, 1989). The function grouped and this package are devoted in the analysis of such data in the case the data are Coarsened At Random (Heitjan and Rubin, 1991).

The framework we use assumes a latent variable Z_i which is coarsely measured and for which we only know Y_{li} and Y_{ui}, i.e., the interval in which Z_i lies. Given some covariates X_i, Z_i|X_i may assume either a Normal, a Logistic or (generalized) Student's-t distribution. In addition three link functions are available for greater flexibility. In particular, the likelihood is of the following form

L_i(β, σ) = F[(y_u^* - xβ)/σ] - F[(y_l^* - xβ)/σ],

where F(.) denotes the cdf of the assumed distribution given by the argument distribution and y_l^* = φ(y_l), where φ(.) denotes the link function, and y_u is defined analogously.

An interesting example of coarse data is the various quality of life indexes. The observed value of such indexes can be thought of as a rounded version of the true latent quality of life that the index attempts to capture. Applications of this approach can be found in Lesaffre et al. (2005) and Tsonaka et al. (2005). Various other examples of grouped and coarse data can be found in Heitjan (1989; 1993).

Value

an object of class grouped is a list with the following components:

`coefficients`	the estimated coefficients, including the standard deviation σ.
`hessian`	the approximate Hessian matrix at convergence returned by `optim`.
`fitted`	the fitted values.
`details`	a list with components: (i) `X` the design matrix, (ii) `y` the response data matrix, (iii) `convergence` the convergence identifier returned by `optim`, (iv) `logLik` the value of the log-likelihood at convergence, (v) `k` the number of outer iterations used, (vi) `n` the sample size, (vii) `df` the degrees of freedom; `NULL` except for the t distribution, (viii) `link` the link function used, (ix) `distribution` the distribution assumed for the true latent response variable and (x) `max.sc` the maximum absolute value of the score vector at convergence.
`call`	the matched call.

Author(s)

Dimitris Rizopoulos d.rizopoulos@erasmusmc.nl

References

Heitjan, D. (1989) Inference from grouped continuous data: A review (with discussion). Statistical Science, 4, 164–183.

Heitjan, D. (1993) Ignorability and coarse data: some biomedical examples. Biometrics, 49, 1099–1109.

Heitjan, D. and Rubin, D. (1991) Ignorability and coarse data. Annals of Statistics, 19, 2244–2253.

Lesaffre, E., Rizopoulos, D. and Tsonaka, S. (2007) The logistic-transform for bounded outcome scores. Biostatistics, 8, 72–85.

Tsonaka, S., Rizopoulos, D. and Lesaffre, E. (2006) Power and sample size calculations for discrete bounded outcomes. Statistics in Medicine, 25, 4241–4252.

Examples

    
grouped(cbind(lo, up) ~ treat * x, link = "logit", data = Sdata)
    
grouped(equispaced(r, n) ~ x1 * x2, link = "logit", data = Seeds)

# See Figure 1 and Table 1 in Heitjan (1989)
y <- iris[iris$Species == "setosa", "Petal.Width"]
index <- cbind(seq(0.05, 0.55, 0.1), seq(0.15, 0.65, 0.1)) 
n <- length(y)
a <- b <- numeric(n)
for(i in 1:n){
    ind <- which(index[, 2] - y[i] > 0)[1]
    a[i] <- index[ind, 1]
    b[i] <- index[ind, 2]
}
summary(grouped(cbind(a, b) ~ 1))

# See Figure 1 and Table 1 in Heitjan (1989)
y <- iris[iris$Species == "setosa", "Petal.Length"]
index <- cbind(seq(0.95, 1.75, 0.2), seq(1.15, 1.95, 0.2)) 
n <- length(y)
a <- b <- numeric(n)
for(i in 1:n){
    ind <- which(index[, 2] - y[i] > 0)[1]
    a[i] <- index[ind, 1]
    b[i] <- index[ind, 2]
}
summary(grouped(cbind(a, b) ~ 1))