R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Model Frame/Matrix/Response Construction for Extended...

model.frame.Formula

R Documentation

Model Frame/Matrix/Response Construction for Extended Formulas

Description

Computation of model frames, model matrices, and model responses for extended formulas of class Formula.

Usage

## S3 method for class 'Formula'
model.frame(formula, data = NULL, ...,
  lhs = NULL, rhs = NULL, dot = "separate")
## S3 method for class 'Formula'
model.matrix(object, data = environment(object), ...,
  lhs = NULL, rhs = 1, dot = "separate")
## S3 method for class 'Formula'
terms(x, ...,
  lhs = NULL, rhs = NULL, dot = "separate")

model.part(object, ...)
## S3 method for class 'Formula'
model.part(object, data, lhs = 0, rhs = 0,
  drop = FALSE, terms = FALSE, dot = NULL, ...)

Arguments

`formula, object, x`	an object of class `Formula`.
`data`	a data.frame, list or environment containing the variables in `formula`. For `model.part` it needs to be the `model.frame`.
`lhs, rhs`	indexes specifying which elements of the left- and right-hand side, respectively, should be employed. `NULL` corresponds to all parts, `0` to none. At least one `lhs` or one `rhs` has to be specified.
`dot`	character specifying how to process formula parts with a dot (`.`) on the right-hand side. This can either be `"separate"` so that each formula part is expanded separately or `"sequential"` so that the parts are expanded sequentially conditional on all prior parts.
`drop`	logical. Should the `data.frame` be dropped for single column data frames?
`terms`	logical. Should the `"terms"` attribute (corresponding to the `model.part` extracted) be added?
`...`	further arguments passed to the respective `formula` methods.

Details

All three model computations leverage the corresponding standard methods. Additionally, they allow specification of the part(s) of the left- and right-hand side (LHS and RHS) that should be included in the computation.

The idea underlying all three model computations is to extract a suitable formula from the more general Formula and then calling the standard model.frame, model.matrix, and terms methods.

More specifically, if the Formula has multiple parts on the RHS, they are collapsed, essentially replacing | by +. If there is only a single response on the LHS, then it is kept on the LHS. Otherwise all parts of the formula are collapsed on the RHS (because formula objects can not have multiple responses). Hence, for multi-response Formula objects, the (non-generic) model.response does not give the correct results. To avoid confusion a new generic model.part with suitable formula method is provided which can always be used instead of model.response. Note, however, that it has a different syntax: It requires the Formula object in addition to the readily processed model.frame supplied in data (and optionally the lhs). Also, it returns either a data.frame with multiple columns or a single column (dropping the data.frame property) depending on whether multiple responses are employed or not.

If the the formula contains one or more dots (.), some care has to be taken to process these correctly, especially if the LHS contains transformartions (such as log, sqrt, cbind, Surv, etc.). Calling the terms method with the original data (untransformed, if any) resolves all dots (by default separately for each part, otherwise sequentially) and also includes the original and updated formula as part of the terms. When calling model.part either the original untransformed data should be provided along with a dot specification or the transformed model.frame from the same formula without another dot specification (in which case the dot is inferred from the terms of the model.frame).

References

Zeileis A, Croissant Y (2010). Extended Model Formulas in R: Multiple Parts and Multiple Responses. Journal of Statistical Software, 34(1), 1–13. http://www.jstatsoft.org/v34/i01/.

Examples

## artificial example data
set.seed(1090)
dat <- as.data.frame(matrix(round(runif(21), digits = 2), ncol = 7))
colnames(dat) <- c("y1", "y2", "y3", "x1", "x2", "x3", "x4")
for(i in c(2, 6:7)) dat[[i]] <- factor(dat[[i]] > 0.5, labels = c("a", "b"))
dat$y2[1] <- NA
dat

######################################
## single response and two-part RHS ##
######################################

## single response with two-part RHS
F1 <- Formula(log(y1) ~ x1 + x2 | I(x1^2))
length(F1)

## set up model frame
mf1 <- model.frame(F1, data = dat)
mf1

## extract single response
model.part(F1, data = mf1, lhs = 1, drop = TRUE)
model.response(mf1)
## model.response() works as usual

## extract model matrices
model.matrix(F1, data = mf1, rhs = 1)
model.matrix(F1, data = mf1, rhs = 2)

#########################################
## multiple responses and multiple RHS ##
#########################################

## set up Formula
F2 <- Formula(y1 + y2 | log(y3) ~ x1 + I(x2^2) | 0 + log(x1) | x3 / x4)
length(F2)

## set up full model frame
mf2 <- model.frame(F2, data = dat)
mf2

## extract responses
model.part(F2, data = mf2, lhs = 1)
model.part(F2, data = mf2, lhs = 2)
## model.response(mf2) does not give correct results!

## extract model matrices
model.matrix(F2, data = mf2, rhs = 1)
model.matrix(F2, data = mf2, rhs = 2)
model.matrix(F2, data = mf2, rhs = 3)

#######################
## Formulas with '.' ##
#######################

## set up Formula with a single '.'
F3 <- Formula(y1 | y2 ~ .)
mf3 <- model.frame(F3, data = dat)
## without y1 or y2
model.matrix(F3, data = mf3)
## without y1 but with y2
model.matrix(F3, data = mf3, lhs = 1)
## without y2 but with y1
model.matrix(F3, data = mf3, lhs = 2)

## set up Formula with multiple '.'
F3 <- Formula(y1 | y2 | log(y3) ~ . - x3 - x4 | .)
## process both '.' separately (default)
mf3 <- model.frame(F3, data = dat, dot = "separate")
## only x1-x2
model.part(F3, data = mf3, rhs = 1)
## all x1-x4
model.part(F3, data = mf3, rhs = 2)
## process the '.' sequentially, i.e., the second RHS conditional on the first
mf3 <- model.frame(F3, data = dat, dot = "sequential")
## only x1-x2
model.part(F3, data = mf3, rhs = 1)
## only x3-x4
model.part(F3, data = mf3, rhs = 2)

##############################
## Process multiple offsets ##
##############################

## set up Formula
F4 <- Formula(y1 ~ x3 + offset(x1) | x4 + offset(log(x2)))
mf4 <- model.frame(F4, data = dat)
## model.part can be applied as above and includes offset!
model.part(F4, data = mf4, rhs = 1)
## additionally, the corresponding corresponding terms can be included
model.part(F4, data = mf4, rhs = 1, terms = TRUE)
## hence model.offset() can be applied to extract offsets
model.offset(model.part(F4, data = mf4, rhs = 1, terms = TRUE))
model.offset(model.part(F4, data = mf4, rhs = 2, terms = TRUE))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Formula)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/Formula/model.frame.Formula.Rd_%03d_medium.png", width=480, height=480)
> ### Name: model.frame.Formula
> ### Title: Model Frame/Matrix/Response Construction for Extended Formulas
> ### Aliases: terms.Formula model.matrix.Formula model.frame.Formula
> ###   model.part model.part.formula model.part.Formula
> ### Keywords: models
> 
> ### ** Examples
> 
> ## artificial example data
> set.seed(1090)
> dat <- as.data.frame(matrix(round(runif(21), digits = 2), ncol = 7))
> colnames(dat) <- c("y1", "y2", "y3", "x1", "x2", "x3", "x4")
> for(i in c(2, 6:7)) dat[[i]] <- factor(dat[[i]] > 0.5, labels = c("a", "b"))
> dat$y2[1] <- NA
> dat
    y1   y2   y3   x1   x2 x3 x4
1 0.82 <NA> 0.27 0.09 0.22  b  a
2 0.70    b 0.17 0.26 0.46  a  a
3 0.65    a 0.28 0.03 0.37  b  b
> 
> ######################################
> ## single response and two-part RHS ##
> ######################################
> 
> ## single response with two-part RHS
> F1 <- Formula(log(y1) ~ x1 + x2 | I(x1^2))
> length(F1)
[1] 1 2
> 
> ## set up model frame
> mf1 <- model.frame(F1, data = dat)
> mf1
     log(y1)   x1   x2 I(x1^2)
1 -0.1984509 0.09 0.22  0.0081
2 -0.3566749 0.26 0.46  0.0676
3 -0.4307829 0.03 0.37   9e-04
> 
> ## extract single response
> model.part(F1, data = mf1, lhs = 1, drop = TRUE)
         1          2          3 
-0.1984509 -0.3566749 -0.4307829 
> model.response(mf1)
         1          2          3 
-0.1984509 -0.3566749 -0.4307829 
> ## model.response() works as usual
> 
> ## extract model matrices
> model.matrix(F1, data = mf1, rhs = 1)
  (Intercept)   x1   x2
1           1 0.09 0.22
2           1 0.26 0.46
3           1 0.03 0.37
attr(,"assign")
[1] 0 1 2
> model.matrix(F1, data = mf1, rhs = 2)
  (Intercept) I(x1^2)
1           1  0.0081
2           1  0.0676
3           1  0.0009
attr(,"assign")
[1] 0 1
> 
> #########################################
> ## multiple responses and multiple RHS ##
> #########################################
> 
> ## set up Formula
> F2 <- Formula(y1 + y2 | log(y3) ~ x1 + I(x2^2) | 0 + log(x1) | x3 / x4)
> length(F2)
[1] 2 3
> 
> ## set up full model frame
> mf2 <- model.frame(F2, data = dat)
> mf2
    y1 y2   log(y3)   x1 I(x2^2)   log(x1) x3 x4
2 0.70  b -1.771957 0.26  0.2116 -1.347074  a  a
3 0.65  a -1.272966 0.03  0.1369 -3.506558  b  b
> 
> ## extract responses
> model.part(F2, data = mf2, lhs = 1)
    y1 y2
2 0.70  b
3 0.65  a
> model.part(F2, data = mf2, lhs = 2)
    log(y3)
2 -1.771957
3 -1.272966
> ## model.response(mf2) does not give correct results!
> 
> ## extract model matrices
> model.matrix(F2, data = mf2, rhs = 1)
  (Intercept)   x1 I(x2^2)
2           1 0.26  0.2116
3           1 0.03  0.1369
attr(,"assign")
[1] 0 1 2
> model.matrix(F2, data = mf2, rhs = 2)
    log(x1)
2 -1.347074
3 -3.506558
attr(,"assign")
[1] 1
> model.matrix(F2, data = mf2, rhs = 3)
  (Intercept) x3b x3a:x4b x3b:x4b
2           1   0       0       0
3           1   1       0       1
attr(,"assign")
[1] 0 1 2 2
attr(,"contrasts")
attr(,"contrasts")$x3
[1] "contr.treatment"

attr(,"contrasts")$x4
[1] "contr.treatment"

> 
> #######################
> ## Formulas with '.' ##
> #######################
> 
> ## set up Formula with a single '.'
> F3 <- Formula(y1 | y2 ~ .)
> mf3 <- model.frame(F3, data = dat)
> ## without y1 or y2
> model.matrix(F3, data = mf3)
  (Intercept)   y3   x1   x2 x3b x4b
2           1 0.17 0.26 0.46   0   0
3           1 0.28 0.03 0.37   1   1
attr(,"assign")
[1] 0 1 2 3 4 5
attr(,"contrasts")
attr(,"contrasts")$x3
[1] "contr.treatment"

attr(,"contrasts")$x4
[1] "contr.treatment"

> ## without y1 but with y2
> model.matrix(F3, data = mf3, lhs = 1)
  (Intercept) y2b   y3   x1   x2 x3b x4b
2           1   1 0.17 0.26 0.46   0   0
3           1   0 0.28 0.03 0.37   1   1
attr(,"assign")
[1] 0 1 2 3 4 5 6
attr(,"contrasts")
attr(,"contrasts")$y2
[1] "contr.treatment"

attr(,"contrasts")$x3
[1] "contr.treatment"

attr(,"contrasts")$x4
[1] "contr.treatment"

> ## without y2 but with y1
> model.matrix(F3, data = mf3, lhs = 2)
  (Intercept)   y1   y3   x1   x2 x3b x4b
2           1 0.70 0.17 0.26 0.46   0   0
3           1 0.65 0.28 0.03 0.37   1   1
attr(,"assign")
[1] 0 1 2 3 4 5 6
attr(,"contrasts")
attr(,"contrasts")$x3
[1] "contr.treatment"

attr(,"contrasts")$x4
[1] "contr.treatment"

> 
> ## set up Formula with multiple '.'
> F3 <- Formula(y1 | y2 | log(y3) ~ . - x3 - x4 | .)
> ## process both '.' separately (default)
> mf3 <- model.frame(F3, data = dat, dot = "separate")
> ## only x1-x2
> model.part(F3, data = mf3, rhs = 1)
    x1   x2
2 0.26 0.46
3 0.03 0.37
> ## all x1-x4
> model.part(F3, data = mf3, rhs = 2)
    x1   x2 x3 x4
2 0.26 0.46  a  a
3 0.03 0.37  b  b
> ## process the '.' sequentially, i.e., the second RHS conditional on the first
> mf3 <- model.frame(F3, data = dat, dot = "sequential")
> ## only x1-x2
> model.part(F3, data = mf3, rhs = 1)
    x1   x2
2 0.26 0.46
3 0.03 0.37
> ## only x3-x4
> model.part(F3, data = mf3, rhs = 2)
  x3 x4
2  a  a
3  b  b
> 
> ##############################
> ## Process multiple offsets ##
> ##############################
> 
> ## set up Formula
> F4 <- Formula(y1 ~ x3 + offset(x1) | x4 + offset(log(x2)))
> mf4 <- model.frame(F4, data = dat)
> ## model.part can be applied as above and includes offset!
> model.part(F4, data = mf4, rhs = 1)
  x3 offset(x1)
1  b       0.09
2  a       0.26
3  b       0.03
> ## additionally, the corresponding corresponding terms can be included
> model.part(F4, data = mf4, rhs = 1, terms = TRUE)
  x3 offset(x1)
1  b       0.09
2  a       0.26
3  b       0.03
> ## hence model.offset() can be applied to extract offsets
> model.offset(model.part(F4, data = mf4, rhs = 1, terms = TRUE))
[1] 0.09 0.26 0.03
> model.offset(model.part(F4, data = mf4, rhs = 2, terms = TRUE))
[1] -1.5141277 -0.7765288 -0.9942523
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>