Last data update: 2014.03.03

R: Bare-bones linear model fitting function
fastLmR Documentation

Bare-bones linear model fitting function

Description

fastLm estimates the linear model using the solve function of Armadillo linear algebra library.

Usage

fastLmPure(X, y)

fastLm(X, ...)
## Default S3 method:
fastLm(X, y, ...)
## S3 method for class 'formula'
fastLm(formula, data = list(), ...)

Arguments

y

a vector containing the explained variable.

X

a model matrix.

formula

a symbolic description of the model to be fit.

data

an optional data frame containing the variables in the model.

...

not used

Details

Linear models should be estimated using the lm function. In some cases, lm.fit may be appropriate.

The fastLmPure function provides a reference use case of the Armadillo library via the wrapper functions in the RcppArmadillo package.

The fastLm function provides a more standard implementation of a linear model fit, offering both a default and a formula interface as well as print, summary and predict methods.

Lastly, one must be be careful in timing comparisons of lm and friends versus this approach based on Armadillo. The reason that Armadillo can do something like lm.fit faster than the functions in the stats package is because Armadillo uses the Lapack version of the QR decomposition while the stats package uses a modified Linpack version. Hence Armadillo uses level-3 BLAS code whereas the stats package uses level-1 BLAS. However, Armadillo will either fail or, worse, produce completely incorrect answers on rank-deficient model matrices whereas the functions from the stats package will handle them properly due to the modified Linpack code.

An example of the type of situation requiring extra care in checking for rank deficiency is a two-way layout with missing cells (see the examples section). These cases require a special pivoting scheme of “pivot only on (apparent) rank deficiency” which is not part of conventional linear algebra software.

Value

fastLmPure returns a list with three components:

coefficients

a vector of coefficients

stderr

a vector of the (estimated) standard errors of the coefficient estimates

df.residual

a scalar denoting the degrees of freedom in the model

fastLm returns a richer object which also includes the residuals, fitted values and call argument similar to the lm or rlm functions..

Author(s)

Armadillo is written by Conrad Sanderson. RcppArmadillo is written by Romain Francois, Dirk Eddelbuettel and Douglas Bates.

References

Armadillo project: http://arma.sourceforge.net/

See Also

lm, lm.fit

Examples

  data(trees, package="datasets")

  ## bare-bones direct interface
  flm <- fastLmPure( cbind(1, log(trees$Girth)), log(trees$Volume) )
  print(flm)

  ## standard R interface for formula or data returning object of class fastLm
  flmmod <- fastLm( log(Volume) ~ log(Girth), data=trees)
  summary(flmmod)

  ## case where fastLm breaks down
  dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]),
                   f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ]
  xtabs(~ f2 + f1, dd)     # one missing cell
  mm <- model.matrix(~ f1 * f2, dd)
  kappa(mm)                # large, indicating rank deficiency
  set.seed(1)
  dd$y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1)
  summary(lm(y ~ f1 * f2, dd))     # detects rank deficiency
  summary(fastLm(y ~ f1 * f2, dd)) # some huge coefficients

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(RcppArmadillo)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/RcppArmadillo/fastLm.Rd_%03d_medium.png", width=480, height=480)
> ### Name: fastLm
> ### Title: Bare-bones linear model fitting function
> ### Aliases: fastLm fastLmPure fastLm.default fastLm.formula
> ### Keywords: regression
> 
> ### ** Examples
> 
>   data(trees, package="datasets")
> 
>   ## bare-bones direct interface
>   flm <- fastLmPure( cbind(1, log(trees$Girth)), log(trees$Volume) )
>   print(flm)
$coefficients
          [,1]
[1,] -2.353325
[2,]  2.199970

$stderr
           [,1]
[1,] 0.23066284
[2,] 0.08983455

$df.residual
[1] 29

> 
>   ## standard R interface for formula or data returning object of class fastLm
>   flmmod <- fastLm( log(Volume) ~ log(Girth), data=trees)
>   summary(flmmod)

Call:
fastLm.formula(formula = log(Volume) ~ log(Girth), data = trees)

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-0.2060000 -0.0687020  0.0010105  0.0725850  0.2479600 

             Estimate    StdErr t.value   p.value    
(Intercept) -2.353325  0.230663 -10.202  4.18e-11 ***
log(Girth)   2.199970  0.089835  24.489 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.115 on 29 degrees of freedom
Multiple R-squared: 0.9539,	Adjusted R-squared: 0.9523
> 
>   ## case where fastLm breaks down
>   dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]),
+                    f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ]
>   xtabs(~ f2 + f1, dd)     # one missing cell
   f1
f2  A B C D
  a 2 0 2 2
  b 2 2 2 2
  c 2 2 2 2
>   mm <- model.matrix(~ f1 * f2, dd)
>   kappa(mm)                # large, indicating rank deficiency
[1] 3.295702e+16
>   set.seed(1)
>   dd$y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1)
>   summary(lm(y ~ f1 * f2, dd))     # detects rank deficiency

Call:
lm(formula = y ~ f1 * f2, data = dd)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.12155 -0.04702  0.00000  0.04702  0.12155 

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.97786    0.05816   16.81 3.41e-09 ***
f1B         12.03807    0.08226  146.35  < 2e-16 ***
f1C          3.11722    0.08226   37.90 5.22e-13 ***
f1D          4.06852    0.08226   49.46 2.83e-14 ***
f2b          5.06012    0.08226   61.52 2.59e-15 ***
f2c          5.99759    0.08226   72.91 4.01e-16 ***
f1B:f2b     -3.01476    0.11633  -25.92 3.27e-11 ***
f1C:f2b      7.70300    0.11633   66.22 1.16e-15 ***
f1D:f2b      8.96425    0.11633   77.06  < 2e-16 ***
f1B:f2c           NA         NA      NA       NA    
f1C:f2c     10.96133    0.11633   94.23  < 2e-16 ***
f1D:f2c     12.04108    0.11633  103.51  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.08226 on 11 degrees of freedom
Multiple R-squared:  0.9999,	Adjusted R-squared:  0.9999 
F-statistic: 1.855e+04 on 10 and 11 DF,  p-value: < 2.2e-16

>   summary(fastLm(y ~ f1 * f2, dd)) # some huge coefficients

Call:
fastLm.formula(formula = y ~ f1 * f2, data = dd)

Residuals:
       Min.     1st Qu.      Median     3rd Qu.        Max. 
-1.2155e-01 -4.7016e-02  1.0658e-14  4.7016e-02  1.2155e-01 

             Estimate    StdErr t.value   p.value    
(Intercept)  0.977859  0.061004  16.029 1.845e-08 ***
f1B          7.020458  0.040669 172.623 < 2.2e-16 ***
f1C          3.117222  0.086273  36.132 6.266e-12 ***
f1D          4.068523  0.086273  47.159 4.431e-13 ***
f2b          5.060123  0.086273  58.653 5.040e-14 ***
f2c          5.997592  0.086273  69.519 9.246e-15 ***
f1B:f2b      2.002847  0.064304  31.147 2.733e-11 ***
f1C:f2b      7.702999  0.122008  63.135 2.418e-14 ***
f1D:f2b      8.964251  0.122008  73.473 5.323e-15 ***
f1B:f2c      5.017610  0.064304  78.030 2.919e-15 ***
f1C:f2c     10.961326  0.122008  89.841 7.143e-16 ***
f1D:f2c     12.041081  0.122008  98.691 2.794e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.08627 on 10 degrees of freedom
Multiple R-squared: 0.9999,	Adjusted R-squared: 0.9999
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>