Most of this documentation is copied from R's documentation for lm.
gpuLm is used to fit linear models using a
GPU enabled QR decomposition.
It can be used to carry out regression,
single stratum analysis of variance and
analysis of covariance (although aov may provide a more
convenient interface for these).
Note: The QR decomposition employed by gpuLm is optimized for speed
and uses minimal pivoting. If rank-revealing pivot is desired, then the
function gpuQR, should be used. The most reliable
determination of rank, however, will be obtained with the svd command.
Usage
gpuLm(formula, data, subset, weights, na.action,
method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
singular.ok = TRUE, contrasts = NULL, useSingle = TRUE, offset, ...)
Arguments
formula
an object of class "formula" (or one that
can be coerced to that class): a symbolic description of the
model to be fitted. The details of model specification are given
under ‘Details’.
data
an optional data frame, list or environment (or object
coercible by as.data.frame to a data frame) containing
the variables in the model. If not found in data, the
variables are taken from environment(formula),
typically the environment from which lm is called.
subset
an optional vector specifying a subset of observations
to be used in the fitting process.
weights
an optional vector of weights to be used in the fitting
process. Should be NULL or a numeric vector.
If non-NULL, weighted least squares is used with weights
weights (that is, minimizing sum(w*e^2)); otherwise
ordinary least squares is used. See also ‘Details’,
na.action
a function which indicates what should happen
when the data contain NAs. The default is set by
the na.action setting of options, and is
na.fail if that is unset. The ‘factory-fresh’
default is na.omit. Another possible value is
NULL, no action. Value na.exclude can be useful.
method
the method to be used; for fitting, currently only
method = "qr" is supported; method = "model.frame" returns
the model frame (the same as with model = TRUE, see below).
model, x, y, qr
logicals. If TRUE the corresponding
components of the fit (the model frame, the model matrix, the
response, the qr decomposition) are returned.
singular.ok
logical. If FALSE (the default in S but
not in R) a singular fit is an error.
contrasts
an optional list. See the contrasts.arg
of model.matrix.default.
useSingle
an optional logical. In the future, setting this to
FALSE will result in using double precision arithmetic on the
gpu, but this is not yet implemented
offset
this can be used to specify an a priori known
component to be included in the linear predictor during fitting.
This should be NULL or a numeric vector of length equal to
the number of cases. One or more offset terms can be
included in the formula instead or as well, and if more than one are
specified their sum is used. See model.offset.
...
additional arguments to be passed to the low level
regression fitting functions (see below).
Details
Models for lm are specified symbolically. A typical model has
the form response ~ terms where response is the (numeric)
response vector and terms is a series of terms which specifies a
linear predictor for response. A terms specification of the form
first + second indicates all the terms in first together
with all the terms in second with duplicates removed. A
specification of the form first:second indicates the set of
terms obtained by taking the interactions of all terms in first
with all terms in second. The specification first*second
indicates the cross of first and second. This is
the same as first + second + first:second.
If the formula includes an offset, this is evaluated and
subtracted from the response.
If response is a matrix a linear model is fitted separately by
least-squares to each column of the matrix.
See model.matrix for some further details. The terms in
the formula will be re-ordered so that main effects come first,
followed by the interactions, all second-order, all third-order and so
on: to avoid this pass a terms object as the formula (see
aov and demo(glm.vr) for an example).
A formula has an implied intercept term. To remove this use either
y ~ x - 1 or y ~ 0 + x. See formula for
more details of allowed formulae.
Non-NULLweights can be used to indicate that different
observations have different variances (with the values in
weights being inversely proportional to the variances); or
equivalently, when the elements of weights are positive
integers w_i, that each response y_i is the mean of
w_i unit-weight observations (including the case that there are
w_i observations equal to y_i and the data have been
summarized).
lm calls the lower level functions lm.fit, etc,
see below, for the actual numerical computations. For programming
only, you may consider doing likewise.
All of weights, subset and offset are evaluated
in the same way as variables in formula, that is first in
data and then in the environment of formula.
Value
lm returns an object of class"lm" or for
multiple responses of class c("mlm", "lm").
The functions summary and anova are used to
obtain and print a summary and analysis of variance table of the
results. The generic accessor functions coefficients,
effects, fitted.values and residuals extract
various useful features of the value returned by lm.
An object of class "lm" is a list containing at least the
following components:
coefficients
a named vector of coefficients
residuals
the residuals, that is response minus fitted values.
fitted.values
the fitted mean values.
rank
the numeric rank of the fitted linear model.
weights
(only for weighted fits) the specified weights.
df.residual
the residual degrees of freedom.
call
the matched call.
terms
the terms object used.
contrasts
(only where relevant) the contrasts used.
xlevels
(only where relevant) a record of the levels of the
factors used in fitting.
offset
the offset used (missing if none were used).
y
if requested, the response used.
x
if requested, the model matrix used.
model
if requested (the default), the model frame used.
na.action
(where relevant) information returned by
model.frame on the special handling of NAs.
In addition, non-null fits will have components assign,
effects and (unless not requested) qr relating to the linear
fit, for use by extractor functions such as summary and
effects.
Using time series
Considerable care is needed when using lm with time series.
Unless na.action = NULL, the time series attributes are
stripped from the variables before the regression is done. (This is
necessary as omitting NAs would invalidate the time series
attributes, and if NAs are omitted in the middle of the series
the result would no longer be a regular time series.)
Even if the time series attributes are retained, they are not used to
line up series, so that the time shift of a lagged or differenced
regressor would be ignored. It is good practice to prepare a
data argument by ts.intersect(..., dframe = TRUE),
then apply a suitable na.action to that data frame and call
gpuLm with na.action = NULL so that residuals and fitted
values are time series.
Note
Offsets specified by offset will not be included in predictions
by predict.lm, whereas those specified by an offset term
in the formula will be.
Author(s)
The design was inspired by the S function of the same name described
in Chambers (1992). The implementation of model formula by Ross Ihaka
was based on Wilkinson & Rogers (1973).
This function was adapted for Nvidia's CUDA–supporting GPGPUs by
Mark Seligman at Rapid Biologics LLC.
http://www.rapidbiologics.com
References
Chambers, J. M. (1992)
Linear models.
Chapter 4 of Statistical Models in S
eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Wilkinson, G. N. and Rogers, C. E. (1973)
Symbolic descriptions of factorial models for analysis of variance.
Applied Statistics, 22, 392–9.
See Also
summary.lm for summaries and anova.lm for
the ANOVA table; aov for a different interface.
The generic functions coef, effects,
residuals, fitted, vcov.
predict.lm (via predict) for prediction,
including confidence and prediction intervals;
confint for confidence intervals of parameters.
lm.influence for regression diagnostics, and
glm for generalized linear models.
The underlying low level functions,
lm.fit for plain, and lm.wfit for weighted
regression fitting.
More lm() examples are available e.g., in
anscombe,
attitude,
freeny,
LifeCycleSavings,
longley,
stackloss,
swiss.
biglm in package biglm for an alternative
way to fit linear models to large datasets (especially those with many
cases).
Examples
# require(graphics)
## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2,10,20, labels=c("Ctl","Trt"))
weight <- c(ctl, trt)
anova(lm.D9 <- gpuLm(weight ~ group))
summary(lm.D90 <- gpuLm(weight ~ group - 1))# omitting intercept
summary(resid(lm.D9) - resid(lm.D90)) #- residuals almost identical
opar <- par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0))
plot(lm.D9, las = 1) # Residuals, Fitted, ...
par(opar)
## model frame :
stopifnot(identical(gpuLm(weight ~ group, method = "model.frame"),
model.frame(lm.D9)))
### less simple examples in "See Also" above