bigglm creates a generalized linear model object that uses only
p^2 memory for p variables.
Usage
bigglm(formula, data, family=gaussian(),...)
## S3 method for class 'data.frame'
bigglm(formula, data,...,chunksize=5000)
## S3 method for class 'function'
bigglm(formula, data, family=gaussian(),
weights=NULL, sandwich=FALSE, maxit=8, tolerance=1e-7,
start=NULL,quiet=FALSE,...)
## S3 method for class 'RODBC'
bigglm(formula, data, family=gaussian(),
tablename, ..., chunksize=5000)
## S4 method for signature 'ANY,DBIConnection'
bigglm(formula, data, family=gaussian(),
tablename, ..., chunksize=5000)
## S3 method for class 'bigglm'
vcov(object,dispersion=NULL, ...)
## S3 method for class 'bigglm'
deviance(object,...)
## S3 method for class 'bigglm'
family(object,...)
## S3 method for class 'bigglm'
AIC(object,...,k=2)
Arguments
formula
A model formula
data
See Details below. Method dispatch is on this argument
family
A glm family object
chunksize
Size of chunks for processng the data frame
weights
A one-sided, single term formula specifying weights
sandwich
TRUE to compute the Huber/White sandwich
covariance matrix (uses p^4 memory rather than p^2)
maxit
Maximum number of Fisher scoring iterations
tolerance
Tolerance for change in coefficient (as multiple of
standard error)
start
Optional starting values for coefficients. If
NULL, maxit should be at least 2 as some quantities
will not be computed on the first iteration
object
A bigglm object
dispersion
Dispersion parameter, or NULL to estimate
tablename
For the SQLiteConnection method, the name of a
SQL table, or a string specifying a join or nested select
k
penalty per parameter for AIC
quiet
When FALSE, warn if the fit did not converge
...
Additional arguments
Details
The data argument may be a function, a data frame, or a
SQLiteConnection or RODBC connection object.
When it is a function the function must take a single argument
reset. When this argument is FALSE it returns a data
frame with the next chunk of data or NULL if no more data are
available. Whenreset=TRUE it indicates that the data should be
reread from the beginning by subsequent calls. The chunks need not be
the same size or in the same order when the data are reread, but the
same data must be provided in total. The bigglm.data.frame
method gives an example of how such a function might be written,
another is in the Examples below.
The model formula must not contain any data-dependent terms, as these
will not be consistent when updated. Factors are permitted, but the
levels of the factor must be the same across all data chunks (empty
factor levels are ok). Offsets are allowed (since version 0.8).
The SQLiteConnection and RODBC methods loads only the
variables needed for the model, not the whole table. The code in the
SQLiteConnection method should work for other DBI
connections, but I do not have any of these to check it with.