Last data update: 2014.03.03

R: Logit Regression Analysis
LogitR Documentation

Logit Regression Analysis

Description

Abbreviation: lr

Based directly on the standard R glm function with family="binomial", automatically provides a logit regression analysis with graphics from a single, simple function call with many default settings, each of which can be re-specified. By default the data exists as a data frame with the default name of mydata, such as data read by the lessR Read function. Specify the model in the function call according to an R formula, that is, the response variable followed by a tilde, followed by the list of predictor variables, each pair separated by a plus sign. The response variable is either a factor with two levels or a numeric variable with values only of 0 and 1.

Default output includes the inferential analysis of the estimated coefficients and model, sorted residuals and Cook's Distance, and sorted fitted values for existing data or new data. For a single predictor variable model, the scatterplot of the data with plotted logit function is provided.

Can also be called from the more general model function.

Usage

Logit(my.formula, data=mydata, digits.d=4, text.width=120, 

     brief=getOption("brief"),

     res.rows=NULL, res.sort=c("cooks","rstudent","dffits","off"), 
     pred=TRUE, pred.all=FALSE, cooks.cut=1, 

     X1.new=NULL, X2.new=NULL, X3.new=NULL, X4.new=NULL, 
     X5.new=NULL, X6.new=NULL,

     pdf.file=NULL, pdf.width=5, pdf.height=5, ...)

lr(...) 

Arguments

my.formula

Standard R formula for specifying a model. For example, for a response variable named Y and two predictor variables, X1 and X2, specify the corresponding linear model as Y ~ X1 + X2.

data

The default name of the data frame that contains the data for analysis is mydata, otherwise explicitly specify.

digits.d

For the Basic Analysis, it provides the number of decimal digits. For the rest of the output, it is a suggestion only.

text.width

Width of the text output at the console.

brief

If set to TRUE, reduced text output. Can change system default with theme function.

res.rows

Default is 25, which lists the first 25 rows of data sorted by the specified sort criterion. To turn this option off, specify a value of 0. To see the output for all observations, specify a value of "all".

res.sort

Default is "cooks", for specifying Cook's distance as the sort criterion for the display of the rows of data and associated residuals. Other values are "rstudent" for Studentized residuals, and "off" to not provide the analysis.

pred

Default is TRUE, which, produces confidence and prediction intervals for each row, or selected rows, of data.

pred.all

Default is FALSE, which produces prediction intervals only for the first, middle and last five rows of data.

cooks.cut

Cutoff value of Cook's Distance at which observations with a larger value are flagged in red and labeled in the resulting scatterplot of Residuals and Fitted Values. Default value is 1.0.

X1.new

Values of the first listed predictor variable for which forecasted values and corresponding prediction intervals are calculated.

X2.new

Values of the second listed predictor variable for which forecasted values and corresponding prediction intervals are calculated.

X3.new

Values of the third listed predictor variable for which forecasted values and corresponding prediction intervals are calculated.

X4.new

Values of the fourth listed predictor variable for which forecasted values and corresponding prediction intervals are calculated.

X5.new

Values of the fifth listed predictor variable for which forecasted values and corresponding prediction intervals are calculated.

X6.new

Values of the sixth listed predictor variable for which forecasted values and corresponding prediction intervals are calculated.

pdf.file

Name of the pdf file to which graphics are redirected.

pdf.width

Width of the pdf file in inches.

pdf.height

Height of the pdf file in inches.

...

Other parameter values for R function glm which provides the core computations.

Details

OVERVIEW
The purpose of Logit is to combine the following function calls into one, as well as provide ancillary analyses such as as graphics, organizing output into tables and sorting to assist interpretation of the output. The basic analysis successively invokes several standard R functions beginning with the standard R function for estimation of the logit model, glm with family="binomial". The output of the analysis is stored in the object lm.out, available for further analysis in the R environment upon completion of the Logit function. By default automatically provides the analyses from the standard R functions, summary, confint and anova, with some of the standard output modified and enhanced. The residual analysis invokes fitted, resid, rstudent, and cooks.distance functions. The option for prediction intervals calls the standard generic R function predict.

The default analysis provides the model's parameter estimates and corresponding hypothesis tests and confidence intervals, goodness of fit indices, the ANOVA table, analysis of residuals and influence as well as the fitted value and standard error for each observation in the model.

DATA FRAME
The name mydata is by default provided by the Read function included in this package for reading and displaying information about the data in preparation for analysis. If all the variables in the model are not in the same data frame, the analysis will not be complete. The data frame does not need to be attached, just specified by name with the data option if the name is not the default mydata.

GRAPHICS
For models with a single predictor variable, a scatter plot of the data is produced, which also includes the fitted values. As with the density histogram plot of the residuals and the scatterplot of the fitted values and residuals, the scatterplot includes a colored background with grid lines. If more than a single predictor variable, then a scatter plot matrix is produced.

FORECASTS
Fitted and forecasted values are listed for all rows of data if the number of rows is less than 25 or if pred.all=TRUE. If only some of the rows are listed, sorted by the fitted value, the first and last four rows of data are listed. Also the 4 rows immeadiately around the fitted value of 0.5 are listed.

RESIDUAL ANALYSIS
By default the residual analysis lists the data and fitted value for each observation as well as the residual, Studentized residual, Cook's distance and dffits, with the first 20 observations listed and sorted by Cook's distance. The residual displayed is the actual difference between fitted and observed, that is, with the setting in the residuals of type="response". The res.sort option provides for sorting by the Studentized residuals or not sorting at all. The res.rows option provides for listing these rows of data and computed statistics statistics for any specified number of observations (rows). To turn off the analysis of residuals, specify res.rows=0.

INVOKED R OPTIONS
The options function is called to turn off the stars for different significance levels (show.signif.stars=FALSE), to turn off scientific notation for the output (scipen=30), and to set the width of the text output at the console to 120 characters. The later option can be re-specified with the text.width option. After reg is finished with a normal termination, the options are re-set to their values before the reg function began executing.

COLORS
The default color theme is dodgerblue, but a gray scale is available with "gray", and other themes are available as explained in theme, such as "red" and "green". Use the option ghost=TRUE for a black background, no grid lines and partial transparency of plotted colors.

Value

Following the standard R function glm, invisibly returns an object of class inheriting from "glm" which inherits from the class "lm". Particularly useful for comparing nested models. Assign the output of Logit for a model to an object. Then for a nested model. Then use the anova function to compare the models as shown in the examples below.

Author(s)

David W. Gerbing (Portland State University; gerbing@pdx.edu)

See Also

formula, glm, summary.glm, anova, confint, fitted, resid, rstudent, cooks.distance

Examples

# Gender has values of "M" and "F"
mydata <- Read("Employee", format="lessR", quiet=TRUE)
# logit regression
Logit(Gender ~ Years)

# short name
lr(Gender ~ Years)

# Modify the default settings as specified
Logit(Gender ~ Years, res.row=8, res.sort="rstudent", digits.d=8, pred=FALSE)

# Multiple logistic regression model
# Provide all default analyses
Logit(Gender ~ Years + Salary)

# compare nested models
# easier and better treatment of missing data to use lessR function:  Nest
full.model <- Logit(Gender ~ Years + Salary)
reduced.model <- Logit(Gender ~ Years)
anova(reduced.model, full.model)

# Save the three plots as pdf files 4 inches square, gray scale
Logit(Gender ~ Years, pdf.file="MyModel.pdf",
      pdf.width=4, pdf.height=4, colors="gray")

# Specify new values of the predictor variables to calculate
#  forecasted values
# Specify an input data frame other than mydata
mydata <- Read("Cars93", format="lessR")
Logit(Source ~ HP + MidPrice, X1.new=seq(100,250,50), X2.new=c(10,60,10))

Results