Last data update: 2014.03.03

R: Fitting function for multinomial response models with...
MRSP.fitR Documentation

Fitting function for multinomial response models with structured penalties


This function performs the actual fitting of multinomial response models with structured penalties. Function MRSP is actually just a user-friendly wrapper that prepares calls to this function. is designed mostly for internal use and therefore not user-friendly, so that it is highly recommended to use MRSP instead of calling directly.

Usage, coef.init = NULL, coef.stand.init = NULL, coef.pretres.init = NULL,
         offset = NULL, weights = NULL, grpindex = NULL, penindex = NULL, lambda,
         lambdaR = lambda, lambdaF = lambda, gamma = 1, psi = 1, indg = NULL,
         indcs = NULL, model = NULL, constr = NULL, control = NULL,
         fista.control = NULL, Proximal.control = NULL, Proximal.args = NULL,
         penweights = NULL, mlfit = NULL, adaptive = FALSE, threshold = FALSE,
         refit = FALSE, fusion = FALSE, nonneg = FALSE, ...)



A list that contains the data in the format required by dat is a list with elements 'y', 'x' and, possibly, 'V'. If nobs individual observations are available, dat$y must be a matrix of dimension nobs x K. For model=multinomlogit(), K is equal to the number of categories of the response variable, for
model=sequentiallogit(), it equals the number of response categories minus 1. In other words: K here always refers to the column rank of the response matrix. (Note that this is for notational convenience and in contrast to the documentation of MRSP, where K always refers to the number of categories of the multicategorical response!) The entries of dat$y are either 0 or 1, with dat$y[i,r]==1 indicating that class r is observed for the i-th observation. For model=multinomlogit(), the rowSums of dat$y must be all 1; for model=sequentiallogit(), they must be either 1 or 0; with a row of zeros indicating that the last category was observed.

dat$x is a matrix of dimension nobs x p which contains covariates whose value is constant across classes. They are called 'global predictors/covariates' in the following. In the context of discrete choice modeling, they are often referred to as 'individual-specific' predictors.

If available, covariates whose value varies from class to class can be included in an entry dat$V. Such variables are called 'category-specific' in the following since their value depends on the categories of the response variable. In the literature on discrete choice modeling, they are often referred to as 'alternative-specific'. These variables can either be equipped with global or with category-specific coefficients. If a total of L category-specific variables shall be used, dat$V must be a list (!) of length K whose elements each are matrices of dimension nobs x L.


An optional coefficient object supplying initial coefficient values to be used. A list whose first entry is a matrix of dimension K x p, with row r containing the coefficients for class r and column j containing the coefficients for global predictor x_j. If category-specific predictors are included, the second entry of coef.init is a matrix of dimension K x L that contains the coefficients for those category-specific predictors.


Optional initial coefficient values for the standardized predictors. Same structure as coef.init.


Optional initial coefficient values, prior to potential thresholding, for the standardized predictors. Same structure as coef.init.


An optional vector or matrix of offset values to be used. Either length nobs or dimension nobs x K.


An optional vector of observation weights of length nobs.


A list of one or two integer vectors that indicate which columns of the design matrix form a group that has to be penalized jointly, e.g. the different dummies of a categorical predictor. The first element is the grouping vector for x, the optional second one for V. Those columns with the same number belong to one group. The numbers must begin with 1 and increase with every group. An example: grpindex = list(c(1,2,3,3,4,4,4)) means that variables 1 and 2 form their own, 'scalar' group; variables 3 and 4 as well as variables 5, 6 and 7 form multi-parameter-groups.


A list of one or two vectors which specifies the exact penalty type to use for each covariate. The first entry specifies the penalty type for the variables in dat$x, the (optional) second entry those for the variables in dat$V. The following penalty types are available:

1: global predictor ('x') whose coefficients shall be penalized with a group lasso penalty with grouping 'across' categories, i.e. CATS Lasso (see Tutz, Poessnecker and Uhlmann, 2015).
10: global predictor, unpenalized.
11: global predictor, sparse group lasso.
12: global predictor, ordinary lasso.
13: global predictor, ridge penalty. does not support penweights.
2: category-specific predictor with global coefficient which is penalized with the ordinary (group-)lasso. (depending on grpindex.)
20: category-specific, unpenalized.
21: category-specific, ridge penalty.
3: category-speficic predictor with category-specific coefficients that are penalized by a group lasso like in '1'.
30: category-specific with category-specific coefs, unpenalized.
31: category-specific with category-specific coefficients and sparse group lasso penalty.
32: cat-cat-specific, with ordinary lasso.
33: cat-cat, with ridge. does not support penweights.
4: global predictor with global effect, penalized. (cf. the '2'-series).
40: global predictor, global effect, unpenalized.
41: global predictor, global effect, ridge penalty.

The '4-series' only makes sense for ordinal models!


Optional object specifying the lambda values to be used as tuning parameter(s) for the main variable selection penalty. Either a vector or a single numeric. If missing, a suitable grid of lambda values is computed. See arguments nrlambda, lambdamin and lambdamax in function MRSP.


Lambda(s) to be used for ridge penalties. Typically, if only a ridge penalty and no other penalty is used, one can specify the Ridge lambda via argument lambda instead.


Lambda(s) to be used for fusion penalties. Not available yet for end-users, but included for compatibility with future releases of MRSP.


See argument gamma in MRSP.


See argument psi in MRSP.


A vector of the column indices of the category-specific variables that are equipped with global coefficients.


A vector of the column indices of the category-specific variables that are equipped with category-specific coefficients


An object of class MRSP.model that specifies the model to be used. Currently, model = multinomlogit() and model = sequentiallogit() are available, yielding a multinomial or sequential logit model, respectively. Cumulative logit models will be included in future versions of MRSP.


The identifiability constraint to be used. The coefficients of predictors which do not vary over categories (i.e. global/individual-specific predictors) are not identifiable in (unpenalized) multinomial logit models. If constr is an integer in [1, K], the corresponding class is used as reference, which means that the coefficients of global predictors for this class are set to 0. If constr = "symmetric", a symmetric side constraint is used, which means that all coefficients belonging to the same global predictor sum to zero. If constr = "none", no constraint is used for penalized parameter groups and identifiability is ensured by the penalty term (see Friedman, Hastie and Tibshirani, 2010.) For ordinal regression, constr must take value "none". If left unspecified, a symmetric side constraint is used for multinomial and no constraint for ordinal models.


An object of class MRSP.control that stores control information. It's slots max.iter and rel.tol specify the max number of iterations and the relative change in penalized log-likelihood values that indicates convergence. The other slots should not be changed unless by experienced users.


An object of class fista.control that contains control information for the FISTA algorithm that is internally used to compute numerical estimates. Not intended for end-users!

Proximal.control, Proximal.args

Arguments to be passed to the proximal gradient algorithm. Not intended for end-users!


An optional list containing weights for the various penalty terms of different coefficients or coefficient groups. Assuming that category-specific covariates are present, the first element of penweights is a list of length two, with the first element of penweights[[1]] being a numeric of length p that contains the weights for the CATS penalty on the group of coefficients of the global covariates for the response classes. The second element of penweights[[1]] is a numeric of length L with the group penalty weights for the category-specific variables. The second element of penweights is again a list of length two. The first element of penweights[[2]] is a K x p matrix with penalty weights for unstructured lasso penalties on atomic coefficients belonging to global predictors. The second element of penweights[[2]] is a K x L matrix with penalty weights for unstructured lasso penalties on category-specific covariates.


A list that contains information about the ML or 'pseudo-ML' coefficients of the specified model. It must contain at least one entry called 'coef.stand' that has the same structure as the coefficient object (see coef.init). The value of those parameters must be known to compute the effective degrees of freedom of penalized parameter estimates with grouped penalties.


Should adaptive weights be used? Use adaptive="ML" to obtain the traditional adaptive weights proposed in the literature. Using adaptive = TRUE computes the penalized estimator with whatever penalty is specified and no adaptive weights and computes adaptive weights from the output of this penalized model. The final output is the computed with those adaptive weights. It is strongly recommended to prefer adaptive="ML" over adaptive=TRUE.


If TRUE, the coefficients will be thresholded with an appropriate threshold value. You can also specify an explicit nonnegative value to be used as the threshold.


Should refitting be performed? If TRUE, the model is first fit traditionally, and then refitted on the active set found by this first fit. This can improve variable selection, but tends to be rather slow and time-consuming.


If fusion penalties are used, this specifies the type of fusion. Not yet supported for end-users of MRSP, but included for compatibility with future releases of MRSP.


If TRUE, all coefficients are restricted to be nonnegative.


Further arguments or objects to be passed to


This function does the actual work of fitting multinomial response models with structured penalties. It is intended mainly for internal use. The main purpose of function MRSP is to provide a user-friendly wrapper that prepares and evaluates a call to


Depending on nrlambda, either an object of class MRSP or of class MRSP.list, which are lists of length nrlambda whose elements are MRSP objects.


Wolfgang Poessnecker


Tutz, G., Poessnecker, W., Uhlmann, L. (2015) Variable Selection in General Multinomial Logit Models
Computational Statistics and Data Analysis, Vol. 82, 207-222.
