A symbolic description of the model to be fitted. The left-hand
side specifies the response, which must be a categorical variable with K
categories. The right-hand-side specifies the covariate structure. See details below.
data
A data frame containing the variables in formula. It must
be in long format. This means that K rows are required for each
individual observation. For details and a possibility to convert to this
format, see mlogit.data from the mlogit package.
The response variable must be either a 0-1-vector or a logical
vector, with 1 or TRUE indicating the observed class/category/alternative.
class.names
An optional character vector of length K specifying
the names of the response classes/categories.
model
An object of class MRSP.model that specifies the model to
be used. Currently, model = multinomlogit() and
model = sequentiallogit() are available, yielding a multinomial or
sequential logit model, respectively. Cumulative logit models will be included
in future versions of MRSP.
constr
The identifiability constraint to be used. The coefficients of
predictors which do not vary over categories (i.e. global/individual-specific
predictors) are not identifiable in (unpenalized) multinomial logit models.
Either an integer in [1,K] or "symmetric" or "none". See
details below.
offset
An optional vector of offsets. Must be of appropriate length.
weights
An optional vector of observations weights. Must be of
appropriate length.
penweights
An object containing weights that modify the penalty terms
on different parameters. See MRSP.fit for details.
standardize
If TRUE, predictors are mean-centered and standardized to
unit variance. The reported coefficients, by default, correspond to the
original covariates.
nrlambda, lambdamin, lambdamax
If a sequence of lambda values is
not specified explicitly via argument lambda, MRSP computes a
suitable grid of length nrlambda, ranging from lambdamin to
lambdamax. If lambdamax is missing, MRSP tries to find
a suitable value for it.
control
An object of class MRSP.control that stores control
information. It's slots max.iter and rel.tol specify the max
number of iterations and the relative change in penalized log-likelihood
values that indicates convergence. The other slots should not be changed
unless by experienced users.
penalty
Specifies the general type of penalty to be applied. FALSE
means no penalty is used. TRUE means that a lasso-type penalty is applied.
"Ridge" applies a ridge penalty. Arguments group.classes,
group.dummies and sparse.groups have no effect unless
penalty = TRUE.
group.classes
If TRUE, lasso-type penalties will be grouped across all
coefficients belonging to the same covariate. This corresponds to the
‘CATS Lasso’ penalty proposed in Tutz, Poessnecker and Uhlmann (2015).
group.dummies
If entries on the rhs of formula refer to factor
variables with more than 2 levels, several dummy variables will enter the
model that are related to the same actual covariate. Setting this argument to
TRUE will yield penalty terms that treat all corresponding coefficients as
one parameter group. This corresponds to the original ‘Group Lasso’
of Yuan and Lin (2006).
sparse.groups, gamma
If TRUE, parameter groups will also be penalized
by an L1 penalty on top of the unsquared L2 penalty. If the L2 penalty uses a
tuning parameter of value lambda, the L1 penalty will use tuning parameter
lambda*gamma.
adaptive
Should adaptive weights be used? Use adaptive="ML" to
obtain the traditional adaptive weights proposed in the literature. Using
adaptive = TRUE computes the penalized estimator with whatever penalty is
specified and no adaptive weights and computes adaptive weights from the output
of this penalized model. The final output is the computed with those
adaptive weights. It is strongly recommended to prefer adaptive="ML"
over adaptive=TRUE.
threshold
If TRUE, the coefficients will be thresholded with an
appropriate threshold value. You can also specify an explicit nonnegative
value to be used as the threshold.
refit
Should refitting be performed? If TRUE, the model is first fit
traditionally, and then refitted on the active set found by this first fit.
This can improve variable selection, but tends to be rather slow and memory-consuming.
lambda
Tuning parameter(s) to be used. Can be a nonnegative scalar or
vector. If missing, MRSP computes a suitable grid of lambda values,
see also nrlambda.
lambdaR
Lambda value(s) for ridge penalties. Same structure as
lambda.
lambdaF
Lambda values(s) for fusion penalties. Same structure as
lambda. Not yet supported for end-users of MRSP, but included
for compatibility with future releases of MRSP.
psi
A numeric that balances the weighting of penalties on global and
class-/category-/alternative-specific predictors (if present) in multinomial
logit models. Penalties on global predictors are weighted with psi,
penalties on class-specific predictors are weighted with (2-psi).
fusion
If fusion penalties are used, this specifies the type of fusion.
Not yet supported for end-users of MRSP, but included
for compatibility with future releases of MRSP.
nonneg
If TRUE, all coefficients are restricted to be nonnegative.
y, X, Z, penindex, grpindex, mlfit
These optional arguments allow to
supply the corresponding objects directly to MRSP.fit. It is highly
recommended not to use those arguments and to use formula instead for
the model specification.
perform.fit
If TRUE, the model is fitted. If FALSE, function MRSP
prepares the call to MRSP.fit and returns a list from which this call
can be accessed.
...
Further arguments to be passed.
Details
For model = multinomlogit(), a formula of the form
“Y ~ x | z1 | z2” yields linear predictors
This means that the x-variables have global values that are class-/category-/alternative-unspecific.
The coefficients belonging to those global variables are not identifiable in the
generic form of the multinomial logit model as given above. Therefore, an identifiability
constraint can be specified with argument constr: By setting constr = r (with r in [1,K]),
category r is chosen as reference category. Technically, this means setting beta_r = 0.
Alternatively, constr="symmetric" specifies a so-called symmetric side constraint, which technically
means imposing that
sum[ beta_sj ]_{s=1,...,K} = 0,
for all j=1,…,p.
If constr = "none", no constraint is used for penalized parameter groups and
identifiability is ensured by the penalty term (see Friedman, Hastie and Tibshirani, 2010).
Coefficients of an unpenalized x-variable are subject to a symmetric side constraint in this case.
The z1- and z2-variables are class-/category-/alternative-specific. The z1-variables
have a global effect while the category-specific z2-variables are equipped with coefficients
that are also category-specific. Note that no identifiability constraints are required for category-specific variables.
For model = sequentiallogit(), a formula of the form “Y ~ x1 | x2”
yields linear predictors of the form
eta_r = beta_0r + x1 ' alpha + x2 ' beta_r
for r=1,…,K-1,
which are connected to conditional probabilities via the logit link in the following way:
P(Y = r | Y >= r, x1, x2) = exp(eta_r) / (1 + exp(eta_r))
for r=1,…,K-1.
Note that in the sequential case, slot "mu" of an MRSP-class object
contains the unconditional class probabilities P(Y = r). If you want to get the
“discrete” hazard rates P(Y = r | Y >= r), use methods fitted or
predict with argument convert2hazard=TRUE.
Value
Depending on nrlambda, either an object of class MRSP or of
class MRSP.list, which are lists of length
nrlambda whose elements are MRSP objects. Additionally, the
attributes “topcall”, “call” and “dat” store, respectively, the call to
MRSP, the call to MRSP.fit which was prepared by MRSP and
the data object to be supplied to MRSP.fit. Note that
dat contains the standardized covariates if standardize = TRUE.
Yuan, M. and Lin, Y. (2006):
Model selection and estimation in regression with grouped variables Journal of the Royal Statistical Society Series B, Vol. 68(1), 49-67.
Examples
## load data
data(TravelMode, package="MRSP")
## bring the response variable to the form required by MRSP
TravelMode$choice <- ifelse(TravelMode$choice=="yes",1,0)
## construct a list of fitted models for different lambda values.
## income is a global predictor, wait is a class-specific predictor with global
## coefficients, vcost and travel are specified as class-specific predictors
## with class-specific coefficients. The fourth category ("car") is chosen as
## reference.
fit <- MRSP(choice~income|wait|vcost+travel, data=TravelMode, constr=4,
class.names=levels(TravelMode$mode), lambdamax=150, nrlambda=10,
group.classes=TRUE, sparse.groups=FALSE, adaptive="ML")
fit
## slots can be extracted from all elements via function 'extract':
BICs <- extract(fit, "BIC")
lambdagrid <- extract(fit, "lambda")
## to select a concrete lambda/model, one can use function 'select'. Here, we
## chose the best model according to its AIC value
bestfit <- select(fit, "AIC")
bestfit
## some methods:
summary(bestfit)
BIC(bestfit)
fitted(bestfit)[1:6,]
bestfit@coef
## get the coefficients belonging to standardized predictors:
coef(bestfit, type="stand")
residuals(bestfit)
predict(bestfit, newdata = TravelMode[1:40,c(4,5,6,8)])
## plot some coefficient paths:
par(mfrow=c(1,2))
## you can either specify the number of the variable...
plot(fit, 2, legendpars=list(x="bottomright"))
## ... or its name as a character string. lcex is the cex parameter for legends.
## set it to zero to disable legend plotting.
plot(fit,"travel", lambda = bestfit@lambda, lcex=0)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(MRSP)
Loading required package: parallel
Loading required package: compiler
Loading required package: matrixcalc
Loading required package: Formula
-----------------------------
This is MRSP version 0.4.3
Author: Wolfgang Poessnecker
Date: December 09, 2014
-----------------------------
Attaching package: 'MRSP'
The following object is masked from 'package:stats':
coefficients
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/MRSP/MRSP.Rd_%03d_medium.png", width=480, height=480)
> ### Name: MRSP
> ### Title: Multinomial Response Models with Structured Penalties
> ### Aliases: MRSP MRSP,ANY-method
> ### Keywords: Regularization Penalization Group Lasso Multinomial Logit
> ### Model CATS Lasso
>
> ### ** Examples
>
>
> ## load data
> data(TravelMode, package="MRSP")
> ## bring the response variable to the form required by MRSP
> TravelMode$choice <- ifelse(TravelMode$choice=="yes",1,0)
>
> ## construct a list of fitted models for different lambda values.
> ## income is a global predictor, wait is a class-specific predictor with global
> ## coefficients, vcost and travel are specified as class-specific predictors
> ## with class-specific coefficients. The fourth category ("car") is chosen as
> ## reference.
> fit <- MRSP(choice~income|wait|vcost+travel, data=TravelMode, constr=4,
+ class.names=levels(TravelMode$mode), lambdamax=150, nrlambda=10,
+ group.classes=TRUE, sparse.groups=FALSE, adaptive="ML")
>
> fit
Model object of class 'MRSP.list'.
A list of 10 objects of class 'MRSP', with additional attributes 'topcall', 'call' and 'dat'.
Topcall:
MRSP(formula = choice ~ income | wait | vcost + travel, data = TravelMode,
class.names = c("air", "train", "bus", "car"), constr = 4,
nrlambda = 10, lambdamax = 150, group.classes = TRUE, sparse.groups = FALSE,
adaptive = "ML", model = expression(multinomlogit()))
Model type:
Multinomial Logit Model
Lambda values ranging from 0.01 to 150.
>
> ## slots can be extracted from all elements via function 'extract':
> BICs <- extract(fit, "BIC")
> lambdagrid <- extract(fit, "lambda")
>
>
> ## to select a concrete lambda/model, one can use function 'select'. Here, we
> ## chose the best model according to its AIC value
> bestfit <- select(fit, "AIC")
> bestfit
Model object of class 'MRSP'.
Call:
MRSP(formula = choice ~ income | wait | vcost + travel, data = TravelMode,
class.names = c("air", "train", "bus", "car"), constr = 4,
nrlambda = 10, lambdamax = 150, group.classes = TRUE, sparse.groups = FALSE,
adaptive = "ML", model = expression(multinomlogit()))
Model type:
Multinomial Logit Model
Lambda value:
0.7178613
LogLik: -165.3431 approx. edf: 14.38038
AIC: 359.4469 BIC: 407.5796
>
> ## some methods:
> summary(bestfit)
$DevianceResiduals
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.14700 -0.62580 0.00000 -0.03904 0.61240 1.96800
$BrierScore
[1] 0.09532651
$loglik
[1] -165.3431
$coef
[[1]]
(Intercept) income
air 4.058384 0.009140216
train 4.901615 -0.041940503
bus 3.679695 -0.021976987
car 0.000000 0.000000000
[[2]]
wait vcost travel
air -0.09233675 0.02182513 -0.028462195
train -0.09233675 -0.02718600 -0.002740459
bus -0.09233675 -0.01683399 -0.003720115
car -0.09233675 -0.03194881 -0.003764431
attr(,"class")
[1] "MRSP.coef"
$edf
[1] 14.38038
$AIC
[1] 359.4469
$BIC
[1] 407.5796
> BIC(bestfit)
[1] 407.5796
> fitted(bestfit)[1:6,]
air train bus car
1 0.04056739 0.29485327 0.14273964 0.5218397
5 0.20929042 0.21228965 0.04522356 0.5331964
9 0.50786025 0.06676395 0.10157734 0.3237985
13 0.21186590 0.03883452 0.01716587 0.7321337
17 0.09253363 0.30823353 0.08522939 0.5140035
21 0.03303183 0.43104541 0.22685141 0.3090714
> bestfit@coef
[[1]]
(Intercept) income
air 4.058384 0.009140216
train 4.901615 -0.041940503
bus 3.679695 -0.021976987
car 0.000000 0.000000000
[[2]]
wait vcost travel
air -0.09233675 0.02182513 -0.028462195
train -0.09233675 -0.02718600 -0.002740459
bus -0.09233675 -0.01683399 -0.003720115
car -0.09233675 -0.03194881 -0.003764431
attr(,"class")
[1] "MRSP.coef"
> ## get the coefficients belonging to standardized predictors:
> coef(bestfit, type="stand")
Global predictors with category-specific coefficients:
(Intercept) income
air 5.064762 0.1797362
train 4.177966 -0.8247318
bus 3.663882 -0.4321627
car 0.000000 0.0000000
Class-specific predictors with category-specific coefficients:
vcost travel
air 0.7060807 -8.5745102
train -0.8795141 -0.8255895
bus -0.5446089 -1.1207203
car -1.0335993 -1.1340712
Category-specific predictors with global coefficients:
wait
-2.302302
> residuals(bestfit)
[1] 0.8064706 0.7930104 1.0619011 0.5583835 -0.8157973 -0.9173559
[7] -0.1333833 -1.5687646 -0.8609495 0.6567114 0.8555428 -0.5893411
[13] -0.4888737 0.7175009 1.2069469 0.3974495 -0.4237051 0.3174823
[19] -0.8682550 0.6844680 0.3406249 0.7902191 -1.2472055 0.4020283
[25] 0.2506884 0.0787724 -1.4529171 -0.8374900 -1.3786842 0.1272865
[31] -0.1914995 0.5345890 1.6882755 0.2325071 -0.4219568 -0.3238668
[37] 1.0610435 0.4779211 1.1995639 -0.1153891 -1.2255908 0.1254288
[43] -0.3368995 -0.2853178 -0.6718128 0.2499521 0.8076601 -0.9634196
[49] 1.1523811 0.0000000 1.9682518 -1.2997661 0.3959652 0.9164251
[55] -0.2336924 0.5447951 -0.5730134 0.4366256 1.0870673 -1.7079267
[61] 0.6096139 0.1717499 0.3703989 0.1252323 -1.9355285 0.4213976
[67] 0.6392411 1.2750483 1.5538752 -1.7580564 0.2663204 -0.5015661
[73] 0.1954144 1.5278638 0.7881882 -1.1603823 0.0000000 0.0000000
[79] -1.4931815 0.8396534 -1.9990310 -0.4138140 -0.1236986 0.8823712
[85] 0.2951353 -0.6247501 0.7961529 -0.2308948 -0.6465080 -0.5028100
[91] -0.5439674 0.2405083 -0.6800815 -0.1623978 -0.1477514 -0.4648180
[97] -0.4713474 -0.2277062 -1.4761235 -0.5013124 0.6127915 -0.6409052
[103] -1.2475948 -1.1437167 0.0000000 -0.7078344 0.9339288 -0.8444365
[109] 0.4133728 0.5479171 -0.5991515 0.5410789 0.8769415 -0.4004860
[115] -1.1920890 0.7764450 1.2559201 0.7396484 -0.8082189 -0.2418968
[121] 0.4905683 0.0000000 -0.4291783 0.4777202 0.3460029 -0.9225956
[127] 0.7368706 -0.9785930 0.7510478 -0.2272649 0.6176078 0.7208953
[133] -0.5982509 -0.8073537 0.8437302 -0.8603294 -1.0425202 0.5999197
[139] -1.5422742 0.8953472 0.5453318 -0.5703076 -2.1465929 0.7117663
[145] 0.2318330 0.6355097 -0.3528943 0.2961520 -0.6317435 -1.1928950
[151] -0.3845051 0.0000000 -1.0729519 -0.8617517 -1.1885595 0.6110338
[157] -0.7143971 -0.1793287 -2.0680139 0.6073485 -0.8648402 -1.7442785
[163] -0.4130425 0.5052354 0.5643808 -1.1339990 -0.5399943 0.4621152
[169] 1.0521807 -0.6264310 -0.4598958 1.5283956 -0.5006444 0.5344999
[175] 0.8857221 -0.6260981 0.6019520 0.5255920 -1.7802427 0.1819760
[181] -0.9829050 1.0202941 0.6866753 1.2727398 0.7422477 0.7230441
[187] -0.5247940 -0.1888991 0.7785051 0.0000000 1.2933743 -0.4020893
[193] 0.2433293 -0.7354390 -0.2303571 0.2504183 1.0215813 0.2000165
[199] -0.1016938 0.1356023 -0.5574834 1.0444842 0.5024799 0.5163399
[205] 0.9668138 -1.2575845 -0.4835862 -1.7208817 1.2960768 0.0000000
> predict(bestfit, newdata = TravelMode[1:40,c(4,5,6,8)])
air train bus car
1 0.04056739 0.294853271 0.142739642 0.52183970
5 0.20929042 0.212289646 0.045223563 0.53319637
9 0.50786025 0.066763950 0.101577341 0.32379846
13 0.21186590 0.038834523 0.017165867 0.73213371
17 0.09253363 0.308233528 0.085229388 0.51400346
21 0.03303183 0.431045410 0.226851412 0.30907135
25 0.98236622 0.002425219 0.004428456 0.01078011
29 0.16269011 0.595481350 0.156480890 0.08534765
33 0.05429331 0.305671419 0.163509771 0.47652550
37 0.17209535 0.089646893 0.088574224 0.64968353
>
> ## plot some coefficient paths:
> par(mfrow=c(1,2))
> ## you can either specify the number of the variable...
> plot(fit, 2, legendpars=list(x="bottomright"))
> ## ... or its name as a character string. lcex is the cex parameter for legends.
> ## set it to zero to disable legend plotting.
> plot(fit,"travel", lambda = bestfit@lambda, lcex=0)
>
>
>
>
>
>
> dev.off()
null device
1
>