R: Compute the trend filtering solution path for any polynomial...
trendfilter
R Documentation
Compute the trend filtering solution path for any polynomial order
Description
This function computes the solution path for the trend filtering
problem of an arbitrary polynomial order. When the order is set to
zero, trend filtering is equivalent to the 1d fused lasso, see
fusedlasso1d.
an optional numeric vector specifying the positions of the
observations, and missing pos is assumed to mean unit spacing.
X
an optional matrix of predictor variables, with observations along
the rows, and variables along the columns. If the passed X
has more columns than rows, then a warning is given, and a small ridge
penalty is added to the generalized lasso criterion before the path
is computed. If X has less columns than rows, then its rank is
not checked for efficiency, and (unlike the genasso function) a
ridge penalty is not automatically added if it is rank deficient.
Therefore, a tall, rank deficient X may cause errors.
ord
an integer specifying the desired order of the piecewise polyomial
produced by the solution of the trend filtering problem. Must be
non-negative, and the default to 1 (linear trend filtering).
approx
a logical variable indicating if the approximate solution path
should be used (with no dual coordinates leaving the boundary).
Default is FALSE. Note
that for the 1d fused lasso (zeroth order trend filtering), with
identity predictor matrix, this approximate path is the same as
the exact solution path.
maxsteps
an integer specifying the maximum number of steps for the algorithm
to take before termination. Default is 2000.
minlam
a numeric variable indicating the value of lambda at which the path
should terminate. Default is 0.
rtol
a numeric variable giving the tolerance for determining the rank of
a matrix: if a diagonal value in the R factor of a QR decomposition
is less than R, in absolute value, then it is considered zero. Hence
making rtol larger means being less stringent with determination of
matrix rank. In general, do not change this unless you know what you
are getting into! Default is 1e-7.
btol
a numeric variable giving the tolerance for accepting "late" hitting
and leaving times: future hitting times and leaving times should always
be less than the current knot in the path, but sometimes for numerical
reasons they are larger; any computed hitting or leaving time larger
than the current knot + btol is thrown away. Hence making btol larger
means being less stringent withthe determination of hitting and leaving
times. Again, in general, do not change this unless you know what you
are getting into! Default is 1e-7.
eps
a numeric variable indicating the multiplier for the ridge penalty,
in the case that X is wide (more columns than rows). If numeric
problems occur, make eps larger. Default is 1e-4.
verbose
a logical variable indicating if progress should be reported after
each knot in the path.
Details
When the predictor matrix is the identity, trend filtering fits a
piecewise polynomial to linearly ordered observations. The result is
similar to that of a polynomial regression spline or a smoothing
spline, except the knots in the piecewise polynomial (changes in the
(k+1)st derivative, if the polynomial order is k) are chosen
adaptively based on the observations. This is in contrast to
regression splines, where the knots are prespecified, and smoothing
splines, which place a knot at every data point.
With a nonidentity predictor matrix, the trend filtering problem
enforces piecewise polynomial smoothness along successive components
of the coefficient vector. This can be used to fit a kind of varying
coefficient model.
We note that, in the signal approximator (identity predictor matrix)
case, fitting trend filtering estimate with arbitrary positions pos
is theoretically no harder than doing so on an evenly spaced grid. However
in practice, with differing gaps between points, the algorithm can
become numerically unstable even for large (or moderately large) problems.
This is especially true as the polynomial order increases. Hence, use the
positions argument pos with caution.
Value
Returns an object of class "trendfilter", a subclass of
"genlasso". This is a list with at least following components:
lambda
values of lambda at which the solution path changes slope,
i.e., kinks or knots.
beta
a matrix of primal coefficients, each column corresponding to a knot
in the solution path.
fit
a matrix of fitted values, each column corresponding to a knot in
the solution path.
u
a matrix of dual coefficients, each column corresponding to a knot
in the solution path.
hit
a vector of logical values indicating if a new variable in the dual
solution hit the box contraint boundary. A value of FALSE
indicates a variable leaving the boundary.
df
a vector giving an unbiased estimate of the degrees of freedom of
the fit at each knot in the solution path.
y
the observed response vector. Useful for plotting and other
methods.
completepath
a logical variable indicating whether the complete path was
computed (terminating the path early with the maxsteps or
minlam options results in a value of FALSE).
bls
the least squares solution, i.e., the solution at lambda = 0. This
can be NULL when completepath is FALSE.
ord
the order of the piecewise polyomial that has been fit.
call
the matched call.
Author(s)
Taylor B. Arnold and Ryan J. Tibshirani
References
Tibshirani, R. J. and Taylor, J. (2011), "The solution path of the
generalized lasso", Annals of Statistics 39 (3) 1335–1371.
Tibshirani, R. J. (2014), "Adaptive piecewise polynomial estimation
via trend filtering", Annals of Statistics 42 (1): 285–323.
Arnold, T. B. and Tibshirani, R. J. (2014), "Efficient implementations
of the generalized lasso dual path algorithm", arXiv: 1405.3222.
Kim, S.-J., Koh, K., Boyd, S. and Gorinevsky, D. (2009), "l1 trend
filtering", SIAM Review 51 (2), 339–360.