Exact significance tests for a changepoint in
linear or multiple linear regression. Confidence intervals
and confidence regions with exact coverage
probabilities for the changepoint.
a formula expression as for regression models, of
the form response ~ predictors; see formula.
type
"LL", "LT" or "TL" which stand for line-line,
line-threshold or threshold-line, defined below.
data
an optional data-frame that assigns values in
formula.
subset
expression saying which subset of the data to use.
weights
vector or matrix.
inverse
if TRUE then 'weights' specifies the inverse of the
weights vector or matrix, as for a covariance matrix.
var.known
is the variance known?
na.action
a function to filter missing data.
contrasts
an optional list; see 'contrasts.arg' in
model.matrix.
offset
a constant vector to be subtracted from the
responses vector.
...
other arguments to lm.fit or
lm.wfit.
Details
A broken-line model consists of two straight lines joined at a
changepoint. Three versions are
LL y = alpha + B * min(x - theta, 0) + Bp * max(x - theta, 0) + e
LT y = alpha + B * min(x - theta, 0) + e
TL y = alpha + Bp * max(x - theta, 0) + e
where e ~ Normal( 0, var * inv(weights) ). The LT and TL versions
omit 'alpha' if the formula is without intercept, such as 'y~x+0'.
Parameters 'theta', 'alpha', 'B', 'Bp', 'var' are unknown, but
'weights' is known.
The same models apply for a multiple-regression formula such as 'y ~ x1 +
x2 + ... + xn' where 'alpha' becomes the coefficient of the
"1"-vector and 'theta' the changepoint for the coefficient of the
first predictor term, 'x1'.
The test for the presence of a changepoint is by
a postulate value outside the range of 'x'-values. Thus, in the
LL model 'sl( min(x1) - 1 )' would give the exact significance
level of the null hypothesis "single line" versus the alternate
hypothesis "broken line."
Exact inferences about the changepoint
'theta' or '(theta,alpha)' are based on the distribution of its
likelihood-ratio statistic, conditional on sufficient statistics
for the other parameters. This method is called conditional likelihood-ratio (CLR) for short.
Value
'lm.br' returns a list that includes a C++ object with accessor
functions. Functions sl, ci and cr get significance levels, confidence intervals,
and confidence regions for the changepoint's x-coordinate or
(x,y)-coordinates. Other functions are mle to get maximum likelihood estimates and sety to set new y-values.
The returned object also lists 'coefficients', 'fitted.values' and 'residuals', the same as for an 'lm' output list.
Note
Data can include more than one 'y' value for the same 'x' value. The 'weights' matrix must be positive-definite.
If variance is known, then 'var' = 1 and 'weights' is the inverse of the variances
vector or variance-covariance matrix.
References
Knowles, M., Siegmund, D. and Zhang, H.P. (1991) Confidence regions
in semilinear regression, _Biometrika_, *78*, 15-31.
Siegmund, D. and Zhang, H.P. (1994), Confidence regions in
broken line regression, in "Change-point Problems", _IMS
Lecture Notes – Monograph Series_, *23*, eds. E. Carlstein, H.
Muller and D. Siegmund, Hayward, CA: Institute of Mathematical
Statistics, 292-316.