Last data update: 2014.03.03

R: Change-Point Estimation using Shape-Restricted Splines
changeptR Documentation

Change-Point Estimation using Shape-Restricted Splines

Description

'changept' is used to estimate the location of the change-point in the underlying mean curve of a scatterplot and the curve as well using shape-restricted B-splines. The change-point falls in one of the three categories: a mode, an inflection point and a jump point.

Usage

changept(formula, family = gaussian(), data = NULL, k = NULL, knots = NULL, 
fir = FALSE, q = 3, pnt = FALSE, pen = 0, arp = FALSE, ci = FALSE, nloop = 1e+3, 
constr = TRUE, param = TRUE, gcv = FALSE)

Arguments

formula

A formula object which gives a symbolic description of the model to be fitted. It has the form "response ~ predictor". The response is a vector of length n and it is from one of the three exponential families: Gaussian, Poisson and binomial. The predictor is a nonparametrically modelled variable with a shape restriction. The user is supposed to indicate the relationship between the underlying mean curve old{f_m(x)} and the predictor old{x} in the following way:

Assume that old{f_m(x)} is the underlying mean curve and old{x} is the predictor:

  • tp(x, sh = 1): old{f_m(x)} is unimodal (increasing-decreasing) with a mode at m. See tp for more details.

  • ip(x, sh = 1): old{f_m(x)} is convex-concave with an inflection point at m. See ip for more details.

  • jp(x, trd1 = -1, trd2 = -1, up = TRUE): old{f_m(x)} has a jump point at m and smooth otherwise. It is non-increasing in a subinterval which is made of two consecutive knots and contains the jump point m. t_{p} ≤q m ≤q t_{p+1}, p=1,…,k-1 and k is the number of knots. See jp for more details.

family

A parameter indicating the error distribution and link function to be used in the model. It can be a character string naming a family function or the result of a call to a family function. This is borrowed from the glm routine in the stats package. There are three families used in changept: Gaussian, Poisson and binomial.

data

An optional data frame, list or environment containing the variables in the model. The default is data = NULL.

k

The number of knots. When it is NULL, k will be determined inside the changept routine according to the length of the predictor old{x}; otherwise, the user-defined k is used to create the knots. The default is NULL.

knots

The knots used to smoothly constrain a predictor. When it is NULL, a vector of knots will be created inside the changept routine; otherwise, the user-defined knots are used. The default is NULL. If both k and knots are defined by the user, only the knots parameter will be used.

fir

The parameter is used only when the change-point is an inflection point. When fir is TRUE, the estimated curve is constrained to be increasing at two end points. This is useful given the a priori information that the underlying mean curve is increasing over the whole interval, e.g., a growth curve. The default is FALSE.

q

The degree of penalty. It can be 1, 2 or 3. The default is q = 3.

pnt

Logical flag indicating if or not penalized B-splines are used. When pnt is TRUE, penalized B-splines are used; otherwise, unpenalized B-splines are used. The default is pnt = FALSE.

pen

The penalty term when penalized B-splines are used. It can only be a non-negative real number. When the user chooses penalized B-splines, the penalty term is determined inside the main routine if it is not defined by the user; otherwise, the user-defined value is used. The default is pen = 0.

arp

Logical flag indicating if or not the error term is from a stationary autoregressive process of order p, where pin{1,2,3,4}. When arp is TRUE, the error term is assumed to follow an AR(p) process; otherwise, it is assumed to be Gaussian. The default is arp = FALSE.

ci

Logical flag. If ci is TRUE, a 95% bootstrap confidence interval of m will be delivered. The default is ci = FALSE.

nloop

The number of simulations used to get a bootstrap confidence interval for the change-point m when ci is TRUE. The default is nloop = 1e+3.

constr

Logical flag. It is used when m is a jump point. If constr is FALSE, no shape constraint is imposed on the estimated curve; otherwise, a shape constraint is imposed on the estimated curve in a subinterval which is made of two consecutive knots and contains the jump point m. t_{p} ≤q m ≤q t_{p+1}, p=1,…,k-1 and k is the number of knots. See jp for more details. The default is constr = TRUE.

param

Logical flag indicating if or not parametric bootstrapping is used. When param is TRUE, parametric bootstrapping is used; otherwise, nonparametric bootstrapping is used. The default is param = TRUE.

gcv

Logical flag indicating if or not the penalty term may be chosen through generalized cross-validation (GCV) methods when pen = TRUE. The default is gcv = FALSE. See Meyer, M.C. (2012) Constrained Penalized Splines. Canadian Journal of Statistics 40(1), 190–206 for more details.

Details

Given a scatterplot of (x_i, y_i), i = 1,…,n, where old{x} is the predictor and old{y} is the response which could be Gaussian, Poisson and binomial. The underlying mean curve in this scatterplot is denoted as old{f_m(x)}, where old{f_m(x)} is a smooth curve with a change-point m which falls in one of the three categories:

  • a mode in a unimodal smooth curve

  • an inflection point in a convex-concave (concave-convex) smooth curve

  • a jump point in an otherwise smooth curve.

    For the unimodal case, quadratic B-splines are used; for the inflection-point case, cubic B-splines are used; for the jump-point case, quadratic B-splines along with two extra basis functions, i.e., the ramp basis function and the jump basis function are used. The ramp basis function ensures that the slope of the estimated mean curve on the left of the jump point could be different from that on the right of the jump point. The jump basis function ensures that the estimated mean curve has an upward or downward jump.

    In each case, the user can choose unpenalized or penalized regression. For the unimodal case, the penalized regression is strongly suggested for its smoothness and robustness in terms of the number and the placement of knots, especially when the response is binomial; for the jump-point case, the user can choose to make a shape-restricted fit or not by setting the parameter constr to be TRUE or FALSE. If constr is TRUE, a jump point will be detected and the monotonicity of the mean curve will be constrained in a subinterval which is made of two consecutive knots and contains the jump point m; otherwise, a jump point will be detected without any shape restriction on the mean curve.

    A parametrically modelled categorical covariate can be included in the model for the unimodal case and the jump-point case. However, it cannot be included in the model for the inflection-point case if the response is binomial, since the problem of estimating an inflection point is not well-defined then.

    In addition, the user can get a 95% bootstrap confidence interval of m by setting the logical parameter ci to be TRUE. The default number of simulations to get such a confidence interval is 1e+3, which is defined by the parameter nloop. The genetic routine plot can be used to show some estimated results including hat{f}_{hat{m}}, hat{m} and a 95% bootstrap confidence interval of m if it is available.

    See references cited in this section for more details.

Value

chpt

The estimated change-point hat{m}.

knots

The knots used in the regression.

fhat

The estimated mean curve hat{f}_{hat{m}}.

fhat_x

The estimated mean curve hat{f}_{hat{m}} in the constrained predictor.

fhat_eta

The estimated systematic component given that the response is binomial or Poisson and the change-point is a mode or a jump point.

fhat_eta_x

The estimated systematic component in the constrained predictor given that the response is binomial or Poisson and the change-point is a mode or a jump point.

coefs

The estimated coefficients of the B-spline basis functions for the shape-restricted predictor and the matrix whose columns represent the parametrically modelled categorical covariate if the user includes any in the model.

zcoefs

The estimated coefficients of the matrix whose columns represent the parametrically modelled categorical covariate if the user includes any in the model.

cibt

A 95% bootstrap confidence interval of m given that the parameter ci is TRUE.

categ

The category of the change-point to be estimated. Three categories are included, i.e., a mode, an inflection point and a jump point.

sh

The sh parameter defined in the symbolic routine tp or ip in a changept formula.

x

The shape-restricted predictor vector.

y

The response vector.

xnm

The name of the shape-restricted predictor.

znm

The name of the parametrically modelled categorical covariate.

ynm

The name of the response variable.

m_i

The centering value for the ramp edge in the jump-point case.

pos

The position of the knot t_p such that t_{p} ≤q m ≤q t_{p+1}, p=1,…,k-1 and k is the number of knots.

sub

The subinterval in which constrained regression is made for the jump-point case. It is of the form [t_{p}, t_p+1]. t_{p} ≤q m ≤q t_{p+1}, p=1,…,k-1 and k is the number of knots.

family

The family parameter in a changept formula.

wt.iter

Logical flag indicating if or not iteratively re-weighted cone projections may be used. If the response is Gaussian, then wt.iter = FALSE; if the response is binomial or Poisson, then wt.iter = TRUE.

zmat

A matrix whose columns represent the parametrically modelled categorical covariate if the user includes any in the model. The user can choose to include a constant vector in it or not. It must be of full column rank.

vals

A vector storing the smallest value of each categorical variable defined as a factor(matrix).

is_fac

Logical flag showing if a categorical variable is a factor(matrix) or not.

zid

A vector keeping track of the position of the categorical variable in a changept formula.

zid1

A vector keeping track of the beginning position of the levels(columns) of the categorical variable defined as a factor(matrix).

zid2

A vector keeping track of the end position of the levels(columns) of the categorical variable defined as a factor(matrix).

tms

The terms object extracted by the generic function terms from a changept fit. See the official help page (http://stat.ethz.ch/R-manual/R-patched/library/stats/html/terms.html) of the terms function for more details.

msbt

A bootstrap distribution of m given that the parameter ci is TRUE.

bmat

A matrix whose columns are the B-spline basis functions for the shape-restricted predictor.

phi

The estimated vector of autoregressive coefficients when the error term is assumed to follow an AR(p) process.

sig

The estimated variance of the white noise of the error term when the error term is assumed to follow an AR(p) process.

aics

A matrix of AIC values delivered when the error term is assumed to follow an AR(p) process.

lambda

The penalty term chosen when penalized B-splines are used.

edf

The constrained effective degrees of freedom.

edfu

The unconstrained effective degrees of freedom.

lams

A vector of penalty terms which is necessary when penalized B-splines are used for the AR(p) case.

phisbt

The bootstrap distribution of the vector of estimated autoregressive coefficients when the error term is assumed to follow an AR(p) process.

sigsbt

The bootstrap distribution of the estimated variance of the white noise of the error term when the error term is assumed to follow an AR(p) process.

Author(s)

Xiyue Liao and Mary C Meyer

References

Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.

Meyer, M.C. (2012) Constrained Penalized Splines. Canadian Journal of Statistics 40(1), 190–206.

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

Liao, X. and M. C. Meyer (2014) coneproj: An R package for the primal or dual cone projections with routines for constrained regression. Journal of Statistical Software 61(12), 1–22.

Wang, H., Meyer, M.C., and Opsomer, J.D. (2013) Constrained Spline Regression in the Presence of AR(p) Errors. Journal of Nonparametic Statistics 25(4), 809–827.

See Also

tp, ip, jp

Examples

  # Example 1: the response is normal 
  # the underlying mean curve has an inflection point at .5 and it is concave-convex
  n = 100
  inflect = .5
  x = seq(1/n, 1, length = n)
  
  set.seed(123)
  y = 50 * (x - inflect)^3 + rnorm(n, sd = 1)
  ans = changept(y ~ ip(x, sh = -1))

  plot(ans)

  # Example 2: the response is binomial
  # the underlying mean curve has an inflection point at .5 and it is convex-concave
  n = 100
  x = seq(1/n, 1, length = n)
  mu = exp(8 * x - 4) / (1 + exp(8 * x - 4))

  set.seed(123)
  y = rbinom(n, size = 1, prob = mu)
  ans = changept(y ~ ip(x), family = binomial())

  plot(ans)

  # Example 3: the response is normal
  # the underlying mean curve is unimodal with a mode at .5
  n = 100
  SD = .1
  x = seq(1/n, 1, length = n)
  mode = .5

  set.seed(123)
  y = -6 * (x - .5)^2 + rnorm(n, sd = SD)
  ans = changept(y ~ tp(x))

  plot(ans)

  # Example 4: the response is binomial
  # the underlying mean curve is unimodal with a mode at .5
  n = 200
  x = seq(1/n, 1, length = n)
  mu = .9 - 2.5 * (x - .5)^2

  set.seed(123)
  y = rbinom(n, size = 1, prob = mu)
  ans = changept(y ~ tp(x), family = binomial(), k = 10, pnt = TRUE)

  plot(ans)

  # Example 5: 
  library(datasets)
  # Nile River Data Set: Nile river flow is the response
  # an abrupt downward jump in the river flow occurred around the year 1898
  # the river flow is non-decreasing on both sides of the jump point
  data(Nile)
  yrs = 1871:1970
  ans = changept(Nile ~ jp(yrs, up = FALSE, trd1 = 1, trd2 = 1), data = Nile)

  plot(ans)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ShapeChange)
Loading required package: coneproj
Loading required package: quadprog
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/ShapeChange/changept.Rd_%03d_medium.png", width=480, height=480)
> ### Name: changept
> ### Title: Change-Point Estimation using Shape-Restricted Splines
> ### Aliases: changept
> ### Keywords: changept
> 
> ### ** Examples
> 
>   # Example 1: the response is normal 
>   # the underlying mean curve has an inflection point at .5 and it is concave-convex
>   n = 100
>   inflect = .5
>   x = seq(1/n, 1, length = n)
>   
>   set.seed(123)
>   y = 50 * (x - inflect)^3 + rnorm(n, sd = 1)
>   ans = changept(y ~ ip(x, sh = -1))
> 
>   plot(ans)
> 
>   # Example 2: the response is binomial
>   # the underlying mean curve has an inflection point at .5 and it is convex-concave
>   n = 100
>   x = seq(1/n, 1, length = n)
>   mu = exp(8 * x - 4) / (1 + exp(8 * x - 4))
> 
>   set.seed(123)
>   y = rbinom(n, size = 1, prob = mu)
>   ans = changept(y ~ ip(x), family = binomial())
> 
>   plot(ans)
> 
>   # Example 3: the response is normal
>   # the underlying mean curve is unimodal with a mode at .5
>   n = 100
>   SD = .1
>   x = seq(1/n, 1, length = n)
>   mode = .5
> 
>   set.seed(123)
>   y = -6 * (x - .5)^2 + rnorm(n, sd = SD)
>   ans = changept(y ~ tp(x))
> 
>   plot(ans)
> 
>   # Example 4: the response is binomial
>   # the underlying mean curve is unimodal with a mode at .5
>   n = 200
>   x = seq(1/n, 1, length = n)
>   mu = .9 - 2.5 * (x - .5)^2
> 
>   set.seed(123)
>   y = rbinom(n, size = 1, prob = mu)
>   ans = changept(y ~ tp(x), family = binomial(), k = 10, pnt = TRUE)
> 
>   plot(ans)
> 
>   # Example 5: 
>   library(datasets)
>   # Nile River Data Set: Nile river flow is the response
>   # an abrupt downward jump in the river flow occurred around the year 1898
>   # the river flow is non-decreasing on both sides of the jump point
>   data(Nile)
>   yrs = 1871:1970
>   ans = changept(Nile ~ jp(yrs, up = FALSE, trd1 = 1, trd2 = 1), data = Nile)
> 
>   plot(ans)
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>