Last data update: 2014.03.03

R: Functions for Forward Search for regression models.
ForwardSearch-packageR Documentation

Functions for Forward Search for regression models.

Description

The Forward Search algorithm is an iterative algorithm for for multiple (time series) regression suggested by Hadi and Simonoff (1993) and developed further by Atkinson and Riani (2000). The algorithm starts with a robust estimate of the regression parameters and a sub-sample of size m_0 and iterates with a sequence of least squares steps. The asymptotic theory developed by Johansen and Nielsen (2013, 2014) is implemented.

Details

Package: ForwardSearch
Type: Package
Version: 1.0
Date: 2014-09-09
License: GPL-3

The Forward Search algorithm is an iterative algorithm for for multiple (time series) regression suggested by Hadi and Simonoff (1993) and developed further by Atkinson and Riani (2000). The algorithm starts with a robust estimate of the regression parameters and a sub-sample of size m_0. A common choice for the initial estimator is the Least Trimmed Squares estimator of Rousseeuw (1984).

The algorithm is initiated by computing the absolute residuals for all n observations. The initial sub-sample consists of the observations with the smallest m_0 absolute residuals. We then run a regression on those m_0 observations and compute absolute residuals of all n observations. The observations with m_0+1 smallest residuals are then selected. The m_0+1 smallest residual is the forward residual. A new regression is performed on these m_0+1 observations. This is then iterated. Eventually the least squares estimator based on all n observations is computed.

The algorithm results in a sequence of forward residuals indexed by the sub-sample size m running from m_0 to n-1. The idea is to monitor the plot of these and stop when the forward residuals become "large". Johansen and Nielsen (2013, 2014) have developed, respectively, pointwise and simultaneous confidence bands for estimators and forward residuals. These are implemented in the package.

The ForwardSearch package can be used as follows.

  1. Execute the full Forward Search using ForwardSearch.fit.

  2. Create the forward plot of the forward residuals using ForwardSearch.plot. This requires the output from above and a choice of reference distribution. The plot shows the scaled forward residuals from above along with simultaneous confidence bands. The user has to choose a "gauge", which is the expected fraction of falsely detected outliers that are tolerable when in fact there are no outliers. For instance a "gauge" of 0.01 indicates that in a sample of n=110 observations 1.1 outlier is found on average when there are none. The simultaneous confidence bands are calibrated so that the Forward Search stop when the fitted values exceed the chosen confidence bands the first time. This is a stopping time. The theory for this is given in Johansen and Nielsen.

  3. Get the estimates of the stopped Forward Search using ForwardSearch.stopped. The user has to input the estimated stopping time. This also gives the rank of the selected and non-selected observations. These are the "good" and the "bad" observations.

Author(s)

Bent Nielsen <bent.nielsen@nuffield.ox.ac.uk> 9 Sep 2014

References

Atkinson, A.C. and Riani, M. (2000) Robust Diagnostic Regression Analysis. New York: Springer.

Hadi, A.S. and Simonoff, J.S. (1993) Procedures for the Identification of Multiple Outliers in Linear Models Journal of the American Statistical Association 88, 1264-1272.

Johansen, S. and Nielsen, B. (2013) Asymptotic analysis of the Forward Search. Download: Nuffield DP.

Johansen, S. and Nielsen, B. (2014) Outlier detection algorithms for least squares time series. Download: Nuffield DP.

Rousseeuw, P.J. (1984) Least median of squares regression. Journal of the American Statistical Association 79, 871-880.

See Also

Forward Search can alternatively be done by the package forward. forward version 1.0.3 includes functions for the analysis suggested in e.g. Atkinson and Riani (2000), but does not include the asymptotic theory of Johansen and Nielsen (2013, 2014). Matlab code for Forward Search is also available from www.riani.it.

Examples

#####################
#	EXAMPLE 1
#	using Fulton Fish data,
#	see Johansen and Nielsen (2014).

#	Call package
library(ForwardSearch)

#	Call data
data(Fulton)
mdata	<- as.matrix(Fulton)
n		<- nrow(mdata)

#	Identify variable to reproduce Johansen and Nielsen (2014)
q		<- mdata[2:n		,9]
q_1		<- mdata[1:(n-1) ,9]
s		<- mdata[2:n		,6]
x.q.s	<- cbind(q_1,s)
colnames(x.q.s	)	<- c("q_1","stormy")

#	Fit Forward Search
FS95	<- ForwardSearch.fit(x.q.s,q,psi.0=0.95)
FS80	<- ForwardSearch.fit(x.q.s,q,psi.0=0.80)

#	Forward plot of forward residuals scaled by variance estimate
#	Note the variance estimate is not bias corrected
#	This is taken into account in asymptotic theory
ForwardSearch.plot(FS95)
ForwardSearch.plot(FS80)

#	Based on the plot of e.g. FS95 it is decided to stop at m=107
ForwardSearch.stopped(FS95,107)

#	Alternatively use the file inst/extdata/Fulton.txt 
#	Data	<- read.table(data/Fulton.txt,header=TRUE)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ForwardSearch)
Loading required package: robustbase
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/ForwardSearch/ForwardSearch-package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: ForwardSearch-package
> ### Title: Functions for Forward Search for regression models.
> ### Aliases: ForwardSearch-package ForwardSearch
> ### Keywords: package
> 
> ### ** Examples
> 
> #####################
> #	EXAMPLE 1
> #	using Fulton Fish data,
> #	see Johansen and Nielsen (2014).
> 
> #	Call package
> library(ForwardSearch)
> 
> #	Call data
> data(Fulton)
> mdata	<- as.matrix(Fulton)
> n		<- nrow(mdata)
> 
> #	Identify variable to reproduce Johansen and Nielsen (2014)
> q		<- mdata[2:n		,9]
> q_1		<- mdata[1:(n-1) ,9]
> s		<- mdata[2:n		,6]
> x.q.s	<- cbind(q_1,s)
> colnames(x.q.s	)	<- c("q_1","stormy")
> 
> #	Fit Forward Search
> FS95	<- ForwardSearch.fit(x.q.s,q,psi.0=0.95)
> FS80	<- ForwardSearch.fit(x.q.s,q,psi.0=0.80)
> 
> #	Forward plot of forward residuals scaled by variance estimate
> #	Note the variance estimate is not bias corrected
> #	This is taken into account in asymptotic theory
> ForwardSearch.plot(FS95)
> ForwardSearch.plot(FS80)
> 
> #	Based on the plot of e.g. FS95 it is decided to stop at m=107
> ForwardSearch.stopped(FS95,107)
$res
               [,1]
  [1,] -0.643120941
  [2,] -0.276639648
  [3,]  0.364210858
  [4,] -0.475854200
  [5,]  0.662212451
  [6,]  0.151680947
  [7,]  0.370940422
  [8,] -0.443856126
  [9,]  0.529152538
 [10,]  0.183196876
 [11,]  0.212387653
 [12,]  0.557285732
 [13,]  0.847669950
 [14,]  0.332051069
 [15,]  0.693556422
 [16,] -0.628388507
 [17,] -1.963154228
 [18,]  0.261579655
 [19,]  0.841030661
 [20,] -0.494715888
 [21,] -0.358568522
 [22,]  0.203320030
 [23,]  0.996026199
 [24,] -0.258010987
 [25,] -0.120547741
 [26,]  0.851843435
 [27,] -0.220442565
 [28,]  0.683225679
 [29,] -0.147866333
 [30,] -0.590044082
 [31,]  0.215960636
 [32,] -1.144971170
 [33,] -1.823326081
 [34,] -1.193777405
 [35,] -0.200088378
 [36,]  0.500632806
 [37,]  0.559376968
 [38,] -0.015251986
 [39,] -0.064196651
 [40,] -0.563659216
 [41,]  0.044932157
 [42,] -0.298953221
 [43,]  0.719234124
 [44,] -0.823951410
 [45,] -1.064851264
 [46,]  0.301618644
 [47,]  0.283101422
 [48,]  0.395470594
 [49,] -0.434313447
 [50,]  0.340268643
 [51,]  0.330224908
 [52,]  0.780413417
 [53,]  0.984106091
 [54,] -0.652240637
 [55,]  0.010497938
 [56,]  0.321427688
 [57,] -0.369586392
 [58,] -1.036750532
 [59,] -0.768113493
 [60,] -0.293393867
 [61,]  0.473175075
 [62,] -0.084904056
 [63,]  0.478362082
 [64,]  0.270093626
 [65,]  0.482167732
 [66,] -0.193605683
 [67,]  1.277737769
 [68,]  0.342505951
 [69,] -0.640744222
 [70,]  0.969427033
 [71,] -0.355882686
 [72,]  0.991603927
 [73,]  0.368764186
 [74,] -1.315814409
 [75,]  0.011180212
 [76,]  0.585753171
 [77,] -0.875898385
 [78,] -0.412068038
 [79,] -0.687131592
 [80,]  0.744270392
 [81,] -1.065422651
 [82,] -0.530578984
 [83,] -1.221325575
 [84,] -0.072131103
 [85,]  0.636543402
 [86,]  0.806952001
 [87,]  0.181785219
 [88,] -1.450318870
 [89,]  0.826114797
 [90,] -0.038843596
 [91,] -0.134109101
 [92,] -0.047273023
 [93,] -1.327615354
 [94,] -2.403622110
 [95,]  0.695619068
 [96,]  0.020718996
 [97,]  0.538574923
 [98,] -0.471337865
 [99,]  0.057962207
[100,]  0.933140570
[101,] -0.247668393
[102,] -0.538847931
[103,] -0.004494437
[104,]  0.387341099
[105,]  0.263348661
[106,] -0.132707181
[107,] -1.545004136
[108,] -1.216260789
[109,]  0.167984555
[110,] -0.392624728

$ranks.selected
  [1] 103  55  75  38  96  90  41  92  99  39  84  62  25 106  91  29   6 109
 [19]  87  10  66  35  22  11  31  27 101  24  18 105  64   2  47  60  42  46
 [37]  56  51  14  50  68  71  21   3  73  57   7 104 110  48  78  49   8  98
 [55]  61   4  63  65  20  36   9  82  97 102  12  37  40  76  30  16  85  69
 [73]   1  54   5  28  79  15  95  43  80  59  52  86  44  89  19  13  26  77
 [91] 100  70  53  72  23  58  45  81  32  34 108  83  67  74  93  88 107

$ranks.outliers
[1] 33 17 94

$beta.m
[1]  7.92662347  0.08834236 -0.37144951

$sigma2.biased
[1] 0.4018263

> 
> #	Alternatively use the file inst/extdata/Fulton.txt 
> #	Data	<- read.table(data/Fulton.txt,header=TRUE)
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>