R: Functions for Forward Search for regression models.
ForwardSearch-package
R Documentation
Functions for Forward Search for regression models.
Description
The Forward Search algorithm is an iterative algorithm for
for multiple (time series) regression suggested by
Hadi and Simonoff (1993) and developed further by
Atkinson and Riani (2000).
The algorithm starts with a robust estimate of the regression parameters and a sub-sample of size
m_0 and iterates with a sequence of least squares steps.
The asymptotic theory developed by Johansen and Nielsen (2013, 2014)
is implemented.
Details
Package:
ForwardSearch
Type:
Package
Version:
1.0
Date:
2014-09-09
License:
GPL-3
The Forward Search algorithm is an iterative algorithm for
for multiple (time series) regression suggested by
Hadi and Simonoff (1993) and developed further by
Atkinson and Riani (2000).
The algorithm starts with a robust estimate of the regression parameters and a sub-sample of size
m_0.
A common choice for the initial estimator is the Least Trimmed Squares estimator of
Rousseeuw (1984).
The algorithm is initiated by computing the absolute residuals for all n observations.
The initial sub-sample consists of the observations with the smallest m_0 absolute
residuals. We then run a regression on those m_0 observations and compute
absolute residuals of all n observations. The observations with m_0+1
smallest residuals are then selected. The
m_0+1 smallest residual is the forward residual.
A new regression is performed on
these m_0+1 observations. This is then iterated. Eventually the least squares estimator
based on all n observations is computed.
The algorithm results in a sequence of forward residuals indexed by the sub-sample size
m running from m_0 to n-1.
The idea is to monitor the plot of these and stop when the forward residuals become "large".
Johansen and Nielsen (2013, 2014) have developed, respectively,
pointwise and simultaneous confidence bands for estimators and forward residuals.
These are implemented in the package.
The ForwardSearch package can be used as follows.
Execute the full Forward Search using
ForwardSearch.fit.
Create the forward plot of the forward residuals using
ForwardSearch.plot.
This requires the output from above and a choice of reference distribution.
The plot shows the scaled forward residuals from above along with simultaneous
confidence bands.
The user has to choose a "gauge", which is the expected fraction of falsely
detected outliers that are tolerable when in fact there are no outliers. For
instance a "gauge" of 0.01 indicates that in a sample of n=110 observations
1.1 outlier is found on average when there are none.
The simultaneous confidence bands are calibrated so that the Forward Search
stop when the fitted values exceed the chosen confidence bands the first time.
This is a stopping time.
The theory for this is given in Johansen and Nielsen.
Get the estimates of the stopped Forward Search using
ForwardSearch.stopped.
The user has to input the estimated stopping time.
This also gives the rank of the selected and non-selected observations.
These are the "good" and the "bad" observations.
Author(s)
Bent Nielsen <bent.nielsen@nuffield.ox.ac.uk> 9 Sep 2014
References
Atkinson, A.C. and Riani, M. (2000) Robust Diagnostic Regression Analysis. New York: Springer.
Hadi, A.S. and Simonoff, J.S. (1993) Procedures for the Identification of Multiple Outliers in Linear Models Journal of the American Statistical Association 88, 1264-1272.
Johansen, S. and Nielsen, B. (2013) Asymptotic analysis of the Forward Search. Download: Nuffield DP.
Johansen, S. and Nielsen, B. (2014) Outlier detection algorithms for least squares time series. Download: Nuffield DP.
Rousseeuw, P.J. (1984) Least median of squares regression. Journal of the American Statistical Association 79, 871-880.
See Also
Forward Search can alternatively be done by the package
forward.
forward version 1.0.3 includes functions for the analysis suggested in e.g.
Atkinson and Riani (2000), but does not include the asymptotic theory of
Johansen and Nielsen (2013, 2014).
Matlab code for Forward Search is also available from
www.riani.it.
Examples
#####################
# EXAMPLE 1
# using Fulton Fish data,
# see Johansen and Nielsen (2014).
# Call package
library(ForwardSearch)
# Call data
data(Fulton)
mdata <- as.matrix(Fulton)
n <- nrow(mdata)
# Identify variable to reproduce Johansen and Nielsen (2014)
q <- mdata[2:n ,9]
q_1 <- mdata[1:(n-1) ,9]
s <- mdata[2:n ,6]
x.q.s <- cbind(q_1,s)
colnames(x.q.s ) <- c("q_1","stormy")
# Fit Forward Search
FS95 <- ForwardSearch.fit(x.q.s,q,psi.0=0.95)
FS80 <- ForwardSearch.fit(x.q.s,q,psi.0=0.80)
# Forward plot of forward residuals scaled by variance estimate
# Note the variance estimate is not bias corrected
# This is taken into account in asymptotic theory
ForwardSearch.plot(FS95)
ForwardSearch.plot(FS80)
# Based on the plot of e.g. FS95 it is decided to stop at m=107
ForwardSearch.stopped(FS95,107)
# Alternatively use the file inst/extdata/Fulton.txt
# Data <- read.table(data/Fulton.txt,header=TRUE)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(ForwardSearch)
Loading required package: robustbase
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/ForwardSearch/ForwardSearch-package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: ForwardSearch-package
> ### Title: Functions for Forward Search for regression models.
> ### Aliases: ForwardSearch-package ForwardSearch
> ### Keywords: package
>
> ### ** Examples
>
> #####################
> # EXAMPLE 1
> # using Fulton Fish data,
> # see Johansen and Nielsen (2014).
>
> # Call package
> library(ForwardSearch)
>
> # Call data
> data(Fulton)
> mdata <- as.matrix(Fulton)
> n <- nrow(mdata)
>
> # Identify variable to reproduce Johansen and Nielsen (2014)
> q <- mdata[2:n ,9]
> q_1 <- mdata[1:(n-1) ,9]
> s <- mdata[2:n ,6]
> x.q.s <- cbind(q_1,s)
> colnames(x.q.s ) <- c("q_1","stormy")
>
> # Fit Forward Search
> FS95 <- ForwardSearch.fit(x.q.s,q,psi.0=0.95)
> FS80 <- ForwardSearch.fit(x.q.s,q,psi.0=0.80)
>
> # Forward plot of forward residuals scaled by variance estimate
> # Note the variance estimate is not bias corrected
> # This is taken into account in asymptotic theory
> ForwardSearch.plot(FS95)
> ForwardSearch.plot(FS80)
>
> # Based on the plot of e.g. FS95 it is decided to stop at m=107
> ForwardSearch.stopped(FS95,107)
$res
[,1]
[1,] -0.643120941
[2,] -0.276639648
[3,] 0.364210858
[4,] -0.475854200
[5,] 0.662212451
[6,] 0.151680947
[7,] 0.370940422
[8,] -0.443856126
[9,] 0.529152538
[10,] 0.183196876
[11,] 0.212387653
[12,] 0.557285732
[13,] 0.847669950
[14,] 0.332051069
[15,] 0.693556422
[16,] -0.628388507
[17,] -1.963154228
[18,] 0.261579655
[19,] 0.841030661
[20,] -0.494715888
[21,] -0.358568522
[22,] 0.203320030
[23,] 0.996026199
[24,] -0.258010987
[25,] -0.120547741
[26,] 0.851843435
[27,] -0.220442565
[28,] 0.683225679
[29,] -0.147866333
[30,] -0.590044082
[31,] 0.215960636
[32,] -1.144971170
[33,] -1.823326081
[34,] -1.193777405
[35,] -0.200088378
[36,] 0.500632806
[37,] 0.559376968
[38,] -0.015251986
[39,] -0.064196651
[40,] -0.563659216
[41,] 0.044932157
[42,] -0.298953221
[43,] 0.719234124
[44,] -0.823951410
[45,] -1.064851264
[46,] 0.301618644
[47,] 0.283101422
[48,] 0.395470594
[49,] -0.434313447
[50,] 0.340268643
[51,] 0.330224908
[52,] 0.780413417
[53,] 0.984106091
[54,] -0.652240637
[55,] 0.010497938
[56,] 0.321427688
[57,] -0.369586392
[58,] -1.036750532
[59,] -0.768113493
[60,] -0.293393867
[61,] 0.473175075
[62,] -0.084904056
[63,] 0.478362082
[64,] 0.270093626
[65,] 0.482167732
[66,] -0.193605683
[67,] 1.277737769
[68,] 0.342505951
[69,] -0.640744222
[70,] 0.969427033
[71,] -0.355882686
[72,] 0.991603927
[73,] 0.368764186
[74,] -1.315814409
[75,] 0.011180212
[76,] 0.585753171
[77,] -0.875898385
[78,] -0.412068038
[79,] -0.687131592
[80,] 0.744270392
[81,] -1.065422651
[82,] -0.530578984
[83,] -1.221325575
[84,] -0.072131103
[85,] 0.636543402
[86,] 0.806952001
[87,] 0.181785219
[88,] -1.450318870
[89,] 0.826114797
[90,] -0.038843596
[91,] -0.134109101
[92,] -0.047273023
[93,] -1.327615354
[94,] -2.403622110
[95,] 0.695619068
[96,] 0.020718996
[97,] 0.538574923
[98,] -0.471337865
[99,] 0.057962207
[100,] 0.933140570
[101,] -0.247668393
[102,] -0.538847931
[103,] -0.004494437
[104,] 0.387341099
[105,] 0.263348661
[106,] -0.132707181
[107,] -1.545004136
[108,] -1.216260789
[109,] 0.167984555
[110,] -0.392624728
$ranks.selected
[1] 103 55 75 38 96 90 41 92 99 39 84 62 25 106 91 29 6 109
[19] 87 10 66 35 22 11 31 27 101 24 18 105 64 2 47 60 42 46
[37] 56 51 14 50 68 71 21 3 73 57 7 104 110 48 78 49 8 98
[55] 61 4 63 65 20 36 9 82 97 102 12 37 40 76 30 16 85 69
[73] 1 54 5 28 79 15 95 43 80 59 52 86 44 89 19 13 26 77
[91] 100 70 53 72 23 58 45 81 32 34 108 83 67 74 93 88 107
$ranks.outliers
[1] 33 17 94
$beta.m
[1] 7.92662347 0.08834236 -0.37144951
$sigma2.biased
[1] 0.4018263
>
> # Alternatively use the file inst/extdata/Fulton.txt
> # Data <- read.table(data/Fulton.txt,header=TRUE)
>
>
>
>
>
>
> dev.off()
null device
1
>