Function for a supervised scatter matrix that is the weighted
covariance matrix of x with weights 1/(q2-q1) if y is between the
lower (q1) and upper (q2) quantile and 0 otherwise (or vice versa).
numerical vector specifying the dependent variable.
q1
percentage for lower quantile of y. With 0 <= q1 < q2. See details.
q2
percentage for upper quantile of y. With q1 < q2 <= 1. See details.
pos
logical. If TRUE then the weights are 1/(q2-q1) if y is between the q1- and q2-
quantiles and 0 othervise. If FALSE then the weights are 0 if y
between q1- and q2-quantiles and 1/(1-q2+q1) otherwise.
type
passed on to function quantile.
method
passed on to function cov.wt.
na.action
a function which indicates what should happen when the data
contain 'NA's. Default is to fail.
check
logical. Checks if the input should be checked for consistency. If not needed setting
it to FALSE might save some time.
Details
The weights for this supervised scatter matrix for pos=TRUE are
w(y) = I(q1-quantile < y < q2-quantile)/(q2-q1). Then scovq is calculated as
scovq = ∑ w(y) (x-x_w_bar)'(x-x_w_bar).
where x_w_bar = sum w(y)x.
To see how this function can be used in the context of supervised invariant coordinate selection
see the example below.
Value
a matrix.
Author(s)
Klaus Nordhausen
References
Liski, E., Nordhausen, K. and Oja, H. (2013), Supervised invariant coordinate selection, To appear in Statistics, ???, ???–???.
See Also
cov.wt and ics
Examples
# Creating some data
# The number of explaining variables
p <- 10
# The number of observations
n <- 400
# The error variance
sigma <- 0.5
# The explaining variables
X <- matrix(rnorm(p*n),n,p)
# The error term
epsilon <- rnorm(n, sd = sigma)
# The response
y <- X[,1]^2 + X[,2]^2*epsilon
# SICS with ics
X.centered <- sweep(X,2,colMeans(X),"-")
SICS <- ics(X.centered, S1=cov, S2=scovq, S2args=list(y=y, q1=0.25,
q2=0.75, pos=FALSE), stdKurt=FALSE, stdB="Z")
# Assuming it is known that k=2, then the two directions
# of interest are choosen as:
k <- 2
KURTS <- SICS@gKurt
KURTS.max <- ifelse(KURTS >= 1, KURTS, 1/KURTS)
ordKM <- order(KURTS.max, decreasing = TRUE)
indKM <- ordKM[1:k]
# The two variables of interest
Zk <- ics.components(SICS)[,indKM]
# The correspondings transformation matrix
Bk <- coef(SICS)[indKM,]
# The corresponding projection matrix
Pk <- t(Bk) %*% solve(Bk %*% t(Bk)) %*% Bk
# Visualization
pairs(cbind(y,Zk))
# checking the subspace difference
# true projection
B0 <- rbind(rep(c(1,0),c(1,p-1)),rep(c(0,1,0),c(1,1,p-2)))
P0 <- t(B0) %*% solve(B0 %*% t(B0)) %*% B0
# crone and crosby subspace distance measure, should be small
k - sum(diag(P0 %*% Pk))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(ICS)
Loading required package: mvtnorm
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/ICS/scovq.Rd_%03d_medium.png", width=480, height=480)
> ### Name: scovq
> ### Title: Supervised scatter matrix based on quantiles
> ### Aliases: scovq
> ### Keywords: multivariate
>
> ### ** Examples
>
> # Creating some data
>
> # The number of explaining variables
> p <- 10
> # The number of observations
> n <- 400
> # The error variance
> sigma <- 0.5
> # The explaining variables
> X <- matrix(rnorm(p*n),n,p)
> # The error term
> epsilon <- rnorm(n, sd = sigma)
> # The response
> y <- X[,1]^2 + X[,2]^2*epsilon
>
>
> # SICS with ics
>
> X.centered <- sweep(X,2,colMeans(X),"-")
> SICS <- ics(X.centered, S1=cov, S2=scovq, S2args=list(y=y, q1=0.25,
+ q2=0.75, pos=FALSE), stdKurt=FALSE, stdB="Z")
>
> # Assuming it is known that k=2, then the two directions
> # of interest are choosen as:
>
> k <- 2
> KURTS <- SICS@gKurt
> KURTS.max <- ifelse(KURTS >= 1, KURTS, 1/KURTS)
> ordKM <- order(KURTS.max, decreasing = TRUE)
>
> indKM <- ordKM[1:k]
>
> # The two variables of interest
> Zk <- ics.components(SICS)[,indKM]
>
> # The correspondings transformation matrix
> Bk <- coef(SICS)[indKM,]
>
> # The corresponding projection matrix
> Pk <- t(Bk) %*% solve(Bk %*% t(Bk)) %*% Bk
>
> # Visualization
> pairs(cbind(y,Zk))
>
> # checking the subspace difference
>
> # true projection
>
> B0 <- rbind(rep(c(1,0),c(1,p-1)),rep(c(0,1,0),c(1,1,p-2)))
> P0 <- t(B0) %*% solve(B0 %*% t(B0)) %*% B0
>
> # crone and crosby subspace distance measure, should be small
> k - sum(diag(P0 %*% Pk))
[1] 0.5934633
>
>
>
>
>
>
> dev.off()
null device
1
>