R: Create and Explore Counting Process Sample Path Summaries
summary.CountingProcessSamplePath
R Documentation
Create and Explore Counting Process Sample Path Summaries
Description
These functions / methods are designed to test a
CountingProcessSamplePath object against a uniform Poisson
process with rate 1.
Usage
## S3 method for class 'CountingProcessSamplePath'
summary(object, exact = TRUE,
lag.max = NULL, d = max(c(2, sqrt(length(object$ppspFct()))%/%5)), ...)
## S3 method for class 'CountingProcessSamplePath.summary'
print(x, digits = 5, ...)
## S3 method for class 'CountingProcessSamplePath.summary'
plot(x, y, which = c(1,2,6,8), main,
caption = c(expression(paste("Uniform on ", Lambda," Test")),
"Berman's Test",
"Log Survivor Function",
expression(paste(U[k+1]," vs ", U[k])),
"Variance vs Mean Test",
"Wiener Process Test",
"Autocorrelation Fct.",
"Renewal Test"),
ask = FALSE, lag.max = NULL,
d = max(c(2, sqrt(length(eval(x$call[[2]])$ppspFct()))%/%5)),
...)
Arguments
object
A CountingProcessSamplePath object.
exact
Should an exact Kolmogorov test be used? See ks.test.
lag.max
See renewalTestPlot.
d
See renewalTestPlot.
x
A CountingProcessSamplePath.summary object.
digits
An integer, the number of digits to be used while
printing summaries. See round.
y
Not used but required for compatibility with the
plot method.
which
If a subset of the test plots is required, specify a subset of
the numbers 1:6.
main
Title to appear above the plots, if missing the
corresponding element of caption will be used.
caption
Default caption to appear above the plots or, if
main is given, bellow it
ask
A logical; if TRUE, the user is asked to hit
the return key before
each plot generation, see par(ask=.).
...
Passed to chisq.test used internally by
summary, not used in plot and print.
Details
If the CountingProcessSamplePath object x is a the
realization of a homogeneous Poisson process then, conditioned on the number of
events observed, the location of the events (jumps in N(t)) is uniform on the
period of observation. This is a basic property of
the homogeneous Poisson process derived in Chap. 2 of Cox and Lewis
(1966) and Daley and Vere-Jones (2003). Component UniformGivenN
of a CountingProcessSamplePath.summary list contains the p.value of
the Kolmogorov test of this uniform null hypothesis. The first graph
generated by the plot method displays the Kolgorov test
graphically (i.e., the empirical cumulative distribution
function and the null hyptohesis (the diagonal). The two
dotted lines on both sides of the diagonal correspond to 95 and
99% (asymptotic) confidence intervals. This is the graph shown on Fig. 9 (p 19) of
Ogata (1988). Notice that the summary
method allows you to compute the exact p.value.
If we write x[i] the jump locations of the
CountingProcessSamplePath object x and if the latter is
the realization of a homogeneous Poisson process then the intervals:
y[i]=x[i+1]-x[i]
are realizations of iid rvs from an exponential distribution with rate 1 and the:
u[i]=1 - exp(-y[i])
are realizations of iid rvs from a uniform distribution on [0,1). The second graph
generated by the plot method tests this uniform distribution hypotheses with
a Kolmogorov Test. This is the graph shown on Fig. 10 (p 19) of
Ogata (1988) which was suggested by Berman. This is also the one of
the graphs proposed by Brown et al (2002) (the other one is a Q-Q plot for the
same quantities). The two dotted lines on both sides of the diagonal correspond to 95 and
99% (asymptotic) confidence intervals. Component BermanTest of
a CountingProcessSamplePath.summary list contains the p.value of
the Kolmogorov test of this uniform null hypothesis.
Following the line of the previous paragraph, if the distribution of
the y[i] is an exponential distribution with rate 1, then
their survivor function is: exp(-y). This is what's
shown on the third graph generated by the plot method, using a log scale for
the ordinate. The point wise CI at 95 and 99% are also drawn (dotted
lines). This is the graph shown on Fig. 12 (p 20) of
Ogata (1988)
If the u[i] of the second paragraph are realizations of iid uniform rvs on
[0,1) then a plot of u[i+1] vs u[i] should
fill uniformly the unit square [0,1) x [0,1). This is the fourth
generated graph (the one shown on Fig. 11 (p 20) of
Ogata (1988)) by the plot method while the seventh graph shows
the autocorrelation function of the u[i]s. Component RenewalTest of a
CountingProcessSamplePath.summary list contains a slightly more
elaborated (and quantitative) version of this test summarizing the
fourth graph (bottom right) generated by a call to
renewalTestPlot. This list element is itself a list with
elements: chi2.95 (a logical), chi2.99 (a
logical) and total (an integer). The bounds resulting from repetitively
testing a sequence of what are, under the null hypothesis, iid
chi2 rvs are obtained using Donsker's Theorem
(see bellow). For each lag the number of degrees of freedom of the
chi2 distribution is subtracted from each
chi2 value. These centered values are then divided by
their sd (assuming the null hypothesis is correct). The cumulative sum
of the centered and scaled sequence is formed and is divided by the
square root of the maximal lag used. This is "plugged-in" the
Donsker's Theorem. The eighth graph of the plot method displays
the resulting Wiener process. With the tight confidence regions of
Kendall et al (2007), see bellow.
If the x[i] are realization of a homogeneous Poisson
process observed between 0 and T, then
the number of events observed on non-overlapping windows of length t
should be iid Poisson rv with mean t (and variance t). The observation
period is therefore chopped into non-overlapping windows of increasing length
and the empirical variance of the event count is graphed versus the
empirical mean, together with 95 and 99% CI (using a normal
approximation). This is done by calling internally
varianceTime. That's what's generated by the fifth graph
of the plot method. This is the graph shown on Fig. 13 (p 20) of
Ogata (1988). Component varianceTimeSummary of a
CountingProcessSamplePath.summary list contains a summary of
this test, counting the number of events out of each band.
The last graph generated by the plot method and the companions
components, Wiener95 and Wiener99, of a
CountingProcessSamplePath.summary list represent "new" tests
(as far as I know). They are based on the fact that if the
y[i] above are realizations of iid rvs following an exponential distribution
with rate 1, then the w[i]=y[i]-1 are realizations of
iid rvs with mean 0
and variance 1. We can then form the partial sums:
S[n]=w[1]+...+w[n]
and define the random right continuous with a left-hand limit functions on [0,1]:
S[floor(n*t)]/sqrt(n)
These functions are realizations of a process which converges (weakly)
to a Wiener process on [0,1]. The proof of this statement is a corollary of Donsker's Theorem
and can be found on pp 146-147, Theorem 14.1, of Billingsley (1999). I
thank Vilmos Prokaj for pointing this reference to me.What is then
done is testing if the putative Wiener process is entirely within the
tight boundaries defined by Kendall et al (2007) for a true Wiener
process, see crossTight.
Value
summary.CountingProcessSamplePath returns a
CountingProcessSamplePath.summary object which is a list
with the following components:
UniformGivenN
A numeric, the p.value of the Kolmogorov
test of uniformity of the events times given the number of events.
Wiener95
A logical: is the scaled martingale within the
tight 95% confidence band?
Wiener99
A logical: is the scaled martingale within the
tight 99% confidence band?
BermanTest
A numeric, the p.value of the Kolmogorov
test of uniformity of the scaled inter events intervals.
RenewalTest
A list with components:
chi2.95, chi2.99 and total. chi2.95
resp. chi2.99 is a logical and is TRUE if the
Wiener process obtained as described above is within the "tight" 95% resp. 99% confidence band of Kendall et al (2007). total gives the total number of
lags. See renewalTestPlot.
varianceTime
A varianceTime object.
varianceTimeSummary
A numeric vector with components:
total, out95 and out99. total gives the total number of
window sizes explored. out95 gives the number of windows within
which the variance is out of the 95% confidence band. out99
gives the number of windows within
which the variance is out of the 99% confidence band. See varianceTime.
n
An integer, the number of events.
call
The matched call.
Acknowledgments
I thank Vilmos Prokaj for pointing out Donsker's Theorem and for indicating me
the proof's location (Patrick Billingsley's book).
I also thank Olivier Faugeras and Jonathan Touboul for pointing out
Donsker's therom to me.
Warning
If you wan these tests to be meaningful do not apply them to the
data you just used to fit your conditional intensity model.
Note
These functions / methods are designed to replace the
summary.transformedTrain and plot.transformedTrain
ones. The former have a more general design.
Of course to be fully usable, these functions must be coupled to
functions allowing users to fit conditional intensity models.The
support for that in STAR is not complete yet but is coming
soon. See for now the example bellow.
The end of the example bellow (not ran by default) shows that the
coverage probability of the Wiener Process confidence bands are really
good even for small (50) sample sizes.
Patrick Billingsley (1999) Convergence of Probability
Measures. Wiley - Interscience.
Brillinger, D. R. (1988) Maximum likelihood analysis of spike trains
of interacting nerve cells. Biol. Cybern.59: 189–200.
Brown, E. N., Barbieri, R., Ventura, V., Kass, R. E. and Frank,
L. M. (2002) The time-rescaling theorem and its application to neural
spike train data analysis. Neural Computation14:
325-346.
D. R. Cox and P. A. W. Lewis (1966) The Statistical Analysis of
Series of Events. John Wiley and Sons.
Daley, D. J. and Vere-Jones D. (2003) An Introduction to the
Theory of Point Processes. Vol. 1. Springer.
Ogata, Yosihiko (1988) Statistical Models for Earthquake Occurrences and Residual
Analysis for Point Processes. Journal of the American
Statistical Association83: 9-27.
Johnson, D.H. (1996) Point process models of single-neuron
discharges. J. Computational Neuroscience3: 275–299.
## Not run:
## load one spike train data set of STAR
data(e060824spont)
## Create the CountingProcessSamplePath object
n1spt.cp <- as.CPSP(e060824spont[["neuron 1"]])
## print it
n1spt.cp
## plot it
plot(n1spt.cp)
## get the summary
## Notice the warning due to few identical interspike intervals
## leading to an inaccurate Berman's test.
summary(n1spt.cp)
## Simulate data corresponding to a renewal process with
## an inverse Gaussian ISI distribution in the spontaneous
## regime modulated by a multiplicative stimulus whose time
## course is a shifted and scaled chi2 density.
## Define the "stimulus" function
stimulus <- function(t,
df=5,
tonset=5,
timeFactor=5,
peakFactor=10) {
dchisq((t-tonset)*timeFactor,df=df)*peakFactor
}
## Define the conditional intensity / hazard function
hFct <- function(t,
tlast,
df=5,
tonset=5,
timeFactor=5,
peakFactor=10,
mu=0.075,
sigma2=3
) {
hinvgauss(t-tlast,mu=mu,sigma2=sigma2)*exp(stimulus(t,df,tonset,timeFactor,peakFactor))
}
## define the function simulating the train with the thinning method
makeTrain <- function(tstop=10,
peakCI=200,
preTime=5,
df=5,
tonset=5,
timeFactor=5,
peakFactor=10,
mu=0.075,
sigma2=3) {
result <- numeric(500) - preTime - .Machine$double.eps
result.n <- 500
result[1] <- 0
idx <- 1
currentTime <- result[1]
while (currentTime < tstop+preTime) {
currentTime <- currentTime+rexp(1,peakCI)
p <- hFct(currentTime,
result[idx],
df=df,
tonset=tonset+preTime,
timeFactor=timeFactor,
peakFactor=peakFactor,
mu=mu,
sigma2=sigma2)/peakCI
rthreshold <- runif(1)
if (p>1) stop("Wrong peakCI")
while(p < rthreshold) {
currentTime <- currentTime+rexp(1,peakCI)
p <- hFct(currentTime,
result[idx],
df=df,
tonset=tonset+preTime,
timeFactor=timeFactor,
peakFactor=peakFactor,
mu=mu,
sigma2=sigma2)/peakCI
if (p>1) stop("Wrong peakCI")
rthreshold <- runif(1)
}
idx <- idx+1
if (idx > result.n) {
result <- c(result,numeric(500)) - preTime - .Machine$double.eps
result.n <- result.n + 500
}
result[idx] <- currentTime
}
result[preTime < result & result <= tstop+preTime] - preTime
}
## set the seed
set.seed(20061001)
## "make" the train
t1 <- makeTrain()
## create the corresponding CountingProcessSamplePath
## object
cpsp1 <- mkCPSP(t1)
## print it
cpsp1
## test it
cpsp1.summary <- summary(cpsp1)
cpsp1.summary
plot(cpsp1.summary)
## Define a function returning the conditional intensity function (cif)
ciFct <- function(t,
tlast,
df=5,
tonset=5,
timeFactor=5,
peakFactor=10,
mu=0.075,
sigma2=3
) {
sapply(t, function(x) {
if (x <= tlast[1]) return(1/mu)
y <- x-max(tlast[tlast<x])
hinvgauss(y,mu=mu,sigma2=sigma2)*exp(stimulus(x,df,tonset,timeFactor,peakFactor))
}
)
}
## Compute the cif of the train
tt <- seq(0,10,0.001)
lambda.true <- ciFct(tt,cpsp1$ppspFct())
## plot it together with the events times
## Notice that the representation is somewhat inaccurate, the cif
## is in fact a left continuous function
plot(tt,lambda.true,type="l",col=2)
rug(cpsp1$ppspFct())
## plot the integrated intensity function and the counting process
plot(tt,cumsum(lambda.true)*0.001,type="l",col=2)
lines(cpsp1)
## define a function doing the time transformation / rescaling
## by integrating the cif and returning another CountingProcessSamplePath
transformCPSP <- function(cpsp,
ciFct,
CIFct,
method=c("integrate","discrete"),
subdivisions=100,
...
) {
if (!inherits(cpsp,"CountingProcessSamplePath"))
stop("cpsp should be a CountingProcessSamplePath objet")
st <- cpsp$ppspFct()
n <- length(st)
from <- cpsp$from
to <- cpsp$to
if (missing(CIFct)) {
if (method[1] == "integrate") {
lwr <- c(from,st)
upr <- c(st,to)
Lambda <- sapply(1:(n+1),
function(idx)
integrate(ciFct,
lower=lwr[idx],
upper=upr[idx],
subdivisions=subdivisions,
...)$value
)
Lambda <- cumsum(Lambda)
st <- Lambda[1:n]
from <- 0
to <- Lambda[n+1]
} ## End of conditional on method[1] == "integrate"
if (method[1] == "discrete") {
lwr <- c(from,st)
upr <- c(st,to)
xx <- unlist(lapply(1:(n+1),
function(idx) seq(lwr[idx],
upr[idx],
length.out=subdivisions)
)
)
Lambda <- cumsum(ciFct(xx[-length(xx)])*diff(xx))
Lambda <- Lambda - Lambda[1]
st <- Lambda[(1:n)*subdivisions]
from <- 0
to <- Lambda[length(Lambda)]
} ## End of conditional on method[1] == "discrete"
} else {
result <- CIFct(c(from,st,to))
result <- result-result[1]
from <- result[1]
to <- result[n+2]
st <- result[2:(n+1)]
} ## End of conditional on missing(CIFct)
mkCPSP(st,from,to)
}
## transform cpsp1
cpsp1t <- transformCPSP(cpsp1,function(t) ciFct(t,cpsp1$ppspFct()))
## test it
cpsp1t.summary <- summary(cpsp1t)
cpsp1t.summary
plot(cpsp1t.summary)
## compare the finite sample performances of the
## Kolmogorov test (test the uniformity of the
## jump times given the number of events) with the
## ones of the new "Wiener process test"
empiricalCovProb <- function(myRates=c(10,(1:8)*25,(5:10)*50,(6:10)*100),
nbRep=1000,
exact=NULL
) {
b95 <- function(t) 0.299944595870772 + 2.34797018726827*sqrt(t)
b99 <- function(t) 0.313071417065285 + 2.88963206734397*sqrt(t)
result <- matrix(numeric(4*length(myRates)),nrow=4)
colnames(result) <- paste(myRates)
rownames(result) <- c("ks95","ks99","wp95","wp99")
for (i in 1:length(myRates)) {
rate <- myRates[i]
partial <- sapply(1:nbRep,
function(repIdx) {
st <- cumsum(rexp(5*rate,rate))
while(max(st) < 1) st <- c(st,max(st)+cumsum(rexp(5*rate,rate)))
st <- st[st<=1]
ks <- ks.test(st,punif,exact=exact)$p.value
w <- (st*rate-seq(st))/sqrt(rate)
c(ks95=0.95 < ks,
ks99=0.99 < ks,
wp95=any(w < -b95(st) | b95(st) < w),
wp99=any(w < -b99(st) | b99(st) < w)
)
}
)
result[,i] <- apply(partial,1,sum)
}
attr(result,"nbRep") <- nbRep
attr(result,"myRates") <- myRates
attr(result,"call") <- match.call()
result/nbRep
}
plotCovProb <- function(covprob,ci=0.95) {
nbMax <- max(attr(covprob,"myRates"))
plot(c(0,nbMax),c(0.94,1),
type="n",
xlab="Expected number of Spikes",
ylab="Empirical cov. prob.",xaxs="i",yaxs="i")
nbRep <- attr(covprob,"nbRep")
polygon(c(0,nbMax,nbMax,0),
c(rep(qbinom((1-ci)/2,nbRep,0.95)/nbRep,2),rep(qbinom(1-(1-ci)/2,nbRep,0.95)/nbRep,2)),
col="grey50",border=NA)
polygon(c(0,nbMax,nbMax,0),
c(rep(qbinom((1-ci)/2,nbRep,0.99)/nbRep,2),rep(qbinom(1-(1-ci)/2,nbRep,0.99)/nbRep,2)),
col="grey50",border=NA)
nbS <- attr(covprob,"myRates")
points(nbS,1-covprob[1,],pch=3)
points(nbS,1-covprob[2,],pch=3)
points(nbS,1-covprob[3,],pch=1)
points(nbS,1-covprob[4,],pch=1)
}
system.time(covprobA <- empiricalCovProb())
plotCovProb(covprobA)
## End(Not run)