Last data update: 2014.03.03

R: Create and Explore Counting Process Sample Path Summaries
summary.CountingProcessSamplePathR Documentation

Create and Explore Counting Process Sample Path Summaries

Description

These functions / methods are designed to test a CountingProcessSamplePath object against a uniform Poisson process with rate 1.

Usage

## S3 method for class 'CountingProcessSamplePath'
summary(object, exact = TRUE,
        lag.max = NULL, d = max(c(2, sqrt(length(object$ppspFct()))%/%5)), ...)
## S3 method for class 'CountingProcessSamplePath.summary'
print(x, digits = 5, ...)
## S3 method for class 'CountingProcessSamplePath.summary'
plot(x, y, which = c(1,2,6,8), main,
     caption = c(expression(paste("Uniform on ", Lambda," Test")),
                 "Berman's Test",
                 "Log Survivor Function",
                 expression(paste(U[k+1]," vs ", U[k])),
                 "Variance vs Mean Test",
                 "Wiener Process Test",
                 "Autocorrelation Fct.",
                 "Renewal Test"),
     ask = FALSE, lag.max = NULL,
     d = max(c(2, sqrt(length(eval(x$call[[2]])$ppspFct()))%/%5)),
     ...)

Arguments

object

A CountingProcessSamplePath object.

exact

Should an exact Kolmogorov test be used? See ks.test.

lag.max

See renewalTestPlot.

d

See renewalTestPlot.

x

A CountingProcessSamplePath.summary object.

digits

An integer, the number of digits to be used while printing summaries. See round.

y

Not used but required for compatibility with the plot method.

which

If a subset of the test plots is required, specify a subset of the numbers 1:6.

main

Title to appear above the plots, if missing the corresponding element of caption will be used.

caption

Default caption to appear above the plots or, if main is given, bellow it

ask

A logical; if TRUE, the user is asked to hit the return key before each plot generation, see par(ask=.).

...

Passed to chisq.test used internally by summary, not used in plot and print.

Details

If the CountingProcessSamplePath object x is a the realization of a homogeneous Poisson process then, conditioned on the number of events observed, the location of the events (jumps in N(t)) is uniform on the period of observation. This is a basic property of the homogeneous Poisson process derived in Chap. 2 of Cox and Lewis (1966) and Daley and Vere-Jones (2003). Component UniformGivenN of a CountingProcessSamplePath.summary list contains the p.value of the Kolmogorov test of this uniform null hypothesis. The first graph generated by the plot method displays the Kolgorov test graphically (i.e., the empirical cumulative distribution function and the null hyptohesis (the diagonal). The two dotted lines on both sides of the diagonal correspond to 95 and 99% (asymptotic) confidence intervals. This is the graph shown on Fig. 9 (p 19) of Ogata (1988). Notice that the summary method allows you to compute the exact p.value.

If we write x[i] the jump locations of the CountingProcessSamplePath object x and if the latter is the realization of a homogeneous Poisson process then the intervals:

y[i]=x[i+1]-x[i]

are realizations of iid rvs from an exponential distribution with rate 1 and the:

u[i]=1 - exp(-y[i])

are realizations of iid rvs from a uniform distribution on [0,1). The second graph generated by the plot method tests this uniform distribution hypotheses with a Kolmogorov Test. This is the graph shown on Fig. 10 (p 19) of Ogata (1988) which was suggested by Berman. This is also the one of the graphs proposed by Brown et al (2002) (the other one is a Q-Q plot for the same quantities). The two dotted lines on both sides of the diagonal correspond to 95 and 99% (asymptotic) confidence intervals. Component BermanTest of a CountingProcessSamplePath.summary list contains the p.value of the Kolmogorov test of this uniform null hypothesis.

Following the line of the previous paragraph, if the distribution of the y[i] is an exponential distribution with rate 1, then their survivor function is: exp(-y). This is what's shown on the third graph generated by the plot method, using a log scale for the ordinate. The point wise CI at 95 and 99% are also drawn (dotted lines). This is the graph shown on Fig. 12 (p 20) of Ogata (1988)

If the u[i] of the second paragraph are realizations of iid uniform rvs on [0,1) then a plot of u[i+1] vs u[i] should fill uniformly the unit square [0,1) x [0,1). This is the fourth generated graph (the one shown on Fig. 11 (p 20) of Ogata (1988)) by the plot method while the seventh graph shows the autocorrelation function of the u[i]s. Component RenewalTest of a CountingProcessSamplePath.summary list contains a slightly more elaborated (and quantitative) version of this test summarizing the fourth graph (bottom right) generated by a call to renewalTestPlot. This list element is itself a list with elements: chi2.95 (a logical), chi2.99 (a logical) and total (an integer). The bounds resulting from repetitively testing a sequence of what are, under the null hypothesis, iid chi2 rvs are obtained using Donsker's Theorem (see bellow). For each lag the number of degrees of freedom of the chi2 distribution is subtracted from each chi2 value. These centered values are then divided by their sd (assuming the null hypothesis is correct). The cumulative sum of the centered and scaled sequence is formed and is divided by the square root of the maximal lag used. This is "plugged-in" the Donsker's Theorem. The eighth graph of the plot method displays the resulting Wiener process. With the tight confidence regions of Kendall et al (2007), see bellow.

If the x[i] are realization of a homogeneous Poisson process observed between 0 and T, then the number of events observed on non-overlapping windows of length t should be iid Poisson rv with mean t (and variance t). The observation period is therefore chopped into non-overlapping windows of increasing length and the empirical variance of the event count is graphed versus the empirical mean, together with 95 and 99% CI (using a normal approximation). This is done by calling internally varianceTime. That's what's generated by the fifth graph of the plot method. This is the graph shown on Fig. 13 (p 20) of Ogata (1988). Component varianceTimeSummary of a CountingProcessSamplePath.summary list contains a summary of this test, counting the number of events out of each band.

The last graph generated by the plot method and the companions components, Wiener95 and Wiener99, of a CountingProcessSamplePath.summary list represent "new" tests (as far as I know). They are based on the fact that if the y[i] above are realizations of iid rvs following an exponential distribution with rate 1, then the w[i]=y[i]-1 are realizations of iid rvs with mean 0 and variance 1. We can then form the partial sums:

S[n]=w[1]+...+w[n]

and define the random right continuous with a left-hand limit functions on [0,1]:

S[floor(n*t)]/sqrt(n)

These functions are realizations of a process which converges (weakly) to a Wiener process on [0,1]. The proof of this statement is a corollary of Donsker's Theorem and can be found on pp 146-147, Theorem 14.1, of Billingsley (1999). I thank Vilmos Prokaj for pointing this reference to me.What is then done is testing if the putative Wiener process is entirely within the tight boundaries defined by Kendall et al (2007) for a true Wiener process, see crossTight.

Value

summary.CountingProcessSamplePath returns a CountingProcessSamplePath.summary object which is a list with the following components:

UniformGivenN

A numeric, the p.value of the Kolmogorov test of uniformity of the events times given the number of events.

Wiener95

A logical: is the scaled martingale within the tight 95% confidence band?

Wiener99

A logical: is the scaled martingale within the tight 99% confidence band?

BermanTest

A numeric, the p.value of the Kolmogorov test of uniformity of the scaled inter events intervals.

RenewalTest

A list with components: chi2.95, chi2.99 and total. chi2.95 resp. chi2.99 is a logical and is TRUE if the Wiener process obtained as described above is within the "tight" 95% resp. 99% confidence band of Kendall et al (2007). total gives the total number of lags. See renewalTestPlot.

varianceTime

A varianceTime object.

varianceTimeSummary

A numeric vector with components: total, out95 and out99. total gives the total number of window sizes explored. out95 gives the number of windows within which the variance is out of the 95% confidence band. out99 gives the number of windows within which the variance is out of the 99% confidence band. See varianceTime.

n

An integer, the number of events.

call

The matched call.

Acknowledgments

I thank Vilmos Prokaj for pointing out Donsker's Theorem and for indicating me the proof's location (Patrick Billingsley's book).

I also thank Olivier Faugeras and Jonathan Touboul for pointing out Donsker's therom to me.

Warning

If you wan these tests to be meaningful do not apply them to the data you just used to fit your conditional intensity model.

Note

These functions / methods are designed to replace the summary.transformedTrain and plot.transformedTrain ones. The former have a more general design.

Of course to be fully usable, these functions must be coupled to functions allowing users to fit conditional intensity models.The support for that in STAR is not complete yet but is coming soon. See for now the example bellow.

The end of the example bellow (not ran by default) shows that the coverage probability of the Wiener Process confidence bands are really good even for small (50) sample sizes.

Author(s)

Christophe Pouzat christophe.pouzat@gmail.com

References

Patrick Billingsley (1999) Convergence of Probability Measures. Wiley - Interscience.

Brillinger, D. R. (1988) Maximum likelihood analysis of spike trains of interacting nerve cells. Biol. Cybern. 59: 189–200.

Brown, E. N., Barbieri, R., Ventura, V., Kass, R. E. and Frank, L. M. (2002) The time-rescaling theorem and its application to neural spike train data analysis. Neural Computation 14: 325-346.

D. R. Cox and P. A. W. Lewis (1966) The Statistical Analysis of Series of Events. John Wiley and Sons.

Daley, D. J. and Vere-Jones D. (2003) An Introduction to the Theory of Point Processes. Vol. 1. Springer.

Ogata, Yosihiko (1988) Statistical Models for Earthquake Occurrences and Residual Analysis for Point Processes. Journal of the American Statistical Association 83: 9-27.

Johnson, D.H. (1996) Point process models of single-neuron discharges. J. Computational Neuroscience 3: 275–299.

W. S. Kendall, J.- M. Marin and C. P. Robert (2007) Brownian Confidence Bands on Monte Carlo Output. Statistics and Computing 17: 1–10. Preprint available at: http://www.ceremade.dauphine.fr/%7Exian/kmr04.rev.pdf

See Also

mkCPSP, as.CPSP, plot.CountingProcessSamplePath, print.CountingProcessSamplePath, varianceTime, crossTight, renewalTestPlot, ks.test

Examples

## Not run: 
## load one spike train data set of STAR
data(e060824spont)
## Create the CountingProcessSamplePath object
n1spt.cp <- as.CPSP(e060824spont[["neuron 1"]])
## print it
n1spt.cp
## plot it
plot(n1spt.cp)
## get the summary
## Notice the warning due to few identical interspike intervals
## leading to an inaccurate Berman's test.
summary(n1spt.cp)

## Simulate data corresponding to a renewal process with
## an inverse Gaussian ISI distribution in the spontaneous
## regime modulated by a multiplicative stimulus whose time
## course is a shifted and scaled chi2 density.
## Define the "stimulus" function
stimulus <- function(t,
                     df=5,
                     tonset=5,
                     timeFactor=5,
                     peakFactor=10) {
  dchisq((t-tonset)*timeFactor,df=df)*peakFactor
}
## Define the conditional intensity / hazard function
hFct <- function(t,
                 tlast,
                 df=5,
                 tonset=5,
                 timeFactor=5,
                 peakFactor=10,
                 mu=0.075,
                 sigma2=3
                 ) {
  
  hinvgauss(t-tlast,mu=mu,sigma2=sigma2)*exp(stimulus(t,df,tonset,timeFactor,peakFactor))

}
## define the function simulating the train with the thinning method                   
makeTrain <- function(tstop=10,
                      peakCI=200,
                      preTime=5,
                      df=5,
                      tonset=5,
                      timeFactor=5,
                      peakFactor=10,
                      mu=0.075,
                      sigma2=3) {

  result <- numeric(500) - preTime - .Machine$double.eps
  result.n <- 500
  result[1] <- 0
  idx <- 1
  currentTime <- result[1]
  while (currentTime < tstop+preTime) {
    currentTime <- currentTime+rexp(1,peakCI)
    p <- hFct(currentTime,
              result[idx],
              df=df,
              tonset=tonset+preTime,
              timeFactor=timeFactor,
              peakFactor=peakFactor,
              mu=mu,
              sigma2=sigma2)/peakCI
    rthreshold <- runif(1)
    if (p>1) stop("Wrong peakCI")
    while(p < rthreshold) {
      currentTime <- currentTime+rexp(1,peakCI)
      p <- hFct(currentTime,
                result[idx],
                df=df,
                tonset=tonset+preTime,
                timeFactor=timeFactor,
                peakFactor=peakFactor,
                mu=mu,
                sigma2=sigma2)/peakCI
      if (p>1) stop("Wrong peakCI")
      rthreshold <- runif(1)
    }
    idx <- idx+1
    if (idx > result.n) {
      result <- c(result,numeric(500)) - preTime - .Machine$double.eps
      result.n <- result.n + 500
    }
    result[idx] <- currentTime
  }

  result[preTime < result & result <= tstop+preTime] - preTime
  
}
## set the seed
set.seed(20061001)
## "make" the train
t1 <- makeTrain()
## create the corresponding CountingProcessSamplePath
## object
cpsp1 <- mkCPSP(t1)
## print it
cpsp1
## test it
cpsp1.summary <- summary(cpsp1)
cpsp1.summary
plot(cpsp1.summary)
## Define a function returning the conditional intensity function (cif)
ciFct <- function(t,
                  tlast,
                  df=5,
                  tonset=5,
                  timeFactor=5,
                  peakFactor=10,
                  mu=0.075,
                  sigma2=3
                  ) {

  sapply(t, function(x) {
    if (x <= tlast[1]) return(1/mu)
    y <- x-max(tlast[tlast<x])
    hinvgauss(y,mu=mu,sigma2=sigma2)*exp(stimulus(x,df,tonset,timeFactor,peakFactor))
  }
         )

}
## Compute the cif of the train
tt <- seq(0,10,0.001)
lambda.true <- ciFct(tt,cpsp1$ppspFct())
## plot it together with the events times
## Notice that the representation is somewhat inaccurate, the cif
## is in fact a left continuous function
plot(tt,lambda.true,type="l",col=2)
rug(cpsp1$ppspFct())
## plot the integrated intensity function and the counting process
plot(tt,cumsum(lambda.true)*0.001,type="l",col=2)
lines(cpsp1)
## define a function doing the time transformation / rescaling
## by integrating the cif and returning another CountingProcessSamplePath
transformCPSP <- function(cpsp,
                          ciFct,
                          CIFct,
                          method=c("integrate","discrete"),
                          subdivisions=100,
                          ...
                          ) {

  if (!inherits(cpsp,"CountingProcessSamplePath"))
    stop("cpsp should be a CountingProcessSamplePath objet")
  st <- cpsp$ppspFct()
  n <- length(st)
  from <- cpsp$from
  to <- cpsp$to
  if (missing(CIFct)) {
    if (method[1] == "integrate") {
      lwr <- c(from,st)
      upr <- c(st,to)
      Lambda <- sapply(1:(n+1),
                       function(idx)
                       integrate(ciFct,
                                 lower=lwr[idx],
                                 upper=upr[idx],
                                 subdivisions=subdivisions,
                                 ...)$value
                       )
      Lambda <- cumsum(Lambda)
      st <- Lambda[1:n]
      from <- 0
      to <- Lambda[n+1]
    } ## End of conditional on method[1] == "integrate"
    if (method[1] == "discrete") {
      lwr <- c(from,st)
      upr <- c(st,to)
      xx <- unlist(lapply(1:(n+1),
                          function(idx) seq(lwr[idx],
                                            upr[idx],
                                            length.out=subdivisions)
                          )
                   )
      Lambda <- cumsum(ciFct(xx[-length(xx)])*diff(xx))
      Lambda <- Lambda - Lambda[1]
      st <- Lambda[(1:n)*subdivisions]
      from <- 0
      to <- Lambda[length(Lambda)]
    } ## End of conditional on method[1] == "discrete"
  } else {
    result <- CIFct(c(from,st,to))
    result <- result-result[1]
    from <- result[1]
    to <- result[n+2]
    st <- result[2:(n+1)]
  } ## End of conditional on missing(CIFct)
  mkCPSP(st,from,to)
}
## transform cpsp1
cpsp1t <- transformCPSP(cpsp1,function(t) ciFct(t,cpsp1$ppspFct()))
## test it
cpsp1t.summary <- summary(cpsp1t)
cpsp1t.summary
plot(cpsp1t.summary)
## compare the finite sample performances of the
## Kolmogorov test (test the uniformity of the
## jump times given the number of events) with the
## ones of the new "Wiener process test"
empiricalCovProb <- function(myRates=c(10,(1:8)*25,(5:10)*50,(6:10)*100),
                             nbRep=1000,
                             exact=NULL
                             ) {

  b95 <- function(t) 0.299944595870772 + 2.34797018726827*sqrt(t)
  b99 <- function(t) 0.313071417065285 + 2.88963206734397*sqrt(t)
  result <- matrix(numeric(4*length(myRates)),nrow=4)
  colnames(result) <- paste(myRates)
  rownames(result) <- c("ks95","ks99","wp95","wp99")
  for (i in 1:length(myRates)) {
    rate <- myRates[i]
    partial <- sapply(1:nbRep,
                      function(repIdx) {
                        st <- cumsum(rexp(5*rate,rate))
                        while(max(st) < 1) st <- c(st,max(st)+cumsum(rexp(5*rate,rate)))
                        st <- st[st<=1]
                        ks <- ks.test(st,punif,exact=exact)$p.value
                        w <- (st*rate-seq(st))/sqrt(rate)
                        c(ks95=0.95 < ks,
                          ks99=0.99 < ks,
                          wp95=any(w < -b95(st) | b95(st) < w),
                          wp99=any(w < -b99(st) | b99(st) < w)
                          )
                      }
                      )
    
    result[,i] <- apply(partial,1,sum)
  }
 
  attr(result,"nbRep") <- nbRep
  attr(result,"myRates") <- myRates
  attr(result,"call") <- match.call()
  result/nbRep
  
}

plotCovProb <- function(covprob,ci=0.95) {

  nbMax <- max(attr(covprob,"myRates"))
  plot(c(0,nbMax),c(0.94,1),
       type="n",
       xlab="Expected number of Spikes",
       ylab="Empirical cov. prob.",xaxs="i",yaxs="i")
  nbRep <- attr(covprob,"nbRep")
  polygon(c(0,nbMax,nbMax,0),
          c(rep(qbinom((1-ci)/2,nbRep,0.95)/nbRep,2),rep(qbinom(1-(1-ci)/2,nbRep,0.95)/nbRep,2)),
          col="grey50",border=NA)
  polygon(c(0,nbMax,nbMax,0),
          c(rep(qbinom((1-ci)/2,nbRep,0.99)/nbRep,2),rep(qbinom(1-(1-ci)/2,nbRep,0.99)/nbRep,2)),
          col="grey50",border=NA)
  nbS <- attr(covprob,"myRates")
  points(nbS,1-covprob[1,],pch=3)
  points(nbS,1-covprob[2,],pch=3)
  points(nbS,1-covprob[3,],pch=1)
  points(nbS,1-covprob[4,],pch=1)

}
system.time(covprobA <- empiricalCovProb())
plotCovProb(covprobA)

## End(Not run)

Results