Last data update: 2014.03.03

R: Cross-Validated Survival Bump Hunting
sbhR Documentation

Cross-Validated Survival Bump Hunting

Description

Main end-user function for fitting a cross-validated Survival Bump Hunting (SBH) model. Returns a cross-validated PRSP object, as generated by our Patient Recursive Survival Peeling or PRSP algorithm, containing cross-validated estimates of end-points statistics of interest.

Usage

  sbh(dataset, 
      B = 10, K = 5, A = 1000, 
      vs = TRUE, cpv = FALSE, decimals = 2,
      cvtype = c("combined", "averaged", "none", NULL), 
      cvcriterion = c("lrt", "cer", "lhr", NULL),
      arg = "beta=0.05,alpha=0.05,minn=5,L=NULL,peelcriterion="lr"",
      probval = NULL, timeval = NULL, 
      parallel = FALSE, conf = NULL, seed = NULL)

Arguments

dataset

data.frame or numeric matrix of input dataset containing the observed survival and status indicator variables in the first two columns, respectively, and all the covariates thereafter. If a data.frame is provided, it will be coerced to a numeric matrix. Discrete (or nominal) covariates should be made (or re-arranged into) ordinal variables.

B

Positive integer scalar of the number of replications of the cross-validation procedure. Defaults to 10.

K

Integer giving the number of folds (partitions) into which the observations should be randomly split for the cross-validation procedure. Setting K also specifies the type of cross-validation to be done:

  • K = 1 carries no cross-validation out.

  • K in {2,...,n-1} carries out eqnK-fold cross-validation.

  • K = n carries out leave-one-out cross-validation.

A

Positive integer scalar of the number of permutations for the computation of cross-validated p-values. Defaults to 1000.

vs

logical scalar. Flag for optional variable (covariate) pre-selection. Defaults to TRUE.

cpv

logical scalar. Flag for computation of permutation p-values. Defaults to FALSE.

decimals

integer scalar. Number of user-specified significant decimals to output results. Defaults to 2.

cvtype

Character vector describing the cross-validation technique in {"combined", "averaged", "none", NULL}. If NULL, automatically reset to "none".

cvcriterion

character vector describing the optimization criterion in {"lrt", "lhr", "cer", NULL}. If NULL, automatically reset to "none".

arg

Character vector describing the PRSP parameters:

  • alpha = fraction to peel off at each step. Defaults to 0.05.

  • beta = minimum support size resulting from the peeling sequence. Defaults to 0.05.

  • minn = minimum number of observation that we want to be able to detect in a box. Defaults to 5.

  • L = fixed peeling length. Defaults to NULL.

  • peelcriterion in {"hr" for Log-Hazard Ratio (LHR), "lr" for Log-Rank Test (LRT), "ch" for Cumulative Hazard Summary (CHS)}. Defaults to "lr".

Note that the parameters in arg come as a string of charaters between double quotes, where all parameter evaluations are separated by comas (see example).

probval

Numeric scalar of the survival probability at which we want to get the endpoint box survival time. Defaults to NULL.

timeval

Numeric scalar of the survival time at which we want to get the endpoint box survival probability. Defaults to NULL.

parallel

Logical. Is parallel computing to be performed? Optional. Defaults to FALSE.

conf

List of parameters for cluster configuration. Inputs for R package parallel function makeCluster (R package parallel) for cluster setup. Optional, defaults to NULL. See details for usage.

seed

Positive integer scalar of the user seed to reproduce the results.

Details

At this point, the main function sbh performs the search of the first box of the recursive coverage (outer) loop of our Patient Recursive Survival Peeling (PRSP) algorithm. It relies on an optional variable pre-selection procedure that is run before the PRSP algorithm. At this point, this is done by Elastic-Net (EN) penalization of the partial likelihood, where both mixing (alpha) and overal shrinkage (lambda) parameters are simultaneously estimated by cross-validation using the glmnet::cv.glmnet function of the R package glmnet.

The returned S3-class PRSP object contains cross-validated estimates of all the decision-rules of pre-selected covariates and all other statistical quantities of interest at each iteration of the peeling sequence (inner loop of the PRSP algorithm). This enables the graphical display of results of profiling curves for model tuning, peeling trajectories, covariate traces and survival distributions (see plotting functions for more details).

The function offers a number of options for the number of cross-validation replicates to be perfomed: B; the type of cross-validation desired: K-fold (replicated)-averaged or-combined, as well as the peeling and optimization critera chosen for model tuning and a few more parameters for the PRSP algorithm.

In case replicated cross-validations are performed, a "summary" of the outputs is done over the B replicates, which requires some explanation:

  • Even thought the PRSP algorithm uses only one covariate at a time at each peeling step, the reported matrix of "Replicated CV" box decision rules may show several covariates being used in a given step, simply because these decision rules are averaged over the B replicates (see equation #21 in Dazard et al. 2015). This is also reflected in the reported "Replicated CV" importance and usage plots of covariate traces.

  • Likewise, the output matrix of "Replicated CV" box membership indicator does not necessarily match exactly the output vector of "Replicated CV" box support (and corresponding box sample size) for all peeling steps. The reason is that the reported "Replicated CV" box membership indicators are computed (at each peeling step) as the point-wise majority vote over the B replicates (see equation #22 in Dazard et al. 2015), whereas the "Replicated CV" box support vector (and corresponding box sample size) is averaged (at each peeling step) over the B replicates.

The function takes advantage of the R package parallel, which allows users to create a cluster of workstations on a local and/or remote machine(s), enabling scaling-up with the number of CPU cores specified and efficient parallel execution.

If the computation of permutation p-values is desired, then running with the parallelization option is strongly advised as it may take a while. In the case of large (p > n) or very large (p >> n) datasets, it is also required to use the parallelization option.

To run a parallel session (and parallel RNG) of the PRIMsrc procedures (parallel=TRUE), argument conf is to be specified (i.e. non NULL). It must list the specifications of the folowing parameters for cluster configuration: "names", "cpus", "type", "homo", "verbose", "outfile". These match the arguments described in function makeCluster of the R package parallel. All fields are required to properly configure the cluster, except for "names" and "cpus", which are the values used alternatively in the case of a cluster of type "SOCK" (socket), or in the case of a cluster of type other than "SOCK" (socket), respectively. See examples below.

  • "names": names : character vector specifying the host names on which to run the job. Could default to a unique local machine, in which case, one may use the unique host name "localhost". Each host name can potentially be repeated to the number of CPU cores available on the corresponding machine.

  • "cpus": spec : integer scalar specifying the total number of CPU cores to be used across the network of available nodes, counting the workernodes and masternode.

  • "type": type : character vector specifying the cluster type ("SOCK", "PVM", "MPI").

  • "homo": homogeneous : logical scalar to be set to FALSE for inhomogeneous clusters.

  • "verbose": verbose : logical scalar to be set to FALSE for quiet mode.

  • "outfile": outfile : character vector of the output log file name for the workernodes.

Note that argument B is internally reset to conf$cpus*ceiling(B/conf$cpus) in case the parallelization is used (i.e. conf is non NULL), where conf$cpus denotes the total number of CPUs to be used (see above). The argument A is similarly reset.

The actual creation of the cluster, its initialization, and closing are all done internally. In addition, when random number generation is needed, the creation of separate streams of parallel RNG per node is done internally by distributing the stream states to the nodes (For more details see function makeCluster (R package parallel) and/or http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html.

The use of a seed allows to reproduce the results within the same type of session: the same seed will reproduce the same results within a non-parallel session or within a parallel session, but it will not necessarily give the exact same results (up to sampling variability) between a non-parallelized and parallelized session due to the difference of management of the seed between the two (see parallel RNG and value of retuned seed below).

Value

Object of class PRSP (Patient Recursive Survival Peeling) List containing the following 19 fields:

x

numeric matrix of original dataset.

times

numeric vector of observed failure / survival times.

status

numeric vector of observed event indicator in {1,0}.

B

positive integer of the number of replications used in the cross-validation procedure.

K

positive integer of the number of folds used in the cross-validation procedure.

A

positive integer of the number of permutations used for the computation of permutation p-values.

vs

logical scalar of returned flag of optional variable pre-selection.

cpv

logical scalar of returned flag of optional computation of permutation p-values.

decimals

integer of the number of user-specified significant decimals.

cvtype

character vector of the cross-validation technique used.

cvcriterion

character vector of optimization criterion used.

arg

character vector of the parameters used.

probval

Numeric scalar of survival probability used.

timeval

Numeric scalar of survival time used.

cvfit

List with 10 fields of cross-validated estimates:

  • cv.maxsteps: numeric scalar of maximal number of peeling steps over the replicates.

  • cv.nsteps: numeric scalar of optimal number of peeling steps according to the optimization criterion.

  • cv.trace: numeric vector of the modal trace values of covariate usage for all peeling steps.

  • cv.boxind: logical matrix in TRUE, FALSE of individual observation box membership indicator (columns) for all peeling steps (rows).

  • cv.rules: data.frame of decision rules on the covariates (columns) for all peeling steps (rows).

  • cv.signnumeric vector in {-1,+1} of directions of peeling for all pre-selected covariates.

  • cv.selectednumeric vector of pre-selected covariates in reference to original index.

  • cv.usednumeric vector of covariates used for peeling in reference to original index.

  • cv.stats: numeric matrix of box endpoint quantities of interest (columns) for all peeling steps (rows).

  • cv.pval: numeric vector of log-rank permutation p-values of sepraration of survival distributions.

cvprofiles

List of (B) of numeric vectors, one for each replicate, of the cross-validated statistics used in the optimization criterion (set by user) as a function of the number of peeling steps.

cvmeanprofiles

List of numeric vectors of the cross-validated mean statistics over the replicates. used in the optimization criterion (one set by user) as a function of the number of peeling steps.

plot

logical scalar of the returned flag for plotting or not the results of the fitted SBH model.

config

List with 7 fields of parameters used for configuring the parallelization including parallel and conf.

seed

User seed(s) used: integer of a single value, if parallelization is used integer vector of values, one for each replication, if parallelization is not used.

Note

Unique end-user function for fitting the Survival Bump Hunting model.

Author(s)

Maintainer: "Jean-Eudes Dazard, Ph.D." jxd101@case.edu

Acknowledgments: This project was partially funded by the National Institutes of Health NIH - National Cancer Institute (R01-CA160593) to J-E. Dazard and J.S. Rao.

References

  • Dazard J-E., Choe M., LeBlanc M. and Rao J.S. (2015). "Cross-validation and Peeling Strategies for Survival Bump Hunting using Recursive Peeling Methods." Statistical Analysis and Data Mining (in press).

  • Dazard J-E., Choe M., LeBlanc M. and Rao J.S. (2014). "Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods." In JSM Proceedings, Survival Methods for Risk Estimation/Prediction Section. Boston, MA, USA. American Statistical Association IMS - JSM, p. 3366-3380.

  • Dazard J-E., Choe M., LeBlanc M. and Rao J.S. (2015). "R package PRIMsrc: Bump Hunting by Patient Rule Induction Method for Survival, Regression and Classification." In JSM Proceedings, Statistical Programmers and Analysts Section. Seattle, WA, USA. American Statistical Association IMS - JSM, (in press).

  • Dazard J-E. and J.S. Rao (2010). "Local Sparse Bump Hunting." J. Comp Graph. Statistics, 19(4):900-92.

See Also

  • makeCluster (R package parallel)

  • cv.glmnet (R package glmnet)

  • glmnet (R package glmnet)

Examples

#===================================================
# Loading the library and its dependencies
#===================================================
library("PRIMsrc")

#===================================================
# Package news
# Package citation
#===================================================
PRIMsrc.news()
citation("PRIMsrc")
    
#===================================================
# Demo with a synthetic dataset
# Use help for descriptions
#===================================================
data("Synthetic.1", package="PRIMsrc")
?Synthetic.1

#===================================================
# Simulated dataset #1 (n=250, p=3)
# Non Replicated Combined Cross-Validation (RCCV)
# Peeling criterion = LRT
# Optimization criterion = LRT
# Without parallelization
# Without computation of permutation p-values
#===================================================
CVCOMB.synt1 <- sbh(dataset = Synthetic.1, 
                    cvtype = "combined", cvcriterion = "lrt",
                    B = 1, K = 5, 
                    vs = TRUE, cpv = FALSE, 
                    decimals = 2, probval = 0.5, 
                    arg = "beta=0.05,
                           alpha=0.05,
                           minn=5,
                           L=NULL,
                           peelcriterion="lr"",
                    parallel = FALSE, conf = NULL, seed = 123)

## Not run: 
    #===================================================
    # Examples of parallel backend parametrization 
    #===================================================
    # Example #1 - 1-Quad (4-core double threaded) PC 
    # Running WINDOWS
    # With SOCKET communication
    #===================================================
    if (.Platform$OS.type == "windows") {
        cpus <- detectCores()
        conf <- list("names" = rep("localhost", cpus),
                     "cpus" = cpus,
                     "type" = "SOCK",
                     "homo" = TRUE,
                     "verbose" = TRUE,
                     "outfile" = "")
    }
    #===================================================
    # Example #2 - 1 master node + 3 worker nodes cluster
    # All nodes equipped with identical setups and multicores
    # Running LINUX
    # With SOCKET communication
    #===================================================
    if (.Platform$OS.type == "unix") {
        masterhost <- Sys.getenv("HOSTNAME")
        slavehosts <- c("compute-0-0", "compute-0-1", "compute-0-2")
        nodes <- length(slavehosts) + 1
        cpus <- 8
        conf <- list("names" = c(rep(masterhost, cpus),
                                 rep(slavehosts, cpus)),
                     "cpus" = nodes * cpus,
                     "type" = "SOCK",
                     "homo" = TRUE,
                     "verbose" = TRUE,
                     "outfile" = "")
    }
    #===================================================
    # Example #3 - Multinode multicore per node cluster
    # Running LINUX 
    # with MPI communication
    # Here, a file named ".nodes" (e.g. in the home directory)
    # contains the list of nodes of the cluster
    #===================================================
    if (.Platform$OS.type == "unix") {
        hosts <- scan(file=paste(Sys.getenv("HOME"), "/.nodes", sep=""), 
                      what="", 
                      sep="\n")
        hostnames <- unique(hosts)
        nodes <- length(hostnames)
        cpus <-  length(hosts)/length(hostnames)
        conf <- list("cpus" = nodes * cpus,
                     "type" = "MPI",
                     "homo" = TRUE,
                     "verbose" = TRUE,
                     "outfile" = "")
    }
    #===================================================
    # Simulated dataset #1 (n=250, p=3)
    # Replicated Combined Cross-Validation (RCCV)
    # Peeling criterion = LRT
    # Optimization criterion = LRT
    # With parallelization
    # With computation of permutation p-values
    #===================================================
    CVCOMBREP.synt1 <- sbh(dataset = Synthetic.1, 
                           cvtype = "combined", cvcriterion = "lrt",
                           B = 10, K = 5, A = 1024, 
                           vs = TRUE, cpv = TRUE, 
                           decimals = 2, probval = 0.5, 
                           arg = "beta=0.05,
                                  alpha=0.05,
                                  minn=5,
                                  L=NULL,
                                  peelcriterion="lr"",
                           parallel = TRUE, conf = conf, seed = 123)

## End(Not run)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(PRIMsrc)
Loading required package: parallel
Loading required package: survival
Loading required package: Hmisc
Loading required package: lattice
Loading required package: Formula
Loading required package: ggplot2

Attaching package: 'Hmisc'

The following objects are masked from 'package:base':

    format.pval, round.POSIXt, trunc.POSIXt, units

Loading required package: glmnet
Loading required package: Matrix
Loading required package: foreach
Loaded glmnet 2.0-5

Loading required package: MASS
PRIMsrc 0.6.3
Type PRIMsrc.news() to see new features, changes, and bug fixes

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/PRIMsrc/sbh.Rd_%03d_medium.png", width=480, height=480)
> ### Name: sbh
> ### Title: Cross-Validated Survival Bump Hunting
> ### Aliases: sbh
> ### Keywords: Exploratory Survival/Risk Analysis Survival/Risk Estimation &
> ###   Prediction Non-Parametric Method Cross-Validation Bump Hunting
> ###   Rule-Induction Method
> 
> ### ** Examples
> 
> #===================================================
> # Loading the library and its dependencies
> #===================================================
> library("PRIMsrc")
> 
> #===================================================
> # Package news
> # Package citation
> #===================================================
> PRIMsrc.news()
Package: PRIMsrc

---------------------------------------------------------------------------------
Date   : 2015-01-20
o RELEASE 0.1.0
- Initial release to GitHub.
- Built and tested under R 3.1.2 and release update to GitHub.
---------------------------------------------------------------------------------
Date   : 2015-01-22
o RELEASE 0.2.0
- Built and tested under R 3.1.2 and release update to GitHub.
---------------------------------------------------------------------------------
Date   : 2015-02-01
o RELEASE 0.3.0
- Minor updates in the manual, email and version number.
- Built and tested under R 3.1.2 and release update to GitHub.
---------------------------------------------------------------------------------
Date   : 2015-02-27
o RELEASE 0.4.0
- Extension to high-dimensional p > n and p >> n cases by adding an 
  internal variable selection procedure using the Elasticnet-Regularized Cox Regression 
  function of the 'glmnet' package.
- Removed (temporarily) interactive option in sbh() in case no variables are selected by glmnet(...).
- Added dependency to glmnet package for initial variable selection.
- Added synthetic dataset #5 and example with p > n.
- Added real dataset #2 and example with p >> n.
- Added new ouputs 'selected' and 'used' in main function sbh(...) for variables effectively 
  selected and used for peeling. 
- Removed returned values of box vertices that were redudant with the returned rules.
- Changed return value of variable traces: now also returns the matrix of traces by replication.
- Corrected superfluous codes in the parallelization section, before clusterCall(...) in sbh(...).
- Corrected number of replications in sbh(...) in case of parallelization. 
- Corrected stepwise variable selection procedure in peel.box() to account for missing values.
- Corrected definition of the cross-validated box vertices (definition) 
  in the case of "combined CV" technique.
- Corrected generation of random seed when none is provided. 
- Minor updates, bugs and code improvements in sbh(...) and internal peel.box(...) functions.
- Updated manual, version number.
- Built and tested under R 3.1.2 and release update to GitHub.
---------------------------------------------------------------------------------
Date   : 2015-03-04
o RELEASE 0.5.0
- Change of package name and GitHub repository name from PrimSRC to PRIMsrc.
- Added CRAN/GitHub subfolder doc for PDF documentation files 
  (including manual and applied study abstract).
- Removed option for overlaying plots of multiple PRSP objects 
  in plot_boxtrace(...) and plot_boxtraj(...).
- Added argument "toplot" to choose which covariates should be plotted in 
  plot_boxtrace(...) and plot_boxtraj(...).
- Corrected handling of empty PRSP object (failed peeling) in all plotting functions.
- Implementation of plotting device now internal to all plotting functions.
- Removed internal functions from the manual, updated manual, version number.
- Built and tested under R 3.1.2 and release update to GitHub.
---------------------------------------------------------------------------------
Date   : 2015-03-16
o RELEASE 0.5.3
- Added S3-generic 'summary' function.
- Added S3-generic 'predict' function.
- Built and tested under R 3.1.2 and release update to GitHub.
---------------------------------------------------------------------------------
Date   : 2015-04-10
o RELEASE 0.5.5
- Removed argument 'discr' in the main function: no special rounding of discrete covariate 
  decision rules is done any longer.
- Made the internal variable selection procedure conditional on whether p <= n or not.
- Corrected treatment of missing values in case of replications for the variable traces.
- Corrected output of variable trace modal values.
- Corrected pre-selected variable output.
- Several minor bugs corrected.
- Built and tested under R 3.1.2 and release update to GitHub.
---------------------------------------------------------------------------------
Date   : 2015-06-19
o RELEASE 0.5.6
- Correction/extension of internal variable pre-selection procedure by cross-validing 
  both parameters alpha (mixing) and lambda (shrinkage) of the 'glmnet' package. 
  This allows to get true lasso-ridge shrinkage estimates.
- Improved robustness in internal functions list2mat and list2array.
- Minor improvement in internal function cv.folds.
- Added vignettes
- Built and tested under R 3.0.2 and release update to GitHub.
---------------------------------------------------------------------------------
Date   : 2015-07-28
o RELEASE 0.5.7
- Compliance with new R CMD check, which now checks code usage via 'codetools'.
  Functions and packages from default packages other than base which are used in the package
  code are now imported via the package namespace file (NAMESPACE).
  Added new field 'Imports' in the package description file (DESCRIPTION) 
  to match the functions and packages newly imported via NAMESPACE.
- Added Cumulative Hazard Summary statistic (derived from the Nelson-Aalen estimator) 
  as new peeling criterion option in the PRSP algorithm.
- Built and tested under R-devel (2015-07-20 r68705). 
- Initial release to CRAN and update to GitHub.
---------------------------------------------------------------------------------
Date   : 2015-08-28
o RELEASE 0.5.8
- Removed pre-selection of variables (covariates) by regular Cox-regression
  and made the remaining Elastic-Net pre-selection of variables optional by
  passing an additional argument in the main function sbh().
- Main function sbh() now returns the parameters used for configuring the parallelization.
- Replaced real dataset #2 of breast cancer data with lung cancer data for reason of size.
- Added S3-generic 'print' function and updated S3-generic 'summary' function.
- Created a new internal subroutine cv.presel() for (optional) variable pre-selection.
- Changed main argument of plot functions from `x` to `object`.
- Minor corrections in the manual.
- Built and tested under R-devel (2015-08-02 r68804) and release update to GitHub.
---------------------------------------------------------------------------------
Date   : 2015-09-07
o RELEASE 0.5.9
- Replaced plotting function plot_scatter(...) by S3-generic `plot` function.
- Corrected all plotting functions for the case of a NULL graphical device.
- Cross-validated estimates of box endpoint quantities of interest now contains 
  sample size for all peeling steps.
- Minor updates and corrections in the outputs of S3-generic functions.
- Minor updates and corrections in the documentation file and manual.
- Built and tested under R-devel (2015-08-02 r68804) and release update to GitHub.
---------------------------------------------------------------------------------
Date   : 2015-09-15
o RELEASE 0.6.0
- The matrix of original dataset is now returned by the main function sbh() 
  and not the submatrix of pre-selected covariates only.
- Corrected bugs in the output of main function sbh(): 
  . the returned vectors of `pre-selected` and `used` covariates are now in reference 
    to the original index of variables.
  . the value of traces and rules are now matched accordingly.
  . plot_boxtraj() and plot_boxtrace() are now corrected accordingly.  
- The value of `object$cvfit$cv.trace` of the `PRSP` object that is returned 
  by the main function sbh() now only contains the vector of the modal trace values 
  of covariate usage at each step.
- Updated S3-generic 'summary' and 'print' functions.
- Minor updates and corrections in the documentation file and manual.
- Built and tested under R-devel (2015-09-14 r69384) and release update to GitHub.
---------------------------------------------------------------------------------
Date   : 2015-10-11
o RELEASE 0.6.2
- Rename example datasets #4 and #5 into #1b and #4, respectively, 
  for consistency with companion article.
- Added argument `decimals` to main function sbh() to output results in
  user-specified significant decimals.
- Added examples for all S3-generic functions.
- Corrected output of decision rules in S3-generic `print` function in case `vs=TRUE`. 
- Renamed results 'varsign`, `selected` and `used` to 'CV.sign`, `CV.selected` and `CV.used` and
  moved them to `cvfit` field of return `PRSP` object.
- Minor improvement in output plot axes names of plot_boxtrace() function.  
- Updates of corresponding modifications in the documentation file and manual.
- Built and tested under R-devel (2015-09-14 r69384) and release update to GitHub.
---------------------------------------------------------------------------------
Date   : 2015-11-16
o RELEASE 0.6.3
- Changed random splitting in the cross-validation step to random stratified splitting 
  with/by conservation of events.
- Changed default values of metaparameters `alpha` to 0.05 (instead of 0.10) 
                                            `minn` to 5    (instead of 10).
- Modified computation of replicated cross-validated maximal peeling length in order to avoid 
  getting below the minimal box support threshold (i.e. the greater of `beta*n` or `minn`) 
  that could occur when combining results from the cross-validation loops and replicates. 
- Corrected behaviors in case `n` is less than `minn` and `n` is equal to `minn`.
- Corrected minor errors in list2array() and list2mat() internal functions.
- Corrected minor errors in plot() and predict() S3-generic functions.
- Updates in the manual file, including added explanation about the outputs of 
  averaged covariate traces, box membership indicators and box decision rules.  
- Updates in the CITATION file.  
- Built and tested under R-devel (2015-11-04 r69597) and release update to GitHub.
---------------------------------------------------------------------------------

> citation("PRIMsrc")

To cite PRIMsrc in publications use:

  Dazard J-E. and Rao J.S. (2010).  Local Sparse Bump Hunting.  J. Comp
  Graph. Statistics, 19(4):900-92.

  Diaz-Pachon D.A., Rao J.S. and Dazard J-E. (2013).  Optimization of
  PRIM under Normality.  In SCo Proceedings, Complex Data Modeling and
  Computationally Intensive Statistical Methods for Estimation and
  Prediction. Milan, Italy.

  Diaz-Pachon D.A., Rao J.S and Dazard J-E. (2015).  On the Explanatory
  Power of Principal Components.  (submitted).

  Diaz-Pachon D.A., Dazard J-E. and Rao J.S. (2015).  Unsupervised Bump
  Hunting Using Principal Components.  (submitted).

  Dazard J-E., Choe M., LeBlanc M. and Rao J.S. (2014).
  Cross-Validation of Survival Bump Hunting by Recursive Peeling
  Methods.  In JSM Proceedings, Survival Methods for Risk
  Estimation/Prediction Section. Boston, MA, USA.  American Statistical
  Association-IMS, p. 3366-3380.

  Dazard J-E., Choe M., LeBlanc M. and Rao J.S. (2015).
  Cross-validation and Peeling Strategies for Survival Bump Hunting
  using Recursive Peeling Methods.  Statistical Analysis and Data
  Mining, x(x):xxx-xxx.

  Dazard J-E., Choe M., LeBlanc M. and Rao J.S. (2015).  R package
  PRIMsrc: Bump Hunting by Patient Rule Induction Method for Survival,
  Regression and Classification.  In JSM Proceedings, Section for
  Statistical Programmers and Analysts Section. Seattle, WA, USA.
  American Statistical Association-IMS, p. xxxx-xxxx.

  Dazard J-E., Choe M., LeBlanc M. and Rao J.S. (2015).  PRIMsrc for
  Identification and Characterization of Informative Prognostic
  Subgroups by Survival Bump Hunting.  (submitted)

>     
> #===================================================
> # Demo with a synthetic dataset
> # Use help for descriptions
> #===================================================
> data("Synthetic.1", package="PRIMsrc")
> ?Synthetic.1
Synthetic.1              package:PRIMsrc               R Documentation

_S_y_n_t_h_e_t_i_c _D_a_t_a_s_e_t #_1: _p < _n _c_a_s_e

_D_e_s_c_r_i_p_t_i_o_n:

     Dataset from simulated regression survival model #1 as described
     in Dazard et al. (2015).  Here, the regression function uses all
     of the predictors, which are also part of the design matrix.
     Survival time was generated from an exponential model with rate
     parameter lambda (and mean frac{1}{lambda}) according to a Cox-PH
     model with hazard exp(eta), where eta(.) is the regression
     function.  Censoring indicator were generated from a uniform
     distribution on [0, 3].  In this synthetic example, all covariates
     are continuous, i.i.d. from a multivariate uniform distribution on
     [0, 1].

_U_s_a_g_e:

     Synthetic.1
     
_F_o_r_m_a_t:

     Each dataset consists of a 'numeric' 'matrix' containing n=250
     observations (samples) by rows and p=3 variables by columns, not
     including the censoring indicator and (censored) time-to-event
     variables.  It comes as a compressed Rda data file.

_A_u_t_h_o_r(_s):

        * "Jean-Eudes Dazard, Ph.D." <email: jxd101@case.edu>

        * "Michael Choe, M.D." <email: mjc206@case.edu>

        * "Michael LeBlanc, Ph.D." <email: mleblanc@fhcrc.org>

        * "Alberto Santana, MBA." <email: ahs4@case.edu>

     Maintainer: "Jean-Eudes Dazard, Ph.D." <email: jxd101@case.edu>

     Acknowledgments: This project was partially funded by the National
     Institutes of Health NIH - National Cancer Institute
     (R01-CA160593) to J-E. Dazard and J.S. Rao.

_S_o_u_r_c_e:

     See simulated survival model #1 in Dazard et al., 2015.

_R_e_f_e_r_e_n_c_e_s:

        * Dazard J-E., Choe M., LeBlanc M. and Rao J.S. (2015).
          "_Cross-validation and Peeling Strategies for Survival Bump
          Hunting using Recursive Peeling Methods._" Statistical
          Analysis and Data Mining (in press).

        * Dazard J-E., Choe M., LeBlanc M. and Rao J.S. (2014).
          "_Cross-Validation of Survival Bump Hunting by Recursive
          Peeling Methods._" In JSM Proceedings, Survival Methods for
          Risk Estimation/Prediction Section. Boston, MA, USA.
          American Statistical Association IMS - JSM, p. 3366-3380.

        * Dazard J-E., Choe M., LeBlanc M. and Rao J.S. (2015).  "_R
          package PRIMsrc: Bump Hunting by Patient Rule Induction
          Method for Survival, Regression and Classification._" In JSM
          Proceedings, Statistical Programmers and Analysts Section.
          Seattle, WA, USA.  American Statistical Association IMS -
          JSM, (in press).

        * Dazard J-E. and J.S. Rao (2010).  "_Local Sparse Bump
          Hunting._" J. Comp Graph. Statistics, 19(4):900-92.


> 
> #===================================================
> # Simulated dataset #1 (n=250, p=3)
> # Non Replicated Combined Cross-Validation (RCCV)
> # Peeling criterion = LRT
> # Optimization criterion = LRT
> # Without parallelization
> # Without computation of permutation p-values
> #===================================================
> CVCOMB.synt1 <- sbh(dataset = Synthetic.1, 
+                     cvtype = "combined", cvcriterion = "lrt",
+                     B = 1, K = 5, 
+                     vs = TRUE, cpv = FALSE, 
+                     decimals = 2, probval = 0.5, 
+                     arg = "beta=0.05,
+                            alpha=0.05,
+                            minn=5,
+                            L=NULL,
+                            peelcriterion="lr"",
+                     parallel = FALSE, conf = NULL, seed = 123)

Survival dataset provided.

Requested single 5-fold cross-validation without replications 
Cross-validation technique:  COMBINED 
Cross-validation criterion:  LRT 
Variable pre-selection: TRUE 
Computation of permutation p-values: FALSE 
Peeling criterion:  LRT 
Parallelization: FALSE 

Pre-selection of covariates and determination of directions of peeling... 
Pre-selected covariates:
X1 X2 X3 
 1  2  3 
Directions of peeling at each step of pre-selected covariates:
X1 X2 X3 
 1 -1 -1 
Fitting and cross-validating the Survival Bump Hunting model using the PRSP algorithm ... 
replicate : 1
seed : 123
Fold : 1
Fold : 2
Fold : 3
Fold : 4
Fold : 5
Success! 1 (replicated) cross-validation(s) has(ve) completed 
Generating cross-validated optimal peeling lengths from all replicates ...
Generating cross-validated box memberships at each step ...
Generating cross-validated box rules for the pre-selected covariates at each step ...
Generating cross-validated modal trace values of covariate usage at each step ...
Covariates used for peeling at each step, based on covariate trace modal values:
X1 X2 X3 
 1  2  3 
Generating cross-validated box statistics at each step ...
Finished!
> 
> ## Not run: 
> ##D     #===================================================
> ##D     # Examples of parallel backend parametrization 
> ##D     #===================================================
> ##D     # Example #1 - 1-Quad (4-core double threaded) PC 
> ##D     # Running WINDOWS
> ##D     # With SOCKET communication
> ##D     #===================================================
> ##D     if (.Platform$OS.type == "windows") {
> ##D         cpus <- detectCores()
> ##D         conf <- list("names" = rep("localhost", cpus),
> ##D                      "cpus" = cpus,
> ##D                      "type" = "SOCK",
> ##D                      "homo" = TRUE,
> ##D                      "verbose" = TRUE,
> ##D                      "outfile" = "")
> ##D     }
> ##D     #===================================================
> ##D     # Example #2 - 1 master node + 3 worker nodes cluster
> ##D     # All nodes equipped with identical setups and multicores
> ##D     # Running LINUX
> ##D     # With SOCKET communication
> ##D     #===================================================
> ##D     if (.Platform$OS.type == "unix") {
> ##D         masterhost <- Sys.getenv("HOSTNAME")
> ##D         slavehosts <- c("compute-0-0", "compute-0-1", "compute-0-2")
> ##D         nodes <- length(slavehosts) + 1
> ##D         cpus <- 8
> ##D         conf <- list("names" = c(rep(masterhost, cpus),
> ##D                                  rep(slavehosts, cpus)),
> ##D                      "cpus" = nodes * cpus,
> ##D                      "type" = "SOCK",
> ##D                      "homo" = TRUE,
> ##D                      "verbose" = TRUE,
> ##D                      "outfile" = "")
> ##D     }
> ##D     #===================================================
> ##D     # Example #3 - Multinode multicore per node cluster
> ##D     # Running LINUX 
> ##D     # with MPI communication
> ##D     # Here, a file named ".nodes" (e.g. in the home directory)
> ##D     # contains the list of nodes of the cluster
> ##D     #===================================================
> ##D     if (.Platform$OS.type == "unix") {
> ##D         hosts <- scan(file=paste(Sys.getenv("HOME"), "/.nodes", sep=""), 
> ##D                       what="", 
> ##D                       sep="\n")
> ##D         hostnames <- unique(hosts)
> ##D         nodes <- length(hostnames)
> ##D         cpus <-  length(hosts)/length(hostnames)
> ##D         conf <- list("cpus" = nodes * cpus,
> ##D                      "type" = "MPI",
> ##D                      "homo" = TRUE,
> ##D                      "verbose" = TRUE,
> ##D                      "outfile" = "")
> ##D     }
> ##D     #===================================================
> ##D     # Simulated dataset #1 (n=250, p=3)
> ##D     # Replicated Combined Cross-Validation (RCCV)
> ##D     # Peeling criterion = LRT
> ##D     # Optimization criterion = LRT
> ##D     # With parallelization
> ##D     # With computation of permutation p-values
> ##D     #===================================================
> ##D     CVCOMBREP.synt1 <- sbh(dataset = Synthetic.1, 
> ##D                            cvtype = "combined", cvcriterion = "lrt",
> ##D                            B = 10, K = 5, A = 1024, 
> ##D                            vs = TRUE, cpv = TRUE, 
> ##D                            decimals = 2, probval = 0.5, 
> ##D                            arg = "beta=0.05,
> ##D                                   alpha=0.05,
> ##D                                   minn=5,
> ##D                                   L=NULL,
> ##D                                   peelcriterion="lr"",
> ##D                            parallel = TRUE, conf = conf, seed = 123)
> ## End(Not run)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>