Last data update: 2014.03.03

R: Bootstrap inference for prespecified models
bootstrap_inferenceR Documentation

Bootstrap inference for prespecified models


Runs B bootstrap samples using a prespecified model then computes the two I estimates based on cross validation. p values of the two I estimates are computed for a given H_0: mu_I_0 = mu_0 and confidence intervals are provided.


bootstrap_inference(X, y, 
		predict_string = "predict(mod, obs_left_out)",
		cleanup_mod_function = NA,
		y_higher_is_better = TRUE,
		verbose = TRUE,
		full_verbose = FALSE,
		H_0_mu_equals = 0,
		pct_leave_out = 0.10,
		B = 3000,
		alpha = 0.05,
		plot = TRUE,
        num_cores = 1,



A n x p dataframe of covariates.


An n-length numeric vector which is the response


A string of R code that will be evaluated to construct the leave one out model. Make sure the covariate data is referred to as Xyleft.


A string of R code that will be evaluated on left out data after the model is built with the training data. Make sure the forecast data (the left one out data) is referred to as obs_left_out and the model is referred to as mod.


A function that is called at the end of a cross validation iteration to cleanup the model in some way.


True if a response value being higher is clinically "better" than one that is lower (e.g. cognitive ability in a drug trial for the mentally ill). False if the response value being lower is clinically "better" than one that is higher (e.g. amount of weight lost in a weight-loss trial). Default is TRUE.


Prints out a dot for each bootstrap sample. This only works on some platforms.


Prints out full information for each cross validation model for each bootstrap sample. This only works on some platforms.


The mu_I_0 value in H_0. Default is 0 which answers the question: does my allocation procedure do better than a naive allocation procedure.


In the cross-validation, the proportion of the original dataset left out to estimate out-of-sample metrics. The default is 0.1 which corresponds to 10-fold cross validation.


The number of bootstrap samples to take. We recommend making this as high as you can tolerate given speed considerations. The default is 3000.


Defines the confidence interval size (1 - alpha). Defaults to 0.05.


Illustrates the estimate, the bootstrap samples and the confidence intervals on a histogram plot. Default to TRUE.


The number of cores to use in parallel to run the bootstrap samples more rapidly. Defaults to serial by using 1 core.


Additional parameters to be sent to the model constructor. Note that if you wish to pass these parameters, "..." must be specified in model_string.


Returns a list object containing results of the procedure.


Adam Kapelner and Justin Bleich


Kapelner, A, Bleich, J, Cohen, ZD, DeRubeis, RJ and Berk, R (2014) Inference for Treatment Regime Models in Personalized Medicine, arXiv


> library(PTE)
> 	beta0 = 1
> 	beta1 = -1
> 	gamma0 = 0
> 	gamma1 = sqrt(2 * pi)
> 	mu_x = 0
> 	sigsq_x = 1
> 	sigsq_e = 1
> 	num_boot = 20 #for speed only
> 	n = 50 #for speed only
> 	x = sort(rnorm(n, mu_x, sigsq_x))
> 	noise = rnorm(n, 0, sigsq_e)
> 	treatment = sample(c(rep(1, n / 2), rep(0, n / 2)))
> 	y = beta0 + beta1 * x + treatment * (gamma0 + gamma1 * x) + noise
> 	X = data.frame(treatment, x)
> 	res = bootstrap_inference(X, y,
+ 			"lm(y ~ . + treatment * ., data = Xyleft)",
+ 			num_cores = 1,
+ 			B = num_boot)

null device 