R: Wrapper function that reads the input files and parameter...
R2GUESS
R Documentation
Wrapper function that reads the input files and parameter values required by
GUESS, runs the C++ code from R and stores the main GUESS output in
an ESS object
Description
The R2GUESS function reads and compiles data,
input files and parameters that are required to run GUESS
source code. It automatically runs GUESS (enabling or not the
GPU capacity), saves the results and summary files in text
files. For portability, R2GUESS generates an
ESS object which compiles information about the input and
parameters used to run GUESS, and outputs as detailed in
as.ESS.object.
either a one element character vector (such as
'dataY.txt') or a data frame. If dataY is entered as
a character vector, it specifies, assuming that data are in the
path.input folder, the location of the response
matrix. In the corresponding file observations are presented in
rows, and the (possibly multivariate) outcome(s) in columns. The
first two rows (single integers) represent the number of rows
(n) and columns (q) in the matrix. If a data frame
argument is passed, it links to a nxq numerical matrix
compiling the observed responses.
dataX
either a one element character vector (such as
'dataX.txt') or a data frame. If dataX is entered as
a character vector, it specifies, assuming that data are in the
path.input folder, the location of the predictor
matrix. In the corresponding file observations are presented in
rows, and the predictors in columns. The first two rows (single
integers) represent the number of rows (n) and columns
(p) in the matrix. If a data frame argument is passed, it
links to a nxq numerical matrix compiling the observed
predictors.
path.input
path linking to the directory containing the data
(dataX and dataY). If
dataX or/and dataY have
been entered as data frame(s), the function will generate the
corresponding text files required to run GUESS in the
path.input folder.
path.output
path indicating the directory in which output
files will be saved.
path.par
path indicating the directory in which to find the
parameter file needed to run GUESS.
path.init
path indicating the location of the file describing the
initial guess of the MCMC procedure (i.e. the variables to include in the initial
model).
file.par
name of the parameter file containing all
user-specified parameters required to set up the run and the
features of the moves. This file is located in path.par
and contains fields that are extensively described in
http://www.bgx.org.uk/software/GUESS_Doc_short.pdf. These
parameters are not mandatory and, if not specified, they will be
set to their default values, also given in documentation. An
example of this file is provided in the package.
file.init
name of the file specifying which
variables to include at the first iteration of the MCMC
run. The first row of the file is a single scalar
representing the number of rows (# variables to include).
Subsequent rows indicate the position of the covariates
to include. This file is optional and if not specified
(default=NULL), initial guesses of the MCMC algorithm
will be derived from a step-wise regression approach.
file.log
name of the log file. This file compiles in real time
summary information describing the initial parameters, the
computational time and state of the run. This file will also
contain information about moves sampled at each sweep. By default
(=NULL), the name is given by the argument
root.file.output extended by '_log' and for
computational efficiency (especially when GPU is enabled), a
minimal amount of information is returned.
nsweep
integer specifying the number of sweeps for
the MCMC run (including the burn-in).
burn.in
integer specifying the number of sweeps to
be discarded to account the burn-in.
Egam
numeric representing the 'a priori' average
model size.
Sgam
numeric representing the 'a priori' standard
deviation of the model size.
root.file.output
name specifying the file stem for writing the
output files in the directory specified by
path.output.
time
Boolean value. When time=TRUE (default value)
a file recording the time each sweep took will be
created and saved in path.output directory.
top
number of top models to be reported in the output. The
default value is 100.
history
Boolean value. When history=TRUE (default
value), a number of additional output files that record the
history of each move is provided. See section 5 of
http://www.bgx.org.uk/software/GUESS_Doc_short.pdf for more
details.
label.X
a character vector specifying the name of the
predictors. If not specified (=NULL), variables are labelled by
their position in the matrix. Predictors name and information is
given in the MAP.file in the case of SNP data (field
SNPName).
label.Y
a character vector specifying the name of
the outcomes. If not specified (=NULL), the outcomes are
labelled Y1,..Yq, where q is the number of columns in the
outcome matrix or will be named after the argument
dataY (if specified by a data frame).
choice.Y
a character vector or a numeric vector specifying
which phenotypes in the response matrix dataY to analyse
in a joint model. By default, all phenotypes in the response
matrix will be considered.
nb.chain
an integer specifying the number of
chains to consider in the evolutionary procedure.
conf
either a one element character vector (such as
'conf.txt') or a data frame. If conf is entered as a
character vector, it specifies, assuming that data are in the
path.input folder, the location of the confounder
matrix. In the corresponding file observations are presented in
rows, and the values for the confounders in columns. The first two
rows (single integers) represent the number of rows (n) and
columns (k) in the matrix. If a data frame argument is
passed, it links to a nxk numerical matrix compiling the
observed confounders. If specified, the function will substitute
the response matrix by the residuals from the linear model
regressing the confounders against the outcomes.
cuda
a boolean value. cuda=TRUE redirects linear algebra
operations towards the GPU. On non-CULA compatible platforms, this
option will be ignored.
MAP.file
either a one element character vector or a data
frame. If a character vector is used, it specifies, assuming that data are in the
path.input folder, the location of the annotation
file. In the corresponding file, predictors are presented in
rows, and are described as a MAP.file. If a data frame
argument is passed, it links to a px3 matrix.
time.limit
a numerical value specifying the maximum computing
time (in hours) for the run. If the run exceeds that value,
modelling options, parameters value, state of the pseudo random
number generator, and state of each chain will be saved to enable
to resume the run exactly at the same point it was interrupted
(using resume option). By default (=NULL) the run
will go on until its completion.
seed
a integer specifying the random seed used to initialize the
pseudo-random number generator. If not specified, the seed will
be initialised using the CPU clock.
Details
For any of the dataX, dataY parameters, if a data
frame argument is passed, a text file labelled
data-*-C-CODE.txt will be created in the path.input
directory. If conf is specified, and additional files
representing the adjusted responses will be created according to the
file labelling system.This file will be formatted to have the
suitable structure to be read by the C++ code: individuals presented
in rows, and observations in columns, with the first two rows
indicating the number of rows and columns in the matrix. The
returned ESS object will include all result files produced by
the code. The number and type of outputs produced depend on the
running options chosen. A full description of the available
output can be found in
http://www.bgx.org.uk/software/GUESS_Doc_short.pdf
Value
An object of class ESS containing information listed in
as.ESS.object. The object can subsequently be used to post-process the results using
provided R functions (such as summary.ESS,
plotMPPI, plot.ESS).
## Not run:
path.input <- system.file("Input", package="R2GUESS")
path.output <- tempdir()
path.par <- system.file("extdata", package="R2GUESS")
file.par.Hopx <- "Par_file_example_Hopx.xml"
#you can have a look of the parameter file in
print(paste(path.par,file.par.Hopx,sep=""))
##To reach convergence you may need to increase nsweep=110000 and the burn.in=10000
## RUNNING is APPROX 5 minutes
root.file.output.Hopx <- "Example-GUESS-Y-Hopx"
label.Y <- c("ADR","Fat","Heart","Kidney")
data(data.Y.Hopx)
data(data.X)
data(MAP.file)
modelY_Hopx<-R2GUESS(dataY=data.Y.Hopx,dataX=data.X,choice.Y=1:4,
label.Y=label.Y,,MAP.file=MAP.file,file.par=file.par.Hopx,file.init=NULL,
file.log=NULL,root.file.output=root.file.output.Hopx,path.input=path.input,
path.output=path.output,path.par=path.par,path.init=NULL,nsweep=11000,
burn.in=1000,Egam=5,Sgam=5,top=100,history=TRUE,time=TRUE,
nb.chain=3,conf=NULL,cuda=FALSE)
summary(modelY_Hopx,20) # 20 best models
print(modelY_Hopx)
## End(Not run)