Training data, it should be in the same format as the testing data
and contains one additional column (see cause below) specifying known
cause of death. The first column is also assumed to be death ID.
gs
the name of the column in train that contains cause of death.
gstable
The list of causes of death used in training data.
thre
a numerical value between 0 to 1. It specifies the maximum rate of
missing for any symptoms to be considered in the model. Default value is set to
0.95, meaning if a symptom has more than 95% missing in the training data, it
will be removed.
type
Three types of learning conditional probabilities are provided: “quantile”
or “fixed”. Since InSilicoVA works with ranked conditional probabilities P(S|C), “quantile”
means the rankings of the P(S|C) are obtained by matching the same quantile distributions
in the default InterVA P(S|C), and “fixed” means P(S|C) are matched to the closest values
in the default InterVA P(S|C) table. Empirically both types of rankings produce similar results. The third option “empirical” means no rankings are calculated, only the raw P(S|C) values are returned.
isNumeric
Indicator if the input is already in numeric form. If the
input is coded numerically such that 1 for “present”, 0 for “absent”,
and -1 for “missing”, this indicator could be set to True to avoid
conversion to standard InterVA format.
Value
cond.prob
raw P(S|C) matrix
cond.prob.alpha
ranked P(S|C) matrix
table.alpha
list of ranks used
table.num
list of median numerical values for each rank
symps.train
training data after removing symptoms with too high missing rate.