Database ODBC channel identifier returned from a call to RODM_open_dbms_connection
data_table_name
Database table/view containing the training dataset.
case_id_column_name
Row unique case identifier in data_table_name.
target_column_name
Target column name in data_table_name.
model_name
ODM Model name.
auto_data_prep
Whether or not ODM should invoke automatic data preparation for the build.
cost_matrix
User-specified cost matrix for the target classes.
gini_impurity_metric
Tree impurity metric: "IMPURITY_GINI" (default) or "IMPURITY_ENTROPY"
max_depth
Specifies the maximum depth of the tree, from root to leaf inclusive.
The default is 7.
minrec_split
Specifies the minimum number of cases required in a node in order
for a further split to be possible. Default is 20.
minpct_split
Specifies the minimum number of cases required in a node in order for
a further split to be possible. Expressed as a percentage of all the rows
in the training data. The default is 1 (1 per cent).
minrec_node
Specifies the minimum number of cases required in a child node.
Default is 10.
minpct_node
Specifies the minimum number of cases required in a child node, expressed
as a percentage of the rows in the training data. The default is 0.05 (.05 per cent).
retrieve_outputs_to_R
Flag controlling if the output results are moved to the R environment.
leave_model_in_dbms
Flag controlling if the model is deleted or left in RDBMS.
sql.log.file
File where to append the log of all the SQL calls made by this function.
Details
The Decision Tree algorithm produces accurate and interpretable models with relatively little user
intervention and can be used for both binary and multiclass classification problems. The algorithm
is fast, both at build time and apply time. The build process for Decision Tree is parallelized.
Decision tree scoring is especially fast. The tree structure, created in the model build, is used
for a series of simple tests. Each test is based on a single predictor. It is a membership test:
either IN or NOT IN a list of values (categorical predictor); or LESS THAN or EQUAL TO some value
(numeric predictor). The algorithm supports two homogeneity metrics, gini and entropy, for
calculating the splits.
For more details on the algotithm implementation, parameters settings and
characteristics of the ODM function itself consult the following Oracle documents: ODM Concepts,
ODM Developer's Guide, Oracle SQL Packages: Data Mining, and Oracle Database SQL Language
Reference (Data Mining functions), listed in the references below.
Value
If retrieve_outputs_to_R is TRUE, returns a list with the following elements: