The package semiArtificial contains methods to generate and evaluate semi-artificial data sets. Different data generators take a data set as an input, learn its properties using machine learning algorithms and generates new data with the same properties.
Using given formula and data the method treeEnsemble builds a tree ensemble and turns it into a data generator, which can be used with newdata method to generate semi-artificial data. The methods supports classification, regression, and unsupervised data, depending on the input and parameters. The method indAttrGen generates data from the same distribution as the input data, but assuming conditionally independent attributes.
Using given formula and data the method builds a RBF network and extracts its properties thereby preparing a data generator which can be used with newdata.RBFgenerator method to generate semi-artificial data.
Depending on the type of problem (classification or regression), a classification performance (accuracy, AUC, brierScore, etc) or regression performance (RMSE, MSE, MAE, RMAE, etc) on two data sets is used to compare the similarity of two data sets.