These functions are intended to be used to describe how well a given set of new observations (e.g., new subjects) were represented in a dataset used to develop a predictive model. The dataRep function forms a data frame that contains all the unique combinations of variable values that existed in a given set of variable values. Cross–classifications of values are created using exact values of variables, so for continuous numeric variables it is often necessary to round them to the nearest v and to possibly curtail the values to some lower and upper limit before rounding. Here v denotes a numeric constant specifying the matching tolerance that will be used. dataRep also stores marginal distribution summaries for all the variables. For numeric variables, all 101 percentiles are stored, and for all variables, the frequency distributions are also stored (frequencies are computed after any rounding and curtailment of numeric variables). For the purposes of rounding and curtailing, the roundN function is provided. A print method will summarize the calculations made by dataRep, and if long=TRUE all unique combinations of values and their frequencies in the original dataset are printed.
● Data Source: CranContrib
● Keywords: category, cluster, datasets, manip, models
● Alias: [.roundN, dataRep, predict.dataRep, print.dataRep, print.predict.dataRep, roundN
