character string representing the name of an
entry (case-insensitive).
pattern
regular expression to be matched to all fields of class
"character" in all entries.
default
optional default value for the field.
type
optional character string specifying the class to be
required for this field. If type is a character vector with more
than two elements, the entries will be used as fixed set of
alternatives. If type is not a character string or vector, the
class will be inferred from the argument given.
is_mandatory
logical specifying whether new entries are required
to have a value for this field.
is_modifiable
logical specifying whether entries can be changed
with respect to that field.
validity_FUN
optional function or character string with the name of a
function that checks the validity of a field entry. Such a function
gets the value to be investigated as argument, and should stop with an
error message if the value is not correct.
object
a registry object.
verbosity
controlling the verbosity of the output of the
summary method for the registry. "short" gives just a list, "long"
also gives the formulas.
...
for pr_DB$set_entry and pr_DB$modify_entry:
named list of fields to be modified in or added to the registry (see details).
This must include the index field ("names").
Details
pr_DB represents the registry of all proximity measures
available. For each
measure, it comprises meta-information that can be queried and
extended. Also, new measures can be added. This is done using
the following accessor functions of the pr_DB object:
get_field_names() returns a character
vector with all field names. get_field() returns the information
for a specific field as a list with components named as described
above. get_fields() returns a list with all field
entries. set_field() is used to create new fields in the
repository (the default value will be set in all
entries).
get_entry_names() returns a character vector with (the first
alias of) all entries. entry_exists() is a predicate checking
if an entry with the specified alias exists in the
registry. get_entry() returns the specified entry if it exists (and, by
default, gives an error if it does not). get_entries() is used to
query more than one entry: either those matching name exactly, or
those where the regular expression in pattern matches any
character field in an entry. By default, all values are
returned. delete_entry removes an existing entry from the
registry (note that only user-provided entries can be deleted).
set_entry and modify_entry require a named list
of arguments used as field entries.
At least the names index field is required. set_entry
will check for all other mandatory fields. If specified in the field
meta data, each field entry and the entry as a whole is checked for
validity. Note that only user-specified fields and/or entries can be
modified, the data shipped with the package are read-only.
The registry fields currently available are as follows:
FUN
Function to register (see below).
names
Character vector with an alias(es) for the measure.
PREFUN
Optional function (or function name) for preprocessing
code (see below).
POSTFUN
Optional function (or function name) for postprocessing
code (see below).
distance
logical indicating whether this measure is a distance (TRUE)
or similarity (FALSE).
convert
Optional Function or function name for converting
between similarities and distances when needed.
type
Optional, the scale the measure applies to
("metric", "ordinal", "nominal",
"binary", or "other"). If
NULL, it is assumed to apply to some other unknown scale.
loop
logical indicating whether FUN is just a measure,
and therefore, if dist shall do the loop over all pairs of
observations/variables, or if FUN does the loop on its own.
C_FUN
logical indicating whether FUN is a C function.
abcd
logical; if TRUE and binary data (or data to be
interpreted as such) are supplied, the number of concordant and
discordant pairs is precomputed for every two binary data vectors and
supplied to the measure function.
formula
Optional character string with the symbolic representation of
the formula.
reference
Optional reference (character).
description
Optional description (character). Ideally,
describes the context in which the measure can be applied.
A function specified as FUN parameter has mandatory arguments
x and y (if abcd is FALSE), and a,
b, c, d, n otherwise. Additionally, it gets
all optional parameters specified by the user in the ...
argument of the dist and simil functions, possibly
changed and/or complemented by the corresponding (optional)
PREFUN function. It must return the
(diss-)similarity value computed from the arguments.
x and y are two vectors from the
data matrix (matrices) supplied. If abcd is FALSE, it is
assumed that binary measures will be used, and the number of all
n concordant and discordant pairs (x_k, y_k)
precomputed and supplied instead of x and
y. a, b, c, and d are the counts of
all (TRUE, TRUE), (TRUE, FALSE), (FALSE, TRUE), and (FALSE, FALSE)
pairs, respectively.
A function specified as PREFUN parameter has mandatory arguments
x, y, p, and reg_entry, with y and
p possibly being NULL depending on the task at
hand. x and y are the data objects, p is a
(possibly empty) list with all specified proximity parameters, and
reg_entry is the registry entry (a named list containing all
information specified in reg_add).
The preprocessing function is allowed to change all these
information, and if so, is required to return *all* arguments
as a named list in the same order.
A function specified as POSTFUN parameter has two mandatory
arguments: result and p. result will contain the
computed raw data, i.e. a vector of length n * (n - 1) / 2 for
auto-distances (see dist for details on
dist objects), or a matrix for cross-distances. p contains
the specified proximity parameters. Post-processing functions need to
return the result object (even if unmodified).
A function specified as convert parameter should preserve the
type of its argument.
## create a new distance measure
mydist <- function(x,y) x * y
## create a new entry in the registry with two aliases
pr_DB$set_entry(FUN = mydist, names = c("test", "mydist"))
## look it up (index is case insensitive):
pr_DB$get_entry("TEST")
## modify the content of the description field in the new entry
pr_DB$modify_entry(names = "test", description = "foo function")
## create a new field
pr_DB$set_field("New")
## look up the test entry again (two ways)
pr_DB$get_entry("test")
pr_DB[["test"]]
## show total number of entries
length(pr_DB)
## show all entries (short list)
pr_DB$get_entries(pattern = "foo")
## show more details
summary(pr_DB, "long")
## get all entries in a list (and extract first two ones)
pr_DB$get_entries()[1:2]
## get all entries as a data frame (select first 3 fields)
as.data.frame(pr_DB)[,1:3]
## delete test entry
pr_DB$delete_entry("test")
## check if it is really gone
pr_DB$entry_exists("test")