These functions compute and return the auto-distance/similarity matrix
between either rows or columns of a matrix/data frame, or a list,
as well as the cross-distance matrix between two matrices/data frames/lists.
For dist and simil, a numeric matrix object, a data frame, or a list. A vector
will be converted into a column matrix. For as.simil and
as.dist, an object of class dist and
simil, respectively, or a numeric matrix. For
pr_dist2simil and pr_simil2dist, any numeric vector.
y
NULL, or a similar object than x
method
a function, a registry entry, or a mnemonic string referencing the
proximity measure. A list of all available measures can be obtained
using pr_DB (see examples). The default for dist is
"Euclidean", and for simil"correlation".
diag
logical value indicating whether the diagonal of the
distance/similarity matrix should be printed by
print.dist/print.simil.
In the context of as.matrix the value to use on the diagonal
representing self-proximities.
upper
logical value indicating whether the upper triangle of the
distance/similarity matrix should be printed by
print.dist/print.simil
pairwise
logical value indicating whether distances should be
computed for the pairs of x and y only.
by_rows
logical indicating whether proximities between rows, or
columns should be computed.
convert_similarities, convert_distances
logical indicating
whether distances should be automatically converted into
similarities (and the other way round) if needed.
auto_convert_data_frames
logical indicating whether data frames
should be converted to matrices if all variables are numeric,
or all are logical, or all are complex.
FUN
optional function to be used by as.dist and
as.simil. If NULL, it is looked up in the method
registry. If there is none specified there, FUN defaults to
pr_simil2dist and pr_dist2simil, respectively.
...
further arguments passed to the proximity function.
Details
The interface is fashioned after dist, but can
also compute cross-distances, and allows user extensions by means of
registry of all proximity measures (see pr_DB).
Missing values are allowed but are excluded from all computations
involving the rows within which they occur. If some columns are
excluded in calculating a Euclidean, Manhattan, Canberra or
Minkowski distance, the sum is scaled up proportionally to the
number of columns used (compare dist in
package stats).
Data frames are silently coerced to matrix if all columns are of
(same) mode numeric or logical.
Distance measures can be used with simil, and similarity
measures with dist. In these cases, the result is transformed
accordingly using the specified coercion functions (default:
pr_simil2dist(d) = 1 - s and pr_dist2simil(s) = 1 / (1 + d)).
Objects of class simil and dist can be converted one in
another using as.dist and as.simil, respectively.
Distance and similarity objects can conveniently be subset
(see examples). Note that duplicate indexes are silently ignored.
Value
Auto distances/similarities are returned as an object of class dist/simil and
cross-distances/similarities as an object of class crossdist/crosssimil.