Last data update: 2014.03.03

R: Transform data into a set of linguistic fuzzy attributes
lcutR Documentation

Transform data into a set of linguistic fuzzy attributes

Description

This function creates a set of linguistic fuzzy attributes from crisp data. Numeric vectors, matrix or data frame columns are transformed into a set of fuzzy attributes, i.e. columns with membership degrees. Factors and other data types are transformed to fuzzy attributes by calling the fcut function.

Usage

lcut3(x, ...)
## S3 method for class 'matrix'
lcut3(x, ...)
## S3 method for class 'data.frame'
lcut3(x, 
     context=NULL,
     name=NULL,
     parallel=FALSE,
     ...)
## S3 method for class 'numeric'
lcut3(x, 
     context=NULL,
     defaultCenter=0.5,
     atomic=c("sm", "me", "bi"), 
     hedges=c("ex", "si", "ve", "ml", "ro", "qr", "vr"),
     name=NULL,
     parallel=FALSE,
     ...)

lcut5(x, ...)
## S3 method for class 'matrix'
lcut5(x, ...)
## S3 method for class 'data.frame'
lcut5(x, 
     context=NULL,
     name=NULL,
     parallel=FALSE,
     ...)
## S3 method for class 'numeric'
lcut5(x, 
     context=NULL,
     defaultCenter=0.5,
     atomic=c('sm', 'lm', 'me', 'um', 'bi'),
     hedges=c("ex", "ve", "ml", "ro", "ty"),
     name=NULL,
     parallel=FALSE,
     ...)

Arguments

x

Data to be transformed: if it is a numeric vector, matrix, or data frame, then the creation of linguistic fuzzy attributes takes place. For other data types the fcut function is called.

context

A definition of context of a numeric attribute. Context determines how people understand the notions "small", "medium", or "big" with respect to that attribute. If x is a numeric vector then context should be a vector of 3 numbers: typical small, medium, and big value. If the context is set to NULL, these values are taken directly from x as follows:

  • small= min(x);

  • medium= (max(x) - min(x)) * defaultCenter + min(x);

  • big= max(x).

If x is a matrix or data frame then context should be a named list of contexts for each x's column. If some context is omitted, it will be determined directly from data as explained above.

Regardless of the value of the atomic argument, all 3 numbers of the context must be provided everytime.

defaultCenter

A value used to determine a typical "medium" value from data (see context above). If context is not specified then typical "medium" is determined as

(max(x) - min(x)) * defaultCenter + min(x).

Default value of defaultCenter is 0.5, however, some literature specifies 0.42 as another sensible value with proper linguistic interpretation.

atomic

A vector of atomic linguistic expressions to be used for creation of fuzzy attributes. The possible values for lcut3 are:

  • smsmall;

  • memedium;

  • bibig.

For lcut5, the following values are possible:

  • smsmall;

  • lmlower medium;

  • memedium;

  • umupper medium;

  • bibig.

Several values are allowed in this argument.

hedges

A vector of linguistic hedges to be used for creation of fuzzy attributes.

For lcut3 variant, the following hedges are allowed:

  • exextremely (sm, bi);

  • sisignificantly (sm, bi);

  • vevery (sm, bi);

  • mlmore or less (sm, me, bi);

  • roroughly (sm, me, bi);

  • qrquite roughly (sm, me, bi);

  • vrvery roughly (sm, me, bi).

For lcut5 variant, the following hedges are allowed:

  • exextremely (sm, bi);

  • vevery (sm, bi);

  • mlmore or less (sm, me, bi);

  • roroughly (sm, me, bi);

  • tytypically (me).

By default, a fuzzy attribute is created for each atomic expression (i.e. "small", "medium", "big") with empty hedge. Additionally, another fuzzy attributes are created based on the set of hedges selected with this argument. Not all hedges are usable to any atomic expression. In the list above, one can find the allowed atomic expressions in parentheses.

name

A name to be added as a suffix to the created fuzzy attribute names. This parameter can be used only if x is a numeric vector. If x is a matrix or data frame, name should be NULL because the fuzzy attribute names are taken from column names of parameter x.

parallel

Whether the processing should be run in parallel or not. Parallelization is implemented using the foreach package. The parallel environment must be set properly in advance, e.g. with the registerDoMC function.

...

Other parameters to some methods.

Details

The aim of this function is to transform numeric data into a set of fuzzy attributes. The resulting fuzzy attributes have direct linguistic interpretation. This is a unique variant of fuzzification that is suitable for the inference mechanism based on Perception-based Linguistic Description (PbLD) – see pbld.

A numeric vector is transformed into a set of fuzzy attributes accordingly to the following scheme:

<hedge> <atomic expression>

where <atomic expression> is a linguistic expression "small" ("sm"), "lower medium" ("lm"), "medium" ("me"), "upper medium" ("um") or "big" ("bi") – see the atomic argument. A <hedge> is a modifier that further concretizes the atomic expression. It can be empty ("") or some value of:

  • tytypically;

  • exextremely;

  • sisignificantly;

  • vevery;

  • mlmore or less;

  • roroughly;

  • qrquite roughly;

  • vrvery roughly.

Accordingly to the theory developed by Novak (2008), not every hedge is suitable with each atomic expression (see the description of the hedges argument). The hedges to be used can be selected with the hedges argument. Function takes care of not to use hedge together with an un-applicable atomic expression by itself.

Obviously, distinct data have different meaning of what is "small", "medium", or "big". Therefore, a context has to be set that specifies sensible values for these linguistic expressions.

If a matrix (resp. data frame) is provided to this function instead of single vector, all columns are processed the same way.

The function also sets up properly the vars and specs properties of the result.

Value

An object of class "fsets" is returned, which is a numeric matrix with columns representing the fuzzy attributes. Each source columm of the x argument corresponds to multiple columns in the resulting matrix. Columns will have names derived from used hedges, atomic expression, and name specified as the optional parameter.

The resulting object would also have set the vars and specs properties with the former being created from original column names (if x is a matrix or data frame) or the name argument (if x is a numeric vector). The specs incidency matrix would be created to reflect the following order of the hedges: "ex" < "si" < "ve" < "" < "ml" < "ro" < "qr" < "vr" and "ty" < "". Fuzzy attributes created from the same source numeric vector (or column) would be ordered that way, with other fuzzy attributes (from the other source) being incomparable.

Author(s)

Michal Burda

References

V. Novak, A comprehensive theory of trichotomous evaluative linguistic expressions, Fuzzy Sets and Systems 159 (22) (2008) 2939–2969.

See Also

fcut, farules, pbld vars, specs, cbind.fsets

Examples

# transform a single vector
x <- runif(10)
lcut3(x, name='age')
lcut5(x, name='age')


# transform single vector with custom context
lcut3(x, context=c(0, 0.2, 0.5), name='age')
lcut5(x, context=c(0, 0.2, 0.5), name='age')


# transform all columns of a data frame
# and do not use any hedges
data <- CO2[, c('conc', 'uptake')]
lcut3(data, hedges=NULL)
lcut5(data, hedges=NULL)


# definition of custom contexts for different columns 
# of a data frame while selecting only "ve" and "ro" hedges.
lcut3(data,
     context=list(conc=c(0, 500, 1000),
                  uptake=c(0, 25, 50)),
     hedges=c('ve', 'ro'))


# lcut on non-numeric data is the same as fcut()
ff <- factor(substring("statistics", 1:10, 1:10), levels = letters)
lcut3(ff)
lcut5(ff)

Results