option to assume "latin1" if the declared
encoding is "unknown".
Details
is.utf8 tests if the components of a vector of character
are true UTF-8 strings, i.e. contain one or more valid UTF-8
multi-byte sequence(s).
is.locale tests if the components of a vector of character
are in the encoding of the current locale.
translate encodes the components of a vector of character
in the encoding of the current locale. This includes the names
attribute of vectors of arbitrary mode. If recursive = TRUE
the components of a list are processed. If internal = TRUE
multi-byte sequences that are invalid in the encoding of the current
locale are changed to literal hex numbers (see FIXME).
fixEncoding sets the declared encoding of the components of
a vector of character to their correct or preferred values. If
latin1 = TRUE strings that are not valid UTF-8 strings are
declared to be in "latin1". On the other hand, strings that
are true UTF-8 strings are declared to be in "UTF-8" encoding.
Value
The same type of object as x with the (declared) encoding
possibly changed.
Note
Currently translate uses iconv and therefore is not
guaranteed to work on all platforms.
Author(s)
Christian Buchta
References
FIXME PCRE, RFC 3629
See Also
Encoding and iconv.
Examples
## Note that we assume R runs in an UTF-8 locale
text <- c("aa", "axe4")
Encoding(text) <- c("unknown", "latin1")
is.utf8(text)
is.ascii(text)
is.locale(text)
## implicit translation
text
##
t1 <- iconv(text, from = "latin1", to = "UTF-8")
Encoding(t1)
## oops
t2 <- iconv(text, from = "latin1", to = "utf-8")
Encoding(t2)
t2
is.locale(t2)
##
t2 <- fixEncoding(t2)
Encoding(t2)
## explicit translation
t3 <- translate(text)
Encoding(t3)