R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Adapt the (Declared) Encoding of a Character Vector

encoding

R Documentation

Adapt the (Declared) Encoding of a Character Vector

Description

Functions for testing and adapting the (declared) encoding of the components of a vector of mode character.

Usage

is.utf8(x)
is.ascii(x)
is.locale(x)

translate(x, recursive = FALSE, internal = FALSE)
fixEncoding(x, latin1 = FALSE)

Arguments

`x`	a vector (of character).
`recursive`	option to process list components.
`internal`	option to use internal translation.
`latin1`	option to assume `"latin1"` if the declared encoding is `"unknown"`.

Details

is.utf8 tests if the components of a vector of character are true UTF-8 strings, i.e. contain one or more valid UTF-8 multi-byte sequence(s).

is.locale tests if the components of a vector of character are in the encoding of the current locale.

translate encodes the components of a vector of character in the encoding of the current locale. This includes the names attribute of vectors of arbitrary mode. If recursive = TRUE the components of a list are processed. If internal = TRUE multi-byte sequences that are invalid in the encoding of the current locale are changed to literal hex numbers (see FIXME).

fixEncoding sets the declared encoding of the components of a vector of character to their correct or preferred values. If latin1 = TRUE strings that are not valid UTF-8 strings are declared to be in "latin1". On the other hand, strings that are true UTF-8 strings are declared to be in "UTF-8" encoding.

Value

The same type of object as x with the (declared) encoding possibly changed.

Note

Currently translate uses iconv and therefore is not guaranteed to work on all platforms.

Author(s)

Christian Buchta

References

FIXME PCRE, RFC 3629

Examples

## Note that we assume R runs in an UTF-8 locale
text <- c("aa", "axe4")
Encoding(text) <- c("unknown", "latin1")
is.utf8(text)
is.ascii(text)
is.locale(text)
## implicit translation
text
##
t1 <- iconv(text, from = "latin1", to = "UTF-8")
Encoding(t1)
## oops
t2 <- iconv(text, from = "latin1", to = "utf-8")
Encoding(t2)
t2
is.locale(t2)
##
t2 <- fixEncoding(t2)
Encoding(t2)
## explicit translation
t3 <- translate(text)
Encoding(t3)