The kakasi is an interface to the external program kakasi,
KAnji KAna Simple Inverter. It is useful especially when Japanese Kanji
characters are subject to convert to Romaji (ASCII) characters.
A chracter string specifying the options passed
to kakasi library/program
ITAIJIDICTPATH
A character string specifying the path to
itaijidict. Environmental variable of itaijidict passed to
kakasi library.
KANWADICTPATH
A character string specifying the path to
kanwadict. Environmental variable of kanwadict passed to
kakasi library.
Details
Japanese strings are often made up a mixture of Chinese characters
(Kanji), Kana (Hiragana and Katakana) and Romaji (Latin phonetical
pronunciation). The external program kakasi converts between these four
different ways of writing Japanese. kakasi and Sys.kakasi
are useful especially for sanitizing a character vector by converting
Japanese (non-ASCII) to ASCII characters.
kakasi uses two basic dictionaries: itaijidict and
kanwadict. These dictionaries are included in doc/share of Package
directory after installation of Nippon package. Since the kakasi library
looks up the environmental variables to find dictionary, ITAIJIDICTPATH
and KANWADICTPATH are internally set using Sys.setenv at the time
when kakasi is called first time. After the first call,
kakasi continues to use the environmental variables. Until R
session closes, these environmental variables never unset. To use
alternative dictionary instead of the bundled, a user can set the
environmental variables using Sys.setenv or as arguments of
kakasi. For permanent setting of environmental variables, see
help of Renviron.
Value
A character vector
Warning
Note that non-Japanese and non-ASCII characters are not filtered in
kakasi.kakasi warns unless LC_CTYPE is "ja_JP.UTF-8"
(Linux or MacOSX) or "Japanese_Japan.932" (Windows). It is not sure
whether the function is workable in other locale.
Note
Sys.kakasi was removed in Nippon ver.0.6.
kakasi warns unless LC_CTYPE is "ja_JP.UTF-8" (Linux or MacOSX)
or "Japanese_Japan.932" (Windows).
The accuracy of Kanji-Kana conversion with kakasi is a bit lower
than with MeCab program (http://mecab.sourceforge.net/). Although MeCab
does not have a function of Kana-Romaji conversion, MeCab could be an option
if you wish more accurate results. RMeCab is available from
http://rmecab.jp/wiki/.
For Windows users, please be known that R on Windows can use strings
encoded by both "ja_JP.UTF-8" and "Japanese_Japan.932"; however,
kakasi works only with "Japanese_Japan.932". If you have data
encoded with UTF-8 on Windows, you should convert it to
"Japanese_Japan.932 (CP932)" as shown in example.
## Not run:
library(Nippon)
data(prefectures)
regions <- unique(prefectures$region)
regions
# Unix-like operating systems
kakasi(regions)
# Windows
regions.cp932 <- iconv(regions, from = "UTF-8", to = "CP932")
kakasi(regions.cp932)
## End(Not run)