R: Convert Japanese characters from fullwidth (zenkaku) to...
zen2han
R Documentation
Convert Japanese characters from fullwidth (zenkaku) to halfwidth
(hankaku) forms
Description
This function is to convert Japanese characters from fullwidth (zenkaku) to halfwidth
(hankaku) forms for avoiding trouble in Japanese string operation.
Usage
zen2han(s)
Arguments
s
A character vector. UTF-8 encoding is preferable.
Details
Japanese graphic characters are traditionally classed into fullwidth
(zenkaku) and halfwidth (hankaku) form. Alphabets, numbers, and symbols can
take either from, while Hiragana, Katakana, and Kanji are only available
as fullwidth characters. It causes troubles in string manipulation such as
matching or searching where the two forms of alphabets, numbers, and
symbols are mixed in. Thus, the character data should be sanitized with this
function.
The targeted zenkaku characters are numbers, alphabets, punctuation
marks, and other special symbols. Katakana is not the target of
zen2han because the halfwidth Katakana is rather a troublemaker.
Value
A character vector. All alphabets, numbers, and symbols have their halfwidth from.
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(Nippon)
Loading required package: maptools
Loading required package: sp
Checking rgeos availability: TRUE
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/Nippon/zen2han.Rd_%03d_medium.png", width=480, height=480)
> ### Name: zen2han
> ### Title: Convert Japanese characters from fullwidth (zenkaku) to
> ### halfwidth (hankaku) forms
> ### Aliases: zen2han
> ### Keywords: character Japanese language
>
> ### ** Examples
>
> zenkaku
$number
[1] "<U+FF10><U+FF11><U+FF12><U+FF13><U+FF14><U+FF15><U+FF16><U+FF17><U+FF18><U+FF19>"
$lower
[1] "<U+FF41><U+FF42><U+FF43><U+FF44><U+FF45><U+FF46><U+FF47><U+FF48><U+FF49><U+FF4A><U+FF4B><U+FF4C><U+FF4D><U+FF4E><U+FF4F><U+FF50><U+FF51><U+FF52><U+FF53><U+FF54><U+FF55><U+FF56><U+FF57><U+FF58><U+FF59><U+FF5A>"
$upper
[1] "<U+FF21><U+FF22><U+FF23><U+FF24><U+FF25><U+FF26><U+FF27><U+FF28><U+FF29><U+FF2A><U+FF2B><U+FF2C><U+FF2D><U+FF2E><U+FF2F><U+FF30><U+FF31><U+FF32><U+FF33><U+FF34><U+FF35><U+FF36><U+FF37><U+FF38><U+FF39><U+FF3A>"
> zen2han(as.character(zenkaku))
[1] "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
Warning message:
In if (Encoding(s) != "UTF-8") s <- iconv(s, from = "", to = "UTF-8") :
the condition has length > 1 and only the first element will be used
>
>
>
>
>
> dev.off()
null device
1
>