Last data update: 2014.03.03

R: Convert Japanese characters from fullwidth (zenkaku) to...
zen2hanR Documentation

Convert Japanese characters from fullwidth (zenkaku) to halfwidth (hankaku) forms

Description

This function is to convert Japanese characters from fullwidth (zenkaku) to halfwidth (hankaku) forms for avoiding trouble in Japanese string operation.

Usage

zen2han(s)

Arguments

s

A character vector. UTF-8 encoding is preferable.

Details

Japanese graphic characters are traditionally classed into fullwidth (zenkaku) and halfwidth (hankaku) form. Alphabets, numbers, and symbols can take either from, while Hiragana, Katakana, and Kanji are only available as fullwidth characters. It causes troubles in string manipulation such as matching or searching where the two forms of alphabets, numbers, and symbols are mixed in. Thus, the character data should be sanitized with this function.

The targeted zenkaku characters are numbers, alphabets, punctuation marks, and other special symbols. Katakana is not the target of zen2han because the halfwidth Katakana is rather a troublemaker.

Value

A character vector. All alphabets, numbers, and symbols have their halfwidth from.

Author(s)

Susumu Tanimura aruminat@gmail.com

References

Halfwidth and Fullwidth Forms http://www.alanwood.net/unicode/halfwidth_and_fullwidth_forms.html

See Also

showNonASCII

Examples

zenkaku
zen2han(as.character(zenkaku))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Nippon)
Loading required package: maptools
Loading required package: sp
Checking rgeos availability: TRUE
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/Nippon/zen2han.Rd_%03d_medium.png", width=480, height=480)
> ### Name: zen2han
> ### Title: Convert Japanese characters from fullwidth (zenkaku) to
> ###   halfwidth (hankaku) forms
> ### Aliases: zen2han
> ### Keywords: character Japanese language
> 
> ### ** Examples
> 
> zenkaku
$number
[1] "<U+FF10><U+FF11><U+FF12><U+FF13><U+FF14><U+FF15><U+FF16><U+FF17><U+FF18><U+FF19>"

$lower
[1] "<U+FF41><U+FF42><U+FF43><U+FF44><U+FF45><U+FF46><U+FF47><U+FF48><U+FF49><U+FF4A><U+FF4B><U+FF4C><U+FF4D><U+FF4E><U+FF4F><U+FF50><U+FF51><U+FF52><U+FF53><U+FF54><U+FF55><U+FF56><U+FF57><U+FF58><U+FF59><U+FF5A>"

$upper
[1] "<U+FF21><U+FF22><U+FF23><U+FF24><U+FF25><U+FF26><U+FF27><U+FF28><U+FF29><U+FF2A><U+FF2B><U+FF2C><U+FF2D><U+FF2E><U+FF2F><U+FF30><U+FF31><U+FF32><U+FF33><U+FF34><U+FF35><U+FF36><U+FF37><U+FF38><U+FF39><U+FF3A>"

> zen2han(as.character(zenkaku))
[1] "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
Warning message:
In if (Encoding(s) != "UTF-8") s <- iconv(s, from = "", to = "UTF-8") :
  the condition has length > 1 and only the first element will be used
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>