Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A Rust library to convert Japanese Half-width-kana[半角カナ] and Wide-alphanumeric[全角英数] into normal ones

License

NotificationsYou must be signed in to change notification settings

gemmarx/unicode-jp-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Statuscrates.ioMIT licensed

Converters of troublesome characters included in Japanese texts.

  • Half-width-kana[半角カナ;HANKAKU KANA] -> normal Katakana
  • Wide-alphanumeric[全角英数;ZENKAKU EISU] <-> normal ASCII

If you need canonicalization of texts including Japanese, consider to useunicode_normalization crate at first.NFD, NFKD, NFC and NFKC can be used.This crate, however, works with you if you are in a niche such as a need of delicate control of Japanese characters for a restrictive character terminal.

Japanese have two syllabary systems Hiragana and Katakana, and Half-width-kana is another notation system of them.In the systems, there are two combinable diacritical marks Voiced-sound-mark and Semi-voiced-sound-mark.Unicode has three independent code points for each of the marks.In addition to it, we often use special style Latin alphabets and Arabic numbers called Wide-alphanumeric in Japanese texts.This small utility converts these codes each other.

API Reference

Example

Cargo.toml

[dependencies]unicode-jp ="0.4.0"

src/main.rs

externcrate kana;use kana::*;fnmain(){let s1 ="マツオ バショウ ア゚";assert_eq!("マツオ バショウ ア ゚", half2kana(s1));assert_eq!("マツオ バショウ ア゚", half2full(s1));let s2 ="ひ゜ひ゛んは゛";assert_eq!("ぴびんば", combine(s2));assert_eq!("ひ ゚ひ ゙んは ゙", vsmark2combi(s2));let s3 ="#&Rust-1.6!";assert_eq!("#&Rust-1.6!", wide2ascii(s3));}

Functions of kana crate:

  • wide2ascii(&str) -> String
    convert Wide-alphanumeric into normal ASCII [A -> A]

  • ascii2wide(&str) -> String
    convert normal ASCII characters into Wide-alphanumeric [A -> A]

  • half2full(&str) -> String
    convert Half-width-kana into normal Katakana with diacritical marks separated [ア゙パ -> ア゙パ]
    This method is simple, but tends to cause troubles when rendering.In such a case, use half2kana() or execute vsmark2{full|half|combi} as post process.

  • half2kana(&str) -> String
    convert Half-width-kana into normal Katakana with diacritical marks combined [ア゙パ -> ア゙パ]

  • combine(&str) -> String
    combine base characters and diacritical marks on Hiragana/Katakana [がハ゜ -> がパ]

  • hira2kata(&str) -> String
    convert Hiragana into Katakana [あ -> ア]

  • kata2hira(&str) -> String
    convert Katakana into Hiragana [ア -> あ]

  • vsmark2full(&str) -> String
    convert all separated Voiced-sound-marks into full-width style "\u{309B}"

  • vsmark2half(&str) -> String
    convert all separated Voiced-sound-marks into half-width style "\u{FF9E}"

  • vsmark2combi(&str) -> String
    convert all separated Voiced-sound-marks into space+combining style "\u{20}\u{3099}"

  • nowidespace(&str) -> String
    convert Wide-space into normal space [" " -> " "]

  • space2wide(&str) -> String
    convert normal space into Wide-space [" " -> " "]

  • nowideyen(&str) -> String
    convert Wide-yen into Half-width-yen ["¥" -> "¥"]

  • yen2wide(&str) -> String
    convert Half-width-yen into Wide-yen ["¥" -> "¥"]

TODO or NOT TODO

  • Voiced-sound-marks -> no space combining style "\u{3099}"
  • Half-width-kana <- normal Katakana
  • (normal/wide)tilde <-> Wave-dash

About

A Rust library to convert Japanese Half-width-kana[半角カナ] and Wide-alphanumeric[全角英数] into normal ones

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors3

  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp