Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A simple binding of ICU character set detection for Node.js

License

NotificationsYou must be signed in to change notification settings

mooz/node-icu-charset-detector

Repository files navigation

Character set detection is the process of determining the character set, or encoding, of character data in an unknown format.

A simple binding of ICU character set detection (http://userguide.icu-project.org/conversion/detection) for Node.js.

Installation

At first, installlibicu into your system (Seethis instruction for details).

After that, installnode-icu-charset-detector from npm.

npm install node-icu-charset-detector

Installing ICU

Linux

  • Debian (Ubuntu)

    apt-get install libicu-dev

  • Gentoo

    emerge icu

  • Fedora/CentOS

    yum install libicu-devel

OSX

  • MacPorts

    port install icu +devel

  • Homebrew

brew install icu4cbrew link icu4c --force

If experiencing issues with 'homebrew' installing version 50.1 of icu4c, try the following:

brew search icu4cbrew tap homebrew/versionsbrew versions icu4ccd$(brew --prefix)&& git pull --rebasegit checkout c25fd2f$(brew --prefix)/Library/Formula/icu4c.rbbrew install icu4c
  • From source
curl -O http://download.icu-project.org/files/icu4c/52.1/icu4c-52_1-src.tgztar xzvf icu4c-4_4_2-src.tgzcd icu/sourcechmod +x runConfigureICU configure install-sh./runConfigureICU MacOSXmakesudo make installxcode-select --install

Usage

Simple usage

node-icu-charset-detector provides a functiondetectCharset(buffer), wherebuffer is an instance ofBuffer whose charset should be detected.

varcharsetDetector=require("node-icu-charset-detector");varbuffer=fs.readFileSync("/path/to/the/file");varcharset=charsetDetector.detectCharset(buffer);console.log("charset name: "+charset.toString());console.log("language: "+charset.language);console.log("detection confidence: "+charset.confidence);

detectCharset(buffer) returns the detected charset name forbuffer, and the returned charset name has two extra propertieslanguage andconfidence:

  • charset.language
    • language name for the detected character set.
  • charset.confidence
    • confidence of the charset detection forcharset.

Leveraging node-iconv

Since ICU itself does not have a feature to convert character sets, you may need to usenode-iconv (https://github.com/bnoordhuis/node-iconv), which has a powerful character sets converting feature.

Here is a simple example to leveragenode-iconv to convert character sets not supported by Node itself.

functionbufferToString(buffer){varcharsetDetector=require("node-icu-charset-detector");varcharset=charsetDetector.detectCharset(buffer).toString();try{returnbuffer.toString(charset);}catch(x){varIconv=require("iconv").Iconv;varcharsetConverter=newIconv(charset,"utf8");returncharsetConverter.convert(buffer).toString();}}varbuffer=fs.readFileSync("/path/to/the/file");varbufferString=bufferToString(buffer);

About

A simple binding of ICU character set detection for Node.js

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors8


[8]ページ先頭

©2009-2025 Movatter.jp