- Notifications
You must be signed in to change notification settings - Fork22
A simple binding of ICU character set detection for Node.js
License
mooz/node-icu-charset-detector
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Character set detection is the process of determining the character set, or encoding, of character data in an unknown format.
A simple binding of ICU character set detection (http://userguide.icu-project.org/conversion/detection) for Node.js.
At first, installlibicu
into your system (Seethis instruction for details).
After that, installnode-icu-charset-detector
from npm.
npm install node-icu-charset-detector
Debian (Ubuntu)
apt-get install libicu-dev
Gentoo
emerge icu
Fedora/CentOS
yum install libicu-devel
MacPorts
port install icu +devel
Homebrew
brew install icu4cbrew link icu4c --force
If experiencing issues with 'homebrew' installing version 50.1 of icu4c, try the following:
brew search icu4cbrew tap homebrew/versionsbrew versions icu4ccd$(brew --prefix)&& git pull --rebasegit checkout c25fd2f$(brew --prefix)/Library/Formula/icu4c.rbbrew install icu4c
- From source
curl -O http://download.icu-project.org/files/icu4c/52.1/icu4c-52_1-src.tgztar xzvf icu4c-4_4_2-src.tgzcd icu/sourcechmod +x runConfigureICU configure install-sh./runConfigureICU MacOSXmakesudo make installxcode-select --install
node-icu-charset-detector
provides a functiondetectCharset(buffer)
, wherebuffer
is an instance ofBuffer
whose charset should be detected.
varcharsetDetector=require("node-icu-charset-detector");varbuffer=fs.readFileSync("/path/to/the/file");varcharset=charsetDetector.detectCharset(buffer);console.log("charset name: "+charset.toString());console.log("language: "+charset.language);console.log("detection confidence: "+charset.confidence);
detectCharset(buffer)
returns the detected charset name forbuffer
, and the returned charset name has two extra propertieslanguage
andconfidence
:
charset.language
- language name for the detected character set.
charset.confidence
- confidence of the charset detection for
charset
.
- confidence of the charset detection for
Since ICU itself does not have a feature to convert character sets, you may need to usenode-iconv
(https://github.com/bnoordhuis/node-iconv), which has a powerful character sets converting feature.
Here is a simple example to leveragenode-iconv
to convert character sets not supported by Node itself.
functionbufferToString(buffer){varcharsetDetector=require("node-icu-charset-detector");varcharset=charsetDetector.detectCharset(buffer).toString();try{returnbuffer.toString(charset);}catch(x){varIconv=require("iconv").Iconv;varcharsetConverter=newIconv(charset,"utf8");returncharsetConverter.convert(buffer).toString();}}varbuffer=fs.readFileSync("/path/to/the/file");varbufferString=bufferToString(buffer);
About
A simple binding of ICU character set detection for Node.js
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors8
Uh oh!
There was an error while loading.Please reload this page.