R Wrapper for Google’s Compact Language Detector 3
Google’s Compact Language Detector 3 is a neural network model for language identification and the successor of CLD2 (available from) CRAN. This version is still experimental and uses a novell algorithm with different properties and outcomes. For more information see:https://github.com/google/cld3#readme
Example
The functiondetect_language()
is vectorised and guesses the the language of each string in text or returns NA if the language could not reliably be determined.
>library(cld3)>example(cld3)cld3># Vectorized best guesscld3>detect_language(c("To be or not to be?","Ce n'est pas grave.","猿も木から落ちる"))[1]"en""fr""ja"
The functiondetect_language_multi()
is not vectorised and detects all languages inside the entire character vector as a whole.
Installation
Binary packages forOS-X orWindows can be installed directly from CRAN:
install.packages("cld3")
Installation from source on Linux or OSX requires Google’sProtocol Buffers library. OnDebian or Ubuntu installlibprotobuf-dev andprotobuf-compiler:
sudo apt-get install -y libprotobuf-dev protobuf-compiler
OnFedora we needprotobuf-devel:
sudo yum install protobuf-devel
OnCentOS / RHEL we install [protobuf-devel](https://src.fedoraproject.org/rpms/protobuf via EPEL:
sudo yum install epel-releasesudo yum install protobuf-devel
OnOS-X useprotobuf from Homebrew:
brew install protobuf