Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Natural language detection library for Rust. Try demo online:https://whatlang.org/

License

NotificationsYou must be signed in to change notification settings

greyblake/whatlang-rs

Repository files navigation

Whatlang - rust library for natural language detection

Whatlang

Natural language detection for Rust with focus on simplicity and performance.

Try online demo.

Build StatusLicenseDocumentation

Stand With Ukraine

Content

Features

  • Supports69 languages
  • 100% written in Rust
  • Lightweight, fast and simple
  • Recognizes not only a language, but also a script (Latin, Cyrillic, etc)
  • Provides reliability information

Get started

Example:

use whatlang::{detect,Lang,Script};fnmain(){let text ="Ĉu vi ne volas eklerni Esperanton? Bonvolu! Estas unu de la plej bonaj aferoj!";let info =detect(text).unwrap();assert_eq!(info.lang(),Lang::Epo);assert_eq!(info.script(),Script::Latin);assert_eq!(info.confidence(),1.0);assert!(info.is_reliable());}

For more details (e.g. how to blacklist some languages) please check thedocumentation.

Who uses Whatlang?

Whatlang is used within the following big projects as direct or indirect dependency for language recognition.You're gonna be in a great company using Whatlang:

  • Sonic - fast, lightweight and schema-less search backend in Rust.
  • Meilisearch - an open-source, easy-to-use, blazingly fast, and hyper-relevant search engine built in Rust.

Feature toggles

FeatureDescription
enum-mapLang andScript implementEnum trait fromenum-map
arbitrarySupportArbitrary
serdeImplementsSerialize andDeserialize forLang andScript
devEnableswhatlang::dev module which provides some internal API.
It exists for profiling purposes and normal users are discouraged to to rely on this API.

How does it work?

How does the language recognition work?

The algorithm is based on the trigram language models, which is a particular case of n-grams.To understand the idea, please check the original whitepaperCavnar and Trenkle '94: N-Gram-Based Text Categorization'.

How isis_reliable calculated?

It is based on the following factors:

  • How many unique trigrams are in the given text
  • How big is the difference between the first and the second(not returned) detected languages? This metric is calledrate in the code base.

Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas.This function is a hyperbola and it looks like the following one:

Language recognition whatlang rust

For more details, please check a blog articleIntroduction to Rust Whatlang Library and Natural Language Identification Algorithms.

Make tasks

  • make bench - run performance benchmarks
  • make doc - generate and open doc
  • make test - run tests
  • make watch - watch changes and run tests

Comparison with alternatives

WhatlangCLD2CLD3
Implementation languageRustC++C++
Languages6883107
Algorithmtrigramsquadgramsneural network
Supported EncodingUTF-8UTF-8?
HTML supportnoyes?

Ports and clones

Donations

You can support the project by donatingNEAR tokens.

Our NEAR wallet address iswhatlang.near

Derivation

Whatlang is a derivative work fromFranc (JavaScript, MIT) byTitus Wormer.

License

MIT ©Sergey Potapov

Contributors


[8]ページ先頭

©2009-2025 Movatter.jp