- Notifications
You must be signed in to change notification settings - Fork59
Developer friendly Natural Language Processing ✨
License
winkjs/wink-nlp
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
WinkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP applicationseasier andfaster, winkNLP is optimized for the right balance of performance and accuracy.
Itsword embedding support unlocks deeper text analysis. Represent words and text as numerical vectors with ease, bringing higher accuracy in tasks like semantic similarity, text classification, and beyond – even within abrowser.
It is built ground up withno external dependency and has alean code base of ~10Kb minified & gzipped. A test coverage of~100% and compliance with theOpen Source Security Foundation best practices make winkNLP the ideal tool for building production grade systems with confidence.
WinkNLP with fullTypescript support, runs on Node.js,web browsers andDeno.
Wikipedia article timeline | Context aware word cloud | Key sentences detection |
---|---|---|
![]() | ![]() | ![]() |
Head tolive examples to explore further.
WinkNLP can easily process large amount of raw text at speeds over650,000 tokens/second on a M1 Macbook Pro in both browser and Node.js environments. It even runs smoothly on a low-end smartphone's browser.
Environment | Benchmarking Command |
---|---|
Node.js | node benchmark/run |
Browser | How to measure winkNLP's speed on browsers? |
WinkNLP has acomprehensive natural language processing (NLP) pipeline covering tokenization, sentence boundary detection (sbd), negation handling, sentiment analysis, part-of-speech (pos) tagging, named entity recognition (ner), custom entities recognition (cer). It offers a rich feature set:
🐎 Fast, lossless & multilingual tokenizer | For example, the multilingual text string"¡Hola! नमस्कार! Hi! Bonjour chéri" is tokenized as["¡", "Hola", "!", "नमस्कार", "!", "Hi", "!", "Bonjour", "chéri"] . The tokenizer processes text at a speed close to4 million tokens/second on a M1 MBP's browser. |
✨ Developer friendly and intuitiveAPI | With winkNLP, process any text using a simple, declarative syntax; mostlive examples have30-40 lines of code. |
🖼 Best-in-classtext visualization | Programmaticallymark tokens, sentences, entities, etc. using HTML mark or any other tag of your choice. |
♻️ Extensive text processing features | Remove and/or retain tokens with specific attributes such as part-of-speech, named entity type, token type, stop word, shape and many more; compute Flesch reading ease score; generate n-grams; normalize, lemmatise or stem. Checkout how with the right kind of text preprocessing, evenNaive Bayes classifier achievesimpressive (≥90%) accuracy in sentiment analysis and chatbot intent classification tasks. |
🔠 Pre-trainedlanguage models | Compact sizes starting from~1MB (minified & gzipped) – reduce model loading time drastically down to ~1 second on a 4G network. |
100-dimensional English word embeddings for over 350K English words, which are optimized for winkNLP. Allows easy computation of sentence or document embeddings. |
- BM25 Vectorizer
- Similarity methods – Cosine, Tversky, Sørensen-Dice, Otsuka-Ochiai
- its & as helpers to get Bag of Words, Frequency table, Lemma, Stem, Stop word removal, Negation handling and many more.
- Concepts — everything you need to know to get started.
- API Reference — explains usage of APIs with examples.
- Change log — version history along with the details of breaking changes, if any.
- Examples — live examples with code to give you a head start.
Usenpm install:
npm install wink-nlp --save
In order to use winkNLP after its installation, you also need to install a language model according to the node version used. The table below outlines the version specific installation command:
Node.js Version | Installation |
---|---|
16 or 18 | npm install wink-eng-lite-web-model --save |
14 or 12 | node -e "require('wink-nlp/models/install')" |
Thewink-eng-lite-web-model is designed to work with Node.js version 16 or 18. It can also work on browsers as described in the next section. This is therecommended model.
The second command installs thewink-eng-lite-model, which works with Node.js version 14 or 12.
EnableesModuleInterop
andallowSyntheticDefaultImports
in thetsconfig.json
file:
"compilerOptions": { "esModuleInterop": true, "allowSyntheticDefaultImports": true, ...}
If you’re using winkNLP in the browser use thewink-eng-lite-web-model. Learn about its installation and usage in ourguide to using winkNLP in the browser. ExplorewinkNLP recipes onObservable for live browser based examples.
How to run onDeno
Follow theexample on replit.
Here is the "Hello World!" of winkNLP:
// Load wink-nlp package.constwinkNLP=require('wink-nlp');// Load english language model.constmodel=require('wink-eng-lite-web-model');// Instantiate winkNLP.constnlp=winkNLP(model);// Obtain "its" helper to extract item properties.constits=nlp.its;// Obtain "as" reducer helper to reduce a collection.constas=nlp.as;// NLP Code.consttext='Hello World🌎! How are you?';constdoc=nlp.readDoc(text);console.log(doc.out());// -> Hello World🌎! How are you?console.log(doc.sentences().out());// -> [ 'Hello World🌎!', 'How are you?' ]console.log(doc.entities().out(its.detail));// -> [ { value: '🌎', type: 'EMOJI' } ]console.log(doc.tokens().out());// -> [ 'Hello', 'World', '🌎', '!', 'How', 'are', 'you', '?' ]console.log(doc.tokens().out(its.type,as.freqTable));// -> [ [ 'word', 5 ], [ 'punctuation', 2 ], [ 'emoji', 1 ] ]
Experiment with winkNLP onRunKit.
ThewinkNLP processes raw text at~650,000 tokens per second with itswink-eng-lite-web-model, whenbenchmarked using "Ch 13 of Ulysses by James Joyce" on a M1 Macbook Pro machine with 16GB RAM. The processing included the entire NLP pipeline — tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, and named entity extraction. This speed is way ahead of the prevailing speed benchmarks.
The benchmark was conducted onNode.js versions 16, and 18.
It pos tags a subset of WSJ corpus with an accuracy of~95% — this includestokenization of raw text prior to pos tagging. The present state-of-the-art is at ~97% accuracy but at lower speeds and is generally computed using gold standard pre-tokenized corpus.
Its general purpose sentiment analysis delivers af-score of~84.5%, when validated using Amazon Product ReviewSentiment Labelled Sentences Data Set atUCI Machine Learning Repository. The current benchmark accuracy forspecifically trained models can range around 95%.
Wink NLP delivers this performance with the minimal load on RAM. For example, it processes the entireHistory of India Volume I with a total peak memory requirement of under80MB. The book has around 350 pages which translates to over 125,000 tokens.
Please ask atStack Overflow or discuss atWink JS GitHub Discussions or chat with us atWink JS Gitter Lobby.
If you spot a bug and the same has not yet been reported, raise a newissue or consider fixing it and sending a PR.
Looking for a new feature, request it via thenew features & ideas discussion forum or consider becoming acontributor.
WinkJS is a family of open source packages forNatural Language Processing,Machine Learning, andStatistical Analysis in NodeJS. The code isthoroughly documented for easy human comprehension and has atest coverage of ~100% for reliability to build production grade solutions.
Wink NLP is copyright 2017-24GRAYPE Systems Private Limited.
It is licensed under the terms of the MIT License.
About
Developer friendly Natural Language Processing ✨