winkjs/wink-nlpPublic

NotificationsYou must be signed in to change notification settings
Fork61
Star1.3k

Developer friendly Natural Language Processing ✨

License

MIT license

1.3k stars 61 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 315 Commits
.github/workflows		.github/workflows
benchmark		benchmark
models		models
src		src
test		test
types		types
utilities		utilities
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.jsdoc.json		.jsdoc.json
.npmignore		.npmignore
.nycrc.json		.nycrc.json
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
runkit-example.js		runkit-example.js

Repository files navigation

winkNLP

Developer friendly Natural Language Processing ✨

WinkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP applicationseasier andfaster, winkNLP is optimized for the right balance of performance and accuracy.

Itsword embedding support unlocks deeper text analysis. Represent words and text as numerical vectors with ease, bringing higher accuracy in tasks like semantic similarity, text classification, and beyond – even within abrowser.

It is built ground up withno external dependency and has alean code base of ~10Kb minified & gzipped. A test coverage of~100% and compliance with theOpen Source Security Foundation best practices make winkNLP the ideal tool for building production grade systems with confidence.

WinkNLP with fullTypescript support, runs on Node.js,web browsers andDeno.

Build amazing apps quickly

Wikipedia article timeline	Context aware word cloud	Key sentences detection

Head tolive examples to explore further.

Blazing fast

WinkNLP can easily process large amount of raw text at speeds over650,000 tokens/second on a M1 Macbook Pro in both browser and Node.js environments. It even runs smoothly on a low-end smartphone's browser.

Environment	Benchmarking Command
Node.js	node benchmark/run
Browser	How to measure winkNLP's speed on browsers?

Features

WinkNLP has acomprehensive natural language processing (NLP) pipeline covering tokenization, sentence boundary detection (sbd), negation handling, sentiment analysis, part-of-speech (pos) tagging, named entity recognition (ner), custom entities recognition (cer). It offers a rich feature set:

🐎 Fast, lossless & multilingual tokenizer	For example, the multilingual text string`"¡Hola! नमस्कार! Hi! Bonjour chéri"` is tokenized as`["¡", "Hola", "!", "नमस्कार", "!", "Hi", "!", "Bonjour", "chéri"]`. The tokenizer processes text at a speed close to4 million tokens/second on a M1 MBP's browser.
✨ Developer friendly and intuitiveAPI	With winkNLP, process any text using a simple, declarative syntax; mostlive examples have30-40 lines of code.
🖼 Best-in-classtext visualization	Programmaticallymark tokens, sentences, entities, etc. using HTML mark or any other tag of your choice.
♻️ Extensive text processing features	Remove and/or retain tokens with specific attributes such as part-of-speech, named entity type, token type, stop word, shape and many more; compute Flesch reading ease score; generate n-grams; normalize, lemmatise or stem. Checkout how with the right kind of text preprocessing, evenNaive Bayes classifier achievesimpressive (≥90%) accuracy in sentiment analysis and chatbot intent classification tasks.
🔠 Pre-trainedlanguage models	Compact sizes starting from~1MB (minified & gzipped) – reduce model loading time drastically down to ~1 second on a 4G network.
↗️Word vectors	100-dimensional English word embeddings for over 350K English words, which are optimized for winkNLP. Allows easy computation of sentence or document embeddings.

Utilities & Tools 💼

BM25 Vectorizer
Similarity methods – Cosine, Tversky, Sørensen-Dice, Otsuka-Ochiai
its & as helpers to get Bag of Words, Frequency table, Lemma, Stem, Stop word removal, Negation handling and many more.

Documentation

Concepts — everything you need to know to get started.
API Reference — explains usage of APIs with examples.
Change log — version history along with the details of breaking changes, if any.
Examples — live examples with code to give you a head start.

Installation

Usenpm install:

npm install wink-nlp --save

In order to use winkNLP after its installation, you also need to install a language model according to the node version used. The table below outlines the version specific installation command:

Node.js Version	Installation
16 or 18	`npm install wink-eng-lite-web-model --save`
14 or 12	`node -e "require('wink-nlp/models/install')"`

Thewink-eng-lite-web-model is designed to work with Node.js version 16 or 18. It can also work on browsers as described in the next section. This is therecommended model.

The second command installs thewink-eng-lite-model, which works with Node.js version 14 or 12.

How to configure TypeScript project

EnableesModuleInterop andallowSyntheticDefaultImports in thetsconfig.json file:

"compilerOptions": {    "esModuleInterop": true,    "allowSyntheticDefaultImports": true,    ...}

How to install for Web Browser

If you’re using winkNLP in the browser use thewink-eng-lite-web-model. Learn about its installation and usage in ourguide to using winkNLP in the browser. ExplorewinkNLP recipes onObservable for live browser based examples.

How to run onDeno

Follow theexample on replit.

Get started

Here is the "Hello World!" of winkNLP:

// Load wink-nlp package.constwinkNLP=require('wink-nlp');// Load english language model.constmodel=require('wink-eng-lite-web-model');// Instantiate winkNLP.constnlp=winkNLP(model);// Obtain "its" helper to extract item properties.constits=nlp.its;// Obtain "as" reducer helper to reduce a collection.constas=nlp.as;// NLP Code.consttext='Hello   World🌎! How are you?';constdoc=nlp.readDoc(text);console.log(doc.out());// -> Hello   World🌎! How are you?console.log(doc.sentences().out());// -> [ 'Hello   World🌎!', 'How are you?' ]console.log(doc.entities().out(its.detail));// -> [ { value: '🌎', type: 'EMOJI' } ]console.log(doc.tokens().out());// -> [ 'Hello', 'World', '🌎', '!', 'How', 'are', 'you', '?' ]console.log(doc.tokens().out(its.type,as.freqTable));// -> [ [ 'word', 5 ], [ 'punctuation', 2 ], [ 'emoji', 1 ] ]

Experiment with winkNLP onRunKit.

Speed & Accuracy

ThewinkNLP processes raw text at~650,000 tokens per second with itswink-eng-lite-web-model, whenbenchmarked using "Ch 13 of Ulysses by James Joyce" on a M1 Macbook Pro machine with 16GB RAM. The processing included the entire NLP pipeline — tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, and named entity extraction. This speed is way ahead of the prevailing speed benchmarks.

The benchmark was conducted onNode.js versions 16, and 18.

It pos tags a subset of WSJ corpus with an accuracy of~95% — this includestokenization of raw text prior to pos tagging. The present state-of-the-art is at ~97% accuracy but at lower speeds and is generally computed using gold standard pre-tokenized corpus.

Its general purpose sentiment analysis delivers af-score of~84.5%, when validated using Amazon Product ReviewSentiment Labelled Sentences Data Set atUCI Machine Learning Repository. The current benchmark accuracy forspecifically trained models can range around 95%.

Memory Requirement

Wink NLP delivers this performance with the minimal load on RAM. For example, it processes the entireHistory of India Volume I with a total peak memory requirement of under80MB. The book has around 350 pages which translates to over 125,000 tokens.

Need Help?

Usage query 👩🏽‍💻

Please ask atStack Overflow or discuss atWink JS GitHub Discussions or chat with us atWink JS Gitter Lobby.

Bug report 🐛

If you spot a bug and the same has not yet been reported, raise a newissue or consider fixing it and sending a PR.

New feature 🌟

Looking for a new feature, request it via thenew features & ideas discussion forum or consider becoming acontributor.

About winkJS

WinkJS is a family of open source packages forNatural Language Processing,Machine Learning, andStatistical Analysis in NodeJS. The code isthoroughly documented for easy human comprehension and has atest coverage of ~100% for reliability to build production grade solutions.