andreekeberg/ml-classify-text-jsPublic

NotificationsYou must be signed in to change notification settings
Fork11
Star132

Machine learning based text classification in JavaScript using n-grams and cosine similarity

License

MIT license

132 stars 11 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.vscode		.vscode
docs		docs
src		src
test		test
.babelrc		.babelrc
.editorconfig		.editorconfig
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.npmignore		.npmignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package.json		package.json
webpack.config.js		webpack.config.js

Repository files navigation

📄 ClassifyText (JS)

Use machine learning to classify text usingn-grams andcosine similarity.

Minimal library that can be used both in thebrowser and inNode.js, that allows you to train a model with a large amount of text samples (and corresponding labels), and then use this model to quickly predict one or more appropriate labels for new text samples.

Installation

Using npm

npm install ml-classify-text

Using yarn

yarn add ml-classify-text

Getting started

Import as an ES6 module

importClassifierfrom'ml-classify-text'

Import as a CommonJS module

const{ Classifier}=require('ml-classify-text')

Basic usage

Setting up a new Classifier instance

constclassifier=newClassifier()

Training a model

constpositive=['This is great, so cool!','Wow, I love it!','It really is amazing']constnegative=['This is really bad','I hate it with a passion','Just terrible!']classifier.train(positive,'positive')classifier.train(negative,'negative')

Getting a prediction

constpredictions=classifier.predict('It sure is pretty great!')if(predictions.length){predictions.forEach((prediction)=>{console.log(`${prediction.label} (${prediction.confidence})`)})}else{console.log('No predictions returned')}

Returning:

positive (0.5423261445466404)

Advanced usage

Configuration

The following configuration options can be passed both directly to a newModel, or indirectly by passing it to theClassifier constructor.

Options

Property	Type	Default	Description
nGramMin	`int`	`1`	Minimum n-gram size
nGramMax	`int`	`1`	Maximum n-gram size
vocabulary	`Array` \|`Set` \|`false`	`[]`	Terms mapped to indexes in the model data, set to`false` to store terms directly in the data entries
data	`Object`	`{}`	Key-value store of labels and training data vectors

Using n-grams

The default behavior is to split up texts by single words (known as abag of words, or unigrams).

This has a few limitations, since by ignoring the order of words, it's impossible to correctly match phrases and expressions.

In comesn-grams, which, when set to use more than one word per term, act like a sliding window that moves across the text — a continuous sequence of words of the specified amount, which can greatly improve the accuracy of predictions.

Example of using n-grams with a size of 2 (bigrams)

constclassifier=newClassifier({nGramMin:2,nGramMax:2})consttokens=classifier.tokenize('I really dont like it')console.log(tokens)

Returning:

{'i really':1,'really dont':1,'dont like':1,'like it':1}

Serializing a model

After training a model with large sets of data, you'll want to store all this data, to allow you to simply set up a new model using this training data at another time, and quickly make predictions.

To do this, simply use theserialize method on yourModel, and either save the data structure to a file, send it to a server, or store it in any other way you want.

constmodel=classifier.modelconsole.log(model.serialize())

Returning:

{    nGramMin: 1,    nGramMax: 1,    vocabulary: [    'this',    'is',      'great',    'so',      'cool',    'wow',    'i',       'love',    'it',    'really',  'amazing', 'bad',    'hate',    'with',    'a',    'passion', 'just',    'terrible'    ],    data: {        positive: {            '0': 1, '1': 2, '2': 1,            '3': 1, '4': 1, '5': 1,            '6': 1, '7': 1, '8': 2,            '9': 1, '10': 1        },        negative: {            '0': 1, '1': 1, '6': 1,            '8': 1, '9': 1, '11': 1,            '12': 1, '13': 1, '14': 1,            '15': 1, '16': 1, '17': 1        }    }}

Documentation

Contributing

Read thecontribution guidelines.

Changelog

Refer to thechangelog for a full history of the project.

License

ClassifyText is licensed under theMIT license.

About

Machine learning based text classification in JavaScript using n-grams and cosine similarity

www.npmjs.com/package/ml-classify-text

Releases3

2.0.1 Latest

Feb 5, 2023

+ 2 releases

Languages

JavaScript100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

📄 ClassifyText (JS)

Installation

Getting started

Basic usage

Setting up a new Classifier instance

Training a model

Getting a prediction

Advanced usage

Configuration

Options

Using n-grams

Example of using n-grams with a size of 2 (bigrams)

Serializing a model

Documentation

Contributing

Changelog

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases3

Languages

Movatterモバイル変換

License

andreekeberg/ml-classify-text-js

Folders and files

Latest commit

History

Repository files navigation

📄 ClassifyText (JS)

Installation

Getting started

Basic usage

Setting up a new Classifier instance

Training a model

Getting a prediction

Advanced usage

Configuration

Options

Using n-grams

Example of using n-grams with a size of 2 (bigrams)

Serializing a model

Documentation

Contributing

Changelog

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases3

Languages