- Notifications
You must be signed in to change notification settings - Fork11
Machine learning based text classification in JavaScript using n-grams and cosine similarity
License
andreekeberg/ml-classify-text-js
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Use machine learning to classify text usingn-grams andcosine similarity.
Minimal library that can be used both in thebrowser and inNode.js, that allows you to train a model with a large amount of text samples (and corresponding labels), and then use this model to quickly predict one or more appropriate labels for new text samples.
Using npm
npm install ml-classify-textUsing yarn
yarn add ml-classify-textImport as an ES6 module
importClassifierfrom'ml-classify-text'
Import as a CommonJS module
const{ Classifier}=require('ml-classify-text')
constclassifier=newClassifier()
constpositive=['This is great, so cool!','Wow, I love it!','It really is amazing']constnegative=['This is really bad','I hate it with a passion','Just terrible!']classifier.train(positive,'positive')classifier.train(negative,'negative')
constpredictions=classifier.predict('It sure is pretty great!')if(predictions.length){predictions.forEach((prediction)=>{console.log(`${prediction.label} (${prediction.confidence})`)})}else{console.log('No predictions returned')}
Returning:
positive (0.5423261445466404)The following configuration options can be passed both directly to a newModel, or indirectly by passing it to theClassifier constructor.
| Property | Type | Default | Description |
|---|---|---|---|
| nGramMin | int | 1 | Minimum n-gram size |
| nGramMax | int | 1 | Maximum n-gram size |
| vocabulary | Array |Set |false | [] | Terms mapped to indexes in the model data, set tofalse to store terms directly in the data entries |
| data | Object | {} | Key-value store of labels and training data vectors |
The default behavior is to split up texts by single words (known as abag of words, or unigrams).
This has a few limitations, since by ignoring the order of words, it's impossible to correctly match phrases and expressions.
In comesn-grams, which, when set to use more than one word per term, act like a sliding window that moves across the text — a continuous sequence of words of the specified amount, which can greatly improve the accuracy of predictions.
constclassifier=newClassifier({nGramMin:2,nGramMax:2})consttokens=classifier.tokenize('I really dont like it')console.log(tokens)
Returning:
{'i really':1,'really dont':1,'dont like':1,'like it':1}
After training a model with large sets of data, you'll want to store all this data, to allow you to simply set up a new model using this training data at another time, and quickly make predictions.
To do this, simply use theserialize method on yourModel, and either save the data structure to a file, send it to a server, or store it in any other way you want.
constmodel=classifier.modelconsole.log(model.serialize())
Returning:
{ nGramMin: 1, nGramMax: 1, vocabulary: [ 'this', 'is', 'great', 'so', 'cool', 'wow', 'i', 'love', 'it', 'really', 'amazing', 'bad', 'hate', 'with', 'a', 'passion', 'just', 'terrible' ], data: { positive: { '0': 1, '1': 2, '2': 1, '3': 1, '4': 1, '5': 1, '6': 1, '7': 1, '8': 2, '9': 1, '10': 1 }, negative: { '0': 1, '1': 1, '6': 1, '8': 1, '9': 1, '11': 1, '12': 1, '13': 1, '14': 1, '15': 1, '16': 1, '17': 1 } }}Read thecontribution guidelines.
Refer to thechangelog for a full history of the project.
ClassifyText is licensed under theMIT license.
About
Machine learning based text classification in JavaScript using n-grams and cosine similarity
Topics
Resources
License
Contributing
Uh oh!
There was an error while loading.Please reload this page.