- Notifications
You must be signed in to change notification settings - Fork17
Apache OpenNLP wrapper for Nodejs
License
mbejda/Node-OpenNLP
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Node-OpenNLP is depended onNode-Java
. Please take make sure your environment is properly configured to runNode-Java
. Clickhere to learn more aboutNode-Java
.
npm install opennlp --save
Node-OpenNLP comes withApache OpenNLP 1.6.0 along with the following trained 1.5 series models:
- en-chunker.bin
- en-ner-person.bin
- en-pos-maxent.bin
- en-sent.bin
- en-token.bin
More trained models can be found here:http://opennlp.sourceforge.net/models-1.5
The OpenNLP Sentence Detector can detect that a punctuation character marks the end of a sentence or not. In this sense a sentence is defined as the longest white space trimmed character sequence between two punctuation marks.
varopenNLP=require("opennlp");varsentence='Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';varsentenceDetector=newopenNLP().sentenceDetector;sentenceDetector.sentDetect(sentence,function(err,results){/// To get probabilitiessentenceDetector.probs(function(error,probability){console.log(error,probability)})console.log(results)});
The following default configurations can be overrided during initialization.
varopenNLP=require("opennlp");varopennlp=newopenNLP({models :{doccat:__dirname+'/models/en-doccat.bin',posTagger:__dirname+'/models/en-pos-maxent.bin',tokenizer:__dirname+'/models/en-token.bin',nameFinder:__dirname+'/models/en-ner-person.bin',sentenceDetector:__dirname+'/models/en-sent.bin',chunker:__dirname+'/models/en-chunker.bin'},openNLP={jar:__dirname+"/lib/opennlp-tools-1.6.0.jar"}});
The OpenNLP Tokenizers segment an input character sequence into tokens. Tokens are usually words, punctuation, numbers, etc.
varopenNLP=require("opennlp");varsentence='Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';vartokenizer=newopenNLP().tokenizer;tokenizer.tokenize(sentence,function(err,results){console.log(err,results);tokenizer.getTokenProbabilities(function(error,response){console.log(error,response);});});
The Name Finder can detect named entities and numbers in text. To be able to detect entities the Name Finder needs a model. The model is dependent on the language and entity type it was trained for.
varopenNLP=require("opennlp");varsentence='Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';varnameFinder=newopenNLP().nameFinder;nameFinder.find(sentence,function(err,tokens_arr){console.log(err,tokens_arr)nameFinder.probs(function(error,response){console.log(error,response)});});
The OpenNLP Document Categorizer can classify text into pre-defined categories. It is based on maximum entropy framework.
** To use the document categorizer you need to train a model first. The default trained model that is included is for testing purposes only. **
varopenNLP=require("opennlp");vardoccat=newopenNLP().doccat;doccat.categorize("I enjoyed watching Rocky",function(err,list){doccat.getAllResults(list,function(err,category){});doccat.getBestCategory(list,function(err,category){});});doccat.scoreMap("I enjoyed watching Rocky",function(err,category){});doccat.sortedScoreMap("I enjoyed watching Rocky",function(err,category){});doccat.getCategory(1,function(err,category){});doccat.getIndex('Happy',function(err,index){});
The Part of Speech Tagger marks tokens with their corresponding word type based on the token itself and the context of the token. A token might have multiple pos tags depending on the token and the context. The OpenNLP POS Tagger uses a probability model to predict the correct pos tag out of the tag set.
varopenNLP=require("opennlp");varposTagger=newopenNLP().posTagger;varsentence='Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';posTagger.tag(sentence,function(err,tokens_arr){console.log(err,tokens_arr)});posTagger.topKSequences(sentence,function(error,tagger){console.log(tagger.getScore())console.log(tagger.getProbs())console.log(tagger.getOutcomes())});
Text chunking consists of dividing a text in syntactically correlated parts of words, like noun groups, verb groups, but does not specify their internal structure, nor their role in the main sentence.
varopenNLP=require("opennlp");varposTagger=newopenNLP().posTagger;varsentence='Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';varchunker=newopenNLP().chunker;posTagger.tag(sentence,function(err,tokens_arr){chunker.topKSequences(sentence,tokens_arr,function(err,tokens_arr){console.log(err,tokens_arr)});chunker.chunk(sentence,tokens_arr,function(err,tokens_arr){chunker.probs(function(error,prob){});});});
Please report any bugs. Feel free to send me a tweet if you need any help.
Follow me on Twitter[@notmilobejda](https://twitter.com/notmilobejda)
My Blog[mbejda.com](https://mbejda.com)
About
Apache OpenNLP wrapper for Nodejs