- Notifications
You must be signed in to change notification settings - Fork1
minhpqn/jmlm_scoring
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Copyright by Pham Quang Nhat Minh (C).
In this package, we implementMasked Language Model Scoring in Pytorch and add Japanese, Vietnamese support.
Python 3.6+ is required. Clone this repository and install:
pip install -e.
fromjmlm.modelsimportget_pretrainedfromjmlm.scorersimportMLMScorer,LMScorer# Masked Language Model scroing with cl-tohoku/bert-base-japanesemodel,tokenizer=get_pretrained('cl-tohoku/bert-base-japanese',device='cpu')scorer=MLMScorer(model,tokenizer,device='cpu')print(scorer.score_sentences(['英語をわかる','英語がわかる']))# >> [-27.141216278076172, -16.992878675460815]# Take average score of tokensprint(scorer.score_sentences(['英語をわかる','英語がわかる'],normalize=True))# >> [-9.047072092692057, -5.6642928918202715]# Get scores for each tokenprint(scorer.score_sentences(['英語がわかる'],per_token=True))# >> [[-6.728611469268799, -2.057461977005005, -8.206805229187012]]# Masked Language Model scoring for Vietnamesemodel,tokenizer=get_pretrained('NlpHUST/vibert4news-base-cased',device='cpu')scorer=MLMScorer(model,tokenizer,device='cpu')print(scorer.score_sentences(['Tôi là sinh viên đại học.','Tôi là học sinh đại học']))## >> [-3.132660969684366, -7.91806149110198]# Language Model Scoring with rinna/japanese-gpt2-mediummodel,tokenizer=get_pretrained('rinna/japanese-gpt2-medium',device='cpu')scorer=LMScorer(model,tokenizer,device='cpu')print(scorer.score_sentences(['英語をわかる','英語がわかる']))## >> [-21.05287978053093, -15.756769746541977]
About
Masked Language Model-based Scoring for Japanese and Vietnamese
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.