Movatterモバイル変換

This repository was archived by the owner on Jul 27, 2023. It is now read-only.

mesolitica/NLP-Models-TensorflowPublic archive

NotificationsYou must be signed in to change notification settings
Fork724
Star1.8k

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

License

MIT license

1.8k stars 724 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 275 Commits
abstractive-summarization		abstractive-summarization
attention		attention
chatbot		chatbot
classification-comparison		classification-comparison
dependency-parser		dependency-parser
entity-tagging		entity-tagging
extractive-summarization		extractive-summarization
generator		generator
language-detection		language-detection
neural-machine-translation		neural-machine-translation
not-deep-learning		not-deep-learning
ocr		ocr
pos-tagging		pos-tagging
question-answer		question-answer
sentence-pair		sentence-pair
speech-to-text		speech-to-text
spelling-correction		spelling-correction
squad-qa		squad-qa
stemming		stemming
text-augmentation		text-augmentation
text-classification		text-classification
text-similarity		text-similarity
text-to-speech		text-to-speech
topic-generator		topic-generator
topic-model		topic-model
unsupervised-extractive-summarization		unsupervised-extractive-summarization
vectorizer		vectorizer
visualization		visualization
vocoder		vocoder
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
nlp-tf.png		nlp-tf.png
requirements.txt		requirements.txt

Repository files navigation

NLP-Models-Tensorflow, Gathers machine learning and tensorflow deep learning models for NLP problems,code simplify inside Jupyter Notebooks 100%.

Objective

Original implementations are quite complex and not really beginner friendly. So I tried to simplify most of it. Also, there are tons of not-yet release papers implementation. So feel free to use it for your own research!

I will attached github repositories for models that I not implemented from scratch, basically I copy, paste and fix those code for deprecated issues.

Tensorflow version

Tensorflow version 1.13 and above only, not included 2.X version. 1.13 < Tensorflow < 2.0

pip install -r requirements.txt

Abstractive Summarization

Trained onIndia news.

Accuracy based on 10 epochs only, calculated using word positions.

Complete list (12 notebooks)

LSTM Seq2Seq using topic modelling, test accuracy 13.22%
LSTM Seq2Seq + Luong Attention using topic modelling, test accuracy 12.39%
LSTM Seq2Seq + Beam Decoder using topic modelling, test accuracy 10.67%
LSTM Bidirectional + Luong Attention + Beam Decoder using topic modelling, test accuracy 8.29%
Pointer-Generator + Bahdanau,https://github.com/xueyouluo/my_seq2seq, test accuracy 15.51%
Copynet, test accuracy 11.15%
Pointer-Generator + Luong,https://github.com/xueyouluo/my_seq2seq, test accuracy 16.51%
Dilated Seq2Seq, test accuracy 10.88%
Dilated Seq2Seq + Self Attention, test accuracy 11.54%
BERT + Dilated CNN Seq2seq, test accuracy 13.5%
self-attention + Pointer-Generator, test accuracy 4.34%
Dilated-CNN Seq2seq + Pointer-Generator, test accuracy 5.57%

Chatbot

Trained onCornell Movie Dialog corpus, accuracy table inchatbot.

Complete list (54 notebooks)

Basic cell Seq2Seq-manual
LSTM Seq2Seq-manual
GRU Seq2Seq-manual
Basic cell Seq2Seq-API Greedy
LSTM Seq2Seq-API Greedy
GRU Seq2Seq-API Greedy
Basic cell Bidirectional Seq2Seq-manual
LSTM Bidirectional Seq2Seq-manual
GRU Bidirectional Seq2Seq-manual
Basic cell Bidirectional Seq2Seq-API Greedy
LSTM Bidirectional Seq2Seq-API Greedy
GRU Bidirectional Seq2Seq-API Greedy
Basic cell Seq2Seq-manual + Luong Attention
LSTM Seq2Seq-manual + Luong Attention
GRU Seq2Seq-manual + Luong Attention
Basic cell Seq2Seq-manual + Bahdanau Attention
LSTM Seq2Seq-manual + Bahdanau Attention
GRU Seq2Seq-manual + Bahdanau Attention
LSTM Bidirectional Seq2Seq-manual + Luong Attention
GRU Bidirectional Seq2Seq-manual + Luong Attention
LSTM Bidirectional Seq2Seq-manual + Bahdanau Attention
GRU Bidirectional Seq2Seq-manual + Bahdanau Attention
LSTM Bidirectional Seq2Seq-manual + backward Bahdanau + forward Luong
GRU Bidirectional Seq2Seq-manual + backward Bahdanau + forward Luong
LSTM Seq2Seq-API Greedy + Luong Attention
GRU Seq2Seq-API Greedy + Luong Attention
LSTM Seq2Seq-API Greedy + Bahdanau Attention
GRU Seq2Seq-API Greedy + Bahdanau Attention
LSTM Seq2Seq-API Beam Decoder
GRU Seq2Seq-API Beam Decoder
LSTM Bidirectional Seq2Seq-API + Luong Attention + Beam Decoder
GRU Bidirectional Seq2Seq-API + Luong Attention + Beam Decoder
LSTM Bidirectional Seq2Seq-API + backward Bahdanau + forward Luong + Stack Bahdanau Luong Attention + Beam Decoder
GRU Bidirectional Seq2Seq-API + backward Bahdanau + forward Luong + Stack Bahdanau Luong Attention + Beam Decoder
Bytenet
LSTM Seq2Seq + tf.estimator
Capsule layers + LSTM Seq2Seq-API Greedy
Capsule layers + LSTM Seq2Seq-API + Luong Attention + Beam Decoder
LSTM Bidirectional Seq2Seq-API + backward Bahdanau + forward Luong + Stack Bahdanau Luong Attention + Beam Decoder + Dropout + L2
DNC Seq2Seq
LSTM Bidirectional Seq2Seq-API + Luong Monotic Attention + Beam Decoder
LSTM Bidirectional Seq2Seq-API + Bahdanau Monotic Attention + Beam Decoder
End-to-End Memory Network + Basic cell
End-to-End Memory Network + LSTM cell
Attention is all you need
Transformer-XL
Attention is all you need + Beam Search
Transformer-XL + LSTM
GPT-2 + LSTM
CNN Seq2seq
Conv-Encoder + LSTM
Tacotron + Greedy decoder
Tacotron + Beam decoder
Google NMT

Dependency-Parser

Trained onCONLL English Dependency. Train set to train, dev and test sets to test.

Stackpointer and Biaffine-attention originally fromhttps://github.com/XuezheMax/NeuroNLP2 written in Pytorch.

Accuracy based on arc, types and root accuracies after 15 epochs only.

Complete list (8 notebooks)

Bidirectional RNN + CRF + Biaffine, arc accuracy 70.48%, types accuracy 65.18%, root accuracy 66.4%
Bidirectional RNN + Bahdanau + CRF + Biaffine, arc accuracy 70.82%, types accuracy 65.33%, root accuracy 66.77%
Bidirectional RNN + Luong + CRF + Biaffine, arc accuracy 71.22%, types accuracy 65.73%, root accuracy 67.23%
BERT Base + CRF + Biaffine, arc accuracy 64.30%, types accuracy 62.89%, root accuracy 74.19%
Bidirectional RNN + Biaffine Attention + Cross Entropy, arc accuracy 72.42%, types accuracy 63.53%, root accuracy 68.51%
BERT Base + Biaffine Attention + Cross Entropy, arc accuracy 72.85%, types accuracy 67.11%, root accuracy 73.93%
Bidirectional RNN + Stackpointer, arc accuracy 61.88%, types accuracy 48.20%, root accuracy 89.39%
XLNET Base + Biaffine Attention + Cross Entropy, arc accuracy 74.41%, types accuracy 71.37%, root accuracy 73.17%

Entity-Tagging

Trained onCONLL NER.

Complete list (9 notebooks)

Bidirectional RNN + CRF, test accuracy 96%
Bidirectional RNN + Luong Attention + CRF, test accuracy 93%
Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 95%
Char Ngrams + Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 96%
Char Ngrams + Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 96%
Char Ngrams + Residual Network + Bahdanau Attention + CRF, test accuracy 69%
Char Ngrams + Attention is you all Need + CRF, test accuracy 90%
BERT, test accuracy 99%
XLNET-Base, test accuracy 99%

Extractive Summarization

Trained onCNN News dataset.

Accuracy based on ROUGE-2.

Complete list (4 notebooks)

LSTM RNN, test accuracy 16.13%
Dilated-CNN, test accuracy 15.54%
Multihead Attention, test accuracy 26.33%
BERT-Base

Generator

Trained onShakespeare dataset.

Complete list (15 notebooks)

Character-wise RNN + LSTM
Character-wise RNN + Beam search
Character-wise RNN + LSTM + Embedding
Word-wise RNN + LSTM
Word-wise RNN + LSTM + Embedding
Character-wise + Seq2Seq + GRU
Word-wise + Seq2Seq + GRU
Character-wise RNN + LSTM + Bahdanau Attention
Character-wise RNN + LSTM + Luong Attention
Word-wise + Seq2Seq + GRU + Beam
Character-wise + Seq2Seq + GRU + Bahdanau Attention
Word-wise + Seq2Seq + GRU + Bahdanau Attention
Character-wise Dilated CNN + Beam search
Transformer + Beam search
Transformer XL + Beam search

Language-detection

Trained onTatoeba dataset.

Complete list (1 notebooks)

Fast-text Char N-Grams

Neural Machine Translation

Trained onEnglish-French, accuracy table inneural-machine-translation.

Complete list (53 notebooks)

1.basic-seq2seq2.lstm-seq2seq3.gru-seq2seq4.basic-seq2seq-contrib-greedy5.lstm-seq2seq-contrib-greedy6.gru-seq2seq-contrib-greedy7.basic-birnn-seq2seq8.lstm-birnn-seq2seq9.gru-birnn-seq2seq10.basic-birnn-seq2seq-contrib-greedy11.lstm-birnn-seq2seq-contrib-greedy12.gru-birnn-seq2seq-contrib-greedy13.basic-seq2seq-luong14.lstm-seq2seq-luong15.gru-seq2seq-luong16.basic-seq2seq-bahdanau17.lstm-seq2seq-bahdanau18.gru-seq2seq-bahdanau19.basic-birnn-seq2seq-bahdanau20.lstm-birnn-seq2seq-bahdanau21.gru-birnn-seq2seq-bahdanau22.basic-birnn-seq2seq-luong23.lstm-birnn-seq2seq-luong24.gru-birnn-seq2seq-luong25.lstm-seq2seq-contrib-greedy-luong26.gru-seq2seq-contrib-greedy-luong27.lstm-seq2seq-contrib-greedy-bahdanau28.gru-seq2seq-contrib-greedy-bahdanau29.lstm-seq2seq-contrib-beam-luong30.gru-seq2seq-contrib-beam-luong31.lstm-seq2seq-contrib-beam-bahdanau32.gru-seq2seq-contrib-beam-bahdanau33.lstm-birnn-seq2seq-contrib-beam-bahdanau34.lstm-birnn-seq2seq-contrib-beam-luong35.gru-birnn-seq2seq-contrib-beam-bahdanau36.gru-birnn-seq2seq-contrib-beam-luong37.lstm-birnn-seq2seq-contrib-beam-luongmonotonic38.gru-birnn-seq2seq-contrib-beam-luongmonotic39.lstm-birnn-seq2seq-contrib-beam-bahdanaumonotonic40.gru-birnn-seq2seq-contrib-beam-bahdanaumonotic41.residual-lstm-seq2seq-greedy-luong42.residual-gru-seq2seq-greedy-luong43.residual-lstm-seq2seq-greedy-bahdanau44.residual-gru-seq2seq-greedy-bahdanau45.memory-network-lstm-decoder-greedy46.google-nmt47.transformer-encoder-transformer-decoder48.transformer-encoder-lstm-decoder-greedy49.bertmultilanguage-encoder-bertmultilanguage-decoder50.bertmultilanguage-encoder-lstm-decoder51.bertmultilanguage-encoder-transformer-decoder52.bertenglish-encoder-transformer-decoder53.transformer-t2t-2gpu

OCR (optical character recognition)

Complete list (2 notebooks)

CNN + LSTM RNN, test accuracy 100%
Im2Latex, test accuracy 100%

POS-Tagging

Trained onCONLL POS.

Complete list (8 notebooks)

Bidirectional RNN + CRF, test accuracy 92%
Bidirectional RNN + Luong Attention + CRF, test accuracy 91%
Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 91%
Char Ngrams + Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 91%
Char Ngrams + Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 91%
Char Ngrams + Residual Network + Bahdanau Attention + CRF, test accuracy 3%
Char Ngrams + Attention is you all Need + CRF, test accuracy 89%
BERT, test accuracy 99%

Question-Answers

Trained onbAbI Dataset.

Complete list (4 notebooks)

End-to-End Memory Network + Basic cell
End-to-End Memory Network + GRU cell
End-to-End Memory Network + LSTM cell
Dynamic Memory

Sentence-pair

Trained onCornell Movie--Dialogs Corpus

Complete list (1 notebooks)

BERT

Speech to Text

Trained onToronto speech dataset.

Complete list (11 notebooks)

Tacotron,https://github.com/Kyubyong/tacotron_asr, test accuracy 77.09%
BiRNN LSTM, test accuracy 84.66%
BiRNN Seq2Seq + Luong Attention + Cross Entropy, test accuracy 87.86%
BiRNN Seq2Seq + Bahdanau Attention + Cross Entropy, test accuracy 89.28%
BiRNN Seq2Seq + Bahdanau Attention + CTC, test accuracy 86.35%
BiRNN Seq2Seq + Luong Attention + CTC, test accuracy 80.30%
CNN RNN + Bahdanau Attention, test accuracy 80.23%
Dilated CNN RNN, test accuracy 31.60%
Wavenet, test accuracy 75.11%
Deep Speech 2, test accuracy 81.40%
Wav2Vec Transfer learning BiRNN LSTM, test accuracy 83.24%

Spelling correction

Complete list (4 notebooks)

BERT-Base
XLNET-Base
BERT-Base Fast
BERT-Base accurate

SQUAD Question-Answers

Trained onSQUAD Dataset.

Complete list (1 notebooks)

BERT,

{"exact_match":77.57805108798486,"f1":86.18327335287402}

Stemming

Trained onEnglish Lemmatization.

Complete list (6 notebooks)

LSTM + Seq2Seq + Beam
GRU + Seq2Seq + Beam
LSTM + BiRNN + Seq2Seq + Beam
GRU + BiRNN + Seq2Seq + Beam
DNC + Seq2Seq + Greedy
BiRNN + Bahdanau + Copynet

Text Augmentation

Complete list (8 notebooks)

Pretrained Glove
GRU VAE-seq2seq-beam TF-probability
LSTM VAE-seq2seq-beam TF-probability
GRU VAE-seq2seq-beam + Bahdanau Attention TF-probability
VAE + Deterministic Bahdanau Attention,https://github.com/HareeshBahuleyan/tf-var-attention
VAE + VAE Bahdanau Attention,https://github.com/HareeshBahuleyan/tf-var-attention
BERT-Base + Nucleus Sampling
XLNET-Base + Nucleus Sampling