- Notifications
You must be signed in to change notification settings - Fork0
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
License
nng555/fairseq-ssmba
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Fairseq(-py) is a sequence modeling toolkit that allows researchers anddevelopers to train custom models for translation, summarization, languagemodeling and other text generation tasks.
We provide reference implementations of various sequence modeling papers:
List of implemented papers
- Convolutional Neural Networks (CNN)
- Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)
- Convolutional Sequence to Sequence Learning (Gehring et al., 2017)
- Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)
- Hierarchical Neural Story Generation (Fan et al., 2018)
- wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)
- LightConv and DynamicConv models
- Long Short-Term Memory (LSTM) networks
- Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)
- Transformer (self-attention) networks
- Attention Is All You Need (Vaswani et al., 2017)
- Scaling Neural Machine Translation (Ott et al., 2018)
- Understanding Back-Translation at Scale (Edunov et al., 2018)
- Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)
- Lexically constrained decoding with dynamic beam allocation (Post & Vilar, 2018)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (Dai et al., 2019)
- Adaptive Attention Span in Transformers (Sukhbaatar et al., 2019)
- Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)
- Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)
- Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)
- Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)
- Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)
- Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)
- Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)
- Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)
- Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)
- Deep Transformers with Latent Depth (Li et al., 2020)
- Non-autoregressive Transformers
- Non-Autoregressive Neural Machine Translation (Gu et al., 2017)
- Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018)
- Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019)
- Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)
- Levenshtein Transformer (Gu et al., 2019)
- Finetuning
- December 2020:GottBERT model and code released
- November 2020: Adopted theHydra configuration framework
- November 2020:fairseq 0.10.0 released
- October 2020:Added R3F/R4F (Better Fine-Tuning) code
- October 2020:Deep Transformer with Latent Depth code released
- October 2020:Added CRISS models and code
- September 2020:Added Linformer code
- September 2020:Added pointer-generator networks
- August 2020:Added lexically constrained decoding
- August 2020:wav2vec2 models and code released
- July 2020:Unsupervised Quality Estimation code released
Previous updates
- May 2020:Follow fairseq on Twitter
- April 2020:Monotonic Multihead Attention code released
- April 2020:Quant-Noise code released
- April 2020:Initial model parallel support and 11B parameters unidirectional LM released
- March 2020:Byte-level BPE code released
- February 2020:mBART model and code released
- February 2020:Added tutorial for back-translation
- December 2019:fairseq 0.9.0 released
- November 2019:VizSeq released (a visual analysis toolkit for evaluating fairseq models)
- November 2019:CamemBERT model and code released
- November 2019:BART model and code released
- November 2019:XLM-R models and code released
- September 2019:Nonautoregressive translation code released
- August 2019:WMT'19 models released
- July 2019: fairseq relicensed under MIT license
- July 2019:RoBERTa models and code released
- June 2019:wav2vec models and code released
- multi-GPU training on one machine or across multiple machines (data and model parallel)
- fast generation on both CPU and GPU with multiple search algorithms implemented:
- beam search
- Diverse Beam Search (Vijayakumar et al., 2016)
- sampling (unconstrained, top-k and top-p/nucleus)
- lexically constrained decoding (Post & Vilar, 2018)
- gradient accumulation enables training with large mini-batches even on a single GPU
- mixed precision training (trains faster with less GPU memory onNVIDIA tensor cores)
- extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
- flexible configuration based onHydra allowing a combination of code, command-line and file based configuration
We also providepre-trained models for translation and language modelingwith a convenienttorch.hub
interface:
en2de=torch.hub.load('pytorch/fairseq','transformer.wmt19.en-de.single_model')en2de.translate('Hello world',beam=5)# 'Hallo Welt'
See the PyTorch Hub tutorials fortranslationandRoBERTa for more examples.
- PyTorch version >= 1.5.0
- Python version >= 3.6
- For training new models, you'll also need an NVIDIA GPU andNCCL
- To install fairseq and develop locally:
git clone https://github.com/pytorch/fairseqcd fairseqpip install --editable ./# on MacOS:# CFLAGS="-stdlib=libc++" pip install --editable ./# to install the latest stable release (0.10.x)# pip install fairseq
- For faster training install NVIDIA'sapex library:
git clone https://github.com/NVIDIA/apexcd apexpip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \ --global-option="--deprecated_fused_adam" --global-option="--xentropy" \ --global-option="--fast_multihead_attn" ./
- For large datasets installPyArrow:
pip install pyarrow
- If you use Docker make sure to increase the shared memory size either with
--ipc=host
or--shm-size
as command line options tonvidia-docker run
.
Thefull documentation contains instructionsfor getting started, training new models and extending fairseq with new modeltypes and tasks.
We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below,as well as example training and evaluation commands.
- Translation: convolutional and transformer models are available
- Language Modeling: convolutional and transformer models are available
We also have more detailed READMEs to reproduce results from specific papers:
- Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)
- Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)
- Training with Quantization Noise for Extreme Model Compression ({Fan*, Stock*} et al., 2020)
- Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)
- Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)
- Reducing Transformer Depth on Demand with Structured Dropout (Fan et al., 2019)
- Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)
- Levenshtein Transformer (Gu et al., 2019)
- Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)
- wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)
- Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)
- Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)
- Understanding Back-Translation at Scale (Edunov et al., 2018)
- Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)
- Hierarchical Neural Story Generation (Fan et al., 2018)
- Scaling Neural Machine Translation (Ott et al., 2018)
- Convolutional Sequence to Sequence Learning (Gehring et al., 2017)
- Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)
- Twitter:https://twitter.com/fairseq
- Facebook page:https://www.facebook.com/groups/fairseq.users
- Google group:https://groups.google.com/forum/#!forum/fairseq-users
fairseq(-py) is MIT-licensed.The license applies to the pre-trained models as well.
Please cite as:
@inproceedings{ott2019fairseq,title ={fairseq: A Fast, Extensible Toolkit for Sequence Modeling},author ={Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},booktitle ={Proceedings of NAACL-HLT 2019: Demonstrations},year ={2019},}
About
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Resources
License
Code of conduct
Stars
Watchers
Forks
Packages0
Languages
- Python96.6%
- Shell1.4%
- Cuda1.3%
- Other0.7%