Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the amount of padding and increases the processing speed. However, despite the fact that mini-batch creation is an essential step in NMT training, widely used NMT toolkits implement disparate strategies for doing so, which have not been empirically validated or compared. This work investigates mini-batch creation strategies with experiments over two different datasets. Our results suggest that the choice of a mini-batch creation strategy has a large effect on NMT training and some length-based sorting strategies do not always work well compared with simple shuffling.
@inproceedings{morishita-etal-2017-empirical, title = "An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation", author = "Morishita, Makoto and Oda, Yusuke and Neubig, Graham and Yoshino, Koichiro and Sudoh, Katsuhito and Nakamura, Satoshi", editor = "Luong, Thang and Birch, Alexandra and Neubig, Graham and Finch, Andrew", booktitle = "Proceedings of the First Workshop on Neural Machine Translation", month = aug, year = "2017", address = "Vancouver", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/W17-3208/", doi = "10.18653/v1/W17-3208", pages = "61--68", abstract = "Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the amount of padding and increases the processing speed. However, despite the fact that mini-batch creation is an essential step in NMT training, widely used NMT toolkits implement disparate strategies for doing so, which have not been empirically validated or compared. This work investigates mini-batch creation strategies with experiments over two different datasets. Our results suggest that the choice of a mini-batch creation strategy has a large effect on NMT training and some length-based sorting strategies do not always work well compared with simple shuffling."}
<?xml version="1.0" encoding="UTF-8"?><modsCollection xmlns="http://www.loc.gov/mods/v3"><mods ID="morishita-etal-2017-empirical"> <titleInfo> <title>An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation</title> </titleInfo> <name type="personal"> <namePart type="given">Makoto</namePart> <namePart type="family">Morishita</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yusuke</namePart> <namePart type="family">Oda</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Graham</namePart> <namePart type="family">Neubig</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Koichiro</namePart> <namePart type="family">Yoshino</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Katsuhito</namePart> <namePart type="family">Sudoh</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Satoshi</namePart> <namePart type="family">Nakamura</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2017-08</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the First Workshop on Neural Machine Translation</title> </titleInfo> <name type="personal"> <namePart type="given">Thang</namePart> <namePart type="family">Luong</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Alexandra</namePart> <namePart type="family">Birch</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Graham</namePart> <namePart type="family">Neubig</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Andrew</namePart> <namePart type="family">Finch</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Vancouver</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the amount of padding and increases the processing speed. However, despite the fact that mini-batch creation is an essential step in NMT training, widely used NMT toolkits implement disparate strategies for doing so, which have not been empirically validated or compared. This work investigates mini-batch creation strategies with experiments over two different datasets. Our results suggest that the choice of a mini-batch creation strategy has a large effect on NMT training and some length-based sorting strategies do not always work well compared with simple shuffling.</abstract> <identifier type="citekey">morishita-etal-2017-empirical</identifier> <identifier type="doi">10.18653/v1/W17-3208</identifier> <location> <url>https://aclanthology.org/W17-3208/</url> </location> <part> <date>2017-08</date> <extent unit="page"> <start>61</start> <end>68</end> </extent> </part></mods></modsCollection>
%0 Conference Proceedings%T An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation%A Morishita, Makoto%A Oda, Yusuke%A Neubig, Graham%A Yoshino, Koichiro%A Sudoh, Katsuhito%A Nakamura, Satoshi%Y Luong, Thang%Y Birch, Alexandra%Y Neubig, Graham%Y Finch, Andrew%S Proceedings of the First Workshop on Neural Machine Translation%D 2017%8 August%I Association for Computational Linguistics%C Vancouver%F morishita-etal-2017-empirical%X Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the amount of padding and increases the processing speed. However, despite the fact that mini-batch creation is an essential step in NMT training, widely used NMT toolkits implement disparate strategies for doing so, which have not been empirically validated or compared. This work investigates mini-batch creation strategies with experiments over two different datasets. Our results suggest that the choice of a mini-batch creation strategy has a large effect on NMT training and some length-based sorting strategies do not always work well compared with simple shuffling.%R 10.18653/v1/W17-3208%U https://aclanthology.org/W17-3208/%U https://doi.org/10.18653/v1/W17-3208%P 61-68