We provide several methods for sentence-alignment of texts with different complexity levels. Using the best of them, we sentence-align the Newsela corpora, thus providing large training materials for automatic text simplification (ATS) systems. We show that using this dataset, even the standard phrase-based statistical machine translation models for ATS can outperform the state-of-the-art ATS systems.
Sanja Štajner, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso, and Heiner Stuckenschmidt. 2017.Sentence Alignment Methods for Improving Text Simplification Systems. InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 97–102, Vancouver, Canada. Association for Computational Linguistics.
@inproceedings{stajner-etal-2017-sentence, title = "Sentence Alignment Methods for Improving Text Simplification Systems", author = "{\v{S}}tajner, Sanja and Franco-Salvador, Marc and Ponzetto, Simone Paolo and Rosso, Paolo and Stuckenschmidt, Heiner", editor = "Barzilay, Regina and Kan, Min-Yen", booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)", month = jul, year = "2017", address = "Vancouver, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P17-2016/", doi = "10.18653/v1/P17-2016", pages = "97--102", abstract = "We provide several methods for sentence-alignment of texts with different complexity levels. Using the best of them, we sentence-align the Newsela corpora, thus providing large training materials for automatic text simplification (ATS) systems. We show that using this dataset, even the standard phrase-based statistical machine translation models for ATS can outperform the state-of-the-art ATS systems."}
%0 Conference Proceedings%T Sentence Alignment Methods for Improving Text Simplification Systems%A Štajner, Sanja%A Franco-Salvador, Marc%A Ponzetto, Simone Paolo%A Rosso, Paolo%A Stuckenschmidt, Heiner%Y Barzilay, Regina%Y Kan, Min-Yen%S Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)%D 2017%8 July%I Association for Computational Linguistics%C Vancouver, Canada%F stajner-etal-2017-sentence%X We provide several methods for sentence-alignment of texts with different complexity levels. Using the best of them, we sentence-align the Newsela corpora, thus providing large training materials for automatic text simplification (ATS) systems. We show that using this dataset, even the standard phrase-based statistical machine translation models for ATS can outperform the state-of-the-art ATS systems.%R 10.18653/v1/P17-2016%U https://aclanthology.org/P17-2016/%U https://doi.org/10.18653/v1/P17-2016%P 97-102
Sanja Štajner, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso, and Heiner Stuckenschmidt. 2017.Sentence Alignment Methods for Improving Text Simplification Systems. InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 97–102, Vancouver, Canada. Association for Computational Linguistics.