基于变换器的双向编码器表示技术(英語:Bidirectional Encoder Representations from Transformers,BERT)是用于自然语言处理(NLP)的预训练技术,由Google提出。[1][2]2018年,雅各布·德夫林和同事创建并发布了BERT。Google正在利用BERT来更好地理解用户搜索语句的语义。[3]2020年的一项文献调查得出结论:「在一年多一点的时间里,BERT已经成为NLP实验中无处不在的基线」,算上分析和改进模型的研究出版物超过150篇。[4]
^1.01.11.21.3Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018-10-11.arXiv:1810.04805v2 [cs.CL].
^9.09.1Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna.Revealing the Dark Secrets of BERT. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). November 2019: 4364–4373 [2020-10-19].doi:10.18653/v1/D19-1445. (原始内容存档于2020-10-20)(美国英语).
^10.010.1Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. What Does BERT Look at? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2019: 276–286.
^Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 284–294.Bibcode:2018arXiv180504623K.arXiv:1805.04623.doi:10.18653/v1/p18-1027.
^Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco. Colorless Green Recurrent Networks Dream Hierarchically. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 1195–1205.Bibcode:2018arXiv180311138G.arXiv:1803.11138.doi:10.18653/v1/n18-1108.
^Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 240–248.Bibcode:2018arXiv180808079G.arXiv:1808.08079.doi:10.18653/v1/w18-5426.
^Zhang, Kelly; Bowman, Samuel. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 359–361.doi:10.18653/v1/w18-5448.
^Dai, Andrew; Le, Quoc. Semi-supervised Sequence Learning. 2015-11-04.arXiv:1511.01432 [cs.LG].
^Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer. Deep contextualized word representations. 2018-02-15.arXiv:1802.05365v2 [cs.CL].
^Howard, Jeremy; Ruder, Sebastian. Universal Language Model Fine-tuning for Text Classification. 2018-01-18.arXiv:1801.06146v5 [cs.CL].