Movatterモバイル変換

BERT

维基百科，自由的百科全书

基于变换器的双向编码器表示技术（英語：Bidirectional Encoder Representations from Transformers，BERT）是用于自然语言处理（NLP）的预训练技术，由Google提出。^[1]^[2]2018年，雅各布·德夫林和同事创建并发布了BERT。Google正在利用BERT来更好地理解用户搜索语句的语义。^[3]2020年的一项文献调查得出结论：「在一年多一点的时间里，BERT已经成为NLP实验中无处不在的基线」，算上分析和改进模型的研究出版物超过150篇。^[4]

最初的英语BERT发布时提供两种类型的预训练模型^[1]：（1）BERT_BASE模型，一个12层，768维，12个自注意头（self attention head），110M参数的神经网络结构；（2）BERT_LARGE模型，一个24层，1024维，16个自注意头，340M参数的神经网络结构。两者的训练语料都是BookCorpus^[5]以及英語維基百科语料，单词量分别是8億以及25億。^[6]

结构

[编辑]

BERT的核心部分是一个Transformer模型，其中编码层数和自注意力头数量可变。结构与Vaswani等人(2017)^[7]的实现几乎“完全一致”。

BERT在两个任务上进行预训练：语言模型（15%的token被掩盖，BERT需要从上下文中进行推断）和下一句预测（BERT需要预测给定的第二个句子是否是第一句的下一句）。训练完成后，BERT学习到单词的上下文嵌入。代价昂贵的预训练完成后，BERT可以使用较少的资源和较小的数据集在下游任务上进行微调，以改进在这些任务上的性能。^[1]^[8]

性能及分析

[编辑]

BERT在以下自然语言理解任务上的性能表现得最为卓越：^[1]

GLUE（General Language Understanding Evaluation，通用语言理解评估）任务集（包括9个任务）。
SQuAD（Stanford Question Answering Dataset，斯坦福问答数据集）v1.1和v2.0。
SWAG（Situations With Adversarial Generation，对抗生成的情境）。

有關BERT在上述自然语言理解任务中為何可以達到先进水平，目前還未找到明確的原因^[9]^[10]。目前BERT的可解释性研究主要集中在研究精心选择的输入序列对BERT的输出的影响关系，^[11]^[12]通过探测分类器分析内部向量表示，^[13]^[14]以及注意力权重表示的关系。^[9]^[10]

历史

[编辑]

BERT起源于预训练的上下文表示学习，包括半监督序列学习（Semi-supervised Sequence Learning）^[15]，生成预训练（Generative Pre-Training），ELMo（英语：ELMo）^[16]和ULMFit^[17]。与之前的模型不同，BERT是一种深度双向的、无监督的语言表示，且仅使用纯文本语料库进行预训练的模型。上下文无关模型（如word2vec或GloVe（英语：GloVe））为词汇表中的每个单词生成一个词向量表示，因此容易出现单词的歧义问题。BERT考虑到单词出现时的上下文。例如，词“水分”的word2vec词向量在“植物需要吸收水分”和“财务报表裡有水分”是相同的，但BERT根据上下文的不同提供不同的词向量，词向量与句子表达的句意有关。

2019年10月25日，Google搜索宣布他们已经开始在美国国内的英语搜索查询中应用BERT模型。^[18]2019年12月9日，据报道，Google搜索已经在70多种语言的搜索采用了BERT。^[19]2020年10月，几乎每一个基于英语的查询都由BERT处理。^[20]

获奖情况

[编辑]

在2019年计算语言学协会北美分会（NAACL（英语：North American Chapter of the Association for Computational Linguistics））年会上，BERT获得了最佳长篇论文奖。^[21]

参见

[编辑]

Transformer模型
Word2vec
自编码器
文献-检索词矩阵（英语：Document-term matrix）
特征提取
特征学习
神经网络语言模型（英语：Neural network language model）
向量空间模型
概念向量（英语：Thought vector）
fastText（英语：fastText）
GloVe（英语：GloVe）
TensorFlow

参考文献

[编辑]

^^1.0 ^1.1 ^1.2 ^1.3Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018-10-11.arXiv:1810.04805v2  [cs.CL].
^Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Google AI Blog. [2019-11-27]. （原始内容存档于2021-01-13）（英语）.
^Understanding searches better than ever before. Google. 2019-10-25 [2019-11-27]. （原始内容存档于2021-01-27）（英语）.
^Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna.A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics. 2020,8: 842–866 [2021-11-24].doi:10.1162/tacl_a_00349. （原始内容存档于2022-04-03）.
^Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books: 19–27. 2015.arXiv:1506.06724  [cs.CV]. 缺少或|url=为空 (帮助)
^Annamoradnejad, Issa. ColBERT: Using BERT Sentence Embedding for Humor Detection. 2020-04-27.arXiv:2004.12765  [cs.CL].
^Polosukhin, Illia; Kaiser, Lukasz; Gomez, Aidan N.; Jones, Llion; Uszkoreit, Jakob; Parmar, Niki; Shazeer, Noam; Vaswani, Ashish. Attention Is All You Need. 2017-06-12.arXiv:1706.03762  [cs.CL].
^Horev, Rani.BERT Explained: State of the art language model for NLP. Towards Data Science. 2018 [27 September 2021]. （原始内容存档于2022-10-17）.
^^9.0 ^9.1Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna.Revealing the Dark Secrets of BERT. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). November 2019: 4364–4373 [2020-10-19].doi:10.18653/v1/D19-1445. （原始内容存档于2020-10-20）（美国英语）.
^^10.0 ^10.1Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. What Does BERT Look at? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2019: 276–286.
^Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 284–294.Bibcode:2018arXiv180504623K.arXiv:1805.04623 .doi:10.18653/v1/p18-1027.
^Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco. Colorless Green Recurrent Networks Dream Hierarchically. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 1195–1205.Bibcode:2018arXiv180311138G.arXiv:1803.11138 .doi:10.18653/v1/n18-1108.
^Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 240–248.Bibcode:2018arXiv180808079G.arXiv:1808.08079 .doi:10.18653/v1/w18-5426.
^Zhang, Kelly; Bowman, Samuel. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 359–361.doi:10.18653/v1/w18-5448.
^Dai, Andrew; Le, Quoc. Semi-supervised Sequence Learning. 2015-11-04.arXiv:1511.01432  [cs.LG].
^Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer. Deep contextualized word representations. 2018-02-15.arXiv:1802.05365v2  [cs.CL].
^Howard, Jeremy; Ruder, Sebastian. Universal Language Model Fine-tuning for Text Classification. 2018-01-18.arXiv:1801.06146v5  [cs.CL].
^Nayak, Pandu.Understanding searches better than ever before. Google Blog. 2019-10-25 [2019-12-10]. （原始内容存档于2019-12-05）.
^Montti, Roger.Google's BERT Rolls Out Worldwide. Search Engine Journal. Search Engine Journal. 2019-12-10 [2019-12-10]. （原始内容存档于2020-11-29）.
^Google: BERT now used on almost every English query. Search Engine Land. 2020-10-15 [2020-11-24]. （原始内容存档于2022-05-06）.
^Best Paper Awards. NAACL. 2019 [2020-03-28]. （原始内容存档于2020-10-19）.

外部链接

[编辑]

官方GitHub仓库（页面存档备份，存于互联网档案馆）

查论编自然语言处理
一般术语	语料库口语语料库停用词词袋人工智慧完全（英语：AI-complete） n元语法（双字母组、三元语法（英语：Trigrams））
文本挖掘	文本分割词性标注（英语：Part-of-speech tagging）拆句处理（英语：Shallow parsing）复合词处理（英语：Compound term processing）搭配提取（英语：Collocation extraction）词干提取词形还原命名实体识别指代文本情感分析概念挖掘（英语：Concept mining）语法分析词义消歧术语提取（英语：Terminology extraction）真实大小写处理（英语：Truecasing）
自动摘要（英语：Automatic summarization）	多文档摘要（英语：Multi-document summarization）句子抽取（英语：Sentence extraction）文本简化（英语：Text simplification）
分佈語義（英语：Distributional semantics）模型	潜在语义学 Seq2Seq模型 Word2vec 語言模型大型语言模型基础模型推理語言模型 LLaMA PaLM ChatGPT GPT-4 文心一言深度求索通義千問 Grok Gemini Copilot 词嵌入
机器翻译	電腦輔助翻譯基于实例（英语：Example-based machine translation）基于规则（英语：Rule-based machine translation）
自动识别与数据采集	语音识别语音合成光学字符识别自然语言生成提示工程
主题模型	弹珠分布（英语：Pachinko allocation）隐含狄利克雷分布潜在语义索引
计算机辅助审查（英语：Computer-assisted reviewing）	自动作文评分（英语：Automated essay scoring）语料库检索工具（英语：Concordancer）文法检查器（英语：Grammar checker）预测文本（英语：Predictive text）拼寫檢查语法猜测（英语：Syntax guessing）
自然语言用户界面（英语：Natural language user interface）	自动在线助手聊天機器人文字冒险游戏問答系統

查论编可微分计算
概论	可微分编程自動微分张量微积分信息几何统计流形神经形态工程（英语：Neuromorphic engineering）模式识别运算学习理论（英语：Computational learning theory）归纳偏置
概念	梯度下降 SGD（英语：Stochastic gradient descent）聚类回归过拟合注意力卷积損失函數反向传播激活函数 softmax sigmoid ReLU 正则化数据集扩散（英语：Diffusion process）自回归
应用	机器学习人工神经网络深度学习科学计算人工智能語言模型大型语言模型
硬件	TPU VPU IPU（英语：Graphcore）憶阻器 SpiNNaker（英语：SpiNNaker）
软件库	Theano TensorFlow Keras PyTorch Caffe JAX MindSpore（英语：MindSpore） Flux.jl（英语：Flux (machine-learning framework)）
主题计算机编程技术分类人工神经网络机器学习

Google AI

電腦程式

AlphaGo

版本	AlphaGo (2015) Master (2016) AlphaGo Zero (2017) AlphaZero (2017) MuZero (2019)
比賽	樊麾 (2015) 李世乭 (2016) 柯潔 (2017)
文化	AlphaGo世紀對決 (2017) The MANIAC (2023)

其他

AlphaFold (2018)
AlphaStar (2019)
AlphaDev (2023)
AlphaGeometry (2024)
AlphaGenome (2025)

機器學習

類神經網路	Inception (2014) WaveNet (2016) MobileNet (2017) Transformer (2017) EfficientNet (2019) Gato (2022)
其他	量子人工智能實驗室 TensorFlow 張量處理單元

生成式人工智慧

聊天機器人	Google助理 (2016) Sparrow (2022) Gemini (2023) Nano Banana
模型	BERT (2018) XLNet (2019) T5 (2019) LaMDA (2021) 絨鼠 (2022) PaLM (2022) Imagen (2023) Gemini (2023) VideoPoet (2024) Gemma (2024) Veo (2024)
其他	DreamBooth (2022) NotebookLM (2023) Vids (2024) Gemini Robotics (2025)

參見

检索自“https://zh.wikipedia.org/w/index.php?title=BERT&oldid=89795411”

分类：

隐藏分类：

[8]ページ先頭

查论编自然语言处理
一般术语	语料库口语语料库停用词词袋人工智慧完全（英语：AI-complete） n元语法（双字母组、三元语法（英语：Trigrams））
文本挖掘	文本分割词性标注（英语：Part-of-speech tagging）拆句处理（英语：Shallow parsing）复合词处理（英语：Compound term processing）搭配提取（英语：Collocation extraction）词干提取词形还原命名实体识别指代文本情感分析概念挖掘（英语：Concept mining）语法分析词义消歧术语提取（英语：Terminology extraction）真实大小写处理（英语：Truecasing）
自动摘要（英语：Automatic summarization）	多文档摘要（英语：Multi-document summarization）句子抽取（英语：Sentence extraction）文本简化（英语：Text simplification）
分佈語義（英语：Distributional semantics）模型	潜在语义学 Seq2Seq模型 Word2vec 語言模型大型语言模型基础模型推理語言模型 LLaMA PaLM ChatGPT GPT-4 文心一言深度求索通義千問 Grok Gemini Copilot 词嵌入
机器翻译	電腦輔助翻譯基于实例（英语：Example-based machine translation）基于规则（英语：Rule-based machine translation）
自动识别与数据采集	语音识别语音合成光学字符识别自然语言生成提示工程
主题模型	弹珠分布（英语：Pachinko allocation）隐含狄利克雷分布潜在语义索引
计算机辅助审查（英语：Computer-assisted reviewing）	自动作文评分（英语：Automated essay scoring）语料库检索工具（英语：Concordancer）文法检查器（英语：Grammar checker）预测文本（英语：Predictive text）拼寫檢查语法猜测（英语：Syntax guessing）
自然语言用户界面（英语：Natural language user interface）	自动在线助手聊天機器人文字冒险游戏問答系統