
エントリーの編集は全ユーザーに共通の機能です。
必ずガイドラインを一読の上ご利用ください。
ここにツイート内容が記載されますhttps://b.hatena.ne.jp/URLはspanで囲んでください
Twitterで共有ONにすると、次回以降このダイアログを飛ばしてTwitterに遷移します

Sinceits inception in "Attention Is All You Need", transformer architecture has led to revolutio...Sinceits inception in "Attention Is All You Need", transformer architecture has led to revolutionary advancements inNLP. The attention layer within the transformer admits a sequence of input tokens $X$ and makes them interact through pairwisesimilarities computed as softmax$(XQK^\top X^\top)$, where $(K,Q)$ are the trainable key-query parameters. In this work, we establish a formal equivalence







