Intuition about project input embedding tokens to queries, keys and values#919

krishnan-duraisamy started this conversation inGeneral

krishnan-duraisamy

Nov 28, 2025

· 0 comments

Return to top

Discussion options

krishnan-duraisamy
Nov 28, 2025

In Section 3.4.1 we have this definition -These three matrices are used to project the embedded input tokens, x(i), into query, key, and value vectors, respectively, as illustrated in figure 3.14

These weight matrices are then initialized later to random tensors like so:

torch.manual_seed(123)W_query = torch.nn.Parameter(torch.rand(d_in, d_out), requires_grad=False)W_key   = torch.nn.Parameter(torch.rand(d_in, d_out), requires_grad=False)W_value = torch.nn.Parameter(torch.rand(d_in, d_out), requires_grad=False)

Where then or how is the input being projected then to these respective weight matrices?
Is the intuition behind the split to also reduce dimensionality?

You must be logged in to vote

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intuition about project input embedding tokens to queries, keys and values#919

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

krishnan-duraisamy
Nov 28, 2025

Replies: 0 comments

Select a reply

Uh oh!

Movatterモバイル変換

Intuition about project input embedding tokens to queries, keys and values#919

Uh oh!

Uh oh!

krishnan-duraisamyNov 28, 2025

Replies: 0 comments

Uh oh!

krishnan-duraisamy
Nov 28, 2025