Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitbcc73f7

Browse files
authored
n_heads × d_head -> d_head × d_head in DeltaNet (#903)
Clarified the explanation of the memory size calculation for `KV_cache_DeltaNet` and updated the quadratic term from `n_heads × d_head` to `d_head × d_head`.
1 parent488bef7 commitbcc73f7

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

‎ch04/08_deltanet/README.md‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -331,7 +331,7 @@ For the simplified DeltaNet version implemented above, we have:
331331
KV_cache_DeltaNet = batch_size × n_heads × d_head × d_head × bytes
332332
```
333333

334-
Note that the`KV_cache_DeltaNet` memory size doesn't have a context length (`n_tokens`) dependency. Also, we have only the memory state S that we store instead of separate keys and values, hence`2 × bytes` becomes just`bytes`. However, note that we now have a quadratic`n_heads × d_head` in here. This comes from the state :
334+
Note that the`KV_cache_DeltaNet` memory size doesn't have a context length (`n_tokens`) dependency. Also, we have only the memory state S that we store instead of separate keys and values, hence`2 × bytes` becomes just`bytes`. However, note that we now have a quadratic`d_head × d_head` in here. This comes from the state :
335335

336336
```
337337
S = x.new_zeros(b, self.num_heads, self.head_dim, self.head_dim)

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp