Commitbcc73f7

authored

n_heads × d_head -> d_head × d_head in DeltaNet (#903)

Clarified the explanation of the memory size calculation for `KV_cache_DeltaNet` and updated the quadratic term from `n_heads × d_head` to `d_head × d_head`.

1 parent488bef7 commitbcc73f7Copy full SHA for bcc73f7

File tree

1 file changed

-1

lines changed

ch04/08_deltanet
- README.md

1 file changed

-1

lines changed

`‎ch04/08_deltanet/README.md‎`

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -331,7 +331,7 @@ For the simplified DeltaNet version implemented above, we have:`
`331`	`331`	`KV_cache_DeltaNet = batch_size × n_heads × d_head × d_head × bytes`
`332`	`332`	```
`333`	`333`
`334`		-Note that the`KV_cache_DeltaNet` memory size doesn't have a context length (`n_tokens`) dependency. Also, we have only the memory state S that we store instead of separate keys and values, hence`2 × bytes` becomes just`bytes`. However, note that we now have a quadratic`n_heads × d_head` in here. This comes from the state :
	`334`	+Note that the`KV_cache_DeltaNet` memory size doesn't have a context length (`n_tokens`) dependency. Also, we have only the memory state S that we store instead of separate keys and values, hence`2 × bytes` becomes just`bytes`. However, note that we now have a quadratic`d_head × d_head` in here. This comes from the state :
`335`	`335`
`336`	`336`	```
`337`	`337`	`S = x.new_zeros(b, self.num_heads, self.head_dim, self.head_dim)`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitbcc73f7

File tree

1 file changed

1 file changed

`‎ch04/08_deltanet/README.md‎`

0 commit comments