Commit083508f

committed

cast attention matrix back to original dtype pre-softmax in attention

1 parent7762edd commit083508fCopy full SHA for 083508f

File tree

-1

lines changed

-1

lines changed

Lines changed: 3 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -879,6 +879,8 @@ def forward(self, x, mask = None, attn_bias = None):`
`879`	`879`	`# attention`
`880`	`880`
`881`	`881`	`attn=sim.softmax(dim=-1,dtype=torch.float32)`
	`882`	`+attn=attn.type(sim.dtype)`
	`883`	`+`
`882`	`884`	`attn=self.dropout(attn)`
`883`	`885`
`884`	`886`	`# aggregate values`
`@@ -1637,6 +1639,7 @@ def forward(self, x, context, mask = None):`
`1637`	`1639`	`sim=sim.masked_fill(~mask,max_neg_value)`
`1638`	`1640`
`1639`	`1641`	`attn=sim.softmax(dim=-1,dtype=torch.float32)`
	`1642`	`+attn=attn.type(sim.dtype)`
`1640`	`1643`
`1641`	`1644`	`out=einsum('b h i j, b h j d -> b h i d',attn,v)`
`1642`	`1645`	`out=rearrange(out,'b h n d -> b n (h d)')`

Lines changed: 1 addition & 1 deletion

Comments

(0)