Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork5k
Open
Description
Updates
Per further discussion, the difference is intentional, but undocumented. It is a difference with the reference implementation from Google Big Vision.
Original Report
Fix location:
pytorch-image-models/timm/models/naflexvit.py
Line 1767 ina7c5368
cfg=NaFlexVitCfg( |
This causes the default to be "bicubic":
pos_embed_interp_mode:str='bicubic'# Interpolation mode for position embedding resizing |
Reference code showing "bilinear" interpolation:
https://github.com/google-research/big_vision/blob/0127fb6b337ee2a27bf4e54dea79cff176527356/big_vision/models/proj/image_text/naflex_vit.py#L67
After making this change, TIMM is able to forward siglip2 naflex with cosine similarly at each intermediate above 0.9999.