Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork5k
Releases: huggingface/pytorch-image-models
Release v1.0.17
Compare
July 7, 2025
- MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)
- Add stem bias (zero'd in updated weights, compat break with old weights)
- GELU -> GELU (tanh approx). A minor change to be closer to JAX
- Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold
- Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32
- Some typing, argument cleanup for norm, norm+act layers done with above
- Support Naver ROPE-ViT (https://github.com/naver-ai/rope-vit) in
eva.py
, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub
model | img_size | top1 | top5 | param_count |
---|---|---|---|---|
vit_large_patch16_rope_mixed_ape_224.naver_in1k | 224 | 84.84 | 97.122 | 304.4 |
vit_large_patch16_rope_mixed_224.naver_in1k | 224 | 84.828 | 97.116 | 304.2 |
vit_large_patch16_rope_ape_224.naver_in1k | 224 | 84.65 | 97.154 | 304.37 |
vit_large_patch16_rope_224.naver_in1k | 224 | 84.648 | 97.122 | 304.17 |
vit_base_patch16_rope_mixed_ape_224.naver_in1k | 224 | 83.894 | 96.754 | 86.59 |
vit_base_patch16_rope_mixed_224.naver_in1k | 224 | 83.804 | 96.712 | 86.44 |
vit_base_patch16_rope_ape_224.naver_in1k | 224 | 83.782 | 96.61 | 86.59 |
vit_base_patch16_rope_224.naver_in1k | 224 | 83.718 | 96.672 | 86.43 |
vit_small_patch16_rope_224.naver_in1k | 224 | 81.23 | 95.022 | 21.98 |
vit_small_patch16_rope_mixed_224.naver_in1k | 224 | 81.216 | 95.022 | 21.99 |
vit_small_patch16_rope_ape_224.naver_in1k | 224 | 81.004 | 95.016 | 22.06 |
vit_small_patch16_rope_mixed_ape_224.naver_in1k | 224 | 80.986 | 94.976 | 22.06 |
- Some cleanup of ROPE modules, helpers, and FX tracing leaf registration
- Preparing version 1.0.17 release
What's Changed
- Adding Naver rope-vit compatibility to EVA ViT by@rwightman in#2529
- Update no_grad usage to inference_mode if possible by@GuillaumeErhard in#2534
- Add a min layer-decay scale clamp, and no optimization threshold to exclude groups from optimization by@rwightman in#2537
- Add stem_bias option to MNV5. Resolve the norm layer so can pass string. by@rwightman in#2538
- Add flag to enable float32 computation for normalization (norm + affine) by@rwightman in#2536
- fix: mnv5 conv_stem bias and GELU with approximate=tanh by@RyanMullins in#2533
- Fixup casting issues for weights/bias in fp32 norm layers by@rwightman in#2539
- Fix H, W ordering for xy indexing in ROPE by@rwightman in#2541
- Fix 3 typos in README.md by@robin-ede in#2544
New Contributors
- @GuillaumeErhard made their first contribution in#2534
- @RyanMullins made their first contribution in#2533
- @robin-ede made their first contribution in#2544
Full Changelog:v1.0.16...v1.0.17
Assets2
Uh oh!
There was an error while loading.Please reload this page.
Release v1.0.16
7101adb
Compare
June 26, 2025
- MobileNetV5 backbone (w/ encoder only variant) forGemma 3n image encoder
- Version 1.0.16 released
June 23, 2025
- Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example byhttps://github.com/stas-sl).
- Further speed up patch embed resample by replacing vmap with matmul (based on snippet byhttps://github.com/stas-sl).
- Add 3 initial native aspect NaFlexViT checkpoints created while testing, ImageNet-1k and 3 different pos embed configs w/ same hparams.
Model | Top-1 Acc | Top-5 Acc | Params (M) | Eval Seq Len |
---|---|---|---|---|
naflexvit_base_patch16_par_gap.e300_s576_in1k | 83.67 | 96.45 | 86.63 | 576 |
naflexvit_base_patch16_parfac_gap.e300_s576_in1k | 83.63 | 96.41 | 86.46 | 576 |
naflexvit_base_patch16_gap.e300_s576_in1k | 83.50 | 96.46 | 86.63 | 576 |
- Support gradient checkpointing for
forward_intermediates
and fix some checkpointing bugs. Thankshttps://github.com/brianhou0208 - Add 'corrected weight decay' (https://arxiv.org/abs/2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers
- Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly
- Fix cuda stream bug in prefetch loader
June 5, 2025
- Initial NaFlexVit model code. NaFlexVit is a Vision Transformer with:
- Encapsulated embedding and position encoding in a single module
- Support for nn.Linear patch embedding on pre-patchified (dictionary) inputs
- Support for NaFlex variable aspect, variable resolution (SigLip-2:https://arxiv.org/abs/2502.14786)
- Support for FlexiViT variable patch size (https://arxiv.org/abs/2212.08013)
- Support for NaViT fractional/factorized position embedding (https://arxiv.org/abs/2307.06304)
- Existing vit models in
vision_transformer.py
can be loaded into the NaFlexVit model by adding theuse_naflex=True
flag tocreate_model
- Some native weights coming soon
- A full NaFlex data pipeline is available that allows training / fine-tuning / evaluating with variable aspect / size images
- To enable in
train.py
andvalidate.py
add the--naflex-loader
arg, must be used with a NaFlexVit
- To enable in
- To evaluate an existing (classic) ViT loaded in NaFlexVit model w/ NaFlex data pipe:
python validate.py /imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256
- The training has some extra args features worth noting
- The
--naflex-train-seq-lens'
argument specifies which sequence lengths to randomly pick from per batch during training - The
--naflex-max-seq-len
argument sets the target sequence length for validation - Adding
--model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24
will enable random patch size selection per-batch w/ interpolation - The
--naflex-loss-scale
arg changes loss scaling mode per batch relative to the batch size,timm
NaFlex loading changes the batch size for each seq len
- The
May 28, 2025
- Add a number of small/fast models thanks tohttps://github.com/brianhou0208
- SwiftFormer -(ICCV2023) SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
- FasterNet -(CVPR2023) Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks
- SHViT -(CVPR2024) SHViT: Single-Head Vision Transformer with Memory Efficient
- StarNet -(CVPR2024) Rewrite the Stars
- GhostNet-V3GhostNetV3: Exploring the Training Strategies for Compact Models
- Update EVA ViT (closest match) to support Perception Encoder models (https://arxiv.org/abs/2504.13181) from Meta, loading Hub weights but I still need to push dedicated
timm
weights- Add some flexibility to ROPE impl
- Big increase in number of models supporting
forward_intermediates()
and some additional fixes thanks tohttps://github.com/brianhou0208- DaViT, EdgeNeXt, EfficientFormerV2, EfficientViT(MIT), EfficientViT(MSRA), FocalNet, GCViT, HGNet /V2, InceptionNeXt, Inception-V4, MambaOut, MetaFormer, NesT, Next-ViT, PiT, PVT V2, RepGhostNet, RepViT, ResNetV2, ReXNet, TinyViT, TResNet, VoV
- TNT model updated w/ new weights
forward_intermediates()
thanks tohttps://github.com/brianhou0208 - Add
local-dir:
pretrained schema, can uselocal-dir:/path/to/model/folder
for model name to source model / pretrained cfg & weights Hugging Face Hub models (config.json + weights file) from a local folder. - Fixes, improvements for onnx export
What's Changed
- Fix arg merging of sknet, old seresnet. Fix#2470 by@rwightman in#2471
- Fix onnx export by@rwightman in#2475
- Add local-dir: schema support for model loading (config + weights) from folder by@rwightman in#2476
- Fix: Allow img_size to be int or tuple in PatchEmbed by@sddongxh in#2477
- Add LightlyTrain Integration for Pretraining Support by@yutong-xiang-97 in#2474
- Check forward_intermediates features against forward_features output by@rwightman in#2483
- More models support forward_intermediates by@brianhou0208 in#2482
- Update README.md by@atharva-pathak in#2484
- remove
download
argument from torch_kwargs for torchvisionImageNet
class by@ryan-caesar-ramos in#2486 - Update TNT-(S/B) model weights and add feature extraction support by@brianhou0208 in#2480
- Add EVA ViT based PE (Perceptual Encoder) impl by@rwightman in#2487
- Add SwiftFormer, SHViT, StarNet, FasterNet and GhostNetV3 by@brianhou0208 in#2499
- A cleaned up beit3 remap onto vision_transformer.py vit by@rwightman in#2503
- Initial NaFlex ViT model and training support by@rwightman in#2466
- Forgot to compact attention pool branches after verifying by@rwightman in#2507
- Throw exception on non-directory path for pretrained weights by@emmanuel-ferdman in#2510
- Add corrected_weight decay to several optimizers by@rwightman in#2511
- Doing some Claude enabled docstring, type annotation and other cleanup by@rwightman in#2504
- Fix#2513, be explicit about stream devices by@rwightman in#2515
- Update legacy AdamW impl so it has a multi-tensor impl like NAdamW (n… by@rwightman in#2517
- Fix
head_dim
reference inAttentionRope
class ofattention.py
by@amorehead in#2519 - Refactor patch and pos embed resampling based on feedback fromhttps://github.com/stas-sl by@rwightman in#2518
- Add initial weights for my first 3 naflexvit_base models by@rwightman in#2523
- Support gradient checkpointing in
forward_intermediates()
by@brianhou0208 in#2501 - Update README: add references for additional supported models by@brianhou0208 in#2526
- MobileNetV5 by@rwightman in#2527
New Contributors
- @sddongxh made their first contribution in#2477
- @yutong-xiang-97 made their first contribution in#2474
- @atharva-pathak made their first contribution in#2484
- @ryan-caesar-ramos made their first contribution in#2486
- @emmanuel-ferdman made their first contribution in#2510
- @amorehead made their first contribution in#2519
Full Changelog:v1.0.15...v1.0.16
Assets2
Uh oh!
There was an error while loading.Please reload this page.
Release v1.0.15
Compare
Feb 21, 2025
- SigLIP 2 ViT image encoders added (https://huggingface.co/collections/timm/siglip-2-67b8e72ba08b09dd97aecaf9)
- Variable resolution / aspect NaFlex versions are a WIP
- Add 'SO150M2' ViT weights trained with SBB recipes, great results, better for ImageNet than previous attempt w/ less training.
vit_so150m2_patch16_reg1_gap_448.sbb_e200_in12k_ft_in1k
- 88.1% top-1vit_so150m2_patch16_reg1_gap_384.sbb_e200_in12k_ft_in1k
- 87.9% top-1vit_so150m2_patch16_reg1_gap_256.sbb_e200_in12k_ft_in1k
- 87.3% top-1vit_so150m2_patch16_reg4_gap_256.sbb_e200_in12k
- Updated InternViT-300M '2.5' weights
- Release 1.0.15
Feb 1, 2025
- FYI PyTorch 2.6 & Python 3.13 are tested and working w/ current main and released version of
timm
Jan 27, 2025
- Add Kron Optimizer (PSGD w/ Kronecker-factored preconditioner)
What's Changed
- Fix metavar for
--input-size
by@JosuaRieder in#2417 - Add arguments to the respective argument groups by@JosuaRieder in#2416
- Add missing training flag to convert_sync_batchnorm by@collinmccarthy in#2423
- Fix num_classes update in reset_classifier and RDNet forward head call by@brianhou0208 in#2421
- timm: addall toinit by@adamjstewart in#2399
- Fiddling with Kron (PSGD) optimizer by@rwightman in#2427
- Try to force numpy<2.0 for torch 1.13 tests, update newest tested torch to 2.5.1 by@rwightman in#2429
- Kron flatten improvements + stochastic weight decay by@rwightman in#2431
- PSGD: unify RNG by@ClashLuke in#2433
- Add vit so150m2 weights by@rwightman in#2439
- adapt_input_conv: add type hints by@adamjstewart in#2441
- SigLIP 2 by@rwightman in#2440
- timm.models: explicitly export attributes by@adamjstewart in#2442
New Contributors
- @collinmccarthy made their first contribution in#2423
- @ClashLuke made their first contribution in#2433
Full Changelog:v1.0.14...v1.0.15
Assets2
Uh oh!
There was an error while loading.Please reload this page.
Release v1.0.14
Compare
Jan 19, 2025
- Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated
- Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft
vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k
- 86.7% top-1vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k
- 87.4% top-1vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k
- Misc typing, typo, etc. cleanup
- 1.0.14 release to get above LeViT fix out
What's Changed
- Fix nn.Module type hints by@adamjstewart in#2400
- Add missing paper title by@JosuaRieder in#2405
- fix 'timm recipe scripts' link by@JosuaRieder in#2404
- fix typo in EfficientNet docs by@JosuaRieder in#2403
- disable abbreviating csv inference output with ellipses by@JosuaRieder in#2402
- fix incorrect LaTeX formulas by@JosuaRieder in#2406
- VGG ConvMlp: fix layer defaults/types by@adamjstewart in#2409
- Implement --no-console-results in inference.py by@JosuaRieder in#2408
- LeViT safetensors load is broken by conversion code that wasn't deactivated by@rwightman in#2412
- A few more weights by@rwightman in#2413
- Fix typos by@JosuaRieder in#2415
New Contributors
- @adamjstewart made their first contribution in#2400
Full Changelog:v1.0.13...v1.0.14
Assets2
Uh oh!
There was an error while loading.Please reload this page.
Release v1.0.13
Compare
Jan 9, 2025
- Add support to train and validate in pure
bfloat16
orfloat16
wandb
project name arg added byhttps://github.com/caojiaolong, use arg.experiment for name- Fix old issue w/ checkpoint saving not working on filesystem w/o hard-link support (e.g. FUSE fs mounts)
- 1.0.13 release
Jan 6, 2025
- Add
torch.utils.checkpoint.checkpoint()
wrapper intimm.models
that defaultsuse_reentrant=False
, unlessTIMM_REENTRANT_CKPT=1
is set in env.
Dec 31, 2024
convnext_nano
384x384 ImageNet-12k pretrain & fine-tune.https://huggingface.co/models?search=convnext_nano%20r384- Add AIM-v2 encoders fromhttps://github.com/apple/ml-aim, see on Hub:https://huggingface.co/models?search=timm%20aimv2
- Add PaliGemma2 encoders fromhttps://github.com/google-research/big_vision to existing PaliGemma, see on Hub:https://huggingface.co/models?search=timm%20pali2
- Add missing L/14 DFN2B 39B CLIP ViT,
vit_large_patch14_clip_224.dfn2b_s39b
- Fix existing
RmsNorm
layer & fn to match standard formulation, use PT 2.5 impl when possible. Move old impl toSimpleNorm
layer, it's LN w/o centering or bias. There were only twotimm
models using it, and they have been updated. - Allow override of
cache_dir
arg for model creation - Pass through
trust_remote_code
for HF datasets wrapper inception_next_atto
model added by creator- Adan optimizer caution, and Lamb decoupled weighgt decay options
- Some feature_info metadata fixed byhttps://github.com/brianhou0208
- All OpenCLIP and JAX (CLIP, SigLIP, Pali, etc) model weights that used load time remapping were given their own HF Hub instances so that they work with
hf-hub:
based loading, and thus will work with new TransformersTimmWrapperModel
What's Changed
- Punch cache_dir through model factory / builder / pretrain helpers by@rwightman in#2356
- Yuweihao inception next atto merge by@rwightman in#2360
- Dataset trust remote tweaks by@rwightman in#2361
- Add --dataset-trust-remote-code to the train.py and validate.py scripts by@grodino in#2328
- Fix feature_info.reduction by@brianhou0208 in#2369
- Add caution to Adan. Add decouple decay option to LAMB. by@rwightman in#2357
- Switching to timm specific weight instances for open_clip image encoders by@rwightman in#2376
- Fix broken image link in
Quickstart
doc by@ariG23498 in#2381 - Supporting aimv2 encoders by@rwightman in#2379
- fix: minor typos in markdowns by@ruidazeng in#2382
- Add 384x384 in12k pretrain and finetune for convnext_nano by@rwightman in#2384
- Fixed unfused attn2d scale by@laclouis5 in#2387
- Fix MQA V2 by@laclouis5 in#2388
- Wrap torch checkpoint() fn to default use_reentrant flag to False and allow env var override by@rwightman in#2394
- Add half-precision (bfloat16, float16) support to train & validate scripts by@rwightman in#2397
- Merging wandb project name chages w/ addition by@rwightman in#2398
New Contributors
- @brianhou0208 made their first contribution in#2369
- @ariG23498 made their first contribution in#2381
- @ruidazeng made their first contribution in#2382
- @laclouis5 made their first contribution in#2387
Full Changelog:v1.0.12...v1.0.13
Assets2
Uh oh!
There was an error while loading.Please reload this page.
Release v1.0.12
Compare
Nov 28, 2024
- More optimizers
- Add MARS optimizer (https://arxiv.org/abs/2411.10438,https://github.com/AGI-Arena/MARS)
- Add LaProp optimizer (https://arxiv.org/abs/2002.04839,https://github.com/Z-T-WANG/LaProp-Optimizer)
- Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085,https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW
- Cleanup some docstrings and type annotations re optimizers and factory
- Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384
- Add small cs3darknet, quite good for the speed
Nov 12, 2024
- Optimizer factory refactor
- New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits
- Add
list_optimizers
,get_optimizer_class
,get_optimizer_info
to reworkedcreate_optimizer_v2
fn to explore optimizers, get info or class - deprecate
optim.optim_factory
, move fns tooptim/_optim_factory.py
andoptim/_param_groups.py
and encourage import viatimm.optim
- Add Adopt (https://github.com/iShohei220/adopt) optimizer
- Add 'Big Vision' variant of Adafactor (https://github.com/google-research/big_vision/blob/main/big_vision/optax.py) optimizer
- Fix original Adafactor to pick better factorization dims for convolutions
- Tweak LAMB optimizer with some improvements in torch.where functionality since original, refactor clipping a bit
- dynamic img size support in vit, deit, eva improved to support resize from non-square patch grids, thankshttps://github.com/wojtke
Oct 31, 2024
Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weights. Seehttps://huggingface.co/blog/rwightman/resnet-trick-or-treat
Oct 19, 2024
- Cleanup torch amp usage to avoid cuda specific calls, merge support for Ascend (NPU) devices fromMengqingCao that should work now in PyTorch 2.5 w/ new device extension autoloading feature. Tested Intel Arc (XPU) in Pytorch 2.5 too and it (mostly) worked.
What's Changed
- mambaout.py: fixed bug by@NightMachinery in#2305
- Cleanup some amp related behaviour to better support different (non-cuda) devices by@rwightman in#2308
- Add NPU backend support for val and inference by@MengqingCao in#2109
- Update some clip pretrained weights to point to new hub locations by@rwightman in#2311
- ResNet vs MNV4 v1/v2 18 & 34 weights by@rwightman in#2316
- Replace deprecated positional argument with --data-dir by@JosuaRieder in#2322
- Fix typo in train.py: bathes > batches by@JosuaRieder in#2321
- Fix positional embedding resampling for non-square inputs in ViT by@wojtke in#2317
- Add trust_remote_code argument to ReaderHfds by@grodino in#2326
- Extend train epoch schedule by warmup_epochs if warmup_prefix enabled by@rwightman in#2325
- Extend existing unit tests using Cover-Agent by@mrT23 in#2331
- An impl of adafactor as per big vision (scaling vit) changes by@rwightman in#2320
- Add py.typed file as recommended by PEP 561 by@antoinebrl in#2252
- Add CODE_OF_CONDUCT.md and CITATION.cff files by@AlinaImtiaz018 in#2333
- Add some 384x384 small model weights by@rwightman in#2334
- In dist training, update loss running avg every step, sync on log by@rwightman in#2340
- Improve WandB logging by@sinahmr in#2341
- A few weights to merge Friday by@rwightman in#2343
- Update timm torchvision resnet weight urls to the updated urls in torchvision by@JohannesTheo in#2346
- More optimizer updates, add MARS, LaProp, add Adopt fix and more by@rwightman in#2347
- Cautious optimizer impl plus some typing cleanup. by@rwightman in#2349
- Add cautious mars, improve test reliability by skipping grad diff for… by@rwightman in#2351
- See if we can avoid some model / layer pickle issues with the aa attr in ConvNormAct by@rwightman in#2353
New Contributors
- @MengqingCao made their first contribution in#2109
- @JosuaRieder made their first contribution in#2322
- @wojtke made their first contribution in#2317
- @grodino made their first contribution in#2326
- @AlinaImtiaz018 made their first contribution in#2333
- @sinahmr made their first contribution in#2341
- @JohannesTheo made their first contribution in#2346
Full Changelog:v1.0.11...v1.0.12
Assets2
Uh oh!
There was an error while loading.Please reload this page.
v1.0.11 Release
Compare
Quick turnaround from 1.0.10 to fix an error impacting 3rd party packages that still import through a deprecated path that isn't tested.
Oct 16, 2024
- Fix error on importing from deprecated path
timm.models.registry
, increased priority of existing deprecation warnings to be visible - Port weights of InternViT-300M (https://huggingface.co/OpenGVLab/InternViT-300M-448px) to
timm
asvit_intern300m_patch14_448
Oct 14, 2024
- Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
- Release 1.0.10
Oct 11, 2024
- MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.
- SigLIP SO400M ViT fine-tunes on ImageNet-1k @ 378x378, added 378x378 option for existing SigLIP 384x384 models
- SigLIP SO400M ViT encoder from recent multi-lingual (i18n) variant, patch16 @ 256x256 (https://huggingface.co/timm/ViT-SO400M-16-SigLIP-i18n-256). OpenCLIP update pending.
- Add two ConvNeXt 'Zepto' models & weights (one w/ overlapped stem and one w/ patch stem). Uses RMSNorm, smaller than previous 'Atto', 2.2M params.
- convnext_zepto_rms_ols.ra4_e3600_r224_in1k - 73.20 top-1 @ 224
- convnext_zepto_rms.ra4_e3600_r224_in1k - 72.81 @ 224
Sept 2024
- Add a suite of tiny test models for improved unit tests and niche low-resource applications (https://huggingface.co/blog/rwightman/timm-tiny-test)
- Add MobileNetV4-Conv-Small (0.5x) model (https://huggingface.co/posts/rwightman/793053396198664)
- mobilenetv4_conv_small_050.e3000_r224_in1k - 65.81 top-1 @ 256, 64.76 @ 224
- Add MobileNetV3-Large variants trained with MNV4 Small recipe
- mobilenetv3_large_150d.ra4_e3600_r256_in1k - 81.81 @ 320, 80.94 @ 256
- mobilenetv3_large_100.ra4_e3600_r224_in1k - 77.16 @ 256, 76.31 @ 224
Assets2
Uh oh!
There was an error while loading.Please reload this page.
Release v1.0.10
Compare
Oct 14, 2024
- Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
- Release 1.0.10
Oct 11, 2024
- MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.
- SigLIP SO400M ViT fine-tunes on ImageNet-1k @ 378x378, added 378x378 option for existing SigLIP 384x384 models
- SigLIP SO400M ViT encoder from recent multi-lingual (i18n) variant, patch16 @ 256x256 (https://huggingface.co/timm/ViT-SO400M-16-SigLIP-i18n-256). OpenCLIP update pending.
- Add two ConvNeXt 'Zepto' models & weights (one w/ overlapped stem and one w/ patch stem). Uses RMSNorm, smaller than previous 'Atto', 2.2M params.
- convnext_zepto_rms_ols.ra4_e3600_r224_in1k - 73.20 top-1 @ 224
- convnext_zepto_rms.ra4_e3600_r224_in1k - 72.81 @ 224
Sept 2024
- Add a suite of tiny test models for improved unit tests and niche low-resource applications (https://huggingface.co/blog/rwightman/timm-tiny-test)
- Add MobileNetV4-Conv-Small (0.5x) model (https://huggingface.co/posts/rwightman/793053396198664)
- mobilenetv4_conv_small_050.e3000_r224_in1k - 65.81 top-1 @ 256, 64.76 @ 224
- Add MobileNetV3-Large variants trained with MNV4 Small recipe
- mobilenetv3_large_150d.ra4_e3600_r256_in1k - 81.81 @ 320, 80.94 @ 256
- mobilenetv3_large_100.ra4_e3600_r224_in1k - 77.16 @ 256, 76.31 @ 224
Assets2
Uh oh!
There was an error while loading.Please reload this page.
Release v1.0.9
Compare
Aug 21, 2024
- Updated SBB ViT models trained on ImageNet-12k and fine-tuned on ImageNet-1k, challenging quite a number of much larger, slower models
model | top1 | top5 | param_count | img_size |
---|---|---|---|---|
vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k | 87.438 | 98.256 | 64.11 | 384 |
vit_mediumd_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k | 86.608 | 97.934 | 64.11 | 256 |
vit_betwixt_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k | 86.594 | 98.02 | 60.4 | 384 |
vit_betwixt_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k | 85.734 | 97.61 | 60.4 | 256 |
- MobileNet-V1 1.25, EfficientNet-B1, & ResNet50-D weights w/ MNV4 baseline challenge recipe
model | top1 | top5 | param_count | img_size |
---|---|---|---|---|
resnet50d.ra4_e3600_r224_in1k | 81.838 | 95.922 | 25.58 | 288 |
efficientnet_b1.ra4_e3600_r240_in1k | 81.440 | 95.700 | 7.79 | 288 |
resnet50d.ra4_e3600_r224_in1k | 80.952 | 95.384 | 25.58 | 224 |
efficientnet_b1.ra4_e3600_r240_in1k | 80.406 | 95.152 | 7.79 | 240 |
mobilenetv1_125.ra4_e3600_r224_in1k | 77.600 | 93.804 | 6.27 | 256 |
mobilenetv1_125.ra4_e3600_r224_in1k | 76.924 | 93.234 | 6.27 | 224 |
Add SAM2 (HieraDet) backbone arch & weight loading support
Add Hiera Small weights trained w/ abswin pos embed on in12k & fine-tuned on 1k
model | top1 | top5 | param_count |
---|---|---|---|
hiera_small_abswin_256.sbb2_e200_in12k_ft_in1k | 84.912 | 97.260 | 35.01 |
hiera_small_abswin_256.sbb2_pd_e200_in12k_ft_in1k | 84.560 | 97.106 | 35.01 |
Aug 8, 2024
- Add RDNet ('DenseNets Reloaded',https://arxiv.org/abs/2403.19588), thanksDonghyun Kim
Assets2
Uh oh!
There was an error while loading.Please reload this page.
Release v1.0.8
Compare
July 28, 2024
- Add
mobilenet_edgetpu_v2_m
weights w/ra4
mnv4-small based recipe. 80.1% top-1 @ 224 and 80.7 @ 256. - Release 1.0.8
July 26, 2024
- More MobileNet-v4 weights, ImageNet-12k pretrain w/ fine-tunes, and anti-aliased ConvLarge models
model | top1 | top1_err | top5 | top5_err | param_count | img_size |
---|---|---|---|---|---|---|
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k | 84.99 | 15.01 | 97.294 | 2.706 | 32.59 | 544 |
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k | 84.772 | 15.228 | 97.344 | 2.656 | 32.59 | 480 |
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k | 84.64 | 15.36 | 97.114 | 2.886 | 32.59 | 448 |
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k | 84.314 | 15.686 | 97.102 | 2.898 | 32.59 | 384 |
mobilenetv4_conv_aa_large.e600_r384_in1k | 83.824 | 16.176 | 96.734 | 3.266 | 32.59 | 480 |
mobilenetv4_conv_aa_large.e600_r384_in1k | 83.244 | 16.756 | 96.392 | 3.608 | 32.59 | 384 |
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k | 82.99 | 17.01 | 96.67 | 3.33 | 11.07 | 320 |
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k | 82.364 | 17.636 | 96.256 | 3.744 | 11.07 | 256 |
- Impressive MobileNet-V1 and EfficientNet-B0 baseline challenges (https://huggingface.co/blog/rwightman/mobilenet-baselines)
model | top1 | top1_err | top5 | top5_err | param_count | img_size |
---|---|---|---|---|---|---|
efficientnet_b0.ra4_e3600_r224_in1k | 79.364 | 20.636 | 94.754 | 5.246 | 5.29 | 256 |
efficientnet_b0.ra4_e3600_r224_in1k | 78.584 | 21.416 | 94.338 | 5.662 | 5.29 | 224 |
mobilenetv1_100h.ra4_e3600_r224_in1k | 76.596 | 23.404 | 93.272 | 6.728 | 5.28 | 256 |
mobilenetv1_100.ra4_e3600_r224_in1k | 76.094 | 23.906 | 93.004 | 6.996 | 4.23 | 256 |
mobilenetv1_100h.ra4_e3600_r224_in1k | 75.662 | 24.338 | 92.504 | 7.496 | 5.28 | 224 |
mobilenetv1_100.ra4_e3600_r224_in1k | 75.382 | 24.618 | 92.312 | 7.688 | 4.23 | 224 |
- Prototype of
set_input_size()
added to vit and swin v1/v2 models to allow changing image size, patch size, window size after model creation. - Improved support in swin for different size handling, in addition to
set_input_size
,always_partition
andstrict_img_size
args have been added to__init__
to allow more flexible input size constraints - Fix out of order indices info for intermediate 'Getter' feature wrapper, check out or range indices for same.
- Add several
tiny
< .5M param models for testing that are actually trained on ImageNet-1k
model | top1 | top1_err | top5 | top5_err | param_count | img_size | crop_pct |
---|---|---|---|---|---|---|---|
test_efficientnet.r160_in1k | 47.156 | 52.844 | 71.726 | 28.274 | 0.36 | 192 | 1.0 |
test_byobnet.r160_in1k | 46.698 | 53.302 | 71.674 | 28.326 | 0.46 | 192 | 1.0 |
test_efficientnet.r160_in1k | 46.426 | 53.574 | 70.928 | 29.072 | 0.36 | 160 | 0.875 |
test_byobnet.r160_in1k | 45.378 | 54.622 | 70.572 | 29.428 | 0.46 | 160 | 0.875 |
test_vit.r160_in1k | 42.0 | 58.0 | 68.664 | 31.336 | 0.37 | 192 | 1.0 |
test_vit.r160_in1k | 40.822 | 59.178 | 67.212 | 32.788 | 0.37 | 160 | 0.875 |
- Fix vit reg token init, thanksPromisery
- Other misc fixes
June 24, 2024
- 3 more MobileNetV4 hyrid weights with different MQA weight init scheme
model | top1 | top1_err | top5 | top5_err | param_count | img_size |
---|---|---|---|---|---|---|
mobilenetv4_hybrid_large.ix_e600_r384_in1k | 84.356 | 15.644 | 96.892 | 3.108 | 37.76 | 448 |
mobilenetv4_hybrid_large.ix_e600_r384_in1k | 83.990 | 16.010 | 96.702 | 3.298 | 37.76 | 384 |
mobilenetv4_hybrid_medium.ix_e550_r384_in1k | 83.394 | 16.606 | 96.760 | 3.240 | 11.07 | 448 |
mobilenetv4_hybrid_medium.ix_e550_r384_in1k | 82.968 | 17.032 | 96.474 | 3.526 | 11.07 | 384 |
mobilenetv4_hybrid_medium.ix_e550_r256_in1k | 82.492 | 17.508 | 96.278 | 3.722 | 11.07 | 320 |
mobilenetv4_hybrid_medium.ix_e550_r256_in1k | 81.446 | 18.554 | 95.704 | 4.296 | 11.07 | 256 |
- florence2 weight loading in DaViT model
Assets2
Uh oh!
There was an error while loading.Please reload this page.