Movatterモバイル変換

huggingface/pytorch-image-modelsPublic

NotificationsYou must be signed in to change notification settings
Fork5k
Star34.8k

Release v1.0.17

10 Jul 16:04

rwightman

v1.0.17

59fdbaa

Compare

Choose a tag to compare

Release v1.0.17Latest

Latest

July 7, 2025

MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)
- Add stem bias (zero'd in updated weights, compat break with old weights)
- GELU -> GELU (tanh approx). A minor change to be closer to JAX
Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold
Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32
Some typing, argument cleanup for norm, norm+act layers done with above
Support Naver ROPE-ViT (https://github.com/naver-ai/rope-vit) ineva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub

model	img_size	top1	top5	param_count
vit_large_patch16_rope_mixed_ape_224.naver_in1k	224	84.84	97.122	304.4
vit_large_patch16_rope_mixed_224.naver_in1k	224	84.828	97.116	304.2
vit_large_patch16_rope_ape_224.naver_in1k	224	84.65	97.154	304.37
vit_large_patch16_rope_224.naver_in1k	224	84.648	97.122	304.17
vit_base_patch16_rope_mixed_ape_224.naver_in1k	224	83.894	96.754	86.59
vit_base_patch16_rope_mixed_224.naver_in1k	224	83.804	96.712	86.44
vit_base_patch16_rope_ape_224.naver_in1k	224	83.782	96.61	86.59
vit_base_patch16_rope_224.naver_in1k	224	83.718	96.672	86.43
vit_small_patch16_rope_224.naver_in1k	224	81.23	95.022	21.98
vit_small_patch16_rope_mixed_224.naver_in1k	224	81.216	95.022	21.99
vit_small_patch16_rope_ape_224.naver_in1k	224	81.004	95.016	22.06
vit_small_patch16_rope_mixed_ape_224.naver_in1k	224	80.986	94.976	22.06

Some cleanup of ROPE modules, helpers, and FX tracing leaf registration
Preparing version 1.0.17 release

What's Changed

Adding Naver rope-vit compatibility to EVA ViT by@rwightman in#2529
Update no_grad usage to inference_mode if possible by@GuillaumeErhard in#2534
Add a min layer-decay scale clamp, and no optimization threshold to exclude groups from optimization by@rwightman in#2537
Add stem_bias option to MNV5. Resolve the norm layer so can pass string. by@rwightman in#2538
Add flag to enable float32 computation for normalization (norm + affine) by@rwightman in#2536
fix: mnv5 conv_stem bias and GELU with approximate=tanh by@RyanMullins in#2533
Fixup casting issues for weights/bias in fp32 norm layers by@rwightman in#2539
Fix H, W ordering for xy indexing in ROPE by@rwightman in#2541
Fix 3 typos in README.md by@robin-ede in#2544

New Contributors

@GuillaumeErhard made their first contribution in#2534
@RyanMullins made their first contribution in#2533
@robin-ede made their first contribution in#2544

Full Changelog:v1.0.16...v1.0.17

Contributors

RyanMullins, rwightman, and 2 other contributors

Assets2

4 people reacted

Release v1.0.16

26 Jun 18:44

rwightman

v1.0.16

7101adb

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Release v1.0.16

June 26, 2025

MobileNetV5 backbone (w/ encoder only variant) forGemma 3n image encoder
Version 1.0.16 released

June 23, 2025

Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example byhttps://github.com/stas-sl).
Further speed up patch embed resample by replacing vmap with matmul (based on snippet byhttps://github.com/stas-sl).
Add 3 initial native aspect NaFlexViT checkpoints created while testing, ImageNet-1k and 3 different pos embed configs w/ same hparams.

Model	Top-1 Acc	Top-5 Acc	Params (M)	Eval Seq Len
naflexvit_base_patch16_par_gap.e300_s576_in1k	83.67	96.45	86.63	576
naflexvit_base_patch16_parfac_gap.e300_s576_in1k	83.63	96.41	86.46	576
naflexvit_base_patch16_gap.e300_s576_in1k	83.50	96.46	86.63	576

Support gradient checkpointing forforward_intermediates and fix some checkpointing bugs. Thankshttps://github.com/brianhou0208
Add 'corrected weight decay' (https://arxiv.org/abs/2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers
Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly
Fix cuda stream bug in prefetch loader

June 5, 2025

Initial NaFlexVit model code. NaFlexVit is a Vision Transformer with:
1. Encapsulated embedding and position encoding in a single module
2. Support for nn.Linear patch embedding on pre-patchified (dictionary) inputs
3. Support for NaFlex variable aspect, variable resolution (SigLip-2:https://arxiv.org/abs/2502.14786)
4. Support for FlexiViT variable patch size (https://arxiv.org/abs/2212.08013)
5. Support for NaViT fractional/factorized position embedding (https://arxiv.org/abs/2307.06304)
Existing vit models invision_transformer.py can be loaded into the NaFlexVit model by adding theuse_naflex=True flag tocreate_model
- Some native weights coming soon
A full NaFlex data pipeline is available that allows training / fine-tuning / evaluating with variable aspect / size images
- To enable intrain.py andvalidate.py add the--naflex-loader arg, must be used with a NaFlexVit
To evaluate an existing (classic) ViT loaded in NaFlexVit model w/ NaFlex data pipe:
- python validate.py /imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256
The training has some extra args features worth noting
- The--naflex-train-seq-lens' argument specifies which sequence lengths to randomly pick from per batch during training
- The--naflex-max-seq-len argument sets the target sequence length for validation
- Adding--model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24 will enable random patch size selection per-batch w/ interpolation
- The--naflex-loss-scale arg changes loss scaling mode per batch relative to the batch size,timm NaFlex loading changes the batch size for each seq len

May 28, 2025

Add a number of small/fast models thanks tohttps://github.com/brianhou0208
Update EVA ViT (closest match) to support Perception Encoder models (https://arxiv.org/abs/2504.13181) from Meta, loading Hub weights but I still need to push dedicatedtimm weights
- Add some flexibility to ROPE impl
Big increase in number of models supportingforward_intermediates() and some additional fixes thanks tohttps://github.com/brianhou0208
- DaViT, EdgeNeXt, EfficientFormerV2, EfficientViT(MIT), EfficientViT(MSRA), FocalNet, GCViT, HGNet /V2, InceptionNeXt, Inception-V4, MambaOut, MetaFormer, NesT, Next-ViT, PiT, PVT V2, RepGhostNet, RepViT, ResNetV2, ReXNet, TinyViT, TResNet, VoV
TNT model updated w/ new weightsforward_intermediates() thanks tohttps://github.com/brianhou0208
Addlocal-dir: pretrained schema, can uselocal-dir:/path/to/model/folder for model name to source model / pretrained cfg & weights Hugging Face Hub models (config.json + weights file) from a local folder.
Fixes, improvements for onnx export

What's Changed

Fix arg merging of sknet, old seresnet. Fix#2470 by@rwightman in#2471
Fix onnx export by@rwightman in#2475
Add local-dir: schema support for model loading (config + weights) from folder by@rwightman in#2476
Fix: Allow img_size to be int or tuple in PatchEmbed by@sddongxh in#2477
Add LightlyTrain Integration for Pretraining Support by@yutong-xiang-97 in#2474
Check forward_intermediates features against forward_features output by@rwightman in#2483
More models support forward_intermediates by@brianhou0208 in#2482
Update README.md by@atharva-pathak in#2484
removedownload argument from torch_kwargs for torchvisionImageNet class by@ryan-caesar-ramos in#2486
Update TNT-(S/B) model weights and add feature extraction support by@brianhou0208 in#2480
Add EVA ViT based PE (Perceptual Encoder) impl by@rwightman in#2487
Add SwiftFormer, SHViT, StarNet, FasterNet and GhostNetV3 by@brianhou0208 in#2499
A cleaned up beit3 remap onto vision_transformer.py vit by@rwightman in#2503
Initial NaFlex ViT model and training support by@rwightman in#2466
Forgot to compact attention pool branches after verifying by@rwightman in#2507
Throw exception on non-directory path for pretrained weights by@emmanuel-ferdman in#2510
Add corrected_weight decay to several optimizers by@rwightman in#2511
Doing some Claude enabled docstring, type annotation and other cleanup by@rwightman in#2504
Fix#2513, be explicit about stream devices by@rwightman in#2515
Update legacy AdamW impl so it has a multi-tensor impl like NAdamW (n… by@rwightman in#2517
Fixhead_dim reference inAttentionRope class ofattention.py by@amorehead in#2519
Refactor patch and pos embed resampling based on feedback fromhttps://github.com/stas-sl by@rwightman in#2518
Add initial weights for my first 3 naflexvit_base models by@rwightman in#2523
Support gradient checkpointing inforward_intermediates() by@brianhou0208 in#2501
Update README: add references for additional supported models by@brianhou0208 in#2526
MobileNetV5 by@rwightman in#2527

New Contributors

@sddongxh made their first contribution in#2477
@yutong-xiang-97 made their first contribution in#2474
@atharva-pathak made their first contribution in#2484
@ryan-caesar-ramos made their first contribution in#2486
@emmanuel-ferdman made their first contribution in#2510
@amorehead made their first contribution in#2519

Full Changelog:v1.0.15...v1.0.16

Contributors

rwightman, amorehead, and 6 other contributors

Assets2

10 people reacted

Release v1.0.15

23 Feb 05:07

rwightman

v1.0.15

e44f14d

Compare

Choose a tag to compare

Release v1.0.15

Feb 21, 2025

SigLIP 2 ViT image encoders added (https://huggingface.co/collections/timm/siglip-2-67b8e72ba08b09dd97aecaf9)
- Variable resolution / aspect NaFlex versions are a WIP
Add 'SO150M2' ViT weights trained with SBB recipes, great results, better for ImageNet than previous attempt w/ less training.
- vit_so150m2_patch16_reg1_gap_448.sbb_e200_in12k_ft_in1k - 88.1% top-1
- vit_so150m2_patch16_reg1_gap_384.sbb_e200_in12k_ft_in1k - 87.9% top-1
- vit_so150m2_patch16_reg1_gap_256.sbb_e200_in12k_ft_in1k - 87.3% top-1
- vit_so150m2_patch16_reg4_gap_256.sbb_e200_in12k
Updated InternViT-300M '2.5' weights
Release 1.0.15

Feb 1, 2025

FYI PyTorch 2.6 & Python 3.13 are tested and working w/ current main and released version oftimm

Jan 27, 2025

Add Kron Optimizer (PSGD w/ Kronecker-factored preconditioner)
- Code fromhttps://github.com/evanatyourservice/kron_torch
- See alsohttps://sites.google.com/site/lixilinx/home/psgd

What's Changed

Fix metavar for--input-size by@JosuaRieder in#2417
Add arguments to the respective argument groups by@JosuaRieder in#2416
Add missing training flag to convert_sync_batchnorm by@collinmccarthy in#2423
Fix num_classes update in reset_classifier and RDNet forward head call by@brianhou0208 in#2421
timm: addall toinit by@adamjstewart in#2399
Fiddling with Kron (PSGD) optimizer by@rwightman in#2427
Try to force numpy<2.0 for torch 1.13 tests, update newest tested torch to 2.5.1 by@rwightman in#2429
Kron flatten improvements + stochastic weight decay by@rwightman in#2431
PSGD: unify RNG by@ClashLuke in#2433
Add vit so150m2 weights by@rwightman in#2439
adapt_input_conv: add type hints by@adamjstewart in#2441
SigLIP 2 by@rwightman in#2440
timm.models: explicitly export attributes by@adamjstewart in#2442

New Contributors

@collinmccarthy made their first contribution in#2423
@ClashLuke made their first contribution in#2433

Full Changelog:v1.0.14...v1.0.15

Contributors

collinmccarthy, rwightman, and 4 other contributors

Assets2

3 people reacted

Release v1.0.14

19 Jan 23:05

rwightman

v1.0.14

5d535d7

Compare

Choose a tag to compare

Release v1.0.14

Jan 19, 2025

Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated
Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft
- vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k - 86.7% top-1
- vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k - 87.4% top-1
- vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k
Misc typing, typo, etc. cleanup
1.0.14 release to get above LeViT fix out

What's Changed

Fix nn.Module type hints by@adamjstewart in#2400
Add missing paper title by@JosuaRieder in#2405
fix 'timm recipe scripts' link by@JosuaRieder in#2404
fix typo in EfficientNet docs by@JosuaRieder in#2403
disable abbreviating csv inference output with ellipses by@JosuaRieder in#2402
fix incorrect LaTeX formulas by@JosuaRieder in#2406
VGG ConvMlp: fix layer defaults/types by@adamjstewart in#2409
Implement --no-console-results in inference.py by@JosuaRieder in#2408
LeViT safetensors load is broken by conversion code that wasn't deactivated by@rwightman in#2412
A few more weights by@rwightman in#2413
Fix typos by@JosuaRieder in#2415

New Contributors

@adamjstewart made their first contribution in#2400

Full Changelog:v1.0.13...v1.0.14

Contributors

rwightman, adamjstewart, and JosuaRieder

Assets2

5 people reacted

Release v1.0.13

09 Jan 18:49

rwightman

v1.0.13

47811bc

Compare

Choose a tag to compare

Release v1.0.13

Jan 9, 2025

Add support to train and validate in purebfloat16 orfloat16
wandb project name arg added byhttps://github.com/caojiaolong, use arg.experiment for name
Fix old issue w/ checkpoint saving not working on filesystem w/o hard-link support (e.g. FUSE fs mounts)
1.0.13 release

Jan 6, 2025

Addtorch.utils.checkpoint.checkpoint() wrapper intimm.models that defaultsuse_reentrant=False, unlessTIMM_REENTRANT_CKPT=1 is set in env.

Dec 31, 2024

convnext_nano 384x384 ImageNet-12k pretrain & fine-tune.https://huggingface.co/models?search=convnext_nano%20r384
Add AIM-v2 encoders fromhttps://github.com/apple/ml-aim, see on Hub:https://huggingface.co/models?search=timm%20aimv2
Add PaliGemma2 encoders fromhttps://github.com/google-research/big_vision to existing PaliGemma, see on Hub:https://huggingface.co/models?search=timm%20pali2
Add missing L/14 DFN2B 39B CLIP ViT,vit_large_patch14_clip_224.dfn2b_s39b
Fix existingRmsNorm layer & fn to match standard formulation, use PT 2.5 impl when possible. Move old impl toSimpleNorm layer, it's LN w/o centering or bias. There were only twotimm models using it, and they have been updated.
Allow override ofcache_dir arg for model creation
Pass throughtrust_remote_code for HF datasets wrapper
inception_next_atto model added by creator
Adan optimizer caution, and Lamb decoupled weighgt decay options
Some feature_info metadata fixed byhttps://github.com/brianhou0208
All OpenCLIP and JAX (CLIP, SigLIP, Pali, etc) model weights that used load time remapping were given their own HF Hub instances so that they work withhf-hub: based loading, and thus will work with new TransformersTimmWrapperModel

What's Changed

Punch cache_dir through model factory / builder / pretrain helpers by@rwightman in#2356
Yuweihao inception next atto merge by@rwightman in#2360
Dataset trust remote tweaks by@rwightman in#2361
Add --dataset-trust-remote-code to the train.py and validate.py scripts by@grodino in#2328
Fix feature_info.reduction by@brianhou0208 in#2369
Add caution to Adan. Add decouple decay option to LAMB. by@rwightman in#2357
Switching to timm specific weight instances for open_clip image encoders by@rwightman in#2376
Fix broken image link inQuickstart doc by@ariG23498 in#2381
Supporting aimv2 encoders by@rwightman in#2379
fix: minor typos in markdowns by@ruidazeng in#2382
Add 384x384 in12k pretrain and finetune for convnext_nano by@rwightman in#2384
Fixed unfused attn2d scale by@laclouis5 in#2387
Fix MQA V2 by@laclouis5 in#2388
Wrap torch checkpoint() fn to default use_reentrant flag to False and allow env var override by@rwightman in#2394
Add half-precision (bfloat16, float16) support to train & validate scripts by@rwightman in#2397
Merging wandb project name chages w/ addition by@rwightman in#2398

New Contributors

@brianhou0208 made their first contribution in#2369
@ariG23498 made their first contribution in#2381
@ruidazeng made their first contribution in#2382
@laclouis5 made their first contribution in#2387

Full Changelog:v1.0.12...v1.0.13

Contributors

rwightman, grodino, and 4 other contributors

Assets2

2 people reacted

Release v1.0.12

03 Dec 19:05

rwightman

v1.0.12

553ded5

Compare

Choose a tag to compare

Release v1.0.12

Nov 28, 2024

More optimizers
- Add MARS optimizer (https://arxiv.org/abs/2411.10438,https://github.com/AGI-Arena/MARS)
- Add LaProp optimizer (https://arxiv.org/abs/2002.04839,https://github.com/Z-T-WANG/LaProp-Optimizer)
- Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085,https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW
- Cleanup some docstrings and type annotations re optimizers and factory
Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384
Add small cs3darknet, quite good for the speed
- https://huggingface.co/timm/cs3darknet_focus_s.ra4_e3600_r256_in1k

Nov 12, 2024

Optimizer factory refactor
- New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits
- Addlist_optimizers,get_optimizer_class,get_optimizer_info to reworkedcreate_optimizer_v2 fn to explore optimizers, get info or class
- deprecateoptim.optim_factory, move fns tooptim/_optim_factory.py andoptim/_param_groups.py and encourage import viatimm.optim
Add Adopt (https://github.com/iShohei220/adopt) optimizer
Add 'Big Vision' variant of Adafactor (https://github.com/google-research/big_vision/blob/main/big_vision/optax.py) optimizer
Fix original Adafactor to pick better factorization dims for convolutions
Tweak LAMB optimizer with some improvements in torch.where functionality since original, refactor clipping a bit
dynamic img size support in vit, deit, eva improved to support resize from non-square patch grids, thankshttps://github.com/wojtke

Oct 31, 2024

Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weights. Seehttps://huggingface.co/blog/rwightman/resnet-trick-or-treat

Oct 19, 2024

Cleanup torch amp usage to avoid cuda specific calls, merge support for Ascend (NPU) devices fromMengqingCao that should work now in PyTorch 2.5 w/ new device extension autoloading feature. Tested Intel Arc (XPU) in Pytorch 2.5 too and it (mostly) worked.

What's Changed

mambaout.py: fixed bug by@NightMachinery in#2305
Cleanup some amp related behaviour to better support different (non-cuda) devices by@rwightman in#2308
Add NPU backend support for val and inference by@MengqingCao in#2109
Update some clip pretrained weights to point to new hub locations by@rwightman in#2311
ResNet vs MNV4 v1/v2 18 & 34 weights by@rwightman in#2316
Replace deprecated positional argument with --data-dir by@JosuaRieder in#2322
Fix typo in train.py: bathes > batches by@JosuaRieder in#2321
Fix positional embedding resampling for non-square inputs in ViT by@wojtke in#2317
Add trust_remote_code argument to ReaderHfds by@grodino in#2326
Extend train epoch schedule by warmup_epochs if warmup_prefix enabled by@rwightman in#2325
Extend existing unit tests using Cover-Agent by@mrT23 in#2331
An impl of adafactor as per big vision (scaling vit) changes by@rwightman in#2320
Add py.typed file as recommended by PEP 561 by@antoinebrl in#2252
Add CODE_OF_CONDUCT.md and CITATION.cff files by@AlinaImtiaz018 in#2333
Add some 384x384 small model weights by@rwightman in#2334
In dist training, update loss running avg every step, sync on log by@rwightman in#2340
Improve WandB logging by@sinahmr in#2341
A few weights to merge Friday by@rwightman in#2343
Update timm torchvision resnet weight urls to the updated urls in torchvision by@JohannesTheo in#2346
More optimizer updates, add MARS, LaProp, add Adopt fix and more by@rwightman in#2347
Cautious optimizer impl plus some typing cleanup. by@rwightman in#2349
Add cautious mars, improve test reliability by skipping grad diff for… by@rwightman in#2351
See if we can avoid some model / layer pickle issues with the aa attr in ConvNormAct by@rwightman in#2353

New Contributors

@MengqingCao made their first contribution in#2109
@JosuaRieder made their first contribution in#2322
@wojtke made their first contribution in#2317
@grodino made their first contribution in#2326
@AlinaImtiaz018 made their first contribution in#2333
@sinahmr made their first contribution in#2341
@JohannesTheo made their first contribution in#2346

Full Changelog:v1.0.11...v1.0.12

Contributors

rwightman, grodino, and 9 other contributors

Assets2

2 people reacted

v1.0.11 Release

16 Oct 21:19

rwightman

v1.0.11

8cb2548

Compare

Choose a tag to compare

v1.0.11 Release

Quick turnaround from 1.0.10 to fix an error impacting 3rd party packages that still import through a deprecated path that isn't tested.

Oct 16, 2024

Fix error on importing from deprecated pathtimm.models.registry, increased priority of existing deprecation warnings to be visible
Port weights of InternViT-300M (https://huggingface.co/OpenGVLab/InternViT-300M-448px) totimm asvit_intern300m_patch14_448

Oct 14, 2024

Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
Release 1.0.10

Oct 11, 2024

MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.

model	img_size	top1	top5	param_count
mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k	384	87.506	98.428	101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k	288	86.912	98.236	101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k	224	86.632	98.156	101.66
mambaout_base_tall_rw.sw_e500_in1k	288	84.974	97.332	86.48
mambaout_base_wide_rw.sw_e500_in1k	288	84.962	97.208	94.45
mambaout_base_short_rw.sw_e500_in1k	288	84.832	97.27	88.83
mambaout_base.in1k	288	84.72	96.93	84.81
mambaout_small_rw.sw_e450_in1k	288	84.598	97.098	48.5
mambaout_small.in1k	288	84.5	96.974	48.49
mambaout_base_wide_rw.sw_e500_in1k	224	84.454	96.864	94.45
mambaout_base_tall_rw.sw_e500_in1k	224	84.434	96.958	86.48
mambaout_base_short_rw.sw_e500_in1k	224	84.362	96.952	88.83
mambaout_base.in1k	224	84.168	96.68	84.81
mambaout_small.in1k	224	84.086	96.63	48.49
mambaout_small_rw.sw_e450_in1k	224	84.024	96.752	48.5
mambaout_tiny.in1k	288	83.448	96.538	26.55
mambaout_tiny.in1k	224	82.736	96.1	26.55
mambaout_kobe.in1k	288	81.054	95.718	9.14
mambaout_kobe.in1k	224	79.986	94.986	9.14
mambaout_femto.in1k	288	79.848	95.14	7.3
mambaout_femto.in1k	224	78.87	94.408	7.3

SigLIP SO400M ViT fine-tunes on ImageNet-1k @ 378x378, added 378x378 option for existing SigLIP 384x384 models
- vit_so400m_patch14_siglip_378.webli_ft_in1k - 89.42 top-1
- vit_so400m_patch14_siglip_gap_378.webli_ft_in1k - 89.03
SigLIP SO400M ViT encoder from recent multi-lingual (i18n) variant, patch16 @ 256x256 (https://huggingface.co/timm/ViT-SO400M-16-SigLIP-i18n-256). OpenCLIP update pending.
Add two ConvNeXt 'Zepto' models & weights (one w/ overlapped stem and one w/ patch stem). Uses RMSNorm, smaller than previous 'Atto', 2.2M params.
- convnext_zepto_rms_ols.ra4_e3600_r224_in1k - 73.20 top-1 @ 224
- convnext_zepto_rms.ra4_e3600_r224_in1k - 72.81 @ 224

Sept 2024

Add a suite of tiny test models for improved unit tests and niche low-resource applications (https://huggingface.co/blog/rwightman/timm-tiny-test)
Add MobileNetV4-Conv-Small (0.5x) model (https://huggingface.co/posts/rwightman/793053396198664)
- mobilenetv4_conv_small_050.e3000_r224_in1k - 65.81 top-1 @ 256, 64.76 @ 224
Add MobileNetV3-Large variants trained with MNV4 Small recipe
- mobilenetv3_large_150d.ra4_e3600_r256_in1k - 81.81 @ 320, 80.94 @ 256
- mobilenetv3_large_100.ra4_e3600_r224_in1k - 77.16 @ 256, 76.31 @ 224

Assets2

3 people reacted

Release v1.0.10

15 Oct 04:44

rwightman

v1.0.10

b4a9a16

Compare

Choose a tag to compare

Release v1.0.10

Oct 14, 2024

Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
Release 1.0.10

Oct 11, 2024

MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.

model	img_size	top1	top5	param_count
mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k	384	87.506	98.428	101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k	288	86.912	98.236	101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k	224	86.632	98.156	101.66
mambaout_base_tall_rw.sw_e500_in1k	288	84.974	97.332	86.48
mambaout_base_wide_rw.sw_e500_in1k	288	84.962	97.208	94.45
mambaout_base_short_rw.sw_e500_in1k	288	84.832	97.27	88.83
mambaout_base.in1k	288	84.72	96.93	84.81
mambaout_small_rw.sw_e450_in1k	288	84.598	97.098	48.5
mambaout_small.in1k	288	84.5	96.974	48.49
mambaout_base_wide_rw.sw_e500_in1k	224	84.454	96.864	94.45
mambaout_base_tall_rw.sw_e500_in1k	224	84.434	96.958	86.48
mambaout_base_short_rw.sw_e500_in1k	224	84.362	96.952	88.83
mambaout_base.in1k	224	84.168	96.68	84.81
mambaout_small.in1k	224	84.086	96.63	48.49
mambaout_small_rw.sw_e450_in1k	224	84.024	96.752	48.5
mambaout_tiny.in1k	288	83.448	96.538	26.55
mambaout_tiny.in1k	224	82.736	96.1	26.55
mambaout_kobe.in1k	288	81.054	95.718	9.14
mambaout_kobe.in1k	224	79.986	94.986	9.14
mambaout_femto.in1k	288	79.848	95.14	7.3
mambaout_femto.in1k	224	78.87	94.408	7.3

SigLIP SO400M ViT fine-tunes on ImageNet-1k @ 378x378, added 378x378 option for existing SigLIP 384x384 models
- vit_so400m_patch14_siglip_378.webli_ft_in1k - 89.42 top-1
- vit_so400m_patch14_siglip_gap_378.webli_ft_in1k - 89.03
SigLIP SO400M ViT encoder from recent multi-lingual (i18n) variant, patch16 @ 256x256 (https://huggingface.co/timm/ViT-SO400M-16-SigLIP-i18n-256). OpenCLIP update pending.
Add two ConvNeXt 'Zepto' models & weights (one w/ overlapped stem and one w/ patch stem). Uses RMSNorm, smaller than previous 'Atto', 2.2M params.
- convnext_zepto_rms_ols.ra4_e3600_r224_in1k - 73.20 top-1 @ 224
- convnext_zepto_rms.ra4_e3600_r224_in1k - 72.81 @ 224

Sept 2024

Add a suite of tiny test models for improved unit tests and niche low-resource applications (https://huggingface.co/blog/rwightman/timm-tiny-test)
Add MobileNetV4-Conv-Small (0.5x) model (https://huggingface.co/posts/rwightman/793053396198664)
- mobilenetv4_conv_small_050.e3000_r224_in1k - 65.81 top-1 @ 256, 64.76 @ 224
Add MobileNetV3-Large variants trained with MNV4 Small recipe
- mobilenetv3_large_150d.ra4_e3600_r256_in1k - 81.81 @ 320, 80.94 @ 256
- mobilenetv3_large_100.ra4_e3600_r224_in1k - 77.16 @ 256, 76.31 @ 224

Assets2

9 people reacted

Release v1.0.9

23 Aug 23:42

rwightman

v1.0.9

0727833

Compare

Choose a tag to compare

Release v1.0.9

Aug 21, 2024

Updated SBB ViT models trained on ImageNet-12k and fine-tuned on ImageNet-1k, challenging quite a number of much larger, slower models

model	top1	top5	param_count	img_size
vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k	87.438	98.256	64.11	384
vit_mediumd_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k	86.608	97.934	64.11	256
vit_betwixt_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k	86.594	98.02	60.4	384
vit_betwixt_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k	85.734	97.61	60.4	256

MobileNet-V1 1.25, EfficientNet-B1, & ResNet50-D weights w/ MNV4 baseline challenge recipe

model	top1	top5	param_count	img_size
resnet50d.ra4_e3600_r224_in1k	81.838	95.922	25.58	288
efficientnet_b1.ra4_e3600_r240_in1k	81.440	95.700	7.79	288
resnet50d.ra4_e3600_r224_in1k	80.952	95.384	25.58	224
efficientnet_b1.ra4_e3600_r240_in1k	80.406	95.152	7.79	240
mobilenetv1_125.ra4_e3600_r224_in1k	77.600	93.804	6.27	256
mobilenetv1_125.ra4_e3600_r224_in1k	76.924	93.234	6.27	224

Add SAM2 (HieraDet) backbone arch & weight loading support
Add Hiera Small weights trained w/ abswin pos embed on in12k & fine-tuned on 1k

model	top1	top5	param_count
hiera_small_abswin_256.sbb2_e200_in12k_ft_in1k	84.912	97.260	35.01
hiera_small_abswin_256.sbb2_pd_e200_in12k_ft_in1k	84.560	97.106	35.01

Aug 8, 2024

Add RDNet ('DenseNets Reloaded',https://arxiv.org/abs/2403.19588), thanksDonghyun Kim

Assets2

13 people reacted

Release v1.0.8

29 Jul 05:18

rwightman

v1.0.8

a6fe31b

Compare

Choose a tag to compare

Release v1.0.8

July 28, 2024

Addmobilenet_edgetpu_v2_m weights w/ra4 mnv4-small based recipe. 80.1% top-1 @ 224 and 80.7 @ 256.
Release 1.0.8

July 26, 2024

More MobileNet-v4 weights, ImageNet-12k pretrain w/ fine-tunes, and anti-aliased ConvLarge models

model	top1	top1_err	top5	top5_err	param_count	img_size
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k	84.99	15.01	97.294	2.706	32.59	544
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k	84.772	15.228	97.344	2.656	32.59	480
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k	84.64	15.36	97.114	2.886	32.59	448
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k	84.314	15.686	97.102	2.898	32.59	384
mobilenetv4_conv_aa_large.e600_r384_in1k	83.824	16.176	96.734	3.266	32.59	480
mobilenetv4_conv_aa_large.e600_r384_in1k	83.244	16.756	96.392	3.608	32.59	384
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k	82.99	17.01	96.67	3.33	11.07	320
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k	82.364	17.636	96.256	3.744	11.07	256

Impressive MobileNet-V1 and EfficientNet-B0 baseline challenges (https://huggingface.co/blog/rwightman/mobilenet-baselines)

model	top1	top1_err	top5	top5_err	param_count	img_size
efficientnet_b0.ra4_e3600_r224_in1k	79.364	20.636	94.754	5.246	5.29	256
efficientnet_b0.ra4_e3600_r224_in1k	78.584	21.416	94.338	5.662	5.29	224
mobilenetv1_100h.ra4_e3600_r224_in1k	76.596	23.404	93.272	6.728	5.28	256
mobilenetv1_100.ra4_e3600_r224_in1k	76.094	23.906	93.004	6.996	4.23	256
mobilenetv1_100h.ra4_e3600_r224_in1k	75.662	24.338	92.504	7.496	5.28	224
mobilenetv1_100.ra4_e3600_r224_in1k	75.382	24.618	92.312	7.688	4.23	224

Prototype ofset_input_size() added to vit and swin v1/v2 models to allow changing image size, patch size, window size after model creation.
Improved support in swin for different size handling, in addition toset_input_size,always_partition andstrict_img_size args have been added to__init__ to allow more flexible input size constraints
Fix out of order indices info for intermediate 'Getter' feature wrapper, check out or range indices for same.
Add severaltiny < .5M param models for testing that are actually trained on ImageNet-1k

model	top1	top1_err	top5	top5_err	param_count	img_size	crop_pct
test_efficientnet.r160_in1k	47.156	52.844	71.726	28.274	0.36	192	1.0
test_byobnet.r160_in1k	46.698	53.302	71.674	28.326	0.46	192	1.0
test_efficientnet.r160_in1k	46.426	53.574	70.928	29.072	0.36	160	0.875
test_byobnet.r160_in1k	45.378	54.622	70.572	29.428	0.46	160	0.875
test_vit.r160_in1k	42.0	58.0	68.664	31.336	0.37	192	1.0
test_vit.r160_in1k	40.822	59.178	67.212	32.788	0.37	160	0.875

Fix vit reg token init, thanksPromisery
Other misc fixes

June 24, 2024

3 more MobileNetV4 hyrid weights with different MQA weight init scheme

model	top1	top1_err	top5	top5_err	param_count	img_size
mobilenetv4_hybrid_large.ix_e600_r384_in1k	84.356	15.644	96.892	3.108	37.76	448
mobilenetv4_hybrid_large.ix_e600_r384_in1k	83.990	16.010	96.702	3.298	37.76	384
mobilenetv4_hybrid_medium.ix_e550_r384_in1k	83.394	16.606	96.760	3.240	11.07	448
mobilenetv4_hybrid_medium.ix_e550_r384_in1k	82.968	17.032	96.474	3.526	11.07	384
mobilenetv4_hybrid_medium.ix_e550_r256_in1k	82.492	17.508	96.278	3.722	11.07	320
mobilenetv4_hybrid_medium.ix_e550_r256_in1k	81.446	18.554	95.704	4.296	11.07	256

florence2 weight loading in DaViT model

Assets2

10 people reacted

Movatterモバイル変換

Uh oh!

Releases: huggingface/pytorch-image-models

Release v1.0.17

July 7, 2025

What's Changed

New Contributors

Contributors

Uh oh!

Release v1.0.16

June 26, 2025

June 23, 2025

June 5, 2025

May 28, 2025

What's Changed

New Contributors

Contributors

Uh oh!

Release v1.0.15

Feb 21, 2025

Feb 1, 2025

Jan 27, 2025

What's Changed

New Contributors

Contributors

Uh oh!

Release v1.0.14

Jan 19, 2025

What's Changed

New Contributors

Contributors

Uh oh!

Release v1.0.13

Jan 9, 2025

Jan 6, 2025

Dec 31, 2024

What's Changed

New Contributors

Contributors

Uh oh!

Release v1.0.12

Nov 28, 2024

Nov 12, 2024

Oct 31, 2024

Oct 19, 2024

What's Changed

New Contributors

Contributors

Uh oh!

v1.0.11 Release

Oct 16, 2024

Oct 14, 2024

Oct 11, 2024

Sept 2024

Uh oh!

Release v1.0.10

Oct 14, 2024

Oct 11, 2024

Sept 2024

Uh oh!

Release v1.0.9

Aug 21, 2024

Aug 8, 2024

Uh oh!

Release v1.0.8

July 28, 2024

July 26, 2024

June 24, 2024

Uh oh!