Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

License

NotificationsYou must be signed in to change notification settings

OpenGVLab/Vision-RWKV

Repository files navigation

The official implementation of "Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures".

News🚀🚀🚀

  • 2025/02/18: A new version of the CUDA code has been added in thecuda_new folder to eliminate the hardcoding ofT_MAX.
  • 2025/02/11: 🎊🎊 Vison-RWKV is accepted by ICLR 2025!
  • 2024/04/14: We support rwkv6 in classification task, higher performance!
  • 2024/03/04: We release the code and models of Vision-RWKV.

Highlights

  • High-Resolution Efficiency: Processed high-resolution images smoothly with a global receptive field.
  • Scalability: Pre-trained with large-scale datasets and posses scale up stablity.
  • Superior Performance: Achieved a better performance in classfication tasks than ViTs. Surpassed window-based ViTs and comparabled to global attention ViTs with lower flops and higher speed in dense prediction tasks.
  • Efficient Alternative: Capability to be an alternative backbone to ViT in comprehensive vision tasks.
image

Overview

image

Schedule

  • Support RWKV6 as VRWKV6
  • Release VRWKV-L
  • Release VRWKV-T/S/B

Model Zoo

Pretrained Models

ModelSizePretrainDownload
VRWKV-L192ImageNet-22Kckpt

Image Classification (ImageNet-1K)

ModelSize#Param#FLOPsTop-1 AccDownload
VRWKV-T2246.2M1.2G75.1ckpt |cfg
VRWKV-S22423.8M4.6G80.1ckpt |cfg
VRWKV-B22493.7M18.2G82.0ckpt |cfg
VRWKV-L384334.9M189.5G86.0ckpt |cfg
VRWKV6-T2247.6M1.6G76.6ckpt |cfg
VRWKV6-S22427.7M5.6G81.1ckpt |cfg
VRWKV6-B224104.9M20.9G82.6ckpt |cfg
  • VRWKV-L is pretrained on ImageNet-22K and then finetuned on ImageNet-1K.
  • We train VRWKV-L with the internimage codebase for a higher speed.

Object Detection with Mask-RCNN head (COCO)

Model#Param#FLOPsbox APmask APDownload
VRWKV-T8.4M67.9G41.738.0ckpt |cfg
VRWKV-S29.3M189.9G44.840.2ckpt |cfg
VRWKV-B106.6M599.0G46.841.7ckpt |cfg
VRWKV-L351.9M1730.6G50.644.9ckpt |cfg
  • We report the #Param and #FLOPs of the backbone in this table.

Semantic Segmentation with UperNet head (ADE20K)

Model#Param#FLOPsmIoUDownload
VRWKV-T8.4M16.6G43.3ckpt |cfg
VRWKV-S29.3M46.3G47.2ckpt |cfg
VRWKV-B106.6M146.0G49.2ckpt |cfg
VRWKV-L351.9M421.9G53.5ckpt |cfg
  • We report the #Param and #FLOPs of the backbone in this table.

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{duan2024vrwkv,title={Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures},author={Duan, Yuchen and Wang, Weiyun and Chen, Zhe and Zhu, Xizhou and Lu, Lewei and Lu, Tong and Qiao, Yu and Li, Hongsheng and Dai, Jifeng and Wang, Wenhai},journal={arXiv preprint arXiv:2403.02308},year={2024}}

License

This repository is released under the Apache 2.0 license as found in theLICENSE file.

Acknowledgement

Vision-RWKV is built with reference to the code of the following projects:RWKV,MMPretrain,MMDetection,MMSegmentation,ViT-Adapter,InternImage. Thanks for their awesome work!

About

[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp