OpenGVLab/Vision-RWKVPublic

NotificationsYou must be signed in to change notification settings
Fork17
Star428

[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

License

Apache-2.0 license

428 stars 17 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
classification		classification
classification_internimage		classification_internimage
detection		detection
segmentation		segmentation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

Vision-RWKV

The official implementation of "Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures".

News🚀🚀🚀

2025/02/18: A new version of the CUDA code has been added in thecuda_new folder to eliminate the hardcoding ofT_MAX.
2025/02/11: 🎊🎊 Vison-RWKV is accepted by ICLR 2025!
2024/04/14: We support rwkv6 in classification task, higher performance!
2024/03/04: We release the code and models of Vision-RWKV.

Highlights

High-Resolution Efficiency: Processed high-resolution images smoothly with a global receptive field.
Scalability: Pre-trained with large-scale datasets and posses scale up stablity.
Superior Performance: Achieved a better performance in classfication tasks than ViTs. Surpassed window-based ViTs and comparabled to global attention ViTs with lower flops and higher speed in dense prediction tasks.
Efficient Alternative: Capability to be an alternative backbone to ViT in comprehensive vision tasks.

Overview

Schedule

Support RWKV6 as VRWKV6
Release VRWKV-L
Release VRWKV-T/S/B

Model Zoo

Pretrained Models

Model	Size	Pretrain	Download
VRWKV-L	192	ImageNet-22K	ckpt

Image Classification (ImageNet-1K)

Model	Size	#Param	#FLOPs	Top-1 Acc	Download
VRWKV-T	224	6.2M	1.2G	75.1	ckpt \|cfg
VRWKV-S	224	23.8M	4.6G	80.1	ckpt \|cfg
VRWKV-B	224	93.7M	18.2G	82.0	ckpt \|cfg
VRWKV-L	384	334.9M	189.5G	86.0	ckpt \|cfg
VRWKV6-T	224	7.6M	1.6G	76.6	ckpt \|cfg
VRWKV6-S	224	27.7M	5.6G	81.1	ckpt \|cfg
VRWKV6-B	224	104.9M	20.9G	82.6	ckpt \|cfg

VRWKV-L is pretrained on ImageNet-22K and then finetuned on ImageNet-1K.
We train VRWKV-L with the internimage codebase for a higher speed.

Object Detection with Mask-RCNN head (COCO)

Model	#Param	#FLOPs	box AP	mask AP	Download
VRWKV-T	8.4M	67.9G	41.7	38.0	ckpt \|cfg
VRWKV-S	29.3M	189.9G	44.8	40.2	ckpt \|cfg
VRWKV-B	106.6M	599.0G	46.8	41.7	ckpt \|cfg
VRWKV-L	351.9M	1730.6G	50.6	44.9	ckpt \|cfg

We report the #Param and #FLOPs of the backbone in this table.

Semantic Segmentation with UperNet head (ADE20K)

Model	#Param	#FLOPs	mIoU	Download
VRWKV-T	8.4M	16.6G	43.3	ckpt \|cfg
VRWKV-S	29.3M	46.3G	47.2	ckpt \|cfg
VRWKV-B	106.6M	146.0G	49.2	ckpt \|cfg
VRWKV-L	351.9M	421.9G	53.5	ckpt \|cfg

We report the #Param and #FLOPs of the backbone in this table.

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{duan2024vrwkv,title={Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures},author={Duan, Yuchen and Wang, Weiyun and Chen, Zhe and Zhu, Xizhou and Lu, Lewei and Lu, Tong and Qiao, Yu and Li, Hongsheng and Dai, Jifeng and Wang, Wenhai},journal={arXiv preprint arXiv:2403.02308},year={2024}}

License

This repository is released under the Apache 2.0 license as found in theLICENSE file.

Acknowledgement

Vision-RWKV is built with reference to the code of the following projects:RWKV,MMPretrain,MMDetection,MMSegmentation,ViT-Adapter,InternImage. Thanks for their awesome work!

About

[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

arxiv.org/abs/2403.02308

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

Vision-RWKV

News🚀🚀🚀

Highlights

Overview

Schedule

Model Zoo

Pretrained Models

Image Classification (ImageNet-1K)

Object Detection with Mask-RCNN head (COCO)

Semantic Segmentation with UperNet head (ADE20K)

Citation

License

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages

Contributors3

Languages

Movatterモバイル変換

License

OpenGVLab/Vision-RWKV

Folders and files

Latest commit

History

Repository files navigation

Vision-RWKV

News🚀🚀🚀

Highlights

Overview

Schedule

Model Zoo

Pretrained Models

Image Classification (ImageNet-1K)

Object Detection with Mask-RCNN head (COCO)

Semantic Segmentation with UperNet head (ADE20K)

Citation

License

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Contributors3

Languages

Packages