vision-transformer
Here are 1,393 public repositories matching this topic...
Language:All
Sort:Most stars
OpenMMLab Detection Toolbox and Benchmark
- Updated
Aug 21, 2024 - Python
pix2tex: Using a ViT to convert images of equations into LaTeX code.
- Updated
Jan 18, 2025 - Python
This repository contains demos I made with the Transformers library by HuggingFace.
- Updated
Jul 2, 2025 - Jupyter Notebook
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
- Updated
Nov 10, 2025 - Jupyter Notebook
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
- Updated
Dec 12, 2025 - Python
SwinIR: Image Restoration Using Swin Transformer (official repository)
- Updated
May 14, 2024 - Python
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
- Updated
Jul 30, 2024
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
- Updated
Mar 15, 2025 - Python
OpenMMLab Pre-training Toolbox and Benchmark
- Updated
Nov 1, 2024 - Python
Scenic: A Jax Library for Computer Vision Research and Beyond
- Updated
Dec 16, 2025 - Python
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
- Updated
Oct 18, 2024 - Python
Efficient vision foundation models for high-resolution generation and perception.
- Updated
Sep 5, 2025 - Python
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
- Updated
May 26, 2025 - Python
EVA Series: Visual Representation Fantasies from BAAI
- Updated
Aug 1, 2024 - Python
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
- Updated
Dec 15, 2025 - Python
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
- Updated
Jan 24, 2024 - Jupyter Notebook
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
- Updated
Dec 17, 2025 - Python
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
- Updated
Jul 22, 2025 - Python
An all-in-one toolkit for computer vision
- Updated
May 9, 2025 - Python
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
- Updated
Sep 25, 2025 - Python
Improve this page
Add a description, image, and links to thevision-transformer topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thevision-transformer topic, visit your repo's landing page and select "manage topics."