🚀
Focusing
I'm an engineer atxAI focusing on multimodal, video generation and world models. My ultimate goal is to build multimodal AGI[0],[1],[2]
🤗 Open Source Projects:
- Cosmos: state-of-the-art generative world models
- NeMo DFM: large-scale training and inference framework for diffusion models
- Megatron-LM MoE: Scaling up mixture of experts
- NeMo: scalable training framework for LLMs transformers
- LongVILA: Long-Context VLM for long videos (ICLR'25)
- ActGPT: browser-use agent
- Channel Pruning: Accelerating Very Deep Neural Networks (ICCV'17)
- Epipolar Transformers: Accurate multi-camera pose understanding (CVPR'20)
- AMC: AutoML for model compression (ECCV'18)
- KL Loss: Accurate Object Detection (CVPR'19)
- FSAF: single-shot object detection (CVPR'19)
🎙️ Invited Talks
PinnedLoading
- NVIDIA-NeMo/NeMo
NVIDIA-NeMo/NeMo PublicA scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
- NVIDIA/Megatron-LM
NVIDIA/Megatron-LM PublicOngoing research training transformer models at scale
- NVIDIA-NeMo/DFM
NVIDIA-NeMo/DFM PublicState-of-the-art framework for fast, large-scale training and inference of diffusion models
- NVIDIA/Cosmos-Tokenizer
NVIDIA/Cosmos-Tokenizer Public archiveA suite of image and video neural tokenizers
- channel-pruning
channel-pruning PublicChannel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)
Something went wrong, please refresh the page to try again.
If the problem persists, check theGitHub status page orcontact support.
If the problem persists, check theGitHub status page orcontact support.
Uh oh!
There was an error while loading.Please reload this page.





