Sudhakar Singh sudhakarsingh27

LLMs, Parallel Systems, Quantum Computing

Achievements

NVIDIA/Megatron-LMNVIDIA/Megatron-LMPublic
Ongoing research training transformer models at scale
Python 14.4k 3.3k
NVIDIA/TransformerEngineNVIDIA/TransformerEnginePublic
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
Python 3k 565
lit-llamalit-llamaPublic
Forked fromLightning-AI/lit-llama
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Python
huggingface/acceleratehuggingface/acceleratePublic
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Python 9.3k 1.2k
state-spaces/mambastate-spaces/mambaPublic
Mamba SSM architecture
Python 16.6k 1.5k
NVIDIA-NeMo/NeMoNVIDIA-NeMo/NeMoPublic
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Python 16.2k 3.2k