Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: NVIDIA-NeMo/Megatron-Bridge

NVIDIA Megatron-Bridge 0.1.0rc4

23 Oct 20:35
6725f70
This commit was created on GitHub.com and signed with GitHub’sverified signature.
GPG key ID:B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Pre-release
  • Fix docs build
  • Update performance scripts
Loading

NVIDIA Megatron-Bridge 0.1.0rc3

08 Oct 01:05
bf71eba
This commit was created on GitHub.com and signed with GitHub’sverified signature.
GPG key ID:B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Pre-release
  • Model Collection Support
    • Llama
    • Qwen 2, Qwen 3, Qwen 3 MoE
    • DeepSeek
    • Mamba
  • Migration guide from NeMo 2 to Megatron-Bridge
  • Contribution guide for adding a new model
  • Checkpoint conversion from Hugging Face to Megatron
  • Performance
    • MoE LLM
      • Change the model to dropless with balanced gating
      • Fusion of operators in router function
      • Global permutation fusion with A2A dispatcher
      • EP A2A communication overlap with computation in both 1F1B pipelining and non-pipelined training
      • Precision-aware optimizer update to support BF16 states
    • Megatron FSDP
      • Migration from mcore FSDP to megatron FSDP
      • Fusion of weight gradient copy to reduce-scatter communication buffer to WGRAD GEMM
      • Removed redundant optimizer operations
      • Use Zero1 (opt and master param sharding) in the replica domain of hybrid FSDP to further lower memory usage
      • IB-SHARP support for the IB AllReduce of hybrid FSDP in a patch with NCCL2.28
    • MXFP8
      • Improved act grad all-gather overlap performance via userbuffer
      • Parameter all-gather overlap with computation while the communication buffer sharing with reduce-scatter
      • Fusion of MXFP8 scaling factor swizzling kernels
      • Use PDL (Programmatic Dependent Launch) for quantization kernels to lower CPU overhead
    • Others
      • Full iteration cuda graph for dense model without pipelining
      • Fusion of activation and cast fusion (currently tensor-wise scaling only)
      • Store SwiGLU input in FP8 to save activation memory
Loading

NVIDIA Megatron-Bridge 0.1.0a0

15 Aug 13:59
c6976d9
This commit was signed with the committer’sverified signature.
ko3n1g oliver könig
GPG key ID:2A0D811D627CDD85
Verified
Learn about vigilant mode.

Choose a tag to compare

Pre-release
  • Llama and Qwen
  • Pretrain/SFT
  • PeFT
  • Recipe structure with examples for plain python & NeMo Run usage
Loading

[8]ページ先頭

©2009-2025 Movatter.jp