ForkNous ResearchNous Researchpublished May 27, 2026seen 5d

NousResearch/Megatron-LM

forked from NVIDIA/Megatron-LM

Open original ↗

Captured source

source ↗
published May 27, 2026seen 5dcaptured 11hhttp 200method plain

NousResearch/Megatron-LM

Description: Ongoing research training transformer models at scale

License: NOASSERTION

Stars: 5

Forks: 0

Open issues: 0

Created: 2026-05-27T12:17:29Z

Pushed: 2026-05-27T12:38:31Z

Default branch: main

Fork: yes

Parent repository: NVIDIA/Megatron-LM

Archived: no

README:

Megatron-LM and Megatron Core =============================

GPU-optimized library for training transformer models at scale

About

This repository contains two components: Megatron-LM and Megatron Core.

Megatron-LM is a reference example that includes Megatron Core plus pre-configured training scripts. Best for research teams, learning distributed training, and quick experimentation.

Megatron Core is a composable library with GPU-optimized building blocks for custom training frameworks. It provides transformer building blocks, advanced parallelism strategies (TP, PP, DP, EP, CP), mixed precision support (FP16, BF16, FP8, FP4), and model architectures. Best for framework developers and ML engineers building custom training pipelines.

[Megatron Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) provides bidirectional Hugging Face ↔ Megatron checkpoint conversion with production-ready recipes.

Getting Started

Install from PyPI:

uv pip install megatron-core

Or clone and install from source:

git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
uv pip install -e .

> Note: Building from source can use a lot of memory. If the build runs out of memory, limit parallel compilation jobs by setting MAX_JOBS (e.g. MAX_JOBS=4 uv pip install -e .).

For NGC container setup and all installation options, see the [Installation Guide](https://docs.nvidia.com/megatron-core/developer-guide/latest/get-started/install.html).

  • [Your First Training Run](https://docs.nvidia.com/megatron-core/developer-guide/latest/get-started/quickstart.html) - End-to-end training examples with data preparation
  • [Parallelism Strategies](https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/parallelism-guide.html) - Scale training across GPUs with TP, PP, DP, EP, and CP
  • [Contribution Guide](https://docs.nvidia.com/megatron-core/developer-guide/latest/developer/contribute.html) - How to contribute to Megatron Core

Latest News

  • [2026/03] Deprecating Python 3.10 support: We're officially dropping Python 3.10 support with the upcoming 0.17.0 release. Downstream applications must raise their lower boundary to 3.12 to stay compatible with MCore.
  • [2026/01] [Dynamic Context Parallelism](https://developer.nvidia.com/blog/speeding-up-variable-length-training-with-dynamic-context-parallelism-and-nvidia-megatron-core/) - Up to 1.48x speedup for variable-length sequence training with adaptive CP sizing.
  • [2025/12] Megatron Core development has moved to GitHub! All development and CI now happens in the open. We welcome community contributions.
  • [2025/10] [Megatron Dev Branch](https://github.com/NVIDIA/Megatron-LM/tree/dev) - early access branch with experimental features.
  • [2025/10] [Megatron Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) - Bidirectional converter for interoperability between Hugging Face and Megatron checkpoints, featuring production-ready recipes for popular models.
  • [2025/08] [MoE Q3-Q4 2025 Roadmap](https://github.com/NVIDIA/Megatron-LM/issues/1729) - Comprehensive roadmap for MoE features including DeepSeek-V3, Qwen3, advanced parallelism strategies, FP8 optimizations, and Blackwell performance enhancements.
  • [2025/08] [GPT-OSS Model](https://github.com/NVIDIA/Megatron-LM/issues/1739) - Advanced features including YaRN RoPE scaling, attention sinks, and custom activation functions are being integrated into Megatron Core.
  • [2025/06] [Megatron MoE Model Zoo](https://github.com/yanring/Megatron-MoE-ModelZoo) - Best practices and optimized configurations for training DeepSeek-V3, Mixtral, and Qwen3 MoE models with performance benchmarking and checkpoint conversion tools.
  • [2025/05] Megatron Core v0.11.0 brings new capabilities for multi-data center LLM training (blog).

Previous News

  • [2024/07] Megatron Core v0.7 improves scalability and training resiliency and adds support for multimodal training (blog).
  • [2024/06] Megatron Core added supports for Mamba-based models. Check out our paper An Empirical Study of Mamba-based Language Models and code example.
  • [2024/01 Announcement] NVIDIA has released the core capabilities in Megatron-LM into **Megatron Core** in this repository. Megatron Core expands upon Megatron-LM's GPU-optimized techniques with more cutting-edge innovations on system-level optimizations, featuring composable and modular APIs.

Project Structure

Megatron-LM/
├── megatron/
│ ├── core/ # Megatron Core (kernels, parallelism, building blocks)
│ │ ├── models/ # Transformer models
│ │ ├── transformer/ # Transformer building blocks
│ │ ├── tensor_parallel/ # Tensor parallelism
│ │ ├── pipeline_parallel/ # Pipeline parallelism
│ │ ├── distributed/ # Distributed training (FSDP, DDP)
│ │ ├── optimizer/ # Optimizers
│ │ ├── datasets/ # Dataset loaders
│ │ ├── inference/ # Inference engines and server
│ │ └── export/ # Model export (e.g. TensorRT-LLM)
│ ├── training/ # Training scripts
│ ├── legacy/ # Legacy components
│ ├── post_training/ # Post-training (quantization, distillation, pruning, etc.)
│ └── rl/ # Reinforcement learning (RLHF, etc.)
├── examples/ # Ready-to-use training examples
├── tools/ # Utility tools
├── tests/ # Comprehensive test suite
└── docs/ # Documentation

Performance Benchmarking

For our latest performance benchmarking results, please refer to NVIDIA Megatron Bridge Performance Summary.

Our codebase efficiently trains models from 2B to 462B parameters across thousands of GPUs, achieving up to 47% Model FLOP Utilization (MFU)

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Routine fork with minimal stars