RepoBasetenBasetenpublished Mar 30, 2026seen 5d

basetenlabs/Megatron-Bridge

Python

Open original ↗

Captured source

source ↗
published Mar 30, 2026seen 5dcaptured 14hhttp 200method plain

basetenlabs/Megatron-Bridge

Language: Python

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 0

Created: 2026-03-30T19:09:05Z

Pushed: 2026-06-08T18:55:16Z

Default branch: main

Fork: no

Archived: no

README:

📣 News

  • [03/31/2026] Agent Skills for Megatron Bridge! We've added a `skills/` directory with structured guides that AI coding agents (Cursor, Claude Code, Codex, etc.) can use to help you add model support, set up dev environments, tune performance, and more. Try them out, and PRs to improve or add new skills are very welcome!
  • [03/26/2026] **Nemotron 3 Super** is now on main! Checkpoint conversion and SFT/LoRA recipes (120B-A12B) are available in the main branch. Read the blog post.
  • [03/12/2026] Deprecating Python 3.10 support: We're officially dropping Python 3.10 support with the upcoming 0.4.0 release. Downstream applications must raise their lower boundary to 3.12 to stay compatible with Megatron-Bridge.
  • [12/16/2025] Mind Lab successfully used Megatron-bridge and VeRL to trained GRPO Lora for Trillion-parameter model on 64 H800 - See their techblog.

Overview

NeMo Megatron Bridge is a PyTorch-native library within the NeMo Framework that provides pretraining, SFT and LoRA for popular LLM and VLM models. It serves as a powerful bridge, conversion, and verification layer between 🤗 Hugging Face and Megatron Core. It provides bidirectional checkpoint conversion between these formats, enabling other projects to leverage Megatron Core's parallelism capabilities or export models for various inference engines. The bridge includes built-in verification mechanisms to ensure conversion accuracy and checkpoint integrity across different model formats.

On top of the bridge, NeMo Megatron Bridge provides a performant and scalable PyTorch-native training loop that leverages Megatron Core to deliver state-of-the-art training throughput. It supports pretraining and fine-tuning with features like tensor and pipeline parallelism, and mixed precision (FP8, BF16, FP4, etc.). Users can either use existing 🤗 Hugging Face models or define custom PyTorch model definitions for flexible end-to-end workflows.

NeMo Megatron Bridge is a refactor of the previous NeMo training stack that adopts a PyTorch-native training loop to provide greater flexibility and customizability for developers.

![image](Repo-Mbridge.png)

🔧 Installation

🐳 NeMo Framework container

The best experience, highest performance, and full feature support are provided by the NeMo Framework container. Fetch the most recent $TAG and run the following to start a container:

docker run --rm -it -w /workdir -v $(pwd):/workdir \
--entrypoint bash \
--gpus all \
nvcr.io/nvidia/nemo:${TAG}

For development installation and additional details, please refer to our Contribution guide.

Megatron-Core Submodule (main & dev)

Megatron Bridge pins Megatron-Core as a git submodule at 3rdparty/Megatron-LM. The repository tracks two pinned commits — one from the upstream main branch (default) and one from dev — managed by scripts/switch_mcore.sh.

The submodule committed to the repo always points to the main commit. Use the dev commit when you need a Megatron-Core feature or fix that has not yet landed on main, or to validate forward-compatibility with upcoming MCore changes:

./scripts/switch_mcore.sh status # Show current commit
./scripts/switch_mcore.sh dev # Switch to dev; then run: uv sync
./scripts/switch_mcore.sh main # Switch back; then run: uv sync --locked

> Note: uv.lock is generated against the main commit. After switching to dev, use uv sync (without --locked). After switching back to main, use uv sync --locked.

The dev branch follows Megatron-LM's upstream dev branch philosophy — features are experimental, follow a streamlined review process, and must graduate to stable within 6 months or be deprecated.

⚡ Quickstart

To get started, install Megatron Bridge or download a NeMo Framework container as described [above](#-installation).

Log in to Hugging Face Hub:

huggingface-cli login --token

Conversion-only quickstart (✅ Core):

from megatron.bridge import AutoBridge

# 1) Create a bridge from a Hugging Face model (hub or local path)
bridge = AutoBridge.from_hf_pretrained("meta-llama/Llama-3.2-1B", trust_remote_code=True)

# 2) Get a Megatron provider and configure parallelism before instantiation
provider = bridge.to_megatron_provider()
provider.tensor_model_parallel_size = 1
provider.pipeline_model_parallel_size = 1
provider.finalize()
# 3) Materialize…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

New repo, likely utility, no traction yet