ForkSarvam AISarvam AIpublished Dec 28, 2025seen 5d

sarvamai/Megatron-Bridge

forked from NVIDIA-NeMo/Megatron-Bridge

Open original ↗

Captured source

source ↗
published Dec 28, 2025seen 5dcaptured 11hhttp 200method plain

sarvamai/Megatron-Bridge

Description: HuggingFace conversion and training library for Megatron-based models

Language: Python

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 0

Created: 2025-12-28T11:09:08Z

Pushed: 2026-03-27T12:05:06Z

Default branch: main

Fork: yes

Parent repository: NVIDIA-NeMo/Megatron-Bridge

Archived: no

README:

📣 News

  • [03/12/2026] Deprecating Python 3.10 support: We're officially dropping Python 3.10 support with the upcoming 0.4.0 release. Downstream applications must raise their lower boundary to 3.12 to stay compatible with Megatron-Bridge.
  • [12/16/2025] Mind Lab successfully used Megatron-bridge and VeRL to trained GRPO Lora for Trillion-parameter model on 64 H800 - See their techblog.

Overview

NeMo Megatron Bridge is a PyTorch-native library within the NeMo Framework that provides pretraining, SFT and LoRA for popular LLM and VLM models. It serves as a powerful bridge, conversion, and verification layer between 🤗 Hugging Face and Megatron Core. It provides bidirectional checkpoint conversion between these formats, enabling other projects to leverage Megatron Core's parallelism capabilities or export models for various inference engines. The bridge includes built-in verification mechanisms to ensure conversion accuracy and checkpoint integrity across different model formats.

On top of the bridge, NeMo Megatron Bridge provides a performant and scalable PyTorch-native training loop that leverages Megatron Core to deliver state-of-the-art training throughput. It supports pretraining and fine-tuning with features like tensor and pipeline parallelism, and mixed precision (FP8, BF16, FP4, etc.). Users can either use existing 🤗 Hugging Face models or define custom PyTorch model definitions for flexible end-to-end workflows.

NeMo Megatron Bridge is a refactor of the previous NeMo training stack that adopts a PyTorch-native training loop to provide greater flexibility and customizability for developers.

![image](Repo-Mbridge.png)

🔧 Installation

🐳 NeMo Framework container

The best experience, highest performance, and full feature support are provided by the NeMo Framework container. Fetch the most recent $TAG and run the following to start a container:

docker run --rm -it -w /workdir -v $(pwd):/workdir \
--entrypoint bash \
--gpus all \
nvcr.io/nvidia/nemo:${TAG}

For development installation and additional details, please refer to our Contribution guide.

⚡ Quickstart

To get started, install Megatron Bridge or download a NeMo Framework container as described [above](#-installation).

Log in to Hugging Face Hub:

huggingface-cli login --token

Conversion-only quickstart (✅ Core):

from megatron.bridge import AutoBridge

# 1) Create a bridge from a Hugging Face model (hub or local path)
bridge = AutoBridge.from_hf_pretrained("meta-llama/Llama-3.2-1B", trust_remote_code=True)

# 2) Get a Megatron provider and configure parallelism before instantiation
provider = bridge.to_megatron_provider()
provider.tensor_model_parallel_size = 1
provider.pipeline_model_parallel_size = 1
provider.finalize()
# 3) Materialize Megatron Core model(s)
model = provider.provide_distributed_model(wrap_with_ddp=False)

# 4a) Export Megatron → Hugging Face (full HF folder with config/tokenizer/weights)
bridge.save_hf_pretrained(model, "./hf_exports/llama32_1b")

# 4b) Or stream only weights (Megatron → HF)
for name, weight in bridge.export_hf_weights(model, cpu=True):
print(name, tuple(weight.shape))

Training quickstart using pre-configured recipes:

from megatron.bridge.recipes.llama import llama32_1b_pretrain_config
from megatron.bridge.training.gpt_step import forward_step
from megatron.bridge.training.pretrain import pretrain

if __name__ == "__main__":
# The recipe uses the Llama 3.2 1B model configuration from HuggingFace
cfg = llama32_1b_pretrain_config(seq_length=1024)

# Override training parameters
cfg.train.train_iters = 10
cfg.scheduler.lr_decay_iters = 10000
cfg.model.vocab_size = 8192
cfg.tokenizer.vocab_size = cfg.model.vocab_size

pretrain(cfg, forward_step)

You can launch the above script with:

torchrun --nproc-per-node= /path/to/script.py

More examples:

For a deeper dive into conversion design and advanced usage, see the models README.

🚀 Key Features

  • Bridge with 🤗 Hugging Face: Seamless bidirectional conversion between 🤗 Hugging Face and Megatron formats for interoperability (model bridges, [auto…

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Routine fork with no traction