arcee-ai/NeMo-RL v1.1.0-rc0
arcee-ai/NeMo-RL
Captured source
source ↗published Oct 15, 2025seen 5dcaptured 10hhttp 200method plain
v1.1.0-rc0
Repository: arcee-ai/NeMo-RL
Tag: v1.1.0-rc0
Published: 2025-10-15T03:23:51Z
Prerelease: yes
Release notes: Since v1.0.1:
Breaking Changes:
- Megatron removed - no need for the
megatron_cfgblock anymore. - Legacy environments removed
- Legacy eval harness removed
- Dataset options other than
dataset.shufflehave been removed. - "Max rollout turns" config option has been removed - implement this in your verifiers environments.
Changelog:
- Added
grpo.interleave_rolluts. Set it totrueto run one step off-policy (consider enabling importance sampling to compensate) and generate the next step's rollouts while you train on the current step's data. - Added
checkpointing.hf_checkpoint. Set it totrueto checkpoint directly to HF (slower than DCP). - Added new training path:
examples/run_sft.py. Seeexamples/configs/sft/afm_pocket_sft.yamlfor full configuration. - Added support for Muon via
dion. To use it, specifydion.MuonReferenceas your optimizer, and specifypolicy.optimizer.scalar_optimasadamwfor non-applicable parameters. - Rename project to "RLKit".
- Removed DPO training path.
- Legacy evaluation and rollout-generation system removed.
- Fixed a bug where
train/approx_entropywould be include entropy from masked-off tokens with no generation logprobs, causing NaNs to appear. - Fixed crash affecting sequence packing when responses are truncated.
- Trust vLLM's tokenization over HuggingFace's, avoiding some off-policy training.
- Required HF checkpointing on systems where DCP checkpointing would fail due to PCIe comms issues.
Notability
notability 4.0/10Minor release candidate of an existing project, limited excitement