ReleaseArcee AIArcee AIpublished Oct 15, 2025seen 5d

arcee-ai/NeMo-RL v1.1.0-rc0

arcee-ai/NeMo-RL

Open original ↗

Captured source

source ↗
published Oct 15, 2025seen 5dcaptured 10hhttp 200method plain

v1.1.0-rc0

Repository: arcee-ai/NeMo-RL

Tag: v1.1.0-rc0

Published: 2025-10-15T03:23:51Z

Prerelease: yes

Release notes: Since v1.0.1:

Breaking Changes:

  • Megatron removed - no need for the megatron_cfg block anymore.
  • Legacy environments removed
  • Legacy eval harness removed
  • Dataset options other than dataset.shuffle have been removed.
  • "Max rollout turns" config option has been removed - implement this in your verifiers environments.

Changelog:

  • Added grpo.interleave_rolluts. Set it to true to run one step off-policy (consider enabling importance sampling to compensate) and generate the next step's rollouts while you train on the current step's data.
  • Added checkpointing.hf_checkpoint. Set it to true to checkpoint directly to HF (slower than DCP).
  • Added new training path: examples/run_sft.py. See examples/configs/sft/afm_pocket_sft.yaml for full configuration.
  • Added support for Muon via dion. To use it, specify dion.MuonReference as your optimizer, and specify policy.optimizer.scalar_optim as adamw for non-applicable parameters.
  • Rename project to "RLKit".
  • Removed DPO training path.
  • Legacy evaluation and rollout-generation system removed.
  • Fixed a bug where train/approx_entropy would be include entropy from masked-off tokens with no generation logprobs, causing NaNs to appear.
  • Fixed crash affecting sequence packing when responses are truncated.
  • Trust vLLM's tokenization over HuggingFace's, avoiding some off-policy training.
  • Required HF checkpointing on systems where DCP checkpointing would fail due to PCIe comms issues.

Notability

notability 4.0/10

Minor release candidate of an existing project, limited excitement