Tencent-Hunyuan/GradLoc
Python
Captured source
source ↗Tencent-Hunyuan/GradLoc
Description: Implementation of GradLoc from the Tencent Hunyuan blog "Stabilizing RLVR via Token-level Gradient Diagnosis and Layerwise Clipping".
Language: Python
License: NOASSERTION
Stars: 99
Forks: 13
Open issues: 1
Created: 2026-02-10T06:12:48Z
Pushed: 2026-02-16T09:17:25Z
Default branch: main
Fork: no
Archived: no
README:
🔍 Overview
This repository implements the GradLoc part from our blog on RLVR training collapse diagnosis and stabilization.
The current release focuses on the GradLoc demo patch:
- GradLoc: localizes gradient spikes to exact culprit tokens with distributed binary search (
O(log N)).
 *Figure 2. GradLoc localization path: global -> micro-batch -> rank -> token, with adaptive thresholds.*
This repo is intentionally lightweight and patch-oriented, so you can directly apply changes to upstream verl and reproduce experiments. We plan to further package GradLoc as a cleaner, configurable feature with better veRL integration and upstream-merge readiness in future releases.
The following arguments in run_experiment.sh are the core runtime knobs for GradLoc. They control trigger sensitivity, search budget, and dump path.
actor_rollout_ref.actor.grad_norm_threshold=640.0 \ # Spike trigger threshold for token-level grad norm
actor_rollout_ref.actor.bisect_budget_steps=128 \ # Max binary-search budget (forward/backward probes)
actor_rollout_ref.actor.bisect_dump_dir="${CKPTS_DIR}/bisect_dump" \ # Output dir for localization artifacts🧩 Base commit
- Upstream:
verl - Commit:
f9c855f7cf04d603c9546bc01776c74806a879c1
📦 Files changed by this patch
verl/trainer/ppo/ray_trainer.pyverl/utils/reward_score/__init__.pyverl/utils/reward_score/math_verify.pyverl/workers/actor/dp_actor.py
⚡ Quick start (online patch)
1) Clone upstream verl and checkout the base commit:
git clone https://github.com/volcengine/verl.gitcd verl && git checkout f9c855f7cf04d603c9546bc01776c74806a879c1
2) Apply patch from URL:
python /path/to/GradLoc-Patch/apply_patch.py --repo /path/to/verl --patch-url --sha256-file
💾 Local patch (offline)
If patches/gradloc.patch is already available locally:
python /path/to/GradLoc-Patch/apply_patch.py --repo /path/to/verl --patch-file /path/to/GradLoc-Patch/patches/gradloc.patch
🧪 Run experiment
bash /path/to/GradLoc-Patch/run_experiment.sh
🛠️ Regenerate patch after development
When code is modified on top of the base commit, regenerate the patch with:
bash /path/to/GradLoc-Patch/make_patch.sh --repo /path/to/verl
This rewrites patches/gradloc.patch from: git diff
📬 Contact Us
- Guanhua Huang:
carlan0974@gmail.com - Tingqiang Xu:
xtq23@mails.tsinghua.edu.cn - Jinbo Wang:
wangjinbo@stu.pku.edu.cn(wangjinbo@ustc.edufor long-term contact)
📚 Citation
If you find this project useful, please cite:
@misc{huang-xu-wang-2026-gradloc,
title = {Stabilizing RLVR via Token-level Gradient Diagnosis and Layerwise Clipping},
author = {Huang, Guanhua and Xu, Tingqiang and Wang, Jinbo and Sheng, Guangming and Li, Siheng and Yang, Evander and Li, Kejiao and Li, Yunxiang and Xu, Zenan and Yi, Qi and Gong, Xue and Nan, Ziyuan and Jiang, Yuhao and Zhang, Chenchen and Wu, Taiqiang and Zhang, Feiyuan and Wang, Junhao and Zhou, Bo and Chen, Alex and Wang, Di and Yao, Shunyu},
year = {2026},
url = {https://hy.tencent.com/research/100015}
}❓ TBD
Excerpt shown — open the source for the full document.
Notability
notability 4.0/10New repo, low traction.