ForkTencent HunyuanTencent Hunyuanpublished Jun 18, 2025seen 5d

Tencent-Hunyuan/UnifiedReward

forked from CodeGoat24/UnifiedReward

Open original ↗

Captured source

source ↗
published Jun 18, 2025seen 5dcaptured 16hhttp 200method plain

Tencent-Hunyuan/UnifiedReward

Description: Official implementation of UnifiedReward & UnifiedReward-Think

License: MIT

Stars: 18

Forks: 16

Open issues: 19

Created: 2025-06-18T12:59:48Z

Pushed: 2025-06-18T11:39:27Z

Default branch: main

Fork: yes

Parent repository: CodeGoat24/UnifiedReward

Archived: no

README:

We release the UnifiedReward -- the first unified reward model for multimodal understanding and generation assessment, enabling both pairwise ranking and pointwise scoring, which can be employed for vision model preference alignment.

🔥🔥 We release UnifiedReward-qwen-[3b/7b/32b], the more powerful unified reward models built upon Qwen2.5-VL-Instruct!!

🔥 We release vLLM inference code for UnifiedReward-qwen in vllm_qwen directory!

🔥 We release SGLang inference code for UnifiedReward-llava in sglang_llava directory!

😊 We appreciate the excellent work Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO, which provides further evidence of the robustness and effectiveness of UnifiedReward in image generation RL tasks.

| Method | HPS | ImageReward | UnifiedReward | |------------|-----------|-----------|-----------| | Janus-Pro + DPO | 77.3 | 77.7 | 80.0 | | Janus-Pro + GRPO | 79.2 | 79.3 | 81.0 | | Janus-Pro + Best-of-4 | 82.1 | 82.4 | 84.5 |

😊 We appreciate the Flow-GRPO team for using UnifiedReward-7B as their image generation quality evaluation metric!

😊 We appreciate the mradermacher team for providing the GGUF version of our models!!

😊 We sincerely thank the Hunyuan team of Tencent for providing the evaluation results on several T2I models using UnifiedReward-qwen-7b!! The evaluation was conducted on 400 prompts sourced from here. | Model | Alignment | Coherence | Style | |---------------------|------------------|-----------------------|------------------| | Flux-pro-ultra | 3.6453 | 3.8193 | _3.4971_ | | Imagen-4.0 | 3.6792 | 3.8049 | 3.4756 | | Recraft-v3 | 3.6611 | 3.8409 | 3.5158 | | OpenAI-GPT-image-1 | _3.6890_ | 3.8448 | 3.4960 | | Imagen-3.0 | 3.6733 | 3.8027 | 3.4674 | | Seedream-3.0 | 3.6927 | _3.8218_ | 3.4887 |

🔥🔥🔥 UnifiedReward-Think

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

We release UnifiedReward-Think -- the first unified multimodal CoT reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks.

Please refer to the project page for details.

🔥🔥 We release UnifiedReward-Think-qwen-7b, a more powerful unified multimodal CoT reward model built upon UnifiedReward-qwen-7b!!!!

🔥🔥 We released Gradio for UnifiedReward-Think!

🔥 News

😊 We are actively gathering feedback from the community to improve our models. We welcome your input and encourage you to stay updated through our repository!!

Please leave us a star ⭐ if you find our work helpful.

  • [2025/5] 🔥🔥 We released UnifiedReward-qwen-[[3b/7b/32b], the more powerful unified reward models built upon Qwen2.5-VL-Instruct!!(https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)! All its inference and evaluation codes are provided in ./inference_qwen and ./benchmark_evaluation directory, respectively.
  • [2025/5] 🔥🔥 We released UnifiedReward-Think-7b, the first unified multimodal CoT reward model. See project page for details.
  • [2025/4] 🔥🔥 We released UnifiedReward-0.5B. Feel free to use it based on your needs.
  • [2025/4] 🔥🔥 We updated UnifiedReward-7B, incorporating valuable feedback from the community, and released UnifiedReward-7B-v1.5 by introducing pointwise scoring for generated images across three dimensions: alignment, coherence, and style, each rated on a continuous scale from 1 to 5.

1. Alignment quantifies how well an image matches its prompt. 2. Coherence assesses the logical consistency of the image and the absence of artifacts or visual glitches. 3. Style reflects the visual appeal of the image, independent of the prompt.

Welcome to try the latest version, and the inference code is in inference_qwen/image_generation/qwen_point_score_ACS_image_generation.py and ./inference/point_score_ACS_image_generation.py.

  • [2025/3] 🔥🔥 We released all training datasets and model checkpoints.
  • [2025/3] 🔥🔥 We released all training, inference, and evaluation code.
  • [2025/3] 🔥 We launched the project page and paper.

🏁 Compared with Current Reward Models

| Reward Model | Method| Image Generation | Image Understanding | Video Generation | Video Understanding | :-----: | :-----: |:-----: |:-----: | :-----: | :-----: | | PickScore |Point | √ | | || | HPS | Point | √ | ||| |…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Routine fork with low stars