zai-org/RPC-Bench
Python
Captured source
source ↗zai-org/RPC-Bench
Description: Official Code for RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension (ACL 2026)
Language: Python
License: Apache-2.0
Stars: 0
Forks: 0
Open issues: 0
Created: 2026-04-28T12:32:36Z
Pushed: 2026-05-18T10:25:51Z
Default branch: main
Fork: no
Archived: no
README:
🌐 Project Page • 📖 Paper • 🤗 Hugging Face • 🧭 ModelScope
*Official code and data of the paper RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension (ACL 2026).*
RPC-Bench, a large-scale fine-grained question answering benchmark constructed from review-rebuttal exchanges of high-quality academic papers, with each paper available in two input formats (pure text and rendered page images) enabling evaluation of both large language models (LLMs) and visual language models (VLMs).
🚀 Quick Start
Dependencies
First, create a conda environment and install all pip package requirements.
conda create -n rpc python==3.11.13 conda activate rpc pip install -r requirements.txt
QA Construction
The [pipeline/](pipeline/) directory provides an example workflow for constructing benchmark QA annotations from crawled OpenReview review-rebuttal data through LLM-based decomposition, rewriting, and filtering. See [pipeline/README.md](pipeline/README.md) for details.
Data processing
For this benchmark, each academic paper can be processed into either structured text or page-rendered images, enabling evaluation across both LLMs and VLMs. Choose the parsing mode that best fits your experimental objectives.
- File Download: Download paper PDFs based on metadata from JSON files located under the
benchmark/directory.
python download.py
- Text Parsing: Parse PDF content into text using MinerU.
pip install --upgrade pip pip install uv uv pip install -U "mineru[core]" mineru-models-download mineru -p "./benchmark/pdf/test" -o "./benchmark/parse/test" --source local
- Image Parsing: Convert PDF pages into image format for further processing.
python pdf2image.py
Processed Data Download
You may also download our processed data directly from Google Drive, Hugging Face, or ModelScope. The processed data includes:
pdf/: original paper PDFs.md/: Markdown files parsed from each paper by MinerU, used as text input for LLM-oriented evaluation.parse/: full MinerU parsing outputs, including structured layout and content artifacts.vlm/: page images rendered from PDFs with PyMuPDF at 200 DPI, used for VLM-oriented evaluation.
The examples below show how to download only md/ and vlm/, which are sufficient for running the default LLM and VLM inference scripts.
Option 1: Download from Hugging Face
pip install -U huggingface_hub hf download zai-org/RPC-Bench md/test/ vlm/test/ --repo-type dataset --local-dir ./benchmark
Option 2: Download from ModelScope
pip install -U modelscope modelscope download --dataset ZhipuAI/RPC-Bench --include "md/test/**" "vlm/test/**" --local_dir ./benchmark
🧩 Consistency Evaluation
The [consistency/](consistency/) directory provides a self-contained example for measuring consistency between LLM judge outputs and human pairwise preferences. See [consistency/README.md](consistency/README.md) for details.
✈️ Inference
GPT-5 is given as an example below, but you may replace this with any other LLM or VLM supported in your environment.
- LLM Inference:
python llm.py
- VLM Inference:
python vlm.py
🛜 Evaluation
After inference, evaluate predictions against benchmark references using:
python eval.py
Notability
notability 5.0/10New benchmark repo from Zhipu AI, traction unclear.