What does this repo signal mean?

Zhipu AI (GLM) published zai-org/RPC-Bench (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo zai-org/RPC-Bench · language Python · Benchmark for remote procedure call systems.. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Evals and quality in the data-business radar.

Zhipu AI (GLM) Repo: zai-org/RPC-Bench

Captured source

source ↗

GitHub/github.com/zai-org/RPC-Bench

zai-org/RPC-Bench repository metadata

Source ↗

published Apr 28, 2026seen 3dcaptured 3dhttp 200method plain

zai-org/RPC-Bench

Description: Official Code for RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension (ACL 2026)

Language: Python

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 0

Created: 2026-04-28T12:32:36Z

Pushed: 2026-05-18T10:25:51Z

Default branch: main

Fork: no

Archived: no

README:

🌐 Project Page • 📖 Paper • 🤗 Hugging Face • 🧭 ModelScope

*Official code and data of the paper RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension (ACL 2026).*

RPC-Bench, a large-scale fine-grained question answering benchmark constructed from review-rebuttal exchanges of high-quality academic papers, with each paper available in two input formats (pure text and rendered page images) enabling evaluation of both large language models (LLMs) and visual language models (VLMs).

🚀 Quick Start

Dependencies

First, create a conda environment and install all pip package requirements.

conda create -n rpc python==3.11.13
conda activate rpc

pip install -r requirements.txt

QA Construction

The [pipeline/](pipeline/) directory provides an example workflow for constructing benchmark QA annotations from crawled OpenReview review-rebuttal data through LLM-based decomposition, rewriting, and filtering. See [pipeline/README.md](pipeline/README.md) for details.

Data processing

For this benchmark, each academic paper can be processed into either structured text or page-rendered images, enabling evaluation across both LLMs and VLMs. Choose the parsing mode that best fits your experimental objectives.

File Download: Download paper PDFs based on metadata from JSON files located under the benchmark/ directory.

python download.py

Text Parsing: Parse PDF content into text using MinerU.

pip install --upgrade pip
pip install uv
uv pip install -U "mineru[core]"

mineru-models-download
mineru -p "./benchmark/pdf/test" -o "./benchmark/parse/test" --source local

Image Parsing: Convert PDF pages into image format for further processing.

python pdf2image.py

Processed Data Download

You may also download our processed data directly from Google Drive, Hugging Face, or ModelScope. The processed data includes:

pdf/: original paper PDFs.
md/: Markdown files parsed from each paper by MinerU, used as text input for LLM-oriented evaluation.
parse/: full MinerU parsing outputs, including structured layout and content artifacts.
vlm/: page images rendered from PDFs with PyMuPDF at 200 DPI, used for VLM-oriented evaluation.

The examples below show how to download only md/ and vlm/, which are sufficient for running the default LLM and VLM inference scripts.

Option 1: Download from Hugging Face

pip install -U huggingface_hub
hf download zai-org/RPC-Bench md/test/ vlm/test/ --repo-type dataset --local-dir ./benchmark

Option 2: Download from ModelScope

pip install -U modelscope
modelscope download --dataset ZhipuAI/RPC-Bench --include "md/test/**" "vlm/test/**" --local_dir ./benchmark

🧩 Consistency Evaluation

The [consistency/](consistency/) directory provides a self-contained example for measuring consistency between LLM judge outputs and human pairwise preferences. See [consistency/README.md](consistency/README.md) for details.

✈️ Inference

GPT-5 is given as an example below, but you may replace this with any other LLM or VLM supported in your environment.

LLM Inference:

python llm.py

VLM Inference:

python vlm.py

🛜 Evaluation

After inference, evaluate predictions against benchmark references using:

python eval.py

Notability

notability 5.0/10

New benchmark repo from Zhipu AI, traction unclear.