RepoTogether AITogether AIpublished Dec 4, 2025seen 5d

togethercomputer/reviewing-agents

Python

Open original ↗

Captured source

source ↗

togethercomputer/reviewing-agents

Language: Python

License: MIT

Stars: 20

Forks: 2

Open issues: 0

Created: 2025-12-04T22:41:23Z

Pushed: 2025-12-08T02:01:41Z

Default branch: main

Fork: no

Archived: no

README: Reviewing Agents

LLM-powered scientific paper review and error detection

This repository supports:

  • **To Err Is Human**: Systematic Quantification of Errors in Published AI Papers via LLM Analysis

We developed an LLM-based Paper Correctness Checker to identify objective mistakes (formulas, derivations, figures, tables) in papers published at top AI venues. Our analysis reveals that mistakes per paper have increased over time—from 3.8 in NeurIPS 2021 to 5.9 in NeurIPS 2025 (+55%). Human experts confirmed 83.2% precision on 316 reviewed mistakes. The checker can also propose correct fixes for 75.8% of identified issues.

Agents4Science 2025 was the inaugural conference where AI systems served as both authors and reviewers of research papers. Organized by TogetherAI and Stanford University, it received 315 submissions with 48 papers accepted after AI + human peer review.

---

Installation

uv sync

---

Configuration

Set up API keys in .env (copy from env.dev):

cp env.dev .env

Configure modules in config.yaml.

---

Modules

| Module | Description | |--------|-------------| | SimpleReviewer | General paper reviewing | | LLMCorrectnessDetector | Methodological correctness evaluation | | LLMCriticalityVerifier | Verifies criticality of correctness findings | | LLMFormatDetector | Format compliance checking | | JailbreakingChecker | Detects adversarial instructions in papers | | ReferenceCheckLight | Reference hallucination detection | | ReferenceCheckHeavy | Full reference + author verification | | ArxivTaxonomyClassifier | arXiv category classification |

---

Quick Start

Agents4Science reviewers can be used to review papers:

from reviewing_agents.modules import SimpleReviewer

pdf_bytes = open("paper.pdf", "rb").read()

reviewer = SimpleReviewer()
result = reviewer.review_paper(pdf_bytes)

The LLMCorrectnessDetector and LLMCriticalityVerifier can be used to evaluate the correctness of a paper, used in our To Err Is Human paper.

from reviewing_agents.modules import LLMCorrectnessDetector, LLMCriticalityVerifier

pdf_bytes = open("paper.pdf", "rb").read()

detector = LLMCorrectnessDetector()
correctness = detector.check_correctness(pdf_bytes)

verifier = LLMCriticalityVerifier()
findings = {"score": correctness.score, "reasoning": correctness.reasoning, "key_issues": correctness.key_issues}
verified = verifier.verify_criticality(pdf_bytes, findings)

---

Citation

@article{bianchi2025toerr,
title={To Err Is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis},
author={Bianchi, Federico and Kwon, Yongchan and Izzo, Zachary and Zhang, Linjun and Zou, James},
journal={arXiv preprint arXiv:2512.05925},
year={2025}
}

@article{bianchi2025agents4science,
title={Exploring the use of AI authors and reviewers at Agents4Science},
author={Bianchi, Federico and Queen, Owen and Thakkar, Nitya and Sun, Eric and Zou, James},
journal={arXiv preprint arXiv:2511.15534},
year={2025}
}

Notability

notability 3.0/10

New repo with low stars