What does this repo signal mean?

Together AI published togethercomputer/reviewing-agents (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo togethercomputer/reviewing-agents · language Python · New repo with low stars. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Together AI Repo: togethercomputer/reviewing-agents

Captured source

source ↗

GitHub/github.com/togethercomputer/reviewing-agents

togethercomputer/reviewing-agents repository metadata

Source ↗

published Dec 4, 2025seen Jun 5captured Jun 11http 200method plain

togethercomputer/reviewing-agents

Language: Python

License: MIT

Stars: 20

Forks: 2

Open issues: 0

Created: 2025-12-04T22:41:23Z

Pushed: 2025-12-08T02:01:41Z

Default branch: main

Fork: no

Archived: no

README: Reviewing Agents

LLM-powered scientific paper review and error detection

This repository supports:

**To Err Is Human**: Systematic Quantification of Errors in Published AI Papers via LLM Analysis

We developed an LLM-based Paper Correctness Checker to identify objective mistakes (formulas, derivations, figures, tables) in papers published at top AI venues. Our analysis reveals that mistakes per paper have increased over time—from 3.8 in NeurIPS 2021 to 5.9 in NeurIPS 2025 (+55%). Human experts confirmed 83.2% precision on 316 reviewed mistakes. The checker can also propose correct fixes for 75.8% of identified issues.

**Agents4Science**: LLM reviewers for the Agents4Science Conference, the first conference where AI agents served as both primary authors and reviewers

Agents4Science 2025 was the inaugural conference where AI systems served as both authors and reviewers of research papers. Organized by TogetherAI and Stanford University, it received 315 submissions with 48 papers accepted after AI + human peer review.

---

Installation

uv sync

---

Configuration

Set up API keys in .env (copy from env.dev):

cp env.dev .env

Configure modules in config.yaml.

---

Modules

| Module | Description | |--------|-------------| | SimpleReviewer | General paper reviewing | | LLMCorrectnessDetector | Methodological correctness evaluation | | LLMCriticalityVerifier | Verifies criticality of correctness findings | | LLMFormatDetector | Format compliance checking | | JailbreakingChecker | Detects adversarial instructions in papers | | ReferenceCheckLight | Reference hallucination detection | | ReferenceCheckHeavy | Full reference + author verification | | ArxivTaxonomyClassifier | arXiv category classification |

---

Quick Start

Agents4Science reviewers can be used to review papers:

from reviewing_agents.modules import SimpleReviewer

pdf_bytes = open("paper.pdf", "rb").read()

reviewer = SimpleReviewer()
result = reviewer.review_paper(pdf_bytes)

The LLMCorrectnessDetector and LLMCriticalityVerifier can be used to evaluate the correctness of a paper, used in our To Err Is Human paper.

from reviewing_agents.modules import LLMCorrectnessDetector, LLMCriticalityVerifier

pdf_bytes = open("paper.pdf", "rb").read()

detector = LLMCorrectnessDetector()
correctness = detector.check_correctness(pdf_bytes)

verifier = LLMCriticalityVerifier()
findings = {"score": correctness.score, "reasoning": correctness.reasoning, "key_issues": correctness.key_issues}
verified = verifier.verify_criticality(pdf_bytes, findings)

---

Citation

@article{bianchi2025toerr,
title={To Err Is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis},
author={Bianchi, Federico and Kwon, Yongchan and Izzo, Zachary and Zhang, Linjun and Zou, James},
journal={arXiv preprint arXiv:2512.05925},
year={2025}
}

@article{bianchi2025agents4science,
title={Exploring the use of AI authors and reviewers at Agents4Science},
author={Bianchi, Federico and Queen, Owen and Thakkar, Nitya and Sun, Eric and Zou, James},
journal={arXiv preprint arXiv:2511.15534},
year={2025}
}

Notability

notability 3.0/10

New repo with low stars