RepoNous ResearchNous Researchpublished Dec 9, 2025seen 5d

NousResearch/nomos

Python

Open original ↗

Captured source

source ↗
published Dec 9, 2025seen 5dcaptured 10hhttp 200method plain

NousResearch/nomos

Language: Python

License: MIT

Stars: 194

Forks: 21

Open issues: 8

Created: 2025-12-09T20:14:55Z

Pushed: 2025-12-18T06:05:44Z

Default branch: main

Fork: no

Archived: no

README:

Nomos

A reasoning harness for mathematical problem-solving and proof-writing in natural language.

Installation

pip install -r requirements.txt

Usage

python solve_agent.py [options]

Required Argumentsmassiveaxe

  • problems_dir: Directory containing .md problem files

Options

| Flag | Default | Description | |------|---------|-------------| | --submissions_dir | submissions/{problems_dir}-{timestamp} | Output directory for final submissions | | --judge_prompt | prompts/score.md | Judge prompt file | | --solve_prompt | None | Solver system prompt | | --consolidation_prompt | prompts/consolidation.md | Consolidation prompt | | --pairwise_prompt | prompts/pairwise.md | Pairwise comparison prompt | | --time_limit_hours | 3.0 | Total time limit | | --max_concurrent | 32 | Max parallel API requests | | --target_perfect_scores | 4 | Number of 7/7 scores needed per problem | | --model | nomos-1 | Model for solving | | --judge_model | nomos-1 | Model for judging | | --base_url | http://localhost:30000/v1 | OpenAI-compatible API endpoint |

Workflow

Nomos keeps working on the problems you give it until its time limit runs out or it reaches a target number of self-critiqued perfect scores on every problem. Once either termination condition is reached Nomos enters a finalization phase where it first discards a number of submissions and the remainder are judged pairwise tournament-style to select a final submission.

Solving Phase

In the solving phase we launch max_concurrent parallel workers where each worker

1. Picks a problem based on priority + round-robin:

  • Priority: problems with fewest perfect scores
  • Round-robin among problems tied at the minimum

2. Generates submission. 3. Scores submission out of a maximum of 7 points.

Nomos keeps spawning workers until all problems have target_perfect_scores or time runs out.

Finalization Phase

Starts 15 minutes before time limit (or at 50% of time limit for short runs). Consists of two subphases:

1. Consolidation: Groups submissions by conclusion, keeps what it thinks is the correct group (not necessarily the majority group). 2. Pairwise Tournament: Single elimination bracket among consolidated submissions, with ties resolved randomly.

Output Format

Each final submission is written to its own markdown file in the following format:

# problem_id

## Problem

[original problem text]

## Submission

[selected solution]

Runbooks

./runbooks/run_putnam_2025_b_nomos-1.sh # Putnam 2025 A problems
./runbooks/run_putnam_2025_b_nomos-1.sh # Putnam 2025 B problems

Results

When run on the Putnam 2025 with the NousResearch/Nomos-1 model, this reasoning harness achieves a score of 87/120 as graded by a human expert. Below we show a problem-wise comparison with [Qwen3/Qwen](Qwen/Qwen3-30B-A3B-Thinking-2507), which scores 24/120 under the same conditions.

Citation

If you would like to cite our work, please use this for now

@misc{nomos2025,
title = {Nomos},
author = {Jin, Roger and Quesnelle, Jeffrey and Mahan, Dakota and Guang, Chen and Teknium, Ryan and Park, Jun and Ustelbay, Ibrakhim and Kim, Samuel and Yurkevich, Miron and Zauytkhan, Adilet and Amankos, Rinat and Andreyev, Alex and Nurlanov, Damir and Abuov, Abuzer and massiveaxe, Askar},
year = {2025},
howpublished = {\url{https://github.com/NousResearch/nomos}},
note = {GitHub repository},
}

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

Notable new model architecture from Nous, moderate early traction.