RepoNovita AINovita AIpublished Jan 19, 2026seen 5d

novitalabs/rft-tinker

Python

Open original ↗

Captured source

source ↗
published Jan 19, 2026seen 5dcaptured 16hhttp 200method plain

novitalabs/rft-tinker

Description: RL training demo with tinker + sandbox

Language: Python

Stars: 0

Forks: 0

Open issues: 0

Created: 2026-01-19T08:22:11Z

Pushed: 2026-01-26T03:06:14Z

Default branch: main

Fork: no

Archived: no

README:

RFT-Tinker: R2E-Gym Training with Tinker API + Agent Sandbox

Overview

Experimental setup for training code generation models on R2E-Gym dataset using:

  • Tinker API for RL model training
  • Agent Sandbox for safe code execution
  • R2E-Gym Dataset (4.5K real-world GitHub issues)

Reproducing DeepSWE experiments (42.2% Pass@1 on SWE-Bench-Verified).

Quick Start

1. Clone Repository

git clone https://github.com/novitalabs/rft-tinker.git
cd rft-tinker

2. Install Dependencies

python3 -m venv venv
source venv/bin/activate
pip install datasets huggingface-hub novita-sandbox tinker torch transformers

3. Configure API Keys

Copy the example environment file:

cp .env.example .env.local

Edit .env.local with your API keys:

# Agent Sandbox API Key (get from https://novita.ai)
NOVITA_API_KEY=your_novita_api_key_here

# Tinker API Token (get from Tinker platform)
TINKER_API_TOKEN=your_tinker_api_token_here

# Template IDs
NOVITA_TEMPLATE_BASE=vn9xnp3cm92x6rmqlgwc

Warning: Never commit `.env.local` with real credentials!

4. Run Tests

Test Agent Sandbox connectivity:

python -m tests.integration.test_novita_basic

Test R2E-Gym workflow:

python -m tests.integration.test_r2e_gym_workflow

5. Prepare Dataset

Download R2E-Gym sample (50 instances):

python scripts/prepare_data/prepare_r2e_sample.py

Test dataset loading:

python -m tests.unit.test_dataset_loading

Project Structure

rft-tinker/
├── src/ # Core source code
│ ├── datasets/ # Dataset utilities and repo mapping
│ ├── environments/ # Sandbox environment wrappers
│ ├── rollout/ # Multi-turn rollout pipeline
│ └── utils/ # Utility functions
├── tests/ # All test files
│ ├── integration/ # Integration tests
│ ├── rollout/ # Rollout pipeline tests
│ └── unit/ # Unit tests
├── scripts/ # Utility scripts
├── templates/ # Agent Sandbox Dockerfile templates
├── docs/ # Documentation
├── data/ # Datasets (gitignored)
├── outputs/ # Generated outputs (gitignored)
├── tinker_r2e_training.py # RL training script
├── tinker_sft_training.py # SFT training script
└── .env.example # API keys template

Training Scripts

RL Training (GRPO)

python tinker_r2e_training.py

Configuration (in script): | Parameter | Value | Purpose | |-----------|-------|---------| | GROUP_SIZE | 10 | Parallel sandboxes per problem | | MAX_STEPS | 40 | Max actions per episode | | SAVE_INTERVAL | 2 | Checkpoint frequency (batches) | | TEMPERATURE | 1.0 | Sampling temperature |

SFT Training (Optional Warm-Start)

python tinker_sft_training.py

Converts gold patches to edit trajectories for supervised fine-tuning warm-start.

Weight Validation

python validate_sft_weights.py

Validates SFT checkpoint weights before RL training.

Agent Sandbox Templates

r2e-gym-base (vn9xnp3cm92x6rmqlgwc)

  • Python 3.8.10, pytest 8.3.5, numpy 1.24.4
  • Core: scipy, sympy, requests, pillow
  • For most Python repositories

r2e-gym-scientific

  • Adds: pandas, scikit-learn, matplotlib, seaborn, h5py
  • For scientific computing

r2e-gym-pillow

  • Pillow 10.4.0 with full image processing
  • For image-heavy repositories

Agent Sandbox API

from novita_sandbox.core import Sandbox

# Create sandbox
sandbox = Sandbox.create(
api_key=api_key,
template=template_id,
timeout=3600
)

# Run commands (synchronous - no await)
result = sandbox.commands.run("echo 'Hello World'")
print(result.stdout)
print(result.exit_code)

# Write files
sandbox.files.write("/path/to/file.py", content.encode())

R2E-Gym Workflow

Standard evaluation workflow:

# 1. Clone repo at base commit
sandbox.commands.run(f"git clone {repo_url} /tmp/testbed")
sandbox.commands.run(f"cd /tmp/testbed && git checkout {base_commit}")

# 2. Apply model-generated patch
sandbox.files.write("/tmp/patch.diff", patch_content)
sandbox.commands.run("cd /tmp/testbed && git apply /tmp/patch.diff")

# 3. Run tests that should now pass (FAIL_TO_PASS)
result = sandbox.commands.run(f"cd /tmp/testbed && pytest {fail_tests}")

# 4. Run tests that should remain passing (PASS_TO_PASS)
result = sandbox.commands.run(f"cd /tmp/testbed && pytest {pass_tests}")

# 5. Compute reward
reward = 1.0 if all_tests_passed else 0.0

Dataset Schema

Each R2E-Gym instance contains:

{
"instance_id": "orange3__2d9617bd",
"repo": "orange3",
"commit_hash": "2d9617bd0cb1f0ba61771258410ab8fae8e7e24d",
"problem_statement": "[ISSUE] ...",
"modified_files": [...],
"test_files": ["test_1.py"],
"test_codes": ["..."],
"old_commit_exit_code": 1, # Tests fail before fix
"new_commit_exit_code": 0, # Tests pass after fix
"gold_patch": {...}
}

Available Actions (in Rollout Generator)

The rollout generator provides 8 tools for the model:

1. bash - Execute shell commands 2. read - Read file content (with line range support) 3. search - Pattern search (grep -rn) 4. find_file - Locate files by pattern 5. list_dir - Directory listing (ls -lah) 6. edit - Line-based file editing 7. run_test - Execute test commands 8. submit - Submit solution

Performance Notes

Based on actual training measurements:

| Phase | Duration | % of Batch | |-------|----------|------------| | Sandbox creation (10×) | ~21s | 1.2% | | Repository setup (10×) | ~2 min | 6.7% | | Rollout execution | ~25-28 min | ~90% | | Training update | ~30s | 1.7% | | Sandbox cleanup | ~15s | 0.8% |

Key metrics:

  • Sandbox hot-start latency: 60-100ms/task
  • Concurrent sandboxes: Up to 150 per account

DeepSWE Comparison

| Aspect | DeepSWE | This Setup | |--------|---------|------------| | Model | Qwen3-32B | Qwen3-30B-A3B | | Hardware | 64 H100 | Tinker | | Dataset | R2E-Gym (4.5K) | Same ✅ | | Sandbox | Kubernetes + Docker | Agent Sandbox ✅ | | Pass@1 | 42.2% (SOTA) | TBD |

Documentation

  • [Technical Blog](docs/novita-sandbox-rl-training.md) - Detailed guide on RL training with Agent Sandbox
  • [Progress Report](docs/PROGRESS.md) - Detailed development progress

References

  • DeepSWE Paper: https://www.together.ai/blog/deepswe
  • **R2E-Gym…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

New repo, no traction yet