What does this repo signal mean?

Novita AI published novitalabs/autotuner (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo novitalabs/autotuner · language Python · Routine new repo with minimal stars. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Novita AI Repo: novitalabs/autotuner

Captured source

source ↗

GitHub/github.com/novitalabs/autotuner

novitalabs/autotuner repository metadata

Source ↗

published Oct 21, 2025seen Jun 5captured Jun 11http 200method plain

novitalabs/autotuner

Description: Optimize the performance of LLM inference engines by automatically tuning parameters for a specific model.

Language: Python

License: MIT

Stars: 11

Forks: 3

Open issues: 4

Created: 2025-10-21T11:17:53Z

Pushed: 2026-06-10T20:58:22Z

Default branch: main

Fork: no

Archived: no

README:

LLM Autotuner (for inference)

Automated parameter tuning for LLM inference engines (SGLang, vLLM) for best performance, while respecting SLOs and hardware constraints.

Why Autotuner?

Quantization and parameter tuning can unlock 60%+ performance gains. LLM inference engines like SGLang and vLLM ship with conservative defaults that work everywhere but are optimized for nowhere.

Performance Impact: Real-World Data

Testing on NVIDIA RTX 4090 (24GB) with typical production workloads (mixed prefill/decode).

See detailed benchmarks: [Baseline Benchmarks](docs/qwen-benchmarks.md)

| What You Get | Manual Tuning | Autotuner | |--------------|---------------|-----------| | Time to optimal config | Hours to Days | Minutes | | Parameter combinations tested | ~10 (limited by patience) | 50-100+ (automated) | | Performance gain | Unknown (untested) | 60%+ throughput (quantization + tuning) | | Reproducibility | Low (manual errors) | High (versioned configs) | | Cross-hardware portability | Manual rework | Re-run task (one command) |

How to Use

CLI Mode

Web UI Mode

Agent Mode

Core Concepts

Task: A tuning job containing model config, parameter ranges, SLOs, and optimization strategy
Experiment: Individual trial with specific parameter values; multiple experiments per task
ARQ Worker: Background processor that deploys models, runs benchmarks, and scores results

Features

Multiple Deployment Modes: Docker, Local (direct GPU), OME (Kubernetes)
Web UI: React frontend with real-time monitoring
Agent Assistant: LLM-powered assistant for task management and troubleshooting
Optimization Strategies: Grid search, Bayesian optimization
SLO-Aware Scoring: Exponential penalties for constraint violations

Quick Start

→ [Get started in 5 minutes with Docker](docs/getting-started/quickstart.md)

# Install
pip install -r requirements.txt && pip install genai-bench

# Run
python src/run_autotuner.py examples/docker_task.yaml --mode docker

Web UI

# Start backend + worker
./scripts/start_dev.sh

# Start frontend (separate terminal)
cd frontend && npm run dev

Access at http://localhost:5173

Documentation

**Full Documentation**

Project Overview

[ROADMAP.md](docs/architecture/roadmap.md) - Product roadmap with completed milestones and future plans

Setup & Deployment

[Installation Guide](docs/getting-started/installation.md) - Complete installation guide
[Quick Start](docs/getting-started/quickstart.md) - Quick start tutorial
[Docker Mode](docs/user-guide/docker-mode.md) - Docker deployment guide
[Kubernetes/OME](docs/user-guide/kubernetes.md) - Kubernetes/OME setup

Features & Configuration

[SLO Scoring](docs/features/slo-scoring.md) - SLO-aware scoring with exponential penalties
[Parallel Execution](docs/features/parallel-execution.md) - Parallel experiment execution
[WebSocket Implementation](docs/features/websocket.md) - Real-time updates via WebSocket
[Quantization Parameters](docs/UNIFIED_QUANTIZATION_PARAMETERS.md) - Quantization configuration
[Parameter Presets](docs/user-guide/presets.md) - Parameter preset system
[Bayesian Optimization](docs/features/bayesian-optimization.md) - Bayesian optimization strategy
[GPU Tracking](docs/features/gpu-tracking.md) - GPU intelligent scheduling

Operations & Troubleshooting

[Troubleshooting](docs/troubleshooting.md) - Common issues and solutions

Contributing

See [DEVELOPMENT](docs/DEVELOPMENT.md) for development guidelines and project architecture.

Notability

notability 3.0/10

Routine new repo with minimal stars