HyperbolicLabs/inference-benchmarks
Python
Captured source
source ↗HyperbolicLabs/inference-benchmarks
Language: Python
Stars: 0
Forks: 0
Open issues: 0
Created: 2026-01-20T20:58:25Z
Pushed: 2026-02-11T06:48:42Z
Default branch: main
Fork: no
Archived: no
README:
Inference Benchmarks
Benchmark tools for testing and evaluating inference endpoints.
Overview
This repository contains benchmark tools for testing inference endpoints:
- AIPerf: Performance benchmarking (latency, throughput)
- OSWorld: End-to-end agent evaluation
Both benchmarks automatically export metrics to Datadog.
Structure
inference-benchmarks/ ├── common/ # Shared components │ ├── datadog_utils.py # Common Datadog export logic │ └── Makefile.common # Common Makefile functions │ ├── aiperf/ # AIPerf performance benchmarking │ ├── benchmark.py │ ├── Dockerfile │ ├── Makefile │ ├── cronjob.yaml │ ├── job.yaml │ ├── pvc.yaml │ └── README.md │ ├── osworld/ # OSWorld evaluation │ ├── run_evaluation.py │ ├── Dockerfile │ ├── Makefile │ ├── osworld-job.yaml │ ├── pvc.yaml │ └── README.md │ ├── Makefile # Root Makefile (builds all) └── README.md
Common Components
common/datadog_utils.py
Shared Datadog export utilities used by all benchmarks:
- Retry logic with exponential backoff
- Batch sending (20 metrics per batch)
- Async (non-blocking) support
- Partial success handling
Usage:
from datadog_utils import send_metrics_async
metrics = {"latency_p95": 150.5, "throughput": 100.2}
base_tags = ["model:Qwen/Qwen3-VL-32B-Thinking", "cluster_name:inference-cluster"]
send_metrics_async(
metrics=metrics,
metric_prefix="inference.benchmark.aiperf",
base_tags=base_tags
)Quick Start
AIPerf
cd aiperf make build-push # Build and push image make deploy # Deploy CronJob
See aiperf/README.md for details.
OSWorld
cd osworld make build-push # Build and push image make deploy # Deploy evaluation job
See osworld/README.md for details.
Building All
# Build all benchmarks cd aiperf && make build && cd ../osworld && make build # Or individually cd aiperf && make build-push cd osworld && make build-push
Datadog Metrics
All benchmarks send metrics to Datadog with prefix:
- AIPerf:
inference.benchmark.aiperf.* - OSWorld:
inference.benchmark.osworld.*
Required: Set DD_API_KEY environment variable or Kubernetes secret.
Requirements
- Kubernetes cluster
- Datadog API key (optional, for metrics export)
- GitHub Container Registry access (for images)
Adding a New Benchmark
1. Create directory: mkdir new-benchmark 2. Create script that uses common/datadog_utils.py 3. Create Dockerfile, Makefile, Kubernetes manifests 4. Follow patterns from existing benchmarks
License
[Your License Here]
Notability
notability 3.0/10New benchmark repo, routine no major traction.