What does this repo signal mean?

Hyperbolic published HyperbolicLabs/inference-benchmarks (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo HyperbolicLabs/inference-benchmarks · language Python · New benchmark repo, routine no major traction.. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Hyperbolic Repo: HyperbolicLabs/inference-benchmarks

Captured source

source ↗

GitHub/github.com/HyperbolicLabs/inference-benchmarks

HyperbolicLabs/inference-benchmarks repository metadata

Source ↗

published Jan 20, 2026seen Jun 5captured Jun 11http 200method plain

HyperbolicLabs/inference-benchmarks

Language: Python

Stars: 0

Forks: 0

Open issues: 0

Created: 2026-01-20T20:58:25Z

Pushed: 2026-02-11T06:48:42Z

Default branch: main

Fork: no

Archived: no

README:

Inference Benchmarks

Benchmark tools for testing and evaluating inference endpoints.

Overview

This repository contains benchmark tools for testing inference endpoints:

AIPerf: Performance benchmarking (latency, throughput)
OSWorld: End-to-end agent evaluation

Both benchmarks automatically export metrics to Datadog.

Structure

inference-benchmarks/
├── common/ # Shared components
│ ├── datadog_utils.py # Common Datadog export logic
│ └── Makefile.common # Common Makefile functions
│
├── aiperf/ # AIPerf performance benchmarking
│ ├── benchmark.py
│ ├── Dockerfile
│ ├── Makefile
│ ├── cronjob.yaml
│ ├── job.yaml
│ ├── pvc.yaml
│ └── README.md
│
├── osworld/ # OSWorld evaluation
│ ├── run_evaluation.py
│ ├── Dockerfile
│ ├── Makefile
│ ├── osworld-job.yaml
│ ├── pvc.yaml
│ └── README.md
│
├── Makefile # Root Makefile (builds all)
└── README.md

Common Components

`common/datadog_utils.py`

Shared Datadog export utilities used by all benchmarks:

Retry logic with exponential backoff
Batch sending (20 metrics per batch)
Async (non-blocking) support
Partial success handling

Usage:

from datadog_utils import send_metrics_async

metrics = {"latency_p95": 150.5, "throughput": 100.2}
base_tags = ["model:Qwen/Qwen3-VL-32B-Thinking", "cluster_name:inference-cluster"]

send_metrics_async(
metrics=metrics,
metric_prefix="inference.benchmark.aiperf",
base_tags=base_tags
)

Quick Start

AIPerf

cd aiperf
make build-push # Build and push image
make deploy # Deploy CronJob

See aiperf/README.md for details.

OSWorld

cd osworld
make build-push # Build and push image
make deploy # Deploy evaluation job

See osworld/README.md for details.

Building All

# Build all benchmarks
cd aiperf && make build && cd ../osworld && make build

# Or individually
cd aiperf && make build-push
cd osworld && make build-push

Datadog Metrics

All benchmarks send metrics to Datadog with prefix:

AIPerf: inference.benchmark.aiperf.*
OSWorld: inference.benchmark.osworld.*

Required: Set DD_API_KEY environment variable or Kubernetes secret.

Requirements

Kubernetes cluster
Datadog API key (optional, for metrics export)
GitHub Container Registry access (for images)

Adding a New Benchmark

1. Create directory: mkdir new-benchmark 2. Create script that uses common/datadog_utils.py 3. Create Dockerfile, Makefile, Kubernetes manifests 4. Follow patterns from existing benchmarks

License

[Your License Here]

Notability

notability 3.0/10

New benchmark repo, routine no major traction.