What does this fork signal mean?

Novita AI forked novitalabs/MiniMax-Provider-Verifier (forked from MiniMax-AI/MiniMax-Provider-Verifier). This fork signal points to upstream code the lab may be inspecting, patching, or building on. High-signal details: repo novitalabs/MiniMax-Provider-Verifier · parent MiniMax-AI/MiniMax-Provider-Verifier · Verification tool for MiniMax AI service providers.. onlylabs links this event to 1 captured evidence page and 6 related fork signals.

Novita AI Fork: novitalabs/MiniMax-Provider-Verifier

Captured source

source ↗

GitHub/github.com/novitalabs/MiniMax-Provider-Verifier

novitalabs/MiniMax-Provider-Verifier repository metadata

Source ↗

published Apr 22, 2026seen Jun 5captured Jun 11http 200method plain

novitalabs/MiniMax-Provider-Verifier

Description: MiniMax-Provider-Verifier offers a rigorous, vendor-agnostic way to verify whether third-party deployments of the Minimax M2 model are correct and reliable.

License: MIT

Stars: 0

Forks: 0

Open issues: 0

Created: 2026-04-22T03:59:59Z

Pushed: 2026-04-01T15:31:13Z

Default branch: main

Fork: yes

Parent repository: MiniMax-AI/MiniMax-Provider-Verifier

Archived: no

README:

MiniMax-Provider-Verifier

[English](README.md) | [中文](README_CN.md)

MiniMax-Provider-Verifier offers a rigorous, vendor-agnostic way to verify whether third-party deployments of the Minimax M2 model are correct and reliable. Since the open-source release of M2, it has been widely adopted and integrated into production services by numerous users. To ensure this vast user base continues to benefit from an efficient, high-quality M2 experience—and to align with our vision of "Intelligence with Everyone"—this toolkit offers an objective, reproducible standard for validating model behavior.

Evaluation Metrics

We evaluate multiple dimensions of vendor deployments, including tool-calling behavior, schema correctness, and system stability (e.g., detecting potential misconfigurations like incorrect top-k settings).

The primary metrics are:

Query-Success-Rate: Measures the probability that a provider can eventually return a valid response successfully when allowed up to max_retry=10 attempts.
query_success_rate = successful_query_count / total_query_count

ToolCalls-Match-Rate: Measures how well the model's "whether to trigger tool-calls" behavior matches the expected labels. Each test case is annotated with expected_tool_call (whether a tool call is expected), and this metric calculates the proportion of cases where the actual result matches the expected result.
tool_calls_match_rate = (tool_calls_finish_tool_calls + stop_finish_stop) / success_count
Confusion Matrix Statistics:
tool_calls_finish_tool_calls: expected tool_call, actual tool_call (TP)
tool_calls_finish_stop: expected tool_call, actual stop (FN)
stop_finish_tool_calls: expected stop, actual tool_call (FP)
stop_finish_stop: expected stop, actual stop (TN)

ToolCalls-Schema-Accuracy: Measures the correctness rate of tool-call payloads (e.g., function name and arguments meeting the expected schema) conditional on tool-call being triggered.
schema_accuracy = tool_calls_successful_count / tool_calls_finish_tool_calls

Response-Success-Rate Not Only Reasoning: Detects a specific error pattern where the model outputs only Chain-of-Thought reasoning without providing valid content or the required tool calls. The presence of this pattern strongly indicates a deployment issue.
Response-success-rate = response_not_only_reasoning_count / only_reasoning_checked_count

Language-Following-Success-Rate: Checks whether the model follows language requirements in minor language scenarios; this is sensitive to top-k and related decoding parameters.
language_following_success-rate = language_following_valid_count / language_following_checked_count

Evaluation Results

The evaluation results below are computed using our initial release of test prompts, each executed 10 times per provider, with all metrics reported as the mean over the 10-run distribution. As a baseline, minimax represents the performance of our official MiniMax Open Platform deployment, providing a reference point for interpreting other providers' results.

MiniMax-M2.5/M2.7 Model – April 2026 Data (After Metrics Revision)

| Metric | Query-Success-Rate | ToolCalls-Match-Rate | ToolCalls-Accuracy | Response-Success-Rate | Language-Following-Success-Rate | |--------|--------------------|-----------------------------|--------------------|--------------------------------------------|----------------------------------| | MiniMax-M2.5 | 100% | 99.29% | 95.59% | 100% | 80% | | MiniMax-M2.7 | 100% | 99.29% | 96.55% | 100% | 90% |

MiniMax-M2.5 Model – Feb 2026 Data

| Metric | Query-Success-Rate | Finish-ToolCalls-Rate | ToolCalls-Trigger Similarity | ToolCalls-Accuracy | Response Success Rate - Not Only Reasoning | Language-Following-Success-Rate | |--------|--------------------|-----------------------|------------------------------|--------------------|--------------------------------------------|----------------------------------| | minimax-m2.5 | 100% | 84.75% | - | 97.26% | 100% | 90% | | openRouter-minimax-fp8 | 100% | 84.55% | 98.98% | 97.25% | 100% | 80% | | openRouter-minimax-highspeed | 100% | 84.14% | 99.22% | 97.24% | 100% | 80% | | openRouter-novita-bf16 | 100% | 84.65% | 99.05% | 97.5% | 100% | 70% | | openRouter-siliconflow/fp8 | 100% | 84.24% | 99.28% | 98.68% | 100% | 80% | | openRouter-atlas-cloud/fp8 | 100% | 84.75% | 99.10% | 96.18% | 100% | 70% | | openRouter-fireworks | 96.32% | 81.63% | 98.87% | 96.19% | 100% | 80% |

MiniMax-M2.1 Model – Jan 2026 Data

| Metric | Query-Success-Rate | Finish-ToolCalls-Rate | ToolCalls-Trigger Similarity | ToolCalls-Accuracy | Response Success Rate - Not Only Reasoning | Language-Following-Success-Rate | |--------|--------------------|-----------------------|------------------------------|--------------------|--------------------------------------------|----------------------------------| | minimax-m2.1 | 100% | 83.33% | - | 96.61% | 100% | 90.00% | | minimax-m2.1-vllm(without topk) | 99.90% | 81.84% | 98.78% | 96.42% | 100% | 60.00% | | minimax-m2.1-vllm | 100% | 82.83% | 98.90% | 93.91% | 100% | 90% | | minimax-m2.1-sglang | 100% | 83.03% | 99.15% | 95.01% | 100% | 90% | | infini-ai | 100% | 80.61% | 97.46% | 100% | 100% | 100% | | openRouter-minimax/fp8 | 100% | 83.23% | 99.03% | 96.11% | 100% | 90% | | openRouter-minimax/lightning | 99.90% | 83.15% | 98.97% | 96.48% | 100% | 80% | | openRouter-gmicloud/fp8 | 83.72% | 55.5% | 81.37% | 84.58% | 100% | 70% | | OpenRouter-novita/fp8 | 99.32% | 83.07% | 99.21% | 96.03% | 100% | 90% | | fireworks | 100% | 81.1% | 97.77% | 94.29% | 100% | 60% | | siliconflow | 100% | 82.42% | 98.47% | 96.19% | 100% | 60% |

MiniMax-M2 Model – Dec 2025 Data

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Routine fork of repository