ReleaseGroqGroqpublished Dec 9, 2025seen 5d

groq/openbench v0.5.3

groq/openbench

Open original ↗

Captured source

source ↗
published Dec 9, 2025seen 5dcaptured 14hhttp 200method plain

v0.5.3

Repository: groq/openbench

Tag: v0.5.3

Published: 2025-12-09T00:50:23Z

Prerelease: no

Release notes:

0.5.3 (2025-12-08)

Features

  • add --max-tasks option for concurrent task execution in eval command (#279) (241e653)
  • add bbq benchmark (#255) (46f4744)
  • add ChartQAPro (#289) (677f7c7)
  • add configurable HuggingFace Hub config naming (#261) (8abe2ae)
  • add DocVQA benchmark (#297) (0dd0edf)
  • add fuzzy match suggestion for misspelled evals (#303) (625a7b3)
  • add ifbench benchmark (#326) (bd730c2)
  • add math EvalGroup (#263) (e0f4a9b)
  • add MathVista benchmark (#298) (5c50a8f)
  • add MMLU-Redux benchmark from lighteval (#321) (d22a587)
  • add MMVet V2 benchmark (#296) (66689de)
  • add OCRBench V2 benchmark (#295) (71f3589)
  • add optional extras for simpleqa and toxicity (#266) (2450ddf)
  • add sealqa benchmark (#283) (06b39e4)
  • add SMT 2024 benchmarks (#239) (5d9b475)
  • add tau bench, pass^k metric (#294) (2bb1242)
  • agentdojo: port agentdojo benchmark (#223) (1cf174c)
  • cli: added export command to exposrt specific logs to hf (#265) (62e8d8c)
  • cvebench: added auto prepare env set up for cvebench (#259) (db238a3)
  • deepresearch-bench: add deepresearch bench (#288) (d2b4622)
  • docs: docs for unsupported providers (#312) (3a3d4b8)
  • docs: search capability benchmarks feature page (#287) (9dd27c1)
  • evals: add GSM8K benchmark with shared grade school math scorer (#322) (4559a67)
  • evals: add QA benchmarks and shared scorer (#323) (0ea3733)
  • factscore: added support for factscore (#258) (13aafd7)
  • gpt_oss: add GPT-OSS AIME benchmark, make --epochs optional and stop default 1 from being forced down (#284) (815f51b)
  • groq: implement configurable timeout for GroqAPI client (#271) (be492b6)
  • groq: streaming support (#313) (c1a20be)
  • m2s: added support for single turn conversion of 3 multi turn jailbreak datasets (mhj, safeMT, cosafe) (#222) (6b8f2b1)
  • PolygloToxicityPrompts: add multilingual toxicity evaluation (#262) (46de7ee)
  • provider: add helicone support (#275) (de6ab04)
  • provider: add SiliconFlow provider support (#269) (ce14070)
  • providers: add W&B…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Minor version release of benchmarking tool