ReleaseGroqGroqpublished Oct 10, 2025seen 5d

groq/openbench v0.5.0

groq/openbench

Open original ↗

Captured source

source ↗
published Oct 10, 2025seen 5dcaptured 14hhttp 200method plain

v0.5.0

Repository: groq/openbench

Tag: v0.5.0

Published: 2025-10-10T18:18:45Z

Prerelease: no

Release notes:

0.5.0 (2025-10-10)

⚠ BREAKING CHANGES

  • added more groupings under benchmarks catalog (#244)

Features

  • add clockbench evaluation framwork and script for synthesizing public dataset. (#159) (3ba9836)
  • add IFEval (#182) (8d1b939)
  • add local openbench implementation of groq provider in inspect (#131) (52aea35)
  • add mmmlu eval (#193) (a42c2d5)
  • add mmstar benchmark (#174) (5d085ab)
  • add new openbench documentation (#169) (f3e6a37)
  • add overarching bbh command to run all 18 BBH tasks (463a25f)
  • add preset eval group infrastructure (#215) (d9ea03a)
  • added more groupings under benchmarks catalog (#244) (d932cb0)
  • ArabicMMLU: add remaining 32 Arabic exam subsets, total 41 subsets (#219) (006e248)
  • benchmark: add support for arc-agi (#158) (3f32253)
  • benchmark: add support for detailbench (#154) (23fbca5)
  • benchmark: add support for TUMLU (#160) (#161) (885be75)
  • benchmark: multichallenge implementation (#170) (cf2ab4f)
  • change default model to groq/openai/gpt-oss-20b (#138) (8f7f42f)
  • components: export the run_eval entrypoint method (#157) (acbe7f4)
  • configure release-please for pre-v1.0 version bumping (#133) (c432934)
  • cybench: ported over code for cybench (#207) (7949425)
  • cybersecurity, changelog, more docs (39f123c)
  • display results patch to include task duration stats (#167) (e4e480c)
  • docs: add changelog page (#225) (7db9135)
  • docs: add release notes section and update index with new features for v0.5 (#245) (09ab78e)
  • docs: added feature card and docs page for exercism (#243) (2b38147)
  • docs: Added feature eval docs pages and cache command docs (#191) (50501f1)
  • eval: add support for json output (#14) (f335418)
  • exercism: added support for exercism tasks w/ agent support for aider, roo, claude, opencode (#151) (d86f0da)
  • graphwalks token filter (#115) (e38658c)
  • groq reasoning effort + bugfix to override inspect's "groq" (#142) (b919cc7)
  • lighteval: Add 7 core commonsense reasoning benchmarks from LightEval (#197) (7792c45)
  • lighteval: add BigBench eval (122 MCQ tasks) (9f35b1d)
  • lighteval: Add cross-lingual understanding benchmarks (XCOPA, XStoryCloze, XWinograd) (917667a)
  • lighteval: add Global-MMLU eval (42 languages)…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Routine version update, low traction