groq/openbench v0.5.0
groq/openbench
Captured source
source ↗published Oct 10, 2025seen 5dcaptured 14hhttp 200method plain
v0.5.0
Repository: groq/openbench
Tag: v0.5.0
Published: 2025-10-10T18:18:45Z
Prerelease: no
Release notes:
0.5.0 (2025-10-10)
⚠ BREAKING CHANGES
- added more groupings under benchmarks catalog (#244)
Features
- add clockbench evaluation framwork and script for synthesizing public dataset. (#159) (3ba9836)
- add IFEval (#182) (8d1b939)
- add local openbench implementation of groq provider in inspect (#131) (52aea35)
- add mmmlu eval (#193) (a42c2d5)
- add mmstar benchmark (#174) (5d085ab)
- add new openbench documentation (#169) (f3e6a37)
- add overarching bbh command to run all 18 BBH tasks (463a25f)
- add preset eval group infrastructure (#215) (d9ea03a)
- added more groupings under benchmarks catalog (#244) (d932cb0)
- ArabicMMLU: add remaining 32 Arabic exam subsets, total 41 subsets (#219) (006e248)
- benchmark: add support for arc-agi (#158) (3f32253)
- benchmark: add support for detailbench (#154) (23fbca5)
- benchmark: add support for TUMLU (#160) (#161) (885be75)
- benchmark: multichallenge implementation (#170) (cf2ab4f)
- change default model to groq/openai/gpt-oss-20b (#138) (8f7f42f)
- components: export the run_eval entrypoint method (#157) (acbe7f4)
- configure release-please for pre-v1.0 version bumping (#133) (c432934)
- cybench: ported over code for cybench (#207) (7949425)
- cybersecurity, changelog, more docs (39f123c)
- display results patch to include task duration stats (#167) (e4e480c)
- docs: add changelog page (#225) (7db9135)
- docs: add release notes section and update index with new features for v0.5 (#245) (09ab78e)
- docs: added feature card and docs page for exercism (#243) (2b38147)
- docs: Added feature eval docs pages and cache command docs (#191) (50501f1)
- eval: add support for json output (#14) (f335418)
- exercism: added support for exercism tasks w/ agent support for aider, roo, claude, opencode (#151) (d86f0da)
- graphwalks token filter (#115) (e38658c)
- groq reasoning effort + bugfix to override inspect's "groq" (#142) (b919cc7)
- lighteval: Add 7 core commonsense reasoning benchmarks from LightEval (#197) (7792c45)
- lighteval: add BigBench eval (122 MCQ tasks) (9f35b1d)
- lighteval: Add cross-lingual understanding benchmarks (XCOPA, XStoryCloze, XWinograd) (917667a)
- lighteval: add Global-MMLU eval (42 languages)…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Routine version update, low traction