What does this repo signal mean?

AI21 Labs published AI21Labs/factor (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo AI21Labs/factor · language Python · Solid new repo from AI21 Labs with modest traction.. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

AI21 Labs Repo: AI21Labs/factor

Captured source

source ↗

GitHub/github.com/AI21Labs/factor

AI21Labs/factor repository metadata

Source ↗

published Jul 13, 2023seen Jun 5captured Jun 11http 200method plain

AI21Labs/factor

Description: Code and data for the FACTOR paper

Language: Python

License: MIT

Stars: 53

Forks: 6

Open issues: 6

Created: 2023-07-13T12:15:34Z

Pushed: 2023-11-15T08:11:30Z

Default branch: main

Fork: no

Archived: yes

README:

FACTOR

This repo contains data from AI21 Labs' paper Generating Benchmarks for Factuality Evaluation of Language Models.

Data

We include the following FACTOR benchmarks for evaluating factuality of language models:

WIKI-FACTOR: Based on the Wikipedia section of The Pile’s) validation split. The dataset consists of 2994 examples.
NEWS-FACTOR: Based on Reuters articles extracted from The RefinedWeb Dataset. The dataset consists of 1036 examples.
EXPERT-FACTOR: Based on the validation and test splits of ExpertQA, a long-from question answering dataset. The benchmark consists of 236 examples.

Evaluation

Setup

To install the required libraries in our repo, run:

pip install -r requirements.txt

To have a Pytorch version specific to your CUDA, install your version before running the above command.

List of Language Models

In the paper, we give the results for the following models (replace $MODEL_NAME with one of those).

GPT-2: gpt2, gpt2-medium, gpt2-large, gpt2-xl
GPT-Neo: EleutherAI/gpt-neo-1.3B, EleutherAI/gpt-neo-2.7B, EleutherAI/gpt-j-6B
OPT: facebook/opt-125m, facebook/opt-350m, facebook/opt-1.3b, facebook/opt-2.7b, facebook/opt-6.7b, facebook/opt-13b, facebook/opt-30b, facebook/opt-66b

Evaluation Script

To run evaluation on models over FACTOR datasets, please use the following command:

python python eval_factuality.py \
--data_file ./data/wiki_factor.csv \
--output_folder $OUTPUT_DIR \
--model_name $MODEL_NAME

License

wiki_factor, expert_factor and code: Released under the MIT license.
news_factor: The [benchmark](data/news_factor.csv) is derived from The RefinedWeb Dataset. The public extract is made available under an ODC-By 1.0 [license](FACTOR_NEWS_LICENSE.md); users should also abide to the CommonCrawl ToU: https://commoncrawl.org/terms-of-use/.

Citation

If you find our paper or code helpful, please cite our paper:

@article{muhlgay2023generating,
title={Generating benchmarks for factuality evaluation of language models},
author={Muhlgay, Dor and Ram, Ori and Magar, Inbal and Levine, Yoav and Ratner, Nir and Belinkov, Yonatan and Abend, Omri and Leyton-Brown, Kevin and Shashua, Amnon and Shoham, Yoav},
journal={arXiv preprint arXiv:2307.06908},
year={2023}
}

Notability

notability 5.0/10

Solid new repo from AI21 Labs with modest traction.