AI21Labs/factor
Python
Captured source
source ↗AI21Labs/factor
Description: Code and data for the FACTOR paper
Language: Python
License: MIT
Stars: 53
Forks: 6
Open issues: 6
Created: 2023-07-13T12:15:34Z
Pushed: 2023-11-15T08:11:30Z
Default branch: main
Fork: no
Archived: yes
README:
FACTOR
This repo contains data from AI21 Labs' paper Generating Benchmarks for Factuality Evaluation of Language Models.
Data
We include the following FACTOR benchmarks for evaluating factuality of language models:
- WIKI-FACTOR: Based on the Wikipedia section of The Pile’s) validation split. The dataset consists of 2994 examples.
- NEWS-FACTOR: Based on Reuters articles extracted from The RefinedWeb Dataset. The dataset consists of 1036 examples.
- EXPERT-FACTOR: Based on the validation and test splits of ExpertQA, a long-from question answering dataset. The benchmark consists of 236 examples.
Evaluation
Setup
To install the required libraries in our repo, run:
pip install -r requirements.txt
To have a Pytorch version specific to your CUDA, install your version before running the above command.
List of Language Models
In the paper, we give the results for the following models (replace $MODEL_NAME with one of those).
- GPT-2:
gpt2,gpt2-medium,gpt2-large,gpt2-xl - GPT-Neo:
EleutherAI/gpt-neo-1.3B,EleutherAI/gpt-neo-2.7B,EleutherAI/gpt-j-6B - OPT:
facebook/opt-125m,facebook/opt-350m,facebook/opt-1.3b,facebook/opt-2.7b,facebook/opt-6.7b,facebook/opt-13b,facebook/opt-30b,facebook/opt-66b
Evaluation Script
To run evaluation on models over FACTOR datasets, please use the following command:
python python eval_factuality.py \ --data_file ./data/wiki_factor.csv \ --output_folder $OUTPUT_DIR \ --model_name $MODEL_NAME
License
wiki_factor,expert_factorand code: Released under the MIT license.news_factor: The [benchmark](data/news_factor.csv) is derived from The RefinedWeb Dataset. The public extract is made available under an ODC-By 1.0 [license](FACTOR_NEWS_LICENSE.md); users should also abide to the CommonCrawl ToU: https://commoncrawl.org/terms-of-use/.
Citation
If you find our paper or code helpful, please cite our paper:
@article{muhlgay2023generating,
title={Generating benchmarks for factuality evaluation of language models},
author={Muhlgay, Dor and Ram, Ori and Magar, Inbal and Levine, Yoav and Ratner, Nir and Belinkov, Yonatan and Abend, Omri and Leyton-Brown, Kevin and Shashua, Amnon and Shoham, Yoav},
journal={arXiv preprint arXiv:2307.06908},
year={2023}
}