ModelCerebrasCerebraspublished Aug 13, 2024seen 5d

cerebras/Llama3-DocChat-1.0-8B

Open original ↗

Captured source

source ↗
published Aug 13, 2024seen 5dcaptured 14hhttp 200method plaintask text-generationlicense otherdownloads 12likes 69

Model Information

We are excited to announce the release of Cerebras DocChat, our first iteration of models designed for document-based conversational question answering. This series includes two models: Cerebras Llama3-DocChat, a large language model (LLM), and Cerebras Dragon-DocChat, a multi-turn retriever model.

This model – Cerebras Llama3-DocChat 1.0 8B – was built on top of Llama 3 base using insights from the latest research on document-based Q&A, most notably Nvidia’s ChatQA model series. As part of this work, we leveraged our experience in LLM model training and dataset curation to overcome the gaps in ChatQA's released datasets and training recipes. Additionally, we employed synthetic data generation to address limitations that couldn't be fully resolved with the available real data. Using a single Cerebras System, Llama3-DocChat 8B was trained in a few hours.

You can find more information about DocChat at the following locations:

Results

| ChatRAG Benchmark | Llama3 Instruct 8B | Command-R-Plus | Nvidia Llama3-ChatQA 1.5 8B | GPT-4-Turbo-2024-04-09 | Cerebras Llama3-DocChat 1.0 8B | | --- | --- | --- | --- | --- | --- | | Doc2Dial | 31.33 | 33.51 | 39.33 | 35.35 | 39.19 | | QuAC | 32.64 | 34.16 | 39.73 | 40.1 | 36 | | QReCC | 43.4 | 49.77 | 49.03 | 51.46 | 50.27 | | CoQA | 73.25 | 69.71 | 76.46 | 77.73 | 79.56 | | DoQA | 30.34 | 40.67 | 49.6 | 41.6 | 48.77 | | ConvFinQA | 53.15 | 71.21 | 78.46 | 84.16 | 80.13 | | SQA | 36.6 | 74.07 | 73.28 | 79.98 | 74.19 | | TopioCQA | 34.64 | 53.77 | 49.96 | 48.32 | 52.13 | | HybriDial\* | 40.77 | 46.7 | 65.76 | 47.86 | 64 | | INSCIT | 32.09 | 35.76 | 30.1 | 33.75 | 32.88 | | Average (all) | 40.82 | 50.93 | 55.17 | 54.03 | 55.71 | | Average (exclude HybriDial) | 40.83 | 51.4 | 53.99 | 54.72 | 54.79 |

| Eleuther Eval Harness Benchmark | Llama3 Instruct 8B | Nvidia Llama3-ChatQA 1.5 8B | Cerebras Llama3-DocChat 1.0 8B | | --- | --- | --- | --- | | hellaswag | 57.68 | 61.37 | 61.68 | | winogrande | 71.98 | 73.95 | 74.11 | | truthfulqa_mc1 | 36.23 | 28.52 | 29.25 | | truthfulqa_mc2 | 51.65 | 43.56 | 45.14 | | mmlu | 63.84 | 60.68 | 62.86 | | gsm8k | 76.12 | 13.72 | 55.57 | | arc_easy | 81.61 | 80.56 | 82.03 | | arc_challenge | 52.99 | 51.02 | 53.92 | | Average | 61.51 | 51.67 | 58.07 |

Prompt Format

DocChat supports the standard Llama3 Instruct chat template – no fancy formatting functions required! When providing a context document to the model, simply prepend the user turn with {put your document here} . You may also provide an “instruction” before the user input to better align the model’s response with the desired behavior. Examples include:

  • Please give a full and complete answer for the question.
  • Answer the following question with a short span

We use the same system prompt as ChatQA: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context.

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "cerebras/Llama3-DocChat-1.0-8B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

system = "This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context."
instruction = "Please give a full and complete answer for the question."

document = """
# Cerebras Wafer-Scale Cluster

Exa-scale performance, single device simplicity

## AI Supercomputers

Condor Galaxy (CG), the supercomputer built by G42 and Cerebras, is the simplest and fastest way to build AI models in the cloud. With over 16 ExaFLOPs of AI compute, Condor Galaxy trains the most demanding models in hours rather than days. The terabyte scale MemoryX system natively accommodates 100 billion+ parameter models, making large scale training simple and efficient.

| Cluster | ExaFLOPs | Systems | Memory |
| -------- | -------- | -------- | ------ |
| CG1 | 4 | 64 CS-2s | 82 TB |
| CG2 | 4 | 64 CS-2s | 82 TB |
| CG3 | 8 | 64 CS-3s | 108 TB |
"""

question = "How many total CS systems does Condor Galaxy 1, 2, and 3 have combined, and how many flops does this correspond to?"

user_turn = f"""
{document}

{instruction} {question}"""

messages = [
{"role": "system", "content": system},
{"role": "user", "content": user_turn}
]

input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)

terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("")
]

outputs = model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=terminators,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

License

This model was trained from Llama 3 8B base, and therefore is subject to the META LLAMA 3 COMMUNITY LICENSE AGREEMENT. Furthermore, it is trained on ChatQA's synthetic conversational QA dataset which was generated using GPT-4. As a result this model can be used for non-commercial purposes only, and is subject to Terms of Use of the data generated by OpenAI. Additionally, please see the licensing information of individual datasets.

Acknowledgements

DocChat was built on top of a large body of ML work, spanning training datasets, recipes, and evaluation. We want to thank each of these resources.

@inproceedings{dua2019drop,
title={DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs},
author={Dua, Dheeru and Wang, Yizhong and Dasigi, Pradeep and…

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Very low traction, minor fine-tune