RepoCohereCoherepublished Oct 7, 2024seen 6d

cohere-ai/cohere-finetune

Python

Open original ↗

Captured source

source ↗
published Oct 7, 2024seen 6dcaptured 15hhttp 200method plain

cohere-ai/cohere-finetune

Description: A tool that facilitates easy, efficient and high-quality fine-tuning of Cohere's models

Language: Python

License: MIT

Stars: 82

Forks: 5

Open issues: 2

Created: 2024-10-07T20:21:46Z

Pushed: 2025-03-14T20:05:26Z

Default branch: main

Fork: no

Archived: no

README:

cohere-finetune

Cohere-finetune is a tool that facilitates easy, efficient and high-quality fine-tuning of Cohere's models on users' own data to serve their own use cases.

Currently, we support the following base models for fine-tuning:

We also support any customized base model built on one of these supported models (see [Step 4](#step-4-submit-the-request-to-start-the-fine-tuning) for more details).

Currently, we support the following fine-tuning strategies:

We will keep extending the base models and fine-tuning strategies we support, and keep adding more features, to help our users fine-tune Cohere's models more easily, more efficiently and with higher quality.

1. Prerequisites

  • You need to have access to a machine with at least one GPU, e.g., H100, H200, etc. The specific required number, memory and model of GPUs depend on your specific use case, e.g., the model to fine-tune, the batch size, the max sequence length in the data, etc.
  • You need to install necessary apps, e.g., Docker, Git, etc. on the GPU machine.

To help you better decide the hardware resources you need, we list some feasible scenarios in the following table as a reference, where all the other hyperparameters that are not shown in the table are set as their default values (see [here](#step-4-submit-the-request-to-start-the-fine-tuning)).

| Hardware resources | Base model | Finetune strategy | Batch size | Max sequence length | |:-------------------|:------------------------------------------------------------------------------------|:------------------|:-----------|:--------------------| | 8 * 80GB H100 GPUs | Command R, Command R 08-2024, Command R 7B 12-2024, Aya Expanse 8B, Aya Expanse 32B | LoRA or QLoRA | 8 | 16384 | | 8 * 80GB H100 GPUs | Command R, Command R 08-2024, Command R 7B 12-2024, Aya Expanse 8B, Aya Expanse 32B | LoRA or QLoRA | 16 | 8192 | | 8 * 80GB H100 GPUs | Command R Plus, Command R Plus 08-2024, Command A 03-2025 | LoRA or QLoRA | 8 | 8192 | | 8 * 80GB H100 GPUs | Command R Plus, Command R Plus 08-2024, Command A 03-2025 | LoRA or QLoRA | 16 | 4096 |

2. Setup

Run the commands below on the GPU machine.

git clone git@github.com:cohere-ai/cohere-finetune.git
cd cohere-finetune

3. Fine-tuning

Throughout this section and the sections below, we use the notation `` to denote some content that you must change according to your own use case, e.g., names, paths to files or directories, etc. Meanwhile, for any name or path that is not between the angle brackets, you must use it as it is, unless otherwise stated.

You can fine-tune a base model on your own data by following the steps below on the GPU machine (the host).

Step 1. Build the Docker image

Run the command below to build the Docker image, which may take about 18min to finish if it is the first time you build it on the host.

DOCKER_BUILDKIT=1 docker build --rm \
--ssh default \
--target peft-prod \
-t \
-f docker/Dockerfile \
.

Alternatively, you may directly use the image we built for you: skip this step and use our image name ghcr.io/cohere-ai/cohere-finetune:latest as ` in the next step, but this image could be outdated (the most up-to-date version is always on the main` branch).

Step 2. Run the Docker container to start the fine-tuning service

Run the command below to start the fine-tuning service.

docker run -it --rm \
--name \
--gpus \
--ipc=host \
--net=host \
-v ~/.cache:/root/.cache \
-v :/opt/finetuning \
-e PATH_PREFIX=/opt/finetuning/ \
-e ENVIRONMENT=DEV \
-e TASK=FINETUNE \
-e HF_TOKEN= \
-e WANDB_API_KEY= \

Some parameters are explained below:

  • specifies the GPUs the service can access, which can be, e.g., '"device=0,1,2,3"' (for GPUs 0, 1, 2, 3) or all` (for all GPUs).
  • By default, HuggingFace will cache all downloaded models in ~/.cache/huggingface/hub and try to fetch the cached model from there when you want to load a model again. Therefore, it is highly recommended to mount ~/.cache on your host to /root/.cache in the container, such that the container will have access to these cached models on your host and avoid going through the time-consuming model downloading process.
  • is the root directory on your host to store all your fine-tunings, and /opt/finetuning` is the corresponding fine-tuning root directory in your container (it can also be changed but you do not have to).
  • PATH_PREFIX is an environment variable that specifies the fine-tuning sub-directory in your container, where `` can be an empty string, i.e., the fine-tuning sub-directory can be equal to the fine-tuning root directory.
  • ENVIRONMENT is an environment variable that specifies the mode of your working environment, which is mainly used to determine the level of logging. If you explicitly set it as DEV, more debugging information will be printed, but if you do not set it or set it as any other value, these debugging information will not be printed.
  • HF_TOKEN is an environment variable that specifies your HuggingFace User Access Token.
  • WANDB_API_KEY is an environment…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New finetune repo with moderate stars