cohere-ai/cohere-finetune
Python
Captured source
source ↗cohere-ai/cohere-finetune
Description: A tool that facilitates easy, efficient and high-quality fine-tuning of Cohere's models
Language: Python
License: MIT
Stars: 82
Forks: 5
Open issues: 2
Created: 2024-10-07T20:21:46Z
Pushed: 2025-03-14T20:05:26Z
Default branch: main
Fork: no
Archived: no
README:
cohere-finetune
Cohere-finetune is a tool that facilitates easy, efficient and high-quality fine-tuning of Cohere's models on users' own data to serve their own use cases.
Currently, we support the following base models for fine-tuning:
- Cohere's Command R in HuggingFace
- Cohere's Command R 08-2024 in HuggingFace
- Cohere's Command R Plus in HuggingFace
- Cohere's Command R Plus 08-2024 in HuggingFace
- Cohere's Command R 7B 12-2024 in HuggingFace
- Cohere's Command A 03-2025 in HuggingFace
- Cohere's Aya Expanse 8B in HuggingFace
- Cohere's Aya Expanse 32B in HuggingFace
We also support any customized base model built on one of these supported models (see [Step 4](#step-4-submit-the-request-to-start-the-fine-tuning) for more details).
Currently, we support the following fine-tuning strategies:
We will keep extending the base models and fine-tuning strategies we support, and keep adding more features, to help our users fine-tune Cohere's models more easily, more efficiently and with higher quality.
1. Prerequisites
- You need to have access to a machine with at least one GPU, e.g., H100, H200, etc. The specific required number, memory and model of GPUs depend on your specific use case, e.g., the model to fine-tune, the batch size, the max sequence length in the data, etc.
- You need to install necessary apps, e.g., Docker, Git, etc. on the GPU machine.
To help you better decide the hardware resources you need, we list some feasible scenarios in the following table as a reference, where all the other hyperparameters that are not shown in the table are set as their default values (see [here](#step-4-submit-the-request-to-start-the-fine-tuning)).
| Hardware resources | Base model | Finetune strategy | Batch size | Max sequence length | |:-------------------|:------------------------------------------------------------------------------------|:------------------|:-----------|:--------------------| | 8 * 80GB H100 GPUs | Command R, Command R 08-2024, Command R 7B 12-2024, Aya Expanse 8B, Aya Expanse 32B | LoRA or QLoRA | 8 | 16384 | | 8 * 80GB H100 GPUs | Command R, Command R 08-2024, Command R 7B 12-2024, Aya Expanse 8B, Aya Expanse 32B | LoRA or QLoRA | 16 | 8192 | | 8 * 80GB H100 GPUs | Command R Plus, Command R Plus 08-2024, Command A 03-2025 | LoRA or QLoRA | 8 | 8192 | | 8 * 80GB H100 GPUs | Command R Plus, Command R Plus 08-2024, Command A 03-2025 | LoRA or QLoRA | 16 | 4096 |
2. Setup
Run the commands below on the GPU machine.
git clone git@github.com:cohere-ai/cohere-finetune.git cd cohere-finetune
3. Fine-tuning
Throughout this section and the sections below, we use the notation `` to denote some content that you must change according to your own use case, e.g., names, paths to files or directories, etc. Meanwhile, for any name or path that is not between the angle brackets, you must use it as it is, unless otherwise stated.
You can fine-tune a base model on your own data by following the steps below on the GPU machine (the host).
Step 1. Build the Docker image
Run the command below to build the Docker image, which may take about 18min to finish if it is the first time you build it on the host.
DOCKER_BUILDKIT=1 docker build --rm \ --ssh default \ --target peft-prod \ -t \ -f docker/Dockerfile \ .
Alternatively, you may directly use the image we built for you: skip this step and use our image name ghcr.io/cohere-ai/cohere-finetune:latest as ` in the next step, but this image could be outdated (the most up-to-date version is always on the main` branch).
Step 2. Run the Docker container to start the fine-tuning service
Run the command below to start the fine-tuning service.
docker run -it --rm \ --name \ --gpus \ --ipc=host \ --net=host \ -v ~/.cache:/root/.cache \ -v :/opt/finetuning \ -e PATH_PREFIX=/opt/finetuning/ \ -e ENVIRONMENT=DEV \ -e TASK=FINETUNE \ -e HF_TOKEN= \ -e WANDB_API_KEY= \
Some parameters are explained below:
specifies the GPUs the service can access, which can be, e.g.,'"device=0,1,2,3"'(for GPUs 0, 1, 2, 3) orall` (for all GPUs).- By default, HuggingFace will cache all downloaded models in
~/.cache/huggingface/huband try to fetch the cached model from there when you want to load a model again. Therefore, it is highly recommended to mount~/.cacheon your host to/root/.cachein the container, such that the container will have access to these cached models on your host and avoid going through the time-consuming model downloading process. is the root directory on your host to store all your fine-tunings, and/opt/finetuning` is the corresponding fine-tuning root directory in your container (it can also be changed but you do not have to).PATH_PREFIXis an environment variable that specifies the fine-tuning sub-directory in your container, where `` can be an empty string, i.e., the fine-tuning sub-directory can be equal to the fine-tuning root directory.ENVIRONMENTis an environment variable that specifies the mode of your working environment, which is mainly used to determine the level of logging. If you explicitly set it asDEV, more debugging information will be printed, but if you do not set it or set it as any other value, these debugging information will not be printed.HF_TOKENis an environment variable that specifies your HuggingFace User Access Token.WANDB_API_KEYis an environment…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10New finetune repo with moderate stars