sarvamai/torchtune
forked from meta-pytorch/torchtune
Captured source
source ↗sarvamai/torchtune
Description: PyTorch native finetuning library
Language: Python
License: BSD-3-Clause
Stars: 0
Forks: 3
Open issues: 0
Created: 2024-12-03T11:27:53Z
Pushed: 2025-01-19T11:38:19Z
Default branch: main
Fork: yes
Parent repository: meta-pytorch/torchtune
Archived: no
README:
torchtune
 !Recipe Integration Test 
[Introduction](#introduction) | [Installation](#installation) | [Get Started](#get-started) | **Documentation** | [Community](#community) | [License](#license) | [Citing torchtune](#citing-torchtune)
📣 Recent updates 📣
- *December 2024*: torchtune now supports Llama 3.3 70B! Try it out by following our installation instructions [here](#Installation), then run any of the configs [here](recipes/configs/llama3_3).
- *November 2024*: torchtune has released v0.4.0 which includes stable support for exciting features like activation offloading and multimodal QLoRA
- *November 2024*: torchtune has added [Gemma2](recipes/configs/gemma2) to its models!
- *October 2024*: torchtune added support for Qwen2.5 models - find the recipes [here](recipes/configs/qwen2_5/)
- *September 2024*: torchtune has support for Llama 3.2 11B Vision, Llama 3.2 3B, and Llama 3.2 1B models! Try them out by following our installation instructions [here](#Installation), then run any of the text configs [here](recipes/configs/llama3_2) or vision configs [here](recipes/configs/llama3_2_vision).
Introduction
torchtune is a PyTorch library for easily authoring, finetuning and experimenting with LLMs.
torchtune provides:
- PyTorch implementations of popular LLMs from Llama, Gemma, Mistral, Phi, and Qwen model families
- Hackable training recipes for full finetuning, LoRA, QLoRA, DPO, PPO, QAT, knowledge distillation, and more
- Out-of-the-box memory efficiency, performance improvements, and scaling with the latest PyTorch APIs
- YAML configs for easily configuring training, evaluation, quantization or inference recipes
- Built-in support for many popular dataset formats and prompt templates
Models
torchtune currently supports the following models.
| Model | Sizes | |-----------------------------------------------|-----------| | Llama3.3 | 70B [[models](torchtune/models/llama3_3/_model_builders.py), [configs](recipes/configs/llama3_3/)] | | Llama3.2-Vision-) | 11B, 90B [[models](torchtune/models/llama3_2_vision/_model_builders.py), [configs](recipes/configs/llama3_2_vision/)] | | Llama3.2 | 1B, 3B [[models](torchtune/models/llama3_2/_model_builders.py), [configs](recipes/configs/llama3_2/)] | | Llama3.1 | 8B, 70B, 405B [[models](torchtune/models/llama3_1/_model_builders.py), [configs](recipes/configs/llama3_1/)] | | Llama3 | 8B, 70B [[models](torchtune/models/llama3/_model_builders.py), [configs](recipes/configs/llama3/)] | | Llama2 | 7B, 13B, 70B [[models](torchtune/models/llama2/_model_builders.py), [configs](recipes/configs/llama2/)] | | Code-Llama2 | 7B, 13B, 70B [[models](torchtune/models/code_llama2/_model_builders.py), [configs](recipes/configs/code_llama2/)] | | Mistral | 7B [[models](torchtune/models/mistral/_model_builders.py), [configs](recipes/configs/mistral/)] | | Gemma | 2B, 7B [[models](torchtune/models/gemma/_model_builders.py), [configs](recipes/configs/gemma/)] | | Gemma2 | 2B, 9B, 27B [[models](torchtune/models/gemma2/_model_builders.py), [configs](recipes/configs/gemma2/)] | | Microsoft Phi3 | Mini [[models](torchtune/models/phi3/), [configs](recipes/configs/phi3/)] | Qwen2 | 0.5B, 1.5B, 7B [[models](torchtune/models/qwen2/), [configs](recipes/configs/qwen2/)] | Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B [[models](torchtune/models/qwen2_5/), [configs](recipes/configs/qwen2_5/)]
We're always adding new models, but feel free to file an issue if there's a new one you would like to see in torchtune.
Finetuning recipes
torchtune provides the following finetuning recipes for training on one or more devices.
| Finetuning Method | Devices | Recipe | Example Config(s) | |:-:|:-:|:-:|:-:| | Full Finetuning | 1-8 | [full_finetune_single_device](recipes/full_finetune_single_device.py) [full_finetune_distributed](recipes/full_finetune_distributed.py)| [Llama3.1 8B single-device](recipes/configs/llama3_1/8B_full_single_device.yaml) [Llama 3.1 70B distributed](recipes/configs/llama3_1/70B_full.yaml) | LoRA Finetuning | 1-8 | [lora_finetune_single_device](recipes/lora_finetune_single_device.py) [lora_finetune_distributed](recipes/lora_finetune_distributed.py) | [Qwen2 0.5B single-device](recipes/configs/qwen2/0.5B_lora_single_device.yaml) [Gemma 7B distributed](recipes/configs/gemma/7B_lora.yaml) | QLoRA Finetuning | 1-8 | [lora_finetune_single_device](recipes/lora_finetune_single_device.py) [lora_finetune_distributed](recipes/lora_finetune_distributed.py)| [Phi3 Mini single-device](recipes/configs/phi3/mini_qlora_single_device.yaml) [Llama 3.1 405B distributed](recipes/configs/llama3_1/405B_qlora.yaml) | DoRA/QDoRA Finetuning | 1-8 | [lora_finetune_single_device](recipes/lora_finetune_single_device.py) [lora_finetune_distributed](recipes/lora_finetune_distributed.py)| [Llama3 8B QDoRA...
Excerpt shown — open the source for the full document.
Notability
notability 2.0/10Routine fork of a repo