arcee-ai/spectrum
forked from QuixiAI/spectrum
Captured source
source ↗arcee-ai/spectrum
Description: for spectral randomness
License: Apache-2.0
Stars: 0
Forks: 0
Open issues: 0
Created: 2025-02-06T01:50:19Z
Pushed: 2025-02-06T01:58:34Z
Default branch: main
Fork: yes
Parent repository: QuixiAI/spectrum
Archived: no
README:
Spectrum
This repository contains the implementation of Spectrum, as detailed in the paper Spectrum: Targeted Training on Signal to Noise Ratio.
Overview
Spectrum is a tool for scanning and evaluating the Signal-to-Noise Ratio (SNR) of layers in large language models. By identifying the top n% of layers with the highest SNR, you can optimize training efficiency.
Features
- Model Scanning: Scan models to determine the SNR of each layer.
- Top n% Layer Identification: Identify and sort the top n% of layers based on their SNR.
- Unfrozen Parameters Configuration: Generate configuration files for unfreezing specific layers in Axolotl or other libraries.
Installation
To use Spectrum, you need to have Python installed. Clone this repository and install the necessary dependencies:
git clone https://github.com/cognitivecomputations/spectrum.git cd spectrum pip install -r requirements.txt
Usage
To use Spectrum, run the following command:
python spectrum.py --model-name --top-percent
--model-name: Specify the local model path or the Hugging Face repository.--top-percent: Specify the top percentage of SNR layers you want to retrieve.
Spectrum will check the model_snr_results folder to see if the model has already been scanned. If not, it will prompt you for the batch size to use for scanning. Once the scan is complete, it will output the SNR ratios in the model_snr_results folder and provide a sorted list of the highest to lowest SNR ratios along with an unfrozen parameters YAML file.
Example Command
python spectrum.py --model-name meta-llama/Meta-Llama-3-8B-Instruct --top-percent 50
Scanning New Models
It will check the model_snr_results folder to see if we've already scanned it (we invite you to add your own scans for models we don't have via PR) - if we have, it will give you the top n% of those ratios. Otherwise, it will ask you what batch size you want to scan at. We've been able to use a batch_size of 4 for 70b models on an 8xH100 node. It will then load the model, and you will be presented with all available modules to scan. We typically only select the MLP/attn layers - but if you're doing continued pretraining or language tasks it wouldn't hurt to include all available modules.
It will then scan the model, and output the snr ratios in the model_snr_results folder. It will also output a sorted from highest to lowest SNR ratios, along with an unfrozen parameters yaml. This matches an axolotl config, and you can copy and paste it directly into your axolotl yaml. That's it!
Integration with Axolotl
If you're using Axolotl, the generated YAML file can be directly integrated.
Integration with Other Libraries
For integration with other libraries, we provide a simple script to freeze and unfreeze parameters:
def _freeze_and_unfreeze_parameters(self): # Freeze all parameters for param in self.model.parameters(): param.requires_grad = False #unfreeze spectrum parameters for name, param in self.model.named_parameters(): if any(unfrozen_param in name for unfrozen_param in self.unfrozen_parameters): param.requires_grad = True unfrozen_parameters = [ 'model.layers.62.mlp.down_proj', 'model.layers.63.mlp.down_proj', 'model.layers.66.mlp.down_proj', 'model.layers.65.mlp.down_proj', 'model.layers.64.mlp.down_proj', 'model.layers.67.mlp.down_proj', 'model.layers.68.mlp.down_proj', 'model.layers.60.mlp.down_proj', 'model.layers.31.mlp.down_proj', 'model.layers.69.mlp.down_proj', 'model.layers.61.mlp.down_proj', 'model.layers.59.mlp.down_proj', 'model.layers.70.mlp.down_proj', 'model.layers.30.mlp.down_proj', 'model.layers.76.mlp.down_proj', 'model.layers.72.mlp.down_proj', 'model.layers.77.mlp.down_proj', 'model.layers.71.mlp.down_proj', 'model.layers.29.mlp.down_proj', 'model.layers.58.mlp.down_proj', 'model.layers.78.mlp.gate_proj', 'model.layers.77.mlp.gate_proj', 'model.layers.76.mlp.gate_proj', 'model.layers.79.mlp.gate_proj', 'model.layers.75.mlp.gate_proj', 'model.layers.74.mlp.gate_proj', 'model.layers.73.mlp.gate_proj', 'model.layers.70.mlp.gate_proj', 'model.layers.72.mlp.gate_proj', 'model.layers.71.mlp.gate_proj', 'model.layers.69.mlp.gate_proj', 'model.layers.54.mlp.gate_proj', 'model.layers.68.mlp.gate_proj', 'model.layers.57.mlp.gate_proj', 'model.layers.63.mlp.gate_proj', 'model.layers.49.mlp.gate_proj', 'model.layers.55.mlp.gate_proj', 'model.layers.53.mlp.gate_proj', 'model.layers.44.mlp.gate_proj', 'model.layers.46.mlp.gate_proj', 'model.layers.69.mlp.up_proj', 'model.layers.70.mlp.up_proj', 'model.layers.71.mlp.up_proj', 'model.layers.68.mlp.up_proj', 'model.layers.67.mlp.up_proj', 'model.layers.66.mlp.up_proj', 'model.layers.46.mlp.up_proj', 'model.layers.63.mlp.up_proj', 'model.layers.72.mlp.up_proj', 'model.layers.64.mlp.up_proj', 'model.layers.62.mlp.up_proj', 'model.layers.45.mlp.up_proj', 'model.layers.65.mlp.up_proj', 'model.layers.73.mlp.up_proj', 'model.layers.47.mlp.up_proj', 'model.layers.44.mlp.up_proj', 'model.layers.49.mlp.up_proj', 'model.layers.48.mlp.up_proj', 'model.layers.53.mlp.up_proj', 'model.layers.74.mlp.up_proj', 'model.layers.79.self_attn.k_proj', 'model.layers.36.self_attn.k_proj', 'model.layers.35.self_attn.k_proj', 'model.layers.74.self_attn.k_proj', 'model.layers.34.self_attn.k_proj', 'model.layers.78.self_attn.k_proj', 'model.layers.77.self_attn.k_proj', 'model.layers.37.self_attn.k_proj', 'model.layers.39.self_attn.k_proj', 'model.layers.41.self_attn.k_proj', 'model.layers.38.self_attn.k_proj', 'model.layers.33.self_attn.k_proj', 'model.layers.69.self_attn.k_proj', 'model.layers.42.self_attn.k_proj', 'model.layers.32.self_attn.k_proj', 'model.layers.25.self_attn.k_proj', 'model.layers.70.self_attn.k_proj', 'model.layers.22.self_attn.k_proj', 'model.layers.63.self_attn.k_proj', 'model.layers.29.self_attn.k_proj', 'model.layers.14.self_attn.o_proj', 'model.layers.39.self_attn.o_proj', 'model.layers.19.self_attn.o_proj', 'model.layers.16.self_attn.o_proj', 'model.layers.17.self_attn.o_proj', 'model.layers.15.self_attn.o_proj', 'model.layers.69.self_attn.o_proj', 'model.layers.12.self_attn.o_proj',…
Excerpt shown — open the source for the full document.
Notability
notability 1.0/10Routine fork by same org