arcee-ai/in-context-learning
Python
Captured source
source ↗arcee-ai/in-context-learning
Language: Python
Stars: 3
Forks: 1
Open issues: 1
Created: 2024-11-25T17:27:10Z
Pushed: 2024-12-24T01:11:19Z
Default branch: master
Fork: no
Archived: no
README:
LLM Context Enhancement Experiment
This project implements an experimental framework for evaluating how providing relevant high quality context affects the quality of language model responses. It uses a vector database to retrieve similar question-answer pairs and compares model outputs with and without this additional context.
Overview
The system: 1. Loads reference QA pairs from specified datasets 2. Stores them in a vector database for similarity search 3. For each experimental question:
- Retrieves similar QA pairs as context
- Generates responses both with and without context
- Evaluates response quality using a reward model
- Stores results for analysis
Features
- Parallel processing for efficient vector database population
- Support for multiple LLM architectures
- Configurable embedding and reward models
- SQLite results storage with comprehensive metrics
- GPU acceleration support
- Batched processing for memory efficiency
Requirements
- Python 3.8+
- PyTorch
- Transformers
- vLLM
- LangChain
- ChromaDB
- SQLAlchemy
- Datasets (HuggingFace)
- tqdm
Installation
1. Clone the repository 2. Install dependencies:
pip install -r requirements.txt
Configuration
Edit config.py to customize:
- Dataset sources
- Model selections
- Database paths
- Experiment parameters
Default configuration:
reference_datasets = [
("mlabonne/orca-agentinstruct-1M-v1-cleaned", "default"),
]
experiment_dataset = "HuggingFaceTB/smoltalk"
embedding_model = "BAAI/bge-small-en-v1.5"
llm_model = "Qwen/Qwen2.5-7B-Instruct"
reward_model = "internlm/internlm2-7b-reward"Usage
Quick Start
Run the complete experiment:
bash run_experiment.sh
This will: 1. Populate the vector database using parallel processing 2. Execute the main experiment 3. Store results in SQLite database
Manual Execution
1. Populate vector database:
python parallel_insertion.py --use_gpu
2. Run experiment:
python main.py
Additional Options
Vector database population:
# CPU-only mode python parallel_insertion.py --num_workers 4 # Specify GPU count python parallel_insertion.py --use_gpu --num_workers 2
Project Structure
config.py: Configuration parametersdata_loader.py: Dataset loading utilitiesdatabase.py: Vector and SQL database managementexperiment.py: Core experimental logicmodel_manager.py: Model loading and inferenceparallel_insertion.py: Parallel vector database populationmain.py: Experiment entry pointrun_experiment.sh: Convenience script
Key Components
DataLoader
Handles loading and preprocessing of reference and experimental datasets.
DatabaseManager
Manages two database systems:
- ChromaDB for vector similarity search
- SQLite for experimental results storage
ModelManager
Handles:
- Model loading/unloading
- Response generation
- Response quality evaluation
OptimizedExperiment
Orchestrates the experimental process: 1. Vector database setup 2. Batch processing of questions 3. Context-based response generation 4. Quality evaluation 5. Results storage
Results Storage
Results are stored in SQLite with the following schema:
- question: Original question
- context_score: Similarity score of retrieved context
- context_qa: Retrieved similar QA pair
- with_context_answer: Model response with context
- without_context_answer: Model response without context
- with_context_score: Quality score with context
- without_context_score: Quality score without context
- with_context_better: Boolean indicating if context improved response
Notability
notability 1.0/10Low stars, routine new repo