RepoMeituan (LongCat)Meituan (LongCat)published Oct 16, 2025seen 5d

meituan-longcat/LongCat-Audio-Codec

Python

Open original ↗

Captured source

source ↗

meituan-longcat/LongCat-Audio-Codec

Description: LongCat Audio Tokenizer and Detokenizer

Language: Python

License: MIT

Stars: 301

Forks: 23

Open issues: 1

Created: 2025-10-16T12:06:05Z

Pushed: 2026-05-09T10:22:27Z

Default branch: main

Fork: no

Archived: no

README:

LongCat-Audio-Codec is an audio tokenizer and detokenizer solution designed for speech large language models. It works by generating semantic and acoustic tokens in parallel, enabling high-fidelity audio reconstruction at extremely low bitrates with excellent backend support for Speech LLM.

✨ Key Features

  • High Fidelity at Ultra-Low Bitrates: As a codec, it achieves high-intelligibility audio reconstruction at extremely low bitrates.
  • Low-Frame-Rate Tokenizer: As a tokenizer,it extracting semantic tokens and acoustic tokens in parallel at a low frame rate of 16.6Hz, with flexible acoustic codebook configurations to adapt to different downstream tasks.
  • Low-Latency Streaming Detokenizer: Equipped with a specially designed streaming-capable detokenizer that requires minimal future information to deliver high-quality audio output with low latency.
  • Super-Resolution Capability: Integrates audio super-resolution processing into the detokenizer, enabling the generation of high-quality audio with a higher sample rate than the original input.

🚀Quick Start

🛠️Installation

1. Create a conda environment and install pytorch

Note: The torch version listed in below is just an example. Please install the version of PyTorch that matches your specific hardware configuration

conda create -n LongCat-Audio-Codec python=3.10
conda activate LongCat-Audio-Codec
pip install torch==2.7.1 torchaudio==2.7.1

2. Other dependencies:

pip install -r requirements.txt

📦Model Preparation

1. Model Download

| Models | Download Link | Notes | | ----------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | | LongCatAudioCodec_encoder | 🤗 Huggingface | encoder weights with semantic encoder and acoustic encoder | | LongCatAudioCodec_encoder_cmvn | 🤗 Huggingface | coefficients of Cepstral Mean and Variance Normalization, used by semantic encoder | | LongCatAudioCodec_decoder16k_4codebooks | 🤗 Huggingface | 16k decoder, supply 1 semantic codebook and at most 3 acoustic codebooks | | LongCatAudioCodec_decoder24k_2codebooks | 🤗 Huggingface | 24k decoder, supply 1 semantic codebook and 1 acoustic codebook, SFT on limited speakers | | LongCatAudioCodec_decoder24k_4codebooks | 🤗 Huggingface | 24k decoder, supply 1 semantic codebook and at most 3 acoustic codebooks |

2. Link LongCat-Audio-Codec Model to Right Path

After downloading the model checkpoint files (.pt files), you have two options for making them accessible to the inference script:

Option 1: Place Models in the Default ckpts Directory (Recommended)

For a quick setup,all models and configuration files must be placed within the LongCat-Audio-Codec/ project root directory.

The final, correct project structure should look exactly like this:

LongCat-Audio-Codec/ longcat-team@meituan.com or join our WeChat Group if you have any questions.

#### WeChat Group

Notability

notability 6.0/10

Solid new audio codec repo, 301 stars