What does this model signal mean?

Moonshot AI (Kimi) published moonshotai/Kimi-Audio-7B. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license mit · 133 HF downloads · A 7B parameter audio model by Moonshot AI.. onlylabs links this event to 1 captured evidence page and 6 related model signals.

Moonshot AI (Kimi) Model: moonshotai/Kimi-Audio-7B

Captured source

source ↗

Hugging Face/huggingface.co/moonshotai/Kimi-Audio-7B

moonshotai/Kimi-Audio-7B model card

Source ↗

published Apr 25, 2025seen Jun 6captured Jun 11http 200method plaintask text-to-speechlicense mitlibrary kimi-audioparams 9.8Bdownloads 133likes 90

Kimi-Audio

🤗 Kimi-Audio-7B | 🤗 Kimi-Audio-7B-Instruct | 📑 Paper

Introduction

We present Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation. This repository hosts the model checkpoints for Kimi-Audio-7B.

Kimi-Audio is designed as a universal audio foundation model capable of handling a wide variety of audio processing tasks within a single unified framework. Key features include:

Universal Capabilities: Handles diverse tasks like speech recognition (ASR), audio question answering (AQA), audio captioning (AAC), speech emotion recognition (SER), sound event/scene classification (SEC/ASC) and end-to-end speech conversation.
State-of-the-Art Performance: Achieves SOTA results on numerous audio benchmarks (see our Technical Report).
Large-Scale Pre-training: Pre-trained on over 13 million hours of diverse audio data (speech, music, sounds) and text data.
Novel Architecture: Employs a hybrid audio input (continuous acoustic + discrete semantic tokens) and an LLM core with parallel heads for text and audio token generation.
Efficient Inference: Features a chunk-wise streaming detokenizer based on flow matching for low-latency audio generation.

For more details, please refer to our GitHub Repository and Technical Report.

Note

Kimi-Audio-7B is a base model without fine-tuning. So it cannot be used directly. The base model is quite flexible, you can fine-tune it on any possible downstream tasks.

If you are looking for an out-of-the-box model, please refer to Kimi-Audio-7B-Instruct.

Citation

If you find Kimi-Audio useful in your research or applications, please cite our technical report:

@misc{kimi_audio_2024,
title={Kimi-Audio Technical Report},
author={Kimi Team},
year={2024},
eprint={arXiv:placeholder},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

License

The model is based and modified from Qwen 2.5-7B. Code derived from Qwen2.5-7B is licensed under the Apache 2.0 License. Other parts of the code are licensed under the MIT License.

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Very low traction, routine model release