What does this repo signal mean?

Xiaomi (MiMo) published XiaomiMiMo/MiMo-Audio-Training (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo XiaomiMiMo/MiMo-Audio-Training · language Python · Xiaomi audio training repo with moderate stars.. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Xiaomi (MiMo) Repo: XiaomiMiMo/MiMo-Audio-Training

Captured source

source ↗

GitHub/github.com/XiaomiMiMo/MiMo-Audio-Training

XiaomiMiMo/MiMo-Audio-Training repository metadata

Source ↗

published Oct 16, 2025seen Jun 5captured Jun 11http 200method plain

XiaomiMiMo/MiMo-Audio-Training

Language: Python

Stars: 109

Forks: 13

Open issues: 5

Created: 2025-10-16T13:52:54Z

Pushed: 2025-10-16T13:55:07Z

Default branch: main

Fork: no

Archived: no

README:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

MiMo-Audio-Training Toolkit

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Introduction

Welcome to the MiMo-Audio-Training toolkit! This toolkit is designed to fine-tune the XiaomiMiMo/MiMo-Audio-7B-Instruct. This toolkit serves as a reference implementation for researchers and developers interested in MiMo-Audio and looking to adapt it to their own custom tasks.

Supported Tasks

The MiMo-Audio-Eval toolkit supports a comprehensive set of tasks. Some of the key features include:

Tasks:

SFT:

ASR
TTS / InstructTTS
Audio Understanding and Reasoning
Spoken Dialogue

Getting Started

To get started with the MiMo-Audio-Training toolkit, follow the instructions below to set up the environment and install the required dependencies.

Prerequisites (Linux)

Python 3.12
CUDA >= 12.0

Installation:

git clone --recurse-submodules https://github.com/XiaomiMiMo/MiMo-Audio-Training
cd MiMo-Audio-Training
pip install -r requirements.txt
pip install flash-attn==2.7.4.post1
pip install -e .

> \[!Note] > If the compilation of flash-attn takes too long, you can download the precompiled wheel and install it manually: > > * Download Precompiled Wheel > > ``sh > pip install /path/to/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp312-cp312-linux_x86_64.whl >

Training Process:

Download the fine-tuning Dataset and pre-process the data as the instruct_template.md

Training

We provide multiple training scripts under the scripts directory, supporting both single-GPU and multi-GPU training setups.

cd MiMo-Audio-Training
bash scripts/train_multiGPU_torchrun.sh

Generate and Evaluation

Run inference using: generate.py

Evaluate the SFT model with 🌐MiMo-Audio-Eval.

Citation

@misc{coreteam2025mimoaudio,
title={MiMo-Audio: Audio Language Models are Few-Shot Learners},
author={LLM-Core-Team Xiaomi},
year={2025},
url={https://github.com/XiaomiMiMo/MiMo-Audio},
}

Contact

Please contact us at [mimo@xiaomi.com](mailto:mimo@xiaomi.com) or open an issue if you have any questions.

Notability

notability 6.0/10

Xiaomi audio training repo with moderate stars.