What does this fork signal mean?

Sarvam AI forked sarvamai/pyannote-audio (forked from pyannote/pyannote-audio). This fork signal points to upstream code the lab may be inspecting, patching, or building on. High-signal details: repo sarvamai/pyannote-audio · parent pyannote/pyannote-audio · Routine fork, low traction. onlylabs links this event to 1 captured evidence page and 6 related fork signals.

Sarvam AI Fork: sarvamai/pyannote-audio

Captured source

source ↗

GitHub/github.com/sarvamai/pyannote-audio

sarvamai/pyannote-audio repository metadata

Source ↗

published Jun 21, 2025seen Jun 5captured Jun 11http 200method plain

sarvamai/pyannote-audio

Description: Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

License: MIT

Stars: 1

Forks: 1

Open issues: 0

Created: 2025-06-21T15:19:14Z

Pushed: 2025-06-23T19:55:12Z

Default branch: main

Fork: yes

Parent repository: pyannote/pyannote-audio

Archived: no

README: Using pyannote.audio open-source toolkit in production? Consider switching to pyannoteAI for better and faster options.

`pyannote.audio` speaker diarization toolkit

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on [PyTorch](pytorch.org) machine learning framework, it comes with state-of-the-art pretrained models and pipelines, that can be further finetuned to your own data for even better performance.

TL;DR

1. Install `pyannote.audio` with pip install pyannote.audio 2. Accept `pyannote/segmentation-3.0` user conditions 3. Accept `pyannote/speaker-diarization-3.1` user conditions 4. Create access token at `hf.co/settings/tokens`.

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1",
use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# send pipeline to GPU (when available)
import torch
pipeline.to(torch.device("cuda"))

# apply pretrained pipeline
diarization = pipeline("audio.wav")

# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...

Highlights

:hugs: pretrained pipelines (and models) on :hugs: model hub
:exploding_head: state-of-the-art performance (see [Benchmark](#benchmark))
:snake: Python-first API
:zap: multi-GPU training with pytorch-lightning

Documentation

[Changelog](CHANGELOG.md)
[Frequently asked questions](FAQ.md)
Models
Available tasks explained
[Applying a pretrained model](tutorials/applying_a_model.ipynb)
[Training, fine-tuning, and transfer learning](tutorials/training_a_model.ipynb)
Pipelines
Available pipelines explained
[Applying a pretrained pipeline](tutorials/applying_a_pipeline.ipynb)
[Adapting a pretrained pipeline to your own data](tutorials/adapting_pretrained_pipeline.ipynb)
[Training a pipeline](tutorials/voice_activity_detection.ipynb)
Contributing
[Adding a new model](tutorials/add_your_own_model.ipynb)
[Adding a new task](tutorials/add_your_own_task.ipynb)
Adding a new pipeline
Sharing pretrained models and pipelines
Blog
2022-12-02 > ["How I reached 1st place at Ego4D 2022, 1st place at Albayzin 2022, and 6th place at VoxSRC 2022 speaker diarization challenges"](tutorials/adapting_pretrained_pipeline.ipynb)
2022-10-23 > "One speaker segmentation model to rule them all"
2021-08-05 > "Streaming voice activity detection with pyannote.audio"
Videos
Introduction to speaker diarization / JSALT 2023 summer school / 90 min
Speaker segmentation model / Interspeech 2021 / 3 min
First release of pyannote.audio / ICASSP 2020 / 8 min
Community contributions (not maintained by the core team)
2024-04-05 > [Offline speaker diarization (speaker-diarization-3.1)](tutorials/community/offline_usage_speaker_diarization.ipynb) by Simon Ottenhaus

Benchmark

Out of the box, pyannote.audio speaker diarization pipeline v3.1 is expected to be much better (and faster) than v2.x. Those numbers are diarization error rates (in %):

| Benchmark | v2.1 | v3.1 | pyannoteAI | | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------------ | | AISHELL-4 | 14.1 | 12.2 | 11.9 | | AliMeeting (channel 1) | 27.4 | 24.4 | 22.5 | | AMI (IHM) | 18.9 | 18.8 | 16.6 | | AMI (SDM) | 27.1 | 22.4 | 20.9 | | AVA-AVD | 66.3 | 50.0 | 39.8 | | CALLHOME (part 2) | 31.6 | 28.4 | 22.2 | | DIHARD 3 (full) | 26.9 | 21.7 | 17.2 | | Earnings21 | 17.0 | 9.4 | 9.0 | | Ego4D (dev.) | 61.5 | 51.2 | 43.8 | | MSDWild | 32.8 | 25.3 | 19.8 | | RAMC | 22.5 | 22.2 | 18.4 | | REPERE (phase2) | 8.2 | 7.8 | 7.6 | | VoxConverse (v0.3) | 11.2 | 11.3 | 9.4 |

Diarization error rate (in %)

Citations

If you use pyannote.audio please use the following citations:

@inproceedings{Plaquet23,
author={Alexis Plaquet and Hervé Bredin},
title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
}

@inproceedings{Bredin23,
author={Hervé...

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Routine fork, low traction