What does this fork signal mean?

DeepInfra forked deepinfra/Kokoro-FastAPI (forked from remsky/Kokoro-FastAPI). This fork signal points to upstream code the lab may be inspecting, patching, or building on. High-signal details: repo deepinfra/Kokoro-FastAPI · parent remsky/Kokoro-FastAPI · Routine fork, no traction. onlylabs links this event to 1 captured evidence page and 6 related fork signals.

DeepInfra Fork: deepinfra/Kokoro-FastAPI

Captured source

source ↗

GitHub/github.com/deepinfra/Kokoro-FastAPI

deepinfra/Kokoro-FastAPI repository metadata

Source ↗

published May 29, 2025seen Jun 5captured Jun 11http 200method plain

deepinfra/Kokoro-FastAPI

Description: Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 0

Created: 2025-05-29T21:01:19Z

Pushed: 2025-05-28T14:57:55Z

Default branch: master

Fork: yes

Parent repository: remsky/Kokoro-FastAPI

Archived: no

README:

_`FastKoko`_

Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model

Multi-language support (English, Japanese, Chinese, _Vietnamese soon_)
OpenAI-compatible Speech endpoint, NVIDIA GPU accelerated or CPU inference with PyTorch
ONNX support coming soon, see v0.1.5 and earlier for legacy ONNX support in the interim
Debug endpoints for monitoring system stats, integrated web UI on localhost:8880/web
Phoneme-based audio generation, phoneme generation
Per-word timestamped caption generation
Voice mixing with weighted combinations

Integration Guides

Get Started

Quickest Start (docker run)

Pre built images are available to run, with arm/multi-arch support, and baked in models Refer to the core/config.py file for a full list of variables which can be managed via the environment

# the `latest` tag can be used, though it may have some unexpected bonus features which impact stability.
Named versions should be pinned for your regular usage.
Feedback/testing is always welcome

docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest # CPU, or:
docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:latest #NVIDIA GPU

Quick Start (docker compose)

1. Install prerequisites, and start the service using Docker Compose (Full setup including UI):

Install Docker
Clone the repository:

git clone https://github.com/remsky/Kokoro-FastAPI.git
cd Kokoro-FastAPI

cd docker/gpu # For GPU support
# or cd docker/cpu # For CPU support
docker compose up --build

# *Note for Apple Silicon (M1/M2) users:
# The current GPU build relies on CUDA, which is not supported on Apple Silicon.
# If you are on an M1/M2/M3 Mac, please use the `docker/cpu` setup.
# MPS (Apple's GPU acceleration) support is planned but not yet available.

# Models will auto-download, but if needed you can manually download:
python docker/scripts/download_model.py --output api/src/models/v1_0

# Or run directly via UV:
./start-gpu.sh # For GPU support
./start-cpu.sh # For CPU support

Direct Run (via uv)

1. Install prerequisites ():

Install astral-uv
Install espeak-ng in your system if you want it available as a fallback for unknown words/sounds. The upstream libraries may attempt to handle this, but results have varied.
Clone the repository:

git clone https://github.com/remsky/Kokoro-FastAPI.git
cd Kokoro-FastAPI

Run the model download script if you haven't already

Start directly via UV (with hot-reload)

Linux and macOS

./start-cpu.sh OR
./start-gpu.sh

Windows

.\start-cpu.ps1 OR
.\start-gpu.ps1

Up and Running?

Run locally as an OpenAI-Compatible Speech Endpoint

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:8880/v1", api_key="not-needed"
)

with client.audio.speech.with_streaming_response.create(
model="kokoro",
voice="af_sky+af_bella", #single or multiple voicepack combo
input="Hello world!"
) as response:
response.stream_to_file("output.mp3")

The API will be available at http://localhost:8880
API Documentation: http://localhost:8880/docs

Web Interface: http://localhost:8880/web

Features

OpenAI-Compatible Speech Endpoint

# Using OpenAI's Python library
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8880/v1", api_key="not-needed")
response = client.audio.speech.create(
model="kokoro",
voice="af_bella+af_sky", # see /api/src/core/openai_mappings.json to customize
input="Hello world!",
response_format="mp3"
)

response.stream_to_file("output.mp3")

Or Via Requests:

import requests

response = requests.get("http://localhost:8880/v1/audio/voices")
voices = response.json()["voices"]

# Generate audio
response = requests.post(
"http://localhost:8880/v1/audio/speech",
json={
"model": "kokoro",
"input": "Hello world!",
"voice": "af_bella",
"response_format": "mp3", # Supported: mp3, wav, opus, flac
"speed": 1.0
}
)

# Save audio
with open("output.mp3", "wb") as f:
f.write(response.content)

Quick tests (run from another terminal):

python examples/assorted_checks/test_openai/test_openai_tts.py # Test OpenAI Compatibility
python examples/assorted_checks/test_voices/test_all_voices.py # Test all available voices

Voice Combination

Weighted voice combinations using ratios (e.g., "af_bella(2)+af_heart(1)" for 67%/33% mix)
Ratios are automatically normalized to sum to 100%
Available through any endpoint by adding weights in parentheses
Saves generated voicepacks for future use

Combine voices and generate audio:

import requests
response = requests.get("http://localhost:8880/v1/audio/voices")
voices = response.json()["voices"]

# Example 1: Simple voice combination (50%/50% mix)
response = requests.post(
"http://localhost:8880/v1/audio/speech",
json={
"input": "Hello world!",
"voice": "af_bella+af_sky", # Equal weights
"response_format": "mp3"
}
)

# Example 2: Weighted voice combination (67%/33% mix)
response = requests.post(
"http://localhost:8880/v1/audio/speech",
json={
"input": "Hello world!",
"voice": "af_bella(2)+af_sky(1)", # 2:1 ratio = 67%/33%
"response_format": "mp3"
}
)

# Example 3: Download combined voice as .pt file
response = requests.post(
"http://localhost:8880/v1/audio/voices/combine",
json="af_bella(2)+af_sky(1)" # 2:1 ratio = 67%/33%
)

# Save the .pt file
with open("combined_voice.pt", "wb") as f:
f.write(response.content)

# Use the downloaded voice file
response = requests.post(
"http://localhost:8880/v1/audio/speech",
json={
"input": "Hello world!",
"voice": "combined_voice", # Use the saved voice file
"response_format": "mp3"
}
)

Multiple Output Audio Formats

mp3
wav
opus
flac
m4a
pcm

Streaming Support

# OpenAI-compatible streaming
from openai import OpenAI
client = OpenAI(...

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Routine fork, no traction

deepinfra/Kokoro-FastAPI

_FastKoko_

Integration Guides

Get Started

Features

_`FastKoko`_