ForkDeepInfraDeepInfrapublished May 29, 2025seen 5d

deepinfra/Kokoro-FastAPI

forked from remsky/Kokoro-FastAPI

Open original ↗

Captured source

source ↗
published May 29, 2025seen 5dcaptured 15hhttp 200method plain

deepinfra/Kokoro-FastAPI

Description: Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 0

Created: 2025-05-29T21:01:19Z

Pushed: 2025-05-28T14:57:55Z

Default branch: master

Fork: yes

Parent repository: remsky/Kokoro-FastAPI

Archived: no

README:

_FastKoko_

Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model

  • Multi-language support (English, Japanese, Chinese, _Vietnamese soon_)
  • OpenAI-compatible Speech endpoint, NVIDIA GPU accelerated or CPU inference with PyTorch
  • ONNX support coming soon, see v0.1.5 and earlier for legacy ONNX support in the interim
  • Debug endpoints for monitoring system stats, integrated web UI on localhost:8880/web
  • Phoneme-based audio generation, phoneme generation
  • Per-word timestamped caption generation
  • Voice mixing with weighted combinations

Integration Guides

Get Started

Quickest Start (docker run)

Pre built images are available to run, with arm/multi-arch support, and baked in models Refer to the core/config.py file for a full list of variables which can be managed via the environment

# the `latest` tag can be used, though it may have some unexpected bonus features which impact stability.
Named versions should be pinned for your regular usage.
Feedback/testing is always welcome

docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest # CPU, or:
docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:latest #NVIDIA GPU

Quick Start (docker compose)

1. Install prerequisites, and start the service using Docker Compose (Full setup including UI):

  • Install Docker
  • Clone the repository:
git clone https://github.com/remsky/Kokoro-FastAPI.git
cd Kokoro-FastAPI

cd docker/gpu # For GPU support
# or cd docker/cpu # For CPU support
docker compose up --build

# *Note for Apple Silicon (M1/M2) users:
# The current GPU build relies on CUDA, which is not supported on Apple Silicon.
# If you are on an M1/M2/M3 Mac, please use the `docker/cpu` setup.
# MPS (Apple's GPU acceleration) support is planned but not yet available.

# Models will auto-download, but if needed you can manually download:
python docker/scripts/download_model.py --output api/src/models/v1_0

# Or run directly via UV:
./start-gpu.sh # For GPU support
./start-cpu.sh # For CPU support

Direct Run (via uv)

1. Install prerequisites ():

  • Install astral-uv
  • Install espeak-ng in your system if you want it available as a fallback for unknown words/sounds. The upstream libraries may attempt to handle this, but results have varied.
  • Clone the repository:
git clone https://github.com/remsky/Kokoro-FastAPI.git
cd Kokoro-FastAPI

Run the model download script if you haven't already

Start directly via UV (with hot-reload)

Linux and macOS

./start-cpu.sh OR
./start-gpu.sh

Windows

.\start-cpu.ps1 OR
.\start-gpu.ps1

Up and Running?

Run locally as an OpenAI-Compatible Speech Endpoint

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:8880/v1", api_key="not-needed"
)

with client.audio.speech.with_streaming_response.create(
model="kokoro",
voice="af_sky+af_bella", #single or multiple voicepack combo
input="Hello world!"
) as response:
response.stream_to_file("output.mp3")
  • The API will be available at http://localhost:8880
  • API Documentation: http://localhost:8880/docs
  • Web Interface: http://localhost:8880/web

Features

OpenAI-Compatible Speech Endpoint

# Using OpenAI's Python library
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8880/v1", api_key="not-needed")
response = client.audio.speech.create(
model="kokoro",
voice="af_bella+af_sky", # see /api/src/core/openai_mappings.json to customize
input="Hello world!",
response_format="mp3"
)

response.stream_to_file("output.mp3")

Or Via Requests:

import requests

response = requests.get("http://localhost:8880/v1/audio/voices")
voices = response.json()["voices"]

# Generate audio
response = requests.post(
"http://localhost:8880/v1/audio/speech",
json={
"model": "kokoro",
"input": "Hello world!",
"voice": "af_bella",
"response_format": "mp3", # Supported: mp3, wav, opus, flac
"speed": 1.0
}
)

# Save audio
with open("output.mp3", "wb") as f:
f.write(response.content)

Quick tests (run from another terminal):

python examples/assorted_checks/test_openai/test_openai_tts.py # Test OpenAI Compatibility
python examples/assorted_checks/test_voices/test_all_voices.py # Test all available voices

Voice Combination

  • Weighted voice combinations using ratios (e.g., "af_bella(2)+af_heart(1)" for 67%/33% mix)
  • Ratios are automatically normalized to sum to 100%
  • Available through any endpoint by adding weights in parentheses
  • Saves generated voicepacks for future use

Combine voices and generate audio:

import requests
response = requests.get("http://localhost:8880/v1/audio/voices")
voices = response.json()["voices"]

# Example 1: Simple voice combination (50%/50% mix)
response = requests.post(
"http://localhost:8880/v1/audio/speech",
json={
"input": "Hello world!",
"voice": "af_bella+af_sky", # Equal weights
"response_format": "mp3"
}
)

# Example 2: Weighted voice combination (67%/33% mix)
response = requests.post(
"http://localhost:8880/v1/audio/speech",
json={
"input": "Hello world!",
"voice": "af_bella(2)+af_sky(1)", # 2:1 ratio = 67%/33%
"response_format": "mp3"
}
)

# Example 3: Download combined voice as .pt file
response = requests.post(
"http://localhost:8880/v1/audio/voices/combine",
json="af_bella(2)+af_sky(1)" # 2:1 ratio = 67%/33%
)

# Save the .pt file
with open("combined_voice.pt", "wb") as f:
f.write(response.content)

# Use the downloaded voice file
response = requests.post(
"http://localhost:8880/v1/audio/speech",
json={
"input": "Hello world!",
"voice": "combined_voice", # Use the saved voice file
"response_format": "mp3"
}
)

Multiple Output Audio Formats

  • mp3
  • wav
  • opus
  • flac
  • m4a
  • pcm

Streaming Support

# OpenAI-compatible streaming
from openai import OpenAI
client = OpenAI(…

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Routine fork, no traction