stepfun-ai/StepAudio-Skills
Python
Captured source
source ↗stepfun-ai/StepAudio-Skills
Description: Audio skills for Claw
Language: Python
License: Apache-2.0
Stars: 26
Forks: 2
Open issues: 0
Created: 2026-03-09T12:48:01Z
Pushed: 2026-04-16T10:17:46Z
Default branch: main
Fork: no
Archived: no
README:
StepAudio-Skills (StepFun TTS + ASR + Speech Reasoning skills)
This repository combines three standalone StepFun skills:
step-tts: text-to-speech and voice cloning via StepFun TTSstep-asr: speech-to-text via StepFun ASR streaming APIstepfun-step-audio-r1-1: non-streaming speech reasoning turns via StepFun Chat Completions (step-audio-r1.1)
The three skills share one repo layout, while their underlying implementations remain separate:
- TTS stays in shell:
skills/step-tts/scripts/tts.sh - ASR stays in Python:
skills/step-asr/scripts/transcribe.py - Speech reasoning stays in Python:
skills/stepfun-step-audio-r1-1/scripts/stepfun_audio_chat.py
Layout
skills/step-tts/SKILL.md: Agent-facing description, triggers, and usage examples for TTS / voice cloneskills/step-tts/scripts/tts.sh: Main TTS CLI entrypointskills/step-asr/SKILL.md: Agent-facing description, triggers, and usage examples for ASRskills/step-asr/scripts/transcribe.py: Main ASR CLI entrypointskills/stepfun-step-audio-r1-1/SKILL.md: Agent-facing description, triggers, and usage examples for StepFun speech reasoningskills/stepfun-step-audio-r1-1/scripts/stepfun_audio_chat.py: Main non-streaming StepFun speech reasoning CLItests/test_step_tts_cli.sh: Smoke tests for the TTS CLI help commandstests/test_step_asr_cli.sh: Smoke tests for the ASR CLI help commandstests/test_stepfun_audio_r1_1_cli.sh: Smoke tests for the StepFun speech reasoning CLI
Prerequisites
bash,curl,python3- A valid StepFun API key
- Optional for
stepfun-step-audio-r1-1local audio normalization:ffmpegor macOSafconvert
Shared API key setup
- Preferred environment variable:
STEPFUN_API_KEY - Legacy alias still accepted for compatibility:
STEP_API_KEY - The
step-ttsconfig command stores the key in~/.stepfun_api_key - All three skills read
~/.stepfun_api_key - All three skills also read the legacy file
~/.step_api_keyif present
Basic usage
List skills from this repo (local dev, from repo root):
npx skills add . --list --full-depth
Note for OpenClaw local installs:
- OpenClaw's project-level skill directory is also named
skills/. - If you run
npx skills add ... --agent openclawinside this source repository, the installer may write into the repo's ownskills/directory and overwrite the source layout. - For OpenClaw verification, use a separate consumer project directory, or install globally.
Install just the TTS skill:
npx skills add . --full-depth --skill step-tts -y
Install just the ASR skill:
npx skills add . --full-depth --skill step-asr -y
Install just the speech reasoning skill:
npx skills add . --full-depth --skill stepfun-step-audio-r1-1 -y
Install all three skills to OpenClaw from a separate consumer project:
cd /path/to/another/project npx skills add /path/to/StepAudio-Skills --full-depth --agent openclaw -y
TTS quick start
Configure your TTS API key (saved to ~/.stepfun_api_key):
bash skills/step-tts/scripts/tts.sh config --set-api-key YOUR_STEPFUN_API_KEY
Generate audio:
bash skills/step-tts/scripts/tts.sh speak \ -t "智能阶跃,十倍每一个人的可能" \ -o step.opus
Defaults for speak:
--model:step-tts-2--voice:elegantgentle-female--response-format:opus
Clone a voice (using an existing file_id from StepFun Files API):
bash skills/step-tts/scripts/tts.sh clone-voice \ --model step-tts-mini \ --file-id file-XXXX \ --text "智能阶跃,十倍每一个人的可能" \ --sample-text "今天天气不错"
The file_id must come from the official StepFun Files API:
- Upload your reference audio (5–10 seconds of the voice you want to clone,
mp3orwav) using
`POST https://api.stepfun.com/v1/files`
- Set
purpose="storage"in the request body - The response will contain a File object with an
idlikefile-abc123— pass this value to--file-id
ASR quick start
Set the ASR API key as an environment variable:
export STEPFUN_API_KEY=YOUR_STEPFUN_API_KEY
If you already ran the TTS config command, step-asr can also reuse the shared key saved in ~/.stepfun_api_key.
Transcribe an audio file:
python3 skills/step-asr/scripts/transcribe.py /path/to/audio.wav
Save the transcription to a file:
python3 skills/step-asr/scripts/transcribe.py /path/to/audio.mp3 --out /tmp/transcript.txt
Output as JSON:
python3 skills/step-asr/scripts/transcribe.py /path/to/audio.ogg --json
Speech reasoning quick start
Reuse the shared StepFun API key from ~/.stepfun_api_key, or export it directly:
export STEPFUN_API_KEY=YOUR_STEPFUN_API_KEY
Create a non-streaming speech-reasoning turn with text in and audio out:
python3 skills/stepfun-step-audio-r1-1/scripts/stepfun_audio_chat.py \ --prompt "用中文介绍一下苏州的春天,语气自然一点。" \ --voice wenrounansheng \ --format wav
Send text plus local audio input for a speech-reasoning turn:
python3 skills/stepfun-step-audio-r1-1/scripts/stepfun_audio_chat.py \ --prompt "听完这段语音后,总结重点,并用更简洁的话复述。" \ --input-audio /path/to/input.wav \ --voice wenrounansheng \ --format wav
Inspect the generated payload without sending a network request:
python3 skills/stepfun-step-audio-r1-1/scripts/stepfun_audio_chat.py \ --prompt "测试 step-audio-r1.1 非流式 payload" \ --dry-run \ --print-json
Development smoke tests
Run all CLI and unit tests from the repo root:
npm test
Notability
notability 3.0/10New repo, low stars, routine.