stepfun-ai/Step-Realtime-CLI
TypeScript
Captured source
source ↗stepfun-ai/Step-Realtime-CLI
Language: TypeScript
License: MIT
Stars: 18
Forks: 7
Open issues: 16
Created: 2026-06-01T06:00:45Z
Pushed: 2026-06-11T06:34:56Z
Default branch: main
Fork: no
Archived: no
README:
Step Realtime CLI
English | 简体中文
step-realtime-cli is a terminal-based AI coding assistant. You can interact with it via text or realtime voice for everyday tasks such as reading code, editing files, and running commands.
Demo

Key capabilities
- Voice coding: run
step voiceand, with headphones on, issue spoken instructions; the assistant parses repository context, applies edits, and confirms changes verbally. - Text chat: run
stepin any working directory to enter the interactive terminal UI and start a task with natural language. - One-shot tasks: submit a single request via
step exec "..."and receive the result when execution completes. - Session resumption: session state is persisted automatically and can be resumed at any time via
step resume. - Read-only planning mode: run
step exec --mode plan "..."so the assistant only reads the code and proposes a plan, which the user reviews and approves before any changes are applied.
Quick start
Requirements
- macOS / Linux, Node.js 20+
- A StepFun API key (a single key may be used for both the coding model and realtime voice; a different provider's key may be configured for the coding side if preferred)
Choose your region
StepFun operates two independent sites; pick the one that matches where your API key was issued. The two sites do not share accounts or keys.
| Region | Console | API endpoint | Installer | | --- | --- | --- | --- | | Mainland China (default) | https://platform.stepfun.com/ | https://api.stepfun.com | bash scripts/setup.sh | | Overseas | https://platform.stepfun.ai/ | https://api.stepfun.ai | bash scripts/setup-overseas.sh |
scripts/setup-overseas.sh runs the same flow as scripts/setup.sh and then rewrites ~/.step-cli/config.json so both the realtime WebSocket and the models-proxy base URL point at api.stepfun.ai. All other flags (--skip-build, --force-config, --uninstall, …) are forwarded verbatim.
Audio dependencies
scripts/setup.sh (and scripts/setup-overseas.sh) enables AEC by default and will detect or install Chrome automatically. In this default mode, audio capture and playback are handled by Chrome (BrowserAudioDriver), and no additional system-level audio utilities are required.
When AEC is disabled via step aec off (or falls back because Chrome is unavailable), realtime voice switches to the system command-line audio drivers, which require:
- macOS:
sox, installable viabrew install sox - Linux: ALSA utilities
arecord/aplay, typically provided byalsa-utils(e.g.sudo apt install alsa-utils)
One-shot install
git clone step-realtime-cli cd step-realtime-cli # Mainland China (platform.stepfun.com) bash scripts/setup.sh # Overseas (platform.stepfun.ai) # bash scripts/setup-overseas.sh
The installer installs dependencies, builds the executable, registers step on your shell PATH, and initializes the voice components (VAD / AEC).
After installation completes, perform the following two steps:
1. Edit ~/.step-cli/config.json and replace the two apiKey placeholders with valid keys:
model.apiKey— coding modelvoice.realtime.apiKey— realtime voice (ASR/TTS)- When using StepFun, the same value may be used for both fields
2. Open a new shell so that the updated PATH takes effect.
Then, from any directory:
step voice # realtime voice conversation step # interactive text UI step "summarize src/index.ts" # one-shot task
Uninstall
bash scripts/uninstall.sh
This removes the installed executable and PATH entry, while preserving ~/.step-cli/config.json and existing session history.
Voice mode
step voice
Once started, simply begin speaking. The assistant performs speech recognition, repository operations, and voice replies concurrently in realtime.
> Using headphones is strongly recommended: it significantly reduces echo and false triggering caused by speaker output being re-captured by the microphone, and improves both recognition accuracy and conversation stability.
Input modes
- duplex (continuous, default): suitable for natural conversation; relies on VAD to determine when an utterance ends.
- ptt (push-to-talk): more reliable in noisy environments.
VAD (voice activity detection)
The default mode is energy, which is suitable for quiet environments. For noisy or speaker-out setups, switch to the more accurate silero model:
step vad set silero # switch to silero step vad status # show current selection
AEC (acoustic echo cancellation)
When speakers are used instead of headphones, TTS output may be re-captured by the microphone and cause feedback. Enabling AEC mitigates this issue:
step aec on # enable AEC step aec status # show AEC status (also verifies Chrome availability)
AEC requires Chrome to be installed locally. On macOS, the CLI will suggest brew install --cask google-chrome if Chrome is not detected. AEC is not required when using headphones.
Speech rate
Adjust voice.defaults.speedRatio in ~/.step-cli/config.json. The valid range is 0.5 – 2.0, with a default of 1.1.
Common commands
step # launch the interactive UI in the current directory step "look at this bug" # one-shot task step voice # realtime voice conversation step resume # resume a previous session step exec --mode plan "..." # read-only planning mode (does not modify files) step config show # display the effective configuration step config sync --write # add newly introduced configuration fields after upgrade step theme # export the current theme for customization
For the full command list, run step --help.
Configuration
All configuration resides in ~/.step-cli/config.json. Typical adjustments include:
- Switch models: update
model.modelandmodel.apiKey - Update voice API key: update
voice.realtime.apiKey - VAD / AEC: use the commands listed above rather than editing the JSON manually
- After upgrade: run
step config sync --writeto populate newly added configuration fields (existing values are preserved)
step config path #…
Excerpt shown — open the source for the full document.