RepoStepFunStepFunpublished Aug 14, 2025seen 5d

stepfun-ai/NextStep-1

Python

Open original ↗

Captured source

source ↗
published Aug 14, 2025seen 5dcaptured 10hhttp 200method plain

stepfun-ai/NextStep-1

Description: [🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.

Language: Python

License: Apache-2.0

Stars: 689

Forks: 26

Open issues: 0

Created: 2025-08-14T08:50:25Z

Pushed: 2026-02-27T17:05:44Z

Default branch: main

Fork: no

Archived: no

README:

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

> Autoregressive models—generating content step-by-step like reading a sentence—excel in language but struggle with images. Traditionally, they either depend on costly diffusion models or compress images into discrete, lossy tokens via vector quantization (VQ). > > NextStep-1 takes a different path: a 14B-parameter autoregressive model that works directly with continuous image tokens, preserving the full richness of visual data. It models sequences of discrete text tokens and continuous image tokens jointly—using a standard LM head for text and a lightweight 157M-parameter flow matching head for visuals. This unified next-token prediction framework is simple, scalable, and capable of producing stunningly detailed images.

🔥 News

  • Feb. 25, 2026: vLLM-Omni supports high performance inference of NextStep-1.1. Please check here for details!
  • Feb. 16, 2026: The training code of NextStep-1 (this repo) and the post-training blogs of NextStep-1.1 (link) have been released. Welcome to discuss and contribute. Happy Chinese New Year!
  • Feb. 6, 2026: NextStep-1 has been selected as Oral Presentation by ICLR 2026! 🎉🎉🎉
  • Dec. 24, 2025: 🔥 We release NextStep-1.1, a text-to-image model that substantially elevates output quality through extended training and a Flow-based Reinforcement Learning (RL) post-training paradigm. Feel free to try with checkpoints hosted on our HF repo!

Checkpoints are available on:

  • Aug. 18, 2025: 👋 We deploy NextStep-1-Large-Edit on HuggingFace Spaces. Feel free to try it out!
  • Aug. 18, 2025: 👋 We open the [WeChat Group](./assets/wechat.png). Feel free to join us!
  • Aug. 14, 2025: 👋 We release the inference code and huggingface model weights of NextStep-1-Large-Pretrain, NextStep-1-Large and NextStep-1-Large-Edit

---

📑 Table of Contents

  • [🔥 News](#-news)
  • [📦 Installation & Environment](#-installation--environment)
  • [📥 Model & Data Preparation](#-model--data-preparation)
  • [2.1 Download Model Weights](#21-download-model-weights)
  • [2.2 Download Training Datasets](#22-download-training-datasets)
  • [2.3 Process Custom Data (Optional)](#23-process-custom-data-optional)
  • [🚀 Training](#-training)
  • [3.1 Start Training (via smartrun)](#31-start-training-via-smartrun)
  • [3.2 Override Training Parameters](#32-override-training-parameters)
  • [3.3 Inspect and Compare Configurations](#33-inspect-and-compare-configurations)
  • [🔮 Inference](#-inference)
  • [4.1 Convert Checkpoint Format](#41-convert-checkpoint-format)
  • [4.2 Run Inference](#42-run-inference)
  • [📚 References](#-references)
  • [📄 License](#-license)
  • [📖 Citation](#-citation)

---

📦 Installation & Environment

1.1 Clone the Repository

git clone https://github.com/stepfun-ai/NextStep-1
cd NextStep-1

1.2 Create Conda Environment

conda create -n nextstep python=3.10 -y
conda activate nextstep

1.3 Install Dependencies

> ⚠️ Note: Pre-installing PyTorch based on your CUDA version is recommended.

pip install uv
uv pip install -e .

> ☕ Tip: This installation may take a while. Grab a cup of coffee and take a break! ☕

1.4 Built-in CLI Tools

The following CLI tools are available after installation:

  • `smartrun`: An intelligent distributed launcher that automatically wraps torchrun parameters.
  • `gen_meta`: Scans datasets to generate metadata indices (sample counts, checksums, etc.).
  • `warmup_data`: Pre-warms and caches data indices to significantly speed up training startup.
  • `eshow`: Inspect or compare experiment configurations.
  • `singlegpu_debug` / `multigpu_debug`: Dedicated debug entries for remote attachment.

---

📥 Model & Data Preparation

2.1 Download Model Weights

Download models to ./nextstep_models. Please update the corresponding paths in nextstep/model_zoos.py.

bash download_models.sh

> ☕ Tip: This download may take a while. Grab a cup of coffee and take a break! ☕

Available Models

The following table lists all available models and their training stages:

| Model | Pre-Training 256px | Pre-Training 512px | Annealing | RL | Visual Diversity | Fine-Tunability | Hugging Face | |-------|-------------------|-------------------|----------|----|-----------|------------------|--------------|

> ⚠️ Note: The models of NextStep-1 series are from the old version. Their performance is not as good as NextStep-1.1, so we do not recommend using them. Please use NextStep-1.1 series models instead.

> 💡 Quick Inference: If you want to quickly inference the model, refer to the inference script below.

python3 inference/inference.py

2.2 Download Training Datasets

Download datasets to ./nextstep_data.

bash download_datasets.sh

> ☕ Tip: This download may take a while. Grab a cup of coffee and take a break! ☕

> ⚠️ Important Note: The datasets provided in download_datasets.sh are only example open-source datasets for demonstration purposes. NextStep's actual training utilized approximately **1…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

Notable repo with 687 stars, moderate traction.