What does this model signal mean?

ByteDance (Doubao/Seed) published ByteDance-Seed/Stable-DiffCoder-8B-Instruct. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license mit · 170 HF downloads · ByteDance's 8B instruction-tuned code diffusion model.. onlylabs links this event to 1 captured evidence page and 6 related model signals.

ByteDance (Doubao/Seed) Model: ByteDance-Seed/Stable-DiffCoder-8B-Instruct

Captured source

source ↗

Hugging Face/huggingface.co/ByteDance-Seed/Stable-DiffCoder-8B-Instruct

ByteDance-Seed/Stable-DiffCoder-8B-Instruct model card

Source ↗

published Jan 15, 2026seen Jun 6captured Jun 11http 200method plaintask text-generationlicense mitlibrary transformersparams 8.3Bdownloads 170likes 139

Stable-DiffCoder-8B-Instruct

Introduction

We are thrilled to introduce Stable-DiffCoder, which is a strong code diffusion large language model. Built directly on the Seed-Coder architecture, data, and training pipeline, it introduces a block diffusion continual pretraining (CPT) stage with a tailored warmup and block-wise clipped noise schedule.

Under identical architecture and data settings, we systematically analyze and design an efficient diffusion training pipeline that is not only stable but also potentially lifts the model’s performance ceiling. With this recipe, Stable-DiffCoder demonstrates overall performance improvements compared to its autoregressive (AR) counterpart across a broad set of code benchmarks, while any-order modeling improves structured code handling for editing and reasoning, and diffusion-based corruption aids learning for low-resource programming languages.

Notably, with only CPT followed by supervised fine-tuning, Stable-DiffCoder further surpasses many strong ∼8B AR and diffusion-based code models. These results demonstrate that diffusion-based training can improve code modeling quality beyond what AR training alone can achieve, even under tightly controlled data and architecture constraints.

This repo contains the Stable-DiffCoder-8B-Instruct model, which has the following features:

Type: Mask Diffusion Language Models
Training Stage: Pretraining & Post-training
Data Source: Public datasets, synthetic data
Context Length: 8192

Model Downloads

| Model Name | Length | Download | Notes | |---------------------------------------------------------|--------|------------------------------------|-----------------------| | Stable-DiffCoder-8B-Base | 8K | 🤗 Model | Pretrained on our model-centric code data. | | 👉 Stable-DiffCoder-8B-Instruct | 8K | 🤗 Model | Instruction-tuned for alignment with user intent. |

Requirements

Current (v5.3.0) transformers is available for inference:

pip install transformers~=5.3.0

Explanation of Inference Parameters

steps: Number of steps for diffusion generation
gen_length: Maximum length of the generated output
block_length: Length of the diffusion block, with a default value of 4
temperature: Temperature for generation, with a default value of 0.0
remasking: Remasking strategy, optional values are 'low_confidence' or 'random', default value is 'low_confidence' (for principle, refer to LLADA)
tokenizer: Tokenizer used for text encoding and decoding
shift: Whether to shift the output to the right by one position (similar to AutoRegressive/AR), default value is False
threshold: Threshold for decoding (range: 0-1.0), default value is None; a smaller value results in faster decoding speed (for principle, refer to Fast-DLLM)
eos_id: ID of the end-of-sequence token, default value is tokenizer.eos_token_id

Quickstart

Here is a simple example demonstrating how to load the model and generate code.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

device = 'cuda'
model = AutoModelForCausalLM.from_pretrained('ByteDance-Seed/Stable-DiffCoder-8B-Instruct', trust_remote_code=True, torch_dtype=torch.bfloat16).to(device).eval()
tokenizer = AutoTokenizer.from_pretrained('ByteDance-Seed/Stable-DiffCoder-8B-Instruct', trust_remote_code=True)

prompt = 'Write a quick sort algorithm.'
m = [{"role": "user", "content": prompt}, ]
prompt = tokenizer.apply_chat_template(m, add_generation_prompt=True, tokenize=False)
input_ids = tokenizer(prompt)['input_ids']
input_ids = torch.tensor(input_ids).to(device).unsqueeze(0)

out = model.generate(input_ids, steps=512, gen_length=512, block_length=4, temperature=0., remasking='low_confidence', tokenizer=tokenizer, shift=False, threshold=None, eos_id=tokenizer.eos_token_id)
print(tokenizer.decode(out[0][input_ids.shape[1]:], skip_special_tokens=True))

Evaluation

Stable-DiffCoder-8B-Instruct has been evaluated on a wide range of coding tasks, including code generation, code reasoning, code editing, achieving stronger performance than a wide range of ∼8B ARs and DLLMs,

Compared with ∼8B AR models：

| Model | HumanEval | MBPP | MHPP | BigCodeBench (Full) | BigCodeBench (Hard) | LiveCodeBench (v5) | |:-----------------------------:|:---------:|:----:|:----:|:-------------------:|:-------------------:|:-------------------------:| | CodeLlama-7B-Instruct | 40.9 | 54.0 | 6.7 | 25.7 | 4.1 | 3.6 | | DeepSeek-Coder-6.7B-Instruct | 74.4 | 74.9 | 20.0 | 43.8 | 15.5 | 9.6 | | CodeQwen1.5-7B-Chat | 83.5 | 77.7 | 17.6 | 43.6 | 15.5 | 3.0 | | Yi-Coder-9B-Chat | 82.3 | 82.0 | 26.7 | 49.0 | 17.6 | 17.5 | | Llama-3.1-8B-Instruct | 68.3 | 70.1 | 17.1 | 40.5 | 13.5 | 11.5 | | OpenCoder-8B-Instruct | 83.5 | 79.1 | 30.5 | 50.9 | 18.9 | 17.1 | | Qwen2.5-Coder-7B-Instruct | 88.4 | 83.5 | 26.7 | 48.8 | 20.3 | 17.3 | | Qwen3-8B | 84.8 | 77.0 | 32.8 | 51.7 | 23.0 | 23.5 | | Seed-Coder-8B-Instruct | 84.8 | 85.2 | 36.2 | 53.3 | 26.4 | 24.7 | | Stable-DiffCoder-8B-Instruct | 86.6 | 85.7 | 42.4 | 54.8 | 31.8 | 23.5 |

Compared with ∼8B DLLM models：

| Model | HumanEval | HumanEval+| MBPP | MBPP+| BigCodeBench (Full) | |:-----------------------------:|:---------:|:---------:|:----:|:----:|:-------------------:| | LLaDA-8B-Instruct | 49.4 | - | 41.0 | - | 16.5 | | Dream-7B-Instruct | 63.4 | - | 68.3 | - | 10.6 | | LLaDA-MoE-7B-Instruct | 61.6 | - | 70.0 | - | 20.4 | | Fast-dLLMv2 | 43.9 | 40.2 | 50.0 | 41.3 | 49.0 | | DiffuCoder-7B-Instruct | 72.0 | 65.2 | 75.1 | 61.9 | 35.7 | | Dream-Coder-7B-Instruct | 82.9 | - | 79.6 | - | 37.1 | | SDAR-8B-Chat | 78.7 | - | 72.0 | - | - | | WeDLM-8B-Chat | 80.5 | 73.8 | 70.5 | - | - | | Stable-DiffCoder-8B-Instruct | 86.6 | 82.3 |85.7|72.8| 54.8 |

For detailed benchmark performance, please refer to our 📑 Technical Report.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{fan2026stablediffcoderpushingfrontiercode,...

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New instruction-tuned code model, modest traction.