ByteDance-Seed/Cola-DLM
Captured source
source ↗Cola DLM
[English](README.md) · [中文](README_zh.md)
Cola DLM (Continuous Latent Diffusion Language Model) is a hierarchical continuous latent-space diffusion language model. It combines a Text VAE with a block-causal Diffusion Transformer (DiT) prior: the VAE maps text into continuous latent sequences and decodes latents back to tokens, while the DiT performs latent prior transport through Flow Matching.
This model repository contains the HuggingFace-format checkpoint for the paper Continuous Latent Diffusion Language Model.
Links
- Model repository:
- GitHub repository:
- Paper:
- HuggingFace Daily Paper:
- Project page:
- Blog post:
- Zhihu article:
Model Files
The expected repository layout is:
. ├── cola_dlm/ │ ├── cola_dit/ │ │ ├── config.json │ │ └── model.safetensors* │ └── cola_vae/ │ ├── config.json │ └── model.safetensors* ├── tokenizer.json ├── README.md └── README_zh.md
The checkpoint consists of two cooperating modules:
ColaDiTModel: a block-causal 1-D Diffusion Transformer prior over continuous text latents.ColaTextVAEModel: a Text VAE encoder and conditional decoder for text-to-latent and latent-to-text mapping.
Quickstart
Install the Cola DLM code package from the GitHub repository, then install the download helper:
git clone https://github.com/ByteDance-Seed/Cola-DLM.git cd Cola-DLM pip install -e . pip install huggingface_hub
Download the model files:
huggingface-cli download ByteDance-Seed/Cola-DLM --local-dir hf_models
Run a minimal Python example:
import torch
from tokenizers import Tokenizer
from cola_dlm import (
ColaDiTModel,
ColaTextVAEModel,
generate_task_repaint_inference,
)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dit = ColaDiTModel.from_pretrained("hf_models/cola_dlm/cola_dit").to(device)
vae = ColaTextVAEModel.from_pretrained("hf_models/cola_dlm/cola_vae").to(device)
tokenizer = Tokenizer.from_file("hf_models/tokenizer.json")
prompts = [{"question": "Question: What is the capital of France? Answer:"}]
results = generate_task_repaint_inference(
dit=dit,
vae=vae,
tokenizer=tokenizer,
prompts=prompts,
task_name="lambada",
device=device,
max_new_tokens=32,
temperature=0.0,
guidance_scale=7.0,
timestep_num=16,
pad_token_id=100277,
)
print(results[0]["generate"])OpenAI-Compatible Serving
The companion openai_adapter/ service in the Cola DLM code release exposes this model through an OpenAI-compatible Chat Completions endpoint:
POST /v1/chat/completions
Install the adapter dependencies from the code repository root:
pip install -e . pip install -r openai_adapter/requirements.txt
Start the service:
export COLA_DIT_PATH=hf_models/cola_dlm/cola_dit export COLA_VAE_PATH=hf_models/cola_dlm/cola_vae export COLA_TOKENIZER_PATH=hf_models/tokenizer.json export COLA_MODEL_NAME=cola-dlm export COLA_API_KEY=change-me uvicorn openai_adapter.server:app --host 0.0.0.0 --port 8000
Then send a request:
curl http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer change-me" \
-d '{
"model": "cola-dlm",
"messages": [
{
"role": "user",
"content": "Question: What is the capital of France? Answer:"
}
],
"temperature": 0,
"max_tokens": 32,
"stream": false
}'The adapter currently supports non-streaming completions.
Model Details
- Architecture: Text VAE + block-causal DiT latent prior.
- Training objective: two-stage training with Text VAE pretraining followed by joint Text VAE + DiT training using Flow Matching.
- Training-compute checkpoint: the released weights correspond to the 2000 EFLOPs checkpoint reported in the paper's RQ4 scaling curve.
- Tokenizer: OLMo 2 tokenizer with a 100,278-entry vocabulary.
- Special token ids:
pad_token_id=100277,eos_token_id=100257,im_end_token_id=100265. - Framework: PyTorch 2.1+ and HuggingFace Transformers 4.40+.
- License: Apache License 2.0.
Evaluation
Reference zero-shot benchmark results from the open-source inference implementation:
| Task | Accuracy (%) | | --- | ---: | | LAMBADA | 50.80 | | MMLU | 19.30 | | OBQA | 23.00 | | HellaSwag | 10.70 | | RACE | 19.60 | | SIQA | 28.90 | | SQuAD | 30.90 | | Story Cloze | 30.77 | | Tasks Average | 26.75 |
The open-source HuggingFace Transformers implementation may differ slightly from the internal implementation used in the paper, so per-task numbers can fluctuate slightly. The overall trend is consistent with the paper.
Intended Use
Cola DLM is intended primarily for research on hierarchical latent-variable language models, continuous latent diffusion for text, Flow Matching priors, and benchmark-style text generation.
This checkpoint is not instruction-tuned and has not gone through RLHF. It should not be treated as a production chatbot or used for safety-critical decision making.
Limitations
- The model was trained primarily on English text; other languages are not well evaluated.
- Outputs may contain factual errors, offensive content, bias, or hallucinations.
- Generation quality can be sensitive to prompt format and prompt length. QA-style prompts such as
"Question: ... Answer:"are recommended for quick evaluation. - The model uses mutable KV caches during generation; service implementations should serialize generation inside one process unless cache handling is explicitly isolated.
Safety Statement and Use Restrictions
Cola DLM is a research-oriented checkpoint for continuous latent diffusion language modeling. The released model is relatively small and has not been instruction-tuned, RLHF-aligned, or systematically safety-aligned. Therefore, it does not provide reliable refusal behavior, content moderation, or risk detection. Its outputs may contain inaccurate, offensive, biased, unlawful, inappropriate, or misleading content.
This model is intended only for academic research and technical experimentation. We do not encourage, support, or authorize the use of Cola DLM to generate, distribute, or assist with the following types of content:
- Pornographic, sexually explicit, exploitative, or otherwise inappropriate content;
- Gambling-related content, including gambling promotion, betting advice, or illegal gambling services;
- Content related to illegal drugs or controlled…
Excerpt shown — open the source for the full document.
Notability
notability 7.0/10New model from ByteDance; potentially notable.