RepoMeituan (LongCat)Meituan (LongCat)published Dec 3, 2025seen 5d

meituan-longcat/LongCat-Image

Python

Open original ↗

Captured source

source ↗
published Dec 3, 2025seen 5dcaptured 14hhttp 200method plain

meituan-longcat/LongCat-Image

Language: Python

License: Apache-2.0

Stars: 695

Forks: 60

Open issues: 10

Created: 2025-12-03T06:26:54Z

Pushed: 2026-05-09T10:22:47Z

Default branch: main

Fork: no

Archived: no

README:

LongCat-Image

Model Introduction

We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models.

Key Features

  • 🌟 Exceptional Efficiency and Performance: With only 6B parameters, LongCat-Image surpasses numerous open-source models that are several times larger across multiple benchmarks, demonstrating the immense potential of efficient model design.
  • 🌟 Superior Editing Performance: LongCat-Image-Edit model achieves state-of-the-art performance among open-source models, delivering leading instruction-following and image quality with superior visual consistency.
  • 🌟 Powerful Chinese Text Rendering: LongCat-Image demonstrates superior accuracy and stability in rendering common Chinese characters compared to existing SOTA open-source models and achieves industry-leading coverage of the Chinese dictionary.
  • 🌟 Remarkable Photorealism: Through an innovative data strategy and training framework, LongCat-Image achieves remarkable photorealism in generated images.
  • 🌟 Comprehensive Open-Source Ecosystem: We provide a complete toolchain, from intermediate checkpoints to full training code, significantly lowering the barrier for further research and development.

[//]: # (For more details, please refer to the comprehensive [*LongCat-Image Technical Report*](https://arxiv.org/abs/2412.11963).)

News

  • 🔥 [2026-03-22] LongCat-Image and LongCat-Image-Edit(-Turbo) is now supported in **ComfyUI**.
  • 🔥 [2026-02-03] We released LongCat-Image-Edit-Turbo! It is the distilled version of LongCat-Image-Edit, achieving a 10x speedup.
  • 🔥 [2025-12-16] LongCat-Image is now fully supported in Diffusers!
  • 🔥 [2025-12-09] T2I-CoreBench results are out! LongCat-Image ranks 2nd among all open-source models in comprehensive performance, surpassed only by the 32B-parameter Flux2.dev.
  • 🔥 [2025-12-08] We released our Technical Report on arXiv!
  • 🔥 [2025-12-05] We released the weights for LongCat-Image, LongCat-Image-Dev, and LongCat-Image-Image on Hugging Face and ModelScope.

Showcase

Text-to-Image

Image Editing

Quick Start

Installation

# create conda environment
conda create -n longcat-image python=3.10
conda activate longcat-image

# install requirements for model inference
pip install -r infer_requirements.txt
pip install -U diffusers

Model Download

Models Type Description Download Link

LongCat‑Image Text‑to‑Image Final Release. The standard model for out‑of‑the‑box inference.

🤗 Huggingface

LongCat‑Image‑Dev Text‑to‑Image Development. Mid-training checkpoint, suitable for fine-tuning.

🤗 Huggingface

LongCat‑Image‑Edit Image Editing Specialized model for image editing.

🤗 Huggingface

LongCat‑Image‑Edit‑Turbo Image Editing Distilled version of LongCat-Image-Edit with a 10x speedup.

🤗 Huggingface

Run Text-to-Image Generation

> [!TIP] > Leveraging a stronger LLM for prompt refinement can further enhance image generation quality. Please refer to inference_t2i.py for detailed usage instructions.

> [!CAUTION] > 📝 Special Handling for Text Rendering > > For both Text-to-Image and Image Editing tasks involving text generation, you must enclose the target text within single or double quotation marks (both English '...' / "..." and Chinese ‘...’ / “...” styles are supported). > > Reasoning: The model utilizes a specialized character-level encoding strategy specifically for quoted content. Failure to use explicit quotation marks prevents this mechanism from triggering, which will severely compromise the text rendering capability.

import torch
from diffusers import LongCatImagePipeline

if __name__ == '__main__':
device = torch.device('cuda')

pipe = LongCatImagePipeline.from_pretrained("meituan-longcat/LongCat-Image", torch_dtype= torch.bfloat16 )
# pipe.to(device, torch.bfloat16) # Uncomment for high VRAM devices (Faster inference)
pipe.enable_model_cpu_offload() # Offload to CPU to save VRAM (Required ~17 GB); slower but prevents OOM

prompt = '一个年轻的亚裔女性,身穿黄色针织衫,搭配白色项链。她的双手放在膝盖上,表情恬静。背景是一堵粗糙的砖墙,午后的阳光温暖地洒在她身上,营造出一种宁静而温馨的氛围。镜头采用中距离视角,突出她的神态和服饰的细节。光线柔和地打在她的脸上,强调她的五官和饰品的质感,增加画面的层次感与亲和力。整个画面构图简洁,砖墙的纹理与阳光的光影效果相得益彰,突显出人物的优雅与从容。'

image = pipe(
prompt,
height=768,
width=1344,
guidance_scale=4.0,
num_inference_steps=50,
num_images_per_prompt=1,
generator=torch.Generator("cpu").manual_seed(43),
enable_cfg_renorm=True,
enable_prompt_rewrite=True
).images[0]
image.save('./t2i_example.png')

Run Image Editing (Standard Mode)

import torch
from PIL import Image
from diffusers import LongCatImageEditPipeline

if __name__ == '__main__':
device = torch.device('cuda')

pipe = LongCatImageEditPipeline.from_pretrained("meituan-longcat/LongCat-Image-Edit", torch_dtype= torch.bfloat16 )
# pipe.to(device, torch.bfloat16) # Uncomment for high VRAM devices (Faster inference)
pipe.enable_model_cpu_offload() # Offload to CPU to save VRAM (Required ~18 GB); slower but prevents OOM

img = Image.open('assets/test.png').convert('RGB')
prompt = '将猫变成狗'
image = pipe(
img,
prompt,
negative_prompt='',
guidance_scale=4.5,
num_inference_steps=50,
num_images_per_prompt=1,
generator=torch.Generator("cpu").manual_seed(43)
).images[0]

image.save('./edit_example.png')

Run Image Editing (Turbo Mode)

import torch
from PIL import Image
from diffusers import LongCatImageEditPipeline

if __name__ == '__main__':
device = torch.device('cuda')

pipe =…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New repo with moderate stars