RepoZhipu AI (GLM)Zhipu AI (GLM)published Dec 1, 2025seen 5d

zai-org/SCAIL

Python

Open original ↗

Captured source

source ↗
published Dec 1, 2025seen 5dcaptured 17hhttp 200method plain

zai-org/SCAIL

Description: SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations (CVPR 2026 Findings)

Language: Python

License: Apache-2.0

Stars: 986

Forks: 57

Open issues: 7

Created: 2025-12-01T06:42:54Z

Pushed: 2026-05-06T14:59:54Z

Default branch: master

Fork: no

Archived: no

README: SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

This repository contains the official implementation code of our paper accepted by *CVPR 2026 Findings Track*: SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations. The code is for the inference of SCAIL-Preview Model, a 14B DiT that enables challenging character animation by In-Context Learning of 3D-Consistent Pose Representation.

🔎 Motivation and Results

SCAIL identifies the key bottlenecks that hinder character animation towards production level: limited generalization towards characters and incoherent motion under complex scenarios (e.g. common failures in basic motions like flipping and turning). We revisit the core components of character animation -- how to represent the pose condition and how to inject the pose condition.

The first contribution of this paper is 3D-Consistent Pose Representation, an identity agnostic representation that can both be aware of depth and preserve rich motion information.

The second contribution and the core of this paper lies in how we inject the pose condition. Common injection methods (e.g. channel concat, pose-guider, residual layers) adds feature instead of showing the full context, which yield decent results under the setting of *controllable generation*, but the performance is limited by the pretrained backbone under wild scenarios. Taking channel concat injection under a 1.3B model as an example, it fail to maintain correct body rotation due to limited model capability in complex human motion. Instead, we show the model full context, not only *telling what to follow*, but also *teaching how to do*. As shown below, revealing the the full turning context with the 3D-Consistent Pose Representation help the less capable 1.3B model learn how to generate plausible turning motion.

Check detailed methods, results gallery, as well as comparisons against other baselines at our project page.

🌱 Community Works

❤️ A heartfelt thanks to friends in the community for their creativity! All results below are shared with their gracious consent. We were surprised to see the emergent abilities our model exhibited — understanding the 3D spatial relationships of 2D characters, driving hand-drawn artwork, and even controlling quadrupeds despite having no animal training data at all. We believe such results work as a compelling demonstration of how In-Context Learning can push the upper bound of the model's capabilities.

Chibi Gotham Battle

Homer Bullet Time (w/ Uni3c)

Anime Art Animation

Street Fighter 6 Motion Mimic

Doodle Art Animation

Dual Dance

Group Dance

Quadrupeds Animation (w/ ViTPose)

🗞️ Updates and Plans

  • 2026.3.1: 🔥 SCAIL is now native in ComfyUI.
  • 2025.12.19: 📣 We offer the Wan Official Framework of SCAIL instead of SAT for more convenient inference. Check the wan branch of SCAIL. We will update the training code of SCAIL on SAT for reproducibility.
  • 2025.12.11: 💥 The preview version of SCAIL is now opensourced on HuggingFace and ModelScope.
  • 2025.12.08: 🔥 We release the inference code of SCAIL on SAT.

TODOs

  • [x] SCAIL-14B-Preview Model Weights(512p, 5s) and Inference Config
  • [x] Prompt Optimization Snippets
  • [x] Implementation on Wan Official Framework
  • [ ] SCAIL-2 Model Weights(Improved Stability and Clarity, Innate Long Video Generation Capability)

📰 News

  • 2026.3.1: Thanks to toyxyz, a Blender 3D rig can be used with scail-pose now, allowing for much more dynamic and diverse shapes and poses, see #30.
  • 2025.12.19: ComfyUI-SCAIL-Pose now supports saving NLF mesh as 3D glb animation and 3D previewing of the SCAIL-Pose skeleton.
  • 2025.12.19: Thanks to deepbeepmeep for Low VRAM SCAIL Preview Support in WanGP! WanGP version has the following perks: 3D pose Preprocessing fully integrated, speed optimized, and compatible with any pytorch version.
  • 2025.12.17: Thanks to VantageWithAI, GGUF version is now available at SCAIL-Preview-GGUF!
  • 2025.12.16: ❤️ Huge thanks to KJ for the work done on adaptation — SCAIL is now available in ComfyUI-WanVideoWrapper!!! Meanwhile, the pose extraction & rendering has also been partly adapted to ComfyUI in ComfyUI-SCAIL-Pose, currently without multi-character tracking.
  • 2025.12.14: 🥳 Thanks to friends in the community for testing the work! Despite the fact that only 1.5% of SCAIL’s training samples are anime data, and that we did not intentionally collect any multi-character anime data, the model can generalize towards many complex anime characters. The release of SCAIL-Preview is intended to demonstrate the soundness of our proposed pose representation and model architecture, with clear potential for further scaling and enhancement.

🚀 Getting Started

Checkpoints Download

| ckpts | Download Link | Notes | |--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------| | SCAIL-Preview(14B) | 🤗 Hugging Face 🤖 ModelScope | Trained with resolutions under 512p. H and W should be both divisible by 32 (e.g. 704*1280) if using…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New repo with 982 stars, solid but not major