RepoZhipu AI (GLM)Zhipu AI (GLM)published Nov 29, 2025seen 5d

zai-org/SCAIL-Pose

Python

Open original ↗

Captured source

source ↗
published Nov 29, 2025seen 5dcaptured 17hhttp 200method plain

zai-org/SCAIL-Pose

Description: Pose Extraction & Rendering for SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

Language: Python

License: Apache-2.0

Stars: 207

Forks: 11

Open issues: 2

Created: 2025-11-29T14:04:19Z

Pushed: 2026-06-09T12:11:29Z

Default branch: master

Fork: no

Archived: no

README: Official Code for Processing Driving Videos for SCAIL Series

This repository contains the code to process driving videos for SCAIL, a series of frameworks towards Studio-Grade Character Animation via In-Context Learning. The frameworks enable complex animation under diverse and challenging conditions, including large motion variations and multi-character interactions. The main repo is at zai-org/SCAIL.

SCAIL

SCAIL-2

📋 Methods

SCAIL is a series of frameworks towards Studio-Grade Character Animation via In-Context Learning. The first open-source work of this series is SCAIL-Preview, a pose-driven animation framework. We develop a 3D skeleton for the pose representation to be fully identity agnostic and depth-aware. The representation can process multi-human interactions, yielding robust results from NLFPose’s reliable depth estimation.

Despite current progress, skeleton maps suffer from inherent ambiguity under complex scenarios. As intermediates, skeleton maps suffer from inherent ambiguity under complex scenarios. Further, it restricts the driving source to be exocentric human movements and thus cannot handle driving sources like animals. Character replacement and multi-character animation suffers from similar issues, where state-of-the-art methods use inpainting masks, but such masks are still a form of intermediates and limits the application and bounds the performance.

Our latest SCAIL-2 is an end-to-end framework to bypass the pose estimation to obtain more reliable and expressive motion, utilizing the inherent in-context learning capability in the diffusion transformer. We adopt a unification design to support both Animation Mode and Replacement Mode, using SAM3 to extract the explicit mask for both the reference image and the driving sequence to augment the conditioning. Benefiting from the end-to-end unification, SCAIL-2 supports diverse driving tasks. You can directly use the full driving video to drive the reference image, or use pose-driven just like SCAIL-Preview. We will elaborate different ways of driving in lateral usage instructions.

🚀 Getting Started

Make sure you have already clone the main repo, this repo should be cloned under the main repo folder:

SCAIL/ (or SCAIL-2/)
├── examples
├── sat
├── configs
├── ...
├── SCAIL-Pose

Change dir to this pose extraction & rendering folder:

cd SCAIL-Pose/

Environment Setup

We recommend using mmpose for the environment setup. You can refer to the official mmpose installation guide. Note that the example in the guide uses python 3.8, however we recommend using python>=3.10 for better compatibility with SAM models. The following commands are used to install the required packages once you have setup the environment.

conda activate openmmlab
pip install -r requirements.txt

# [Optional] SAM2 is only for multi-human extraction of SCAIL-Preview, for SCAIL-2 we use SAM3
git clone https://github.com/facebookresearch/sam2.git && cd sam2
pip install -e .
cd checkpoints && \
./download_ckpts.sh && \
cd ../..

Weights Download

First, download pretrained weights for pose extraction & rendering. The script below downloads NLFPose (torchscript), DWPose ( onnx) and YOLOX (onnx) weights. You can also download the weights manually and put them into the pretrained_weights folder.

mkdir pretrained_weights && cd pretrained_weights
# download NLFPose Model Weights
wget https://github.com/isarandi/nlf/releases/download/v0.3.2/nlf_l_multi_0.3.2.torchscript
# download DWPose Model Weights & Detection Model Weights
mkdir DWPose
wget -O DWPose/dw-ll_ucoco_384.onnx \
https://huggingface.co/yzd-v/DWPose/resolve/main/dw-ll_ucoco_384.onnx
wget -O DWPose/yolox_l.onnx \
https://huggingface.co/yzd-v/DWPose/resolve/main/yolox_l.onnx
cd ..

For SCAIL-2, you additionally need the SAM3 weights. SAM3 is gated on HuggingFace, so you must first request access at facebook/sam3 and agree to Meta's license. Once approved, download sam3.pt into pretrained_weights/:

# After being granted access on HuggingFace
huggingface-cli login
huggingface-cli download facebook/sam3 sam3.pt --local-dir pretrained_weights/

The weights should be formatted as follows:

pretrained_weights/
├── nlf_l_multi_0.3.2.torchscript
├── sam3.pt
└── DWPose/
├── dw-ll_ucoco_384.onnx
└── yolox_l.onnx

🦾 Usage

SCAIL-Preview

# Single Character w/o 3D Retarget
python NLFPoseExtract/v1_process_pose.py --subdir --resolution [512, 896]

# Single Character w/ 3D Retarget
python NLFPoseExtract/v1_process_pose.py --subdir --use_align --resolution [512, 896]

# Multi-Human
python NLFPoseExtract/v1_process_pose_multi.py --subdir --resolution [512, 896]

SCAIL-2

For SCAIL-2, two entrypoints cover the two tasks: Animation (process_animation_aio.py) and Replacement (process_replacement.py).

Animation Mode

# (Recommended) End-to-end: rendered_v2.mp4 = driving copy, mask video is colored SAM3 masks.
# More accurate and easier than pose-driven for most cases.
python NLFPoseExtract/process_animation_aio.py --subdir --e2e_mode

# Pose-driven (no --e2e_mode): runs NLF + DWpose, rendered_v2.mp4 is the skeleton render.
# More interpretable / controllable; use it for extremely challenging inputs.
python NLFPoseExtract/process_animation_aio.py --subdir

## Following options allow behaviours between pose-driven and full-e2e. Useful for 704p horizontal / multi-human inputs where the zero-shot resolution gap causes artifacts

# E2E + per-frame mask silhouette crop.
python NLFPoseExtract/process_animation_aio.py --subdir --e2e_mode --crop_e2e_mask

# E2E + per-frame bbox crop.
python NLFPoseExtract/process_animation_aio.py…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New pose estimation repo, moderate stars.