RepoByteDance (Doubao/Seed)ByteDance (Doubao/Seed)published Apr 23, 2026seen 5d

ByteDance-Seed/SimArt

Python

Open original ↗

Captured source

source ↗
published Apr 23, 2026seen 5dcaptured 14hhttp 200method plain

ByteDance-Seed/SimArt

Description: [SIGGRAPH 2026] SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM

Language: Python

License: Apache-2.0

Stars: 3

Forks: 2

Open issues: 0

Created: 2026-04-23T03:25:52Z

Pushed: 2026-05-27T09:00:27Z

Default branch: main

Fork: no

Archived: no

README:

[SIGGRAPH 2026] SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM

![Teaser](assets/teaser.png)

🌟 Overview

SIMART is a unified MLLM framework that performs part-level decomposition and kinematic prediction jointly to transform monolithic meshes into sim-ready articulated assets.

  • Unified MLLM Framework: Offers a single-stage path to joint static asset understanding and sim-ready asset generation.
  • Sparse 3D VQ-VAE: Reduces token counts by 70% compared to dense voxel tokens, enabling high-fidelity multi-part assemblies.
  • Sim-Ready Assets: Generates structured URDF metadata and decomposed segments, enabling deployment into physics-based simulators and interactive robotic environments.

🔧 Installation

Our implementation is tested on Python 3.10.

conda create -n simart python=3.10
conda activate simart
pip install -r requirements.txt

📥 Model Weights

Download the pre-trained checkpoints for the MLLM and the VQ-VAE from Hugging Face:

Place the downloaded weights in the ./checkpoints, or specify your custom paths using the inference arguments below.

🚀 Inference

1. Data Preprocessing (Coordination Alignment)

Our model is trained on 3D assets following the Right-Handed Coordinate System:

  • Up Direction: +Z
  • Forward Direction: -Y (or +Y, but consistency is key for part orientation)

Pre-aligned Models: If your models are generated by Seed3D or Hunyuan3D, they are typically pre-aligned to the +Z up convention. You can run the normalization script directly without additional rotation arguments:

python scripts/process_raw_objects.py --input --output ./assets --render

Manual Alignment: For models from other sources that might use +Y up, you must use the rotation flags to align them.

  • Important: Beyond just the "Up" direction, ensure the "Front" of the object faces the intended direction to help the MLLM correctly identify parts like "front legs" or "handles".
  • Reference: Please refer to the processed models in the assets/ directory for the standard orientation.

Arguments:

  • --input: Path to the input raw object (.glb).
  • --output: Output directory for the normalized model.
  • --rot_x, --rot_y, --rot_z: Rotation angles in degrees to align the mesh.
  • --render: Highly recommended. It renders a preview image to let you verify if the object is standing upright and facing forward.

2. Run Inference

To predict the articulated structure and generate the URDF of a processed 3D model, run the main inference pipeline:

python inference/infer.py --object_path ./assets/box_00.glb --debug

Arguments:

  • --object_path: Path to the object file or a folder containing multiple GLBs (Required).
  • --output_path: Directory to save outputs (Default: ./output/raw).
  • --name: Base name for outputs (JSON, URDF, PLY, folders). If not provided, it is derived from the object_path.
  • --model_path: Path to the trained MLLM checkpoint directory (Default: ./checkpoints/simart_mllm).
  • --vqvae_ckpt_dir: Path to the VQ-VAE checkpoint directory (Default: ./checkpoints/simart_vqvae).
  • --blender_path: Custom path to the Blender executable. If not provided, it auto-downloads to /tmp.
  • --debug: Enable debug mode to output intermediate visualizations (colored PLY files, joint axes, etc.).

📁 Repository Structure

SIMART/
├── assets/ # Sample 3D GLB assets
├── blender_script/ # Scripts for headless Blender rendering
├── inference/ # Main MLLM inference pipeline
├── scripts/ # Data preprocessing scripts
├── utils/ # Modular utility functions (mesh, URDF, parsing, etc.)
└── vqvae/ # Sparse VQ-VAE model definitions

License

This project is licensed under the Apache 2.0.

Citation

If you find our work helpful, please cite as

@article{zhang2026simart,
title={SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM},
author={Zhang, Chuanrui and Qin, Minghan and Wang, Yuang and Xie, Baifeng and Li, Hang and Wang, Ziwei},
journal={arXiv preprint arXiv:2603.23386},
year={2026}
}

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Low traction new repo