reka-ai/TAPIP3D
forked from zbw001/TAPIP3D
Captured source
source ↗reka-ai/TAPIP3D
Description: TAPIP3D: Tracking Any Point in Persistent 3D Geometry
License: Apache-2.0
Stars: 0
Forks: 0
Open issues: 0
Created: 2025-06-25T17:34:14Z
Pushed: 2025-05-13T09:59:37Z
Default branch: main
Fork: yes
Parent repository: zbw001/TAPIP3D
Archived: no
README:
Overview
TAPIP3D is a method for long-term feed-forward 3D point tracking in monocular RGB and RGB-D video sequences. It introduces a 3D feature cloud representation that lifts image features into a persistent world coordinate space, canceling out camera motion and enabling accurate trajectory estimation across frames.
Installation
Installing dependencies
1. Prepare the environment
conda create -n tapip3d python=3.10 conda activate tapip3d pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 "xformers>=0.0.27" --index-url https://download.pytorch.org/whl/cu124 pip install torch-scatter -f https://data.pyg.org/whl/torch-2.4.1+cu124.html pip install -r requirements.txt
2. Compile pointops2
cd third_party/pointops2 LIBRARY_PATH=$CONDA_PREFIX/lib:$LIBRARY_PATH python setup.py install cd ../..
3. Compile megasam
cd third_party/megasam/base LIBRARY_PATH=$CONDA_PREFIX/lib:$LIBRARY_PATH python setup.py install cd ../../..
Downloading checkpoints
Download our TAPIP3D model checkpoint here to checkpoints/tapip3d_final.pth
If you want to run TAPIP3D on monocular videos, you need to prepare the following checkpoints manually to run MegaSAM:
1. Download the DepthAnything V1 checkpoint from here and put it to third_party/megasam/Depth-Anything/checkpoints/depth_anything_vitl14.pth
2. Download the RAFT checkpoint from here and put it to third_party/megasam/cvd_opt/raft-things.pth
Additionally, the checkpoints of MoGe and UniDepth will be downloaded automatically when running the demo. Please make sure your network connection is available.
Demo Usage
We provide a simple demo script inference.py, along with sample input data located in the demo_inputs/ directory.
The script accepts as input either an .mp4 video file or an .npz file. If providing an .npz file, it should follow the following format:
video: array of shape (T, H, W, 3), dtype: uint8depths(optional): array of shape (T, H, W), dtype: float32intrinsics(optional): array of shape (T, 3, 3), dtype: float32extrinsics(optional): array of shape (T, 4, 4), dtype: float32
For demonstration purposes, the script uses a 32x32 grid of points at the first frame as queries.
Inference with Monocular Video
By providing an video as --input_path, the script first runs MegaSAM with MoGe to estimate depth maps and camera parameters. Subsequently, the model will process these inputs within the global frame.
Demo 1
To run inference:
python inference.py --input_path demo_inputs/sheep.mp4 --checkpoint checkpoints/tapip3d_final.pth --resolution_factor 2
An npz file will be saved to outputs/inference/. To visualize the results:
python visualize.py
Demo 2
python inference.py --input_path demo_inputs/pstudio.mp4 --checkpoint checkpoints/tapip3d_final.pth --resolution_factor 2
Inference with Known Depths and Camera Parameters
If an .npz file containing all four keys (rgb, depths, intrinsics, extrinsics) is provided, the model will operate in an aligned global frame, generating point trajectories in world coordinates. We provide one example .npz file at here and please put it in the demo_inputs/ directory.
Demo 3
python inference.py --input_path demo_inputs/dexycb.npz --checkpoint checkpoints/tapip3d_final.pth --resolution_factor 2
Citation
If you find this project useful, please consider citing:
@article{tapip3d,
title={TAPIP3D: Tracking Any Point in Persistent 3D Geometry},
author={Zhang, Bowei and Ke, Lei and Harley, Adam W and Fragkiadaki, Katerina},
journal={arXiv preprint arXiv:2504.14717},
year={2025}
}Notability
notability 2.0/10Routine fork of repo, no notable traction