reka-ai/Video-Depth-Anything
forked from DepthAnything/Video-Depth-Anything
Captured source
source ↗reka-ai/Video-Depth-Anything
Description: [CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
License: Apache-2.0
Stars: 0
Forks: 0
Open issues: 0
Created: 2025-06-13T04:58:01Z
Pushed: 2025-04-25T12:28:56Z
Default branch: main
Fork: yes
Parent repository: DepthAnything/Video-Depth-Anything
Archived: no
README:
This work presents Video Depth Anything based on Depth Anything V2, which can be applied to arbitrarily long videos without compromising quality, consistency, or generalization ability. Compared with other diffusion-based models, it enjoys faster inference speed, fewer parameters, and higher consistent depth accuracy.

News
- 2025-04-25: 🌟🌟🌟 Release metric depth model based on Video-Depth-Anything-Large.
- 2025-04-05: Our paper has been accepted for a highlight presentation at CVPR 2025 (13.5% of the accepted papers).
- 2025-03-11: Add full dataset inference and evaluation scripts.
- 2025-02-08: Enable autocast inference. Support grayscale video, NPZ and EXR output formats.
- 2025-01-21: Paper, project page, code, models, and demo are all released.
Release Notes
- 2025-02-08: 🚀🚀🚀 Inference speed and memory usage improvement
Model Latency (ms) GPU VRAM (GB)
FP32 FP16 FP32 FP16
Video-Depth-Anything-V2-Small 9.1 7.5 7.3 6.8
Video-Depth-Anything-V2-Large 67 14 26.7 23.6
The Latency and GPU VRAM results are obtained on a single A100 GPU with input of shape 1 x 32 x 518 × 518.
Pre-trained Models
We provide two models of varying scales for robust and consistent video depth estimation:
| Model | Params | Checkpoint | |:-|-:|:-:| | Video-Depth-Anything-V2-Small | 28.4M | Download | | Video-Depth-Anything-V2-Large | 381.8M | Download |
Usage
Preparation
git clone https://github.com/DepthAnything/Video-Depth-Anything cd Video-Depth-Anything pip install -r requirements.txt
Download the checkpoints listed [here](#pre-trained-models) and put them under the checkpoints directory.
bash get_weights.sh
Inference a video
python3 run.py --input_video ./assets/example_videos/davis_rollercoaster.mp4 --output_dir ./outputs --encoder vitl
Options:
--input_video: path of input video--output_dir: path to save the output results--input_size(optional): By default, we use input size518for model inference.--max_res(optional): By default, we use maximum resolution1280for model inference.--encoder(optional):vitsfor Video-Depth-Anything-V2-Small,vitlfor Video-Depth-Anything-V2-Large.--max_len(optional): maximum length of the input video,-1means no limit--target_fps(optional): target fps of the input video,-1means the original fps--fp32(optional): Usefp32precision for inference. By default, we usefp16.--grayscale(optional): Save the grayscale depth map, without applying color palette.--save_npz(optional): Save the depth map innpzformat.--save_exr(optional): Save the depth map inexrformat.
Citation
If you find this project useful, please consider citing:
@article{video_depth_anything,
title={Video Depth Anything: Consistent Depth Estimation for Super-Long Videos},
author={Chen, Sili and Guo, Hengkai and Zhu, Shengnan and Zhang, Feihu and Huang, Zilong and Feng, Jiashi and Kang, Bingyi}
journal={arXiv:2501.12375},
year={2025}
}LICENSE
Video-Depth-Anything-Small model is under the Apache-2.0 license. Video-Depth-Anything-Large model is under the CC-BY-NC-4.0 license. For business cooperation, please send an email to Hengkai Guo at guohengkaighk@gmail.com.
Notability
notability 2.0/10Routine fork, no traction