ForkReka AIReka AIpublished Jun 13, 2025seen 5d

reka-ai/Video-Depth-Anything

forked from DepthAnything/Video-Depth-Anything

Open original ↗

Captured source

source ↗
published Jun 13, 2025seen 5dcaptured 14hhttp 200method plain

reka-ai/Video-Depth-Anything

Description: [CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 0

Created: 2025-06-13T04:58:01Z

Pushed: 2025-04-25T12:28:56Z

Default branch: main

Fork: yes

Parent repository: DepthAnything/Video-Depth-Anything

Archived: no

README:

This work presents Video Depth Anything based on Depth Anything V2, which can be applied to arbitrarily long videos without compromising quality, consistency, or generalization ability. Compared with other diffusion-based models, it enjoys faster inference speed, fewer parameters, and higher consistent depth accuracy.

![teaser](assets/teaser_video_v2.png)

News

  • 2025-04-25: 🌟🌟🌟 Release metric depth model based on Video-Depth-Anything-Large.
  • 2025-04-05: Our paper has been accepted for a highlight presentation at CVPR 2025 (13.5% of the accepted papers).
  • 2025-03-11: Add full dataset inference and evaluation scripts.
  • 2025-02-08: Enable autocast inference. Support grayscale video, NPZ and EXR output formats.
  • 2025-01-21: Paper, project page, code, models, and demo are all released.

Release Notes

  • 2025-02-08: 🚀🚀🚀 Inference speed and memory usage improvement

Model Latency (ms) GPU VRAM (GB)

FP32 FP16 FP32 FP16

Video-Depth-Anything-V2-Small 9.1 7.5 7.3 6.8

Video-Depth-Anything-V2-Large 67 14 26.7 23.6

The Latency and GPU VRAM results are obtained on a single A100 GPU with input of shape 1 x 32 x 518 × 518.

Pre-trained Models

We provide two models of varying scales for robust and consistent video depth estimation:

| Model | Params | Checkpoint | |:-|-:|:-:| | Video-Depth-Anything-V2-Small | 28.4M | Download | | Video-Depth-Anything-V2-Large | 381.8M | Download |

Usage

Preparation

git clone https://github.com/DepthAnything/Video-Depth-Anything
cd Video-Depth-Anything
pip install -r requirements.txt

Download the checkpoints listed [here](#pre-trained-models) and put them under the checkpoints directory.

bash get_weights.sh

Inference a video

python3 run.py --input_video ./assets/example_videos/davis_rollercoaster.mp4 --output_dir ./outputs --encoder vitl

Options:

  • --input_video: path of input video
  • --output_dir: path to save the output results
  • --input_size (optional): By default, we use input size 518 for model inference.
  • --max_res (optional): By default, we use maximum resolution 1280 for model inference.
  • --encoder (optional): vits for Video-Depth-Anything-V2-Small, vitl for Video-Depth-Anything-V2-Large.
  • --max_len (optional): maximum length of the input video, -1 means no limit
  • --target_fps (optional): target fps of the input video, -1 means the original fps
  • --fp32 (optional): Use fp32 precision for inference. By default, we use fp16.
  • --grayscale (optional): Save the grayscale depth map, without applying color palette.
  • --save_npz (optional): Save the depth map in npz format.
  • --save_exr (optional): Save the depth map in exr format.

Citation

If you find this project useful, please consider citing:

@article{video_depth_anything,
title={Video Depth Anything: Consistent Depth Estimation for Super-Long Videos},
author={Chen, Sili and Guo, Hengkai and Zhu, Shengnan and Zhang, Feihu and Huang, Zilong and Feng, Jiashi and Kang, Bingyi}
journal={arXiv:2501.12375},
year={2025}
}

LICENSE

Video-Depth-Anything-Small model is under the Apache-2.0 license. Video-Depth-Anything-Large model is under the CC-BY-NC-4.0 license. For business cooperation, please send an email to Hengkai Guo at guohengkaighk@gmail.com.

Notability

notability 2.0/10

Routine fork, no traction