NVIDIA/paidf-auto-labeling
Python
Captured source
source ↗NVIDIA/paidf-auto-labeling
Description: Auto-labeling pipeline that turns raw video and images into fine-tuning-ready scenes via super-resolution, detection and tracking, VLM scene understanding, and question generation
Language: Python
License: Apache-2.0
Stars: 2
Forks: 0
Open issues: 0
Created: 2026-05-19T00:03:24Z
Pushed: 2026-05-30T22:13:21Z
Default branch: main
Fork: no
Archived: no
README:
Physical AI Data Factory - Auto Labeling
An end-to-end pipeline that converts raw video or image inputs into DAFT-ready training scenes. It chains Super Resolution (SR), object detection and tracking, VLM-based scene understanding, and LLM-assisted task QA generation—including Multiple-Choice Questions (MCQs)—into one config-driven workflow.
By default, all four stages run in sequence. The same CLI also supports partial runs when you already have SR outputs, tracking overlays, captions, or metadata.
---
Pipeline overview

Input video/image → [SR] → [Detection & Tracking] → [VLM JSON] → [MCQ Generation]
Each stage can be enabled or disabled independently. If SR fails or is skipped, downstream stages fall back to the original input.
| Stage | Purpose | Main artifacts | |-------|---------|----------------| | Super Resolution | Upscale input media | SR output media | | Detection & Tracking | Detect objects and assign track IDs | Object/instance annotations and overlays | | VLM JSON | Generate scene metadata and events | Scene/event JSON | | MCQ Generation | Generate task QA | MCQ, BCQ (binary-choice), and open-QA task files |
---
Start here
Default workflow: SR → detection/tracking → VLM JSON → MCQ generation.
Requirements for a full run: Docker with NVIDIA GPU access, input media, VLM and LLM endpoints, and an output directory. For setup, partial workflows, and hardware notes, see [docs/getting-started.md](docs/getting-started.md).
Supported runtime: Run pipeline commands inside a Docker container. The image includes a uv-managed environment; examples use uv run python modules/cli.py from inside the container. Host-side pipeline execution is not validated—use host-side uv only for development checks such as tests and lint.
Documentation map
| Topic | Guide | |-------|--------| | First run, Docker images, hardware | [docs/getting-started.md](docs/getting-started.md) | | Smoke tests and full-matrix validation | [docs/e2e-testing.md](docs/e2e-testing.md) | | MCQ modes, prompts, question banks | [docs/mcq-modes.md](docs/mcq-modes.md) | | Config overrides and env vars | [docs/config-reference.md](docs/config-reference.md) | | S3, HTTP, and MSC I/O | [docs/remote-io.md](docs/remote-io.md) |
---
Quickstart: full pipeline
./docker/deploy.sh build ./docker/deploy.sh shell -lc ' uv run python modules/cli.py --config configs/pipeline_example.yaml \ data.0.inputs.video_path="/workspace/input/my_video.mp4" \ data.0.output.out_dir="/workspace/output/my_run" \ endpoints.vlm.url="http://host.docker.internal:/v1" \ endpoints.vlm.model="" \ endpoints.llm.url="http://host.docker.internal:/v1" \ endpoints.llm.model="" '
For published images, use the command shape in [Published image command shape](docs/getting-started.md#published-image-command-shape).
Supported inputs: images (.jpg, .jpeg, .png, .webp, .bmp); videos (.mp4, .mov, .m4v).
---
Common usage
Disable a stage
super_resolution.enabled=false detection_and_tracking.enabled=false vlm_json.enabled=false mcq_generation.enabled=false
Batch multiple videos
data.0.inputs.video_path="clip1.mp4" data.0.output.out_dir="output/clip1" \ data.1.inputs.video_path="clip2.mp4" data.1.output.out_dir="output/clip2"
Pin GPUs (for example, when VLM/LLM servers use GPU 0 and 1)
pipeline.gpu_ids=2,3
SR and tracking use GPU 2. Add pipeline.use_multi_gpu=true to spread SR across GPU 2 and GPU 3.
VLM and LLM endpoints — from inside Docker, use host.docker.internal, not localhost:
endpoints.vlm.url="http://host.docker.internal:/v1" endpoints.vlm.model="" \ endpoints.llm.url="http://host.docker.internal:/v1" endpoints.llm.model=""
API keys — set in the environment before running (first match wins):
| Service | Resolution order | |---------|------------------| | VLM | VLM_API_KEY → NVIDIA_API_KEY → OPENAI_API_KEY → "EMPTY" (no auth) | | LLM | LLM_API_KEY → NVIDIA_API_KEY → OPENAI_API_KEY → "EMPTY" (no auth) |
---
MCQ modes
Set the mode with mcq_generation.mode=.
Cookbooks are organized by ownership: use-case folders hold domain banks, configs, and prompts; cookbooks/shared/ holds cross-use-case prompt logic.
| Mode | Endpoints | Use case | Default assets | |------|-----------|----------|----------------| | question-driven-vlm-llm *(blueprint default)* | VLM + LLM | Generic route for traffic, robotics, warehouse, PAS/open-QA, or custom banks | [cookbooks/shared/](cookbooks/shared/) templates + selected question_bank_file | | window-vlm-llm | VLM + LLM | Traffic blueprint: VLM captions, then LLM mapping | [cookbooks/traffic/prompts/mcq/window_vlm_llm/](cookbooks/traffic/prompts/mcq/window_vlm_llm/) | | window-direct-vlm | VLM | Traffic blueprint: VLM answers directly | [cookbooks/traffic/prompts/mcq/window_direct_vlm/](cookbooks/traffic/prompts/mcq/window_direct_vlm/) | | metadata-llm | LLM only | Traffic blueprint: remap from existing sidecars/metadata.json | [cookbooks/traffic/prompts/mcq/metadata_llm/](cookbooks/traffic/prompts/mcq/metadata_llm/) |
See [docs/mcq-modes.md](docs/mcq-modes.md) for examples and window, retry, and VLM verify settings.
To add a domain, create or copy a cookbook under cookbooks// and point the CLI at its question_bank.json or prompt files. See [Adding prompts or question banks](docs/mcq-modes.md#adding-prompts-or-question-banks).
---
Failure behavior
By default, the pipeline is fallback-friendly (pipeline.empty_output_policy=warn). SR respects super_resolution.window_timeout (default 3600 seconds) so a hung SR window does not block a batch indefinitely.
If SR, tracking, or VLM JSON fails after retries, the run still produces a DAFT-compatible scene when possible:
- SR failure: falls back to the original input;
sidecars/pipeline_status.jsonrecordsstatus=completed_degraded. - Tracking:…
Excerpt shown — open the source for the full document.
Notability
notability 2.0/10Low stars, routine repo