RepoNVIDIANVIDIApublished May 19, 2026seen 5d

NVIDIA/paidf-auto-labeling

Python

Open original ↗

Captured source

source ↗
published May 19, 2026seen 5dcaptured 16hhttp 200method plain

NVIDIA/paidf-auto-labeling

Description: Auto-labeling pipeline that turns raw video and images into fine-tuning-ready scenes via super-resolution, detection and tracking, VLM scene understanding, and question generation

Language: Python

License: Apache-2.0

Stars: 2

Forks: 0

Open issues: 0

Created: 2026-05-19T00:03:24Z

Pushed: 2026-05-30T22:13:21Z

Default branch: main

Fork: no

Archived: no

README:

Physical AI Data Factory - Auto Labeling

An end-to-end pipeline that converts raw video or image inputs into DAFT-ready training scenes. It chains Super Resolution (SR), object detection and tracking, VLM-based scene understanding, and LLM-assisted task QA generation—including Multiple-Choice Questions (MCQs)—into one config-driven workflow.

By default, all four stages run in sequence. The same CLI also supports partial runs when you already have SR outputs, tracking overlays, captions, or metadata.

---

Pipeline overview

![Data enrichment workflow](docs/data_enrichment_workflow_auto_labeling.png)

Input video/image → [SR] → [Detection & Tracking] → [VLM JSON] → [MCQ Generation]

Each stage can be enabled or disabled independently. If SR fails or is skipped, downstream stages fall back to the original input.

| Stage | Purpose | Main artifacts | |-------|---------|----------------| | Super Resolution | Upscale input media | SR output media | | Detection & Tracking | Detect objects and assign track IDs | Object/instance annotations and overlays | | VLM JSON | Generate scene metadata and events | Scene/event JSON | | MCQ Generation | Generate task QA | MCQ, BCQ (binary-choice), and open-QA task files |

---

Start here

Default workflow: SR → detection/tracking → VLM JSON → MCQ generation.

Requirements for a full run: Docker with NVIDIA GPU access, input media, VLM and LLM endpoints, and an output directory. For setup, partial workflows, and hardware notes, see [docs/getting-started.md](docs/getting-started.md).

Supported runtime: Run pipeline commands inside a Docker container. The image includes a uv-managed environment; examples use uv run python modules/cli.py from inside the container. Host-side pipeline execution is not validated—use host-side uv only for development checks such as tests and lint.

Documentation map

| Topic | Guide | |-------|--------| | First run, Docker images, hardware | [docs/getting-started.md](docs/getting-started.md) | | Smoke tests and full-matrix validation | [docs/e2e-testing.md](docs/e2e-testing.md) | | MCQ modes, prompts, question banks | [docs/mcq-modes.md](docs/mcq-modes.md) | | Config overrides and env vars | [docs/config-reference.md](docs/config-reference.md) | | S3, HTTP, and MSC I/O | [docs/remote-io.md](docs/remote-io.md) |

---

Quickstart: full pipeline

./docker/deploy.sh build
./docker/deploy.sh shell -lc '
uv run python modules/cli.py --config configs/pipeline_example.yaml \
data.0.inputs.video_path="/workspace/input/my_video.mp4" \
data.0.output.out_dir="/workspace/output/my_run" \
endpoints.vlm.url="http://host.docker.internal:/v1" \
endpoints.vlm.model="" \
endpoints.llm.url="http://host.docker.internal:/v1" \
endpoints.llm.model=""
'

For published images, use the command shape in [Published image command shape](docs/getting-started.md#published-image-command-shape).

Supported inputs: images (.jpg, .jpeg, .png, .webp, .bmp); videos (.mp4, .mov, .m4v).

---

Common usage

Disable a stage

super_resolution.enabled=false
detection_and_tracking.enabled=false
vlm_json.enabled=false
mcq_generation.enabled=false

Batch multiple videos

data.0.inputs.video_path="clip1.mp4" data.0.output.out_dir="output/clip1" \
data.1.inputs.video_path="clip2.mp4" data.1.output.out_dir="output/clip2"

Pin GPUs (for example, when VLM/LLM servers use GPU 0 and 1)

pipeline.gpu_ids=2,3

SR and tracking use GPU 2. Add pipeline.use_multi_gpu=true to spread SR across GPU 2 and GPU 3.

VLM and LLM endpoints — from inside Docker, use host.docker.internal, not localhost:

endpoints.vlm.url="http://host.docker.internal:/v1" endpoints.vlm.model="" \
endpoints.llm.url="http://host.docker.internal:/v1" endpoints.llm.model=""

API keys — set in the environment before running (first match wins):

| Service | Resolution order | |---------|------------------| | VLM | VLM_API_KEYNVIDIA_API_KEYOPENAI_API_KEY"EMPTY" (no auth) | | LLM | LLM_API_KEYNVIDIA_API_KEYOPENAI_API_KEY"EMPTY" (no auth) |

---

MCQ modes

Set the mode with mcq_generation.mode=.

Cookbooks are organized by ownership: use-case folders hold domain banks, configs, and prompts; cookbooks/shared/ holds cross-use-case prompt logic.

| Mode | Endpoints | Use case | Default assets | |------|-----------|----------|----------------| | question-driven-vlm-llm *(blueprint default)* | VLM + LLM | Generic route for traffic, robotics, warehouse, PAS/open-QA, or custom banks | [cookbooks/shared/](cookbooks/shared/) templates + selected question_bank_file | | window-vlm-llm | VLM + LLM | Traffic blueprint: VLM captions, then LLM mapping | [cookbooks/traffic/prompts/mcq/window_vlm_llm/](cookbooks/traffic/prompts/mcq/window_vlm_llm/) | | window-direct-vlm | VLM | Traffic blueprint: VLM answers directly | [cookbooks/traffic/prompts/mcq/window_direct_vlm/](cookbooks/traffic/prompts/mcq/window_direct_vlm/) | | metadata-llm | LLM only | Traffic blueprint: remap from existing sidecars/metadata.json | [cookbooks/traffic/prompts/mcq/metadata_llm/](cookbooks/traffic/prompts/mcq/metadata_llm/) |

See [docs/mcq-modes.md](docs/mcq-modes.md) for examples and window, retry, and VLM verify settings.

To add a domain, create or copy a cookbook under cookbooks// and point the CLI at its question_bank.json or prompt files. See [Adding prompts or question banks](docs/mcq-modes.md#adding-prompts-or-question-banks).

---

Failure behavior

By default, the pipeline is fallback-friendly (pipeline.empty_output_policy=warn). SR respects super_resolution.window_timeout (default 3600 seconds) so a hung SR window does not block a batch indefinitely.

If SR, tracking, or VLM JSON fails after retries, the run still produces a DAFT-compatible scene when possible:

  • SR failure: falls back to the original input; sidecars/pipeline_status.json records status=completed_degraded.
  • Tracking:

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Low stars, routine repo