RepoNVIDIANVIDIApublished May 19, 2026seen 5d

NVIDIA/paidf-augmentation

Python

Open original ↗

Captured source

source ↗
published May 19, 2026seen 5dcaptured 13hhttp 200method plain

NVIDIA/paidf-augmentation

Description: Containerized generative-AI pipeline that transforms video, image, and text inputs into large, diverse, physically grounded datasets for model training and evaluation

Language: Python

License: Apache-2.0

Stars: 1

Forks: 0

Open issues: 0

Created: 2026-05-19T00:35:07Z

Pushed: 2026-05-30T22:32:58Z

Default branch: main

Fork: no

Archived: no

README:

Physical AI Data Factory (PAIDF) Augmentation Pipeline

This pipeline processes generic camera data through generative AI models. It supports multi-modal inputs such as RGB, depth, edges, and segmentation, and can produce augmented outputs through models such as COSMOS Transfer and image-edit backends. Optional steps include captioning, template-based prompt generation, and verification.

Overview

The pipeline is designed for generic data augmentation. Input media can be captioned, transformed into template variables, turned into prompts, and then passed to a generation backend. You can also enable hallucination detection and attribute verification.

High-level flow:

1. Captioning — Produce a generation prompt from the input media (VLM, LLM, fixed text, or sampled from a file) 2. Generation — Run a backend such as cosmos-transfer2.5, cosmos-predict, or image-edit 3. Hallucination check (optional) — Detect motion artifacts not present in the original 4. Attribute verification (optional) — Generate verification questions and answer them with a VLM

Requirements

  • NVIDIA GPU (Ampere, Hopper, or Blackwell) and a recent NVIDIA driver
  • Docker with the NVIDIA Container Runtime installed
  • A HuggingFace account with the NVIDIA Open Model License accepted on the Cosmos repos you plan to use (see [HuggingFace access](#huggingface-access) below)
  • A VLM and/or LLM endpoint — required for most configs that use VLM/LLM captioning or verification. Not required when using captioning.llm.text (fixed text prompt) or captioning.llm.file_path (sample from a file).
  • *(Optional)* multi-storage-client configuration if your data lives in S3, GCS, Azure, or other remote storage

Installation

Clone the repository

git clone
cd augmentation

Run the CLI with:

uv run modules/cli.py

Set up examples/.env

Create examples/.env and replace the placeholders with your values:

cat > ./examples/.env
VLM_ENDPOINT_MODEL=
VLM_API_KEY=
LLM_ENDPOINT_URL=
LLM_ENDPOINT_MODEL=
LLM_API_KEY=

# Required at runtime to pull Cosmos model checkpoints from HuggingFace (see HuggingFace access below)
HF_TOKEN=

# Optional
LOG_LEVEL=INFO

# Only needed if using gradio executor (cosmos-transfer or cosmos-predict in gradio mode)
# COSMOS_ENDPOINT_URL=
# COSMOS_ENDPOINT_MODEL=

# Only needed if using an image-edit model
# IMAGE_EDIT_ENDPOINT_URL=
# IMAGE_EDIT_ENDPOINT_MODEL=
EOF

HuggingFace access

The Cosmos model repos on HuggingFace are gated under the NVIDIA Open Model License. To use HF_TOKEN:

1. Log in to huggingface.co 2. Visit each repo you plan to use and accept the license (access is granted instantly):

3. Create a read token at huggingface.co/settings/tokens and set it as HF_TOKEN in examples/.env.

Optional storage configuration

The pipeline uses multi-storage-client for unified local and cloud I/O. Configure it only when your configs reference remote paths.

Provide configuration in one of these ways:

  • Create examples/secrets.json and mount it to /var/secrets/secrets.json
  • Set MULTISTORAGECLIENT_CONFIGURATION as an environment variable

Example examples/secrets.json:

cat > ./examples/secrets.json ",
"MULTISTORAGECLIENT_CONFIGURATION": {
"profiles": {
"": {
"storage_provider": {
"type": "s3",
"options": {
"base_path": "",
"region_name": "",
"endpoint_url": "",
"infer_content_type": true
}
},
"credentials_provider": {
"type": "S3Credentials",
"options": {
"access_key": "",
"secret_key": ""
}
}
}
},
"path_mapping": {
"/": "msc:///",
"s3:///": "msc:///"
}
}
}
EOF

If you do not provide storage configuration, the pipeline uses local filesystem paths.

Usage

Build and launch the Docker container

From the repository root:

docker build -t paidf-augmentation:latest -f docker/Dockerfile .

docker run -it --rm \
--network host \
--runtime nvidia \
-e NVIDIA_VISIBLE_DEVICES=0 \
-e NVIDIA_DRIVER_CAPABILITIES=compute,utility \
--env-file examples/.env \
-v "$(pwd)/modules:/app/modules" \
-v "$(pwd)/configs:/app/configs" \
-v "$(pwd)/data:/app/data" \
--entrypoint /bin/bash \
paidf-augmentation:latest

If you use cloud storage, also mount your secrets file:

-v "$(pwd)/examples/secrets.json:/var/secrets/secrets.json:ro"

> Note: The container runs as user nvidia (UID 10000). Mounted host directories must be readable (and data/ writable) by other users. Before launching the container, run: > > ``bash > chmod -R o+rX configs/ modules/ examples/ > chmod -R o+rwX data/ > > > data/` needs write permission so the pipeline can save outputs.

Inside the container, your working directory is /app.

Run inference

For a first-run validation, use the starter config. It uses the bundled sample input at data/sample_input.mp4 and runs the full pipeline: VLM captioning, LLM prompt generation, Cosmos Transfer 2.5 inference, and hallucination check. VLM and LLM endpoints are read from VLM_ENDPOINT_URL and LLM_ENDPOINT_URL in examples/.env:

uv run modules/cli.py --config configs/config_starter.yaml

On success, the generated video is written to data/sample_output.mp4, along with sample_caption.txt and sample_metadata.json.

Other example configs use placeholder data paths (for example, /path/to/input/rgb.mp4). Edit the config or override paths on the command line:

uv run modules/cli.py --config configs/config_carla_vlm_llm.yaml \
'data.0.inputs.rgb=' \
'data.0.output.media='

output.media is the preferred output key. Older configs that have not been migrated may still use output.video; the CLI supports…

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

New repo, minimal traction

NVIDIA has a repo signal matching data demand, evals and quality, infrastructure.