NVIDIA/paidf-augmentation
Python
Captured source
source ↗NVIDIA/paidf-augmentation
Description: Containerized generative-AI pipeline that transforms video, image, and text inputs into large, diverse, physically grounded datasets for model training and evaluation
Language: Python
License: Apache-2.0
Stars: 1
Forks: 0
Open issues: 0
Created: 2026-05-19T00:35:07Z
Pushed: 2026-05-30T22:32:58Z
Default branch: main
Fork: no
Archived: no
README:
Physical AI Data Factory (PAIDF) Augmentation Pipeline
This pipeline processes generic camera data through generative AI models. It supports multi-modal inputs such as RGB, depth, edges, and segmentation, and can produce augmented outputs through models such as COSMOS Transfer and image-edit backends. Optional steps include captioning, template-based prompt generation, and verification.
Overview
The pipeline is designed for generic data augmentation. Input media can be captioned, transformed into template variables, turned into prompts, and then passed to a generation backend. You can also enable hallucination detection and attribute verification.
High-level flow:
1. Captioning — Produce a generation prompt from the input media (VLM, LLM, fixed text, or sampled from a file) 2. Generation — Run a backend such as cosmos-transfer2.5, cosmos-predict, or image-edit 3. Hallucination check (optional) — Detect motion artifacts not present in the original 4. Attribute verification (optional) — Generate verification questions and answer them with a VLM
Requirements
- NVIDIA GPU (Ampere, Hopper, or Blackwell) and a recent NVIDIA driver
- Docker with the NVIDIA Container Runtime installed
- A HuggingFace account with the NVIDIA Open Model License accepted on the Cosmos repos you plan to use (see [HuggingFace access](#huggingface-access) below)
- A VLM and/or LLM endpoint — required for most configs that use VLM/LLM captioning or verification. Not required when using
captioning.llm.text(fixed text prompt) orcaptioning.llm.file_path(sample from a file). - *(Optional)* multi-storage-client configuration if your data lives in S3, GCS, Azure, or other remote storage
Installation
Clone the repository
git clone cd augmentation
Run the CLI with:
uv run modules/cli.py
Set up examples/.env
Create examples/.env and replace the placeholders with your values:
cat > ./examples/.env VLM_ENDPOINT_MODEL= VLM_API_KEY= LLM_ENDPOINT_URL= LLM_ENDPOINT_MODEL= LLM_API_KEY= # Required at runtime to pull Cosmos model checkpoints from HuggingFace (see HuggingFace access below) HF_TOKEN= # Optional LOG_LEVEL=INFO # Only needed if using gradio executor (cosmos-transfer or cosmos-predict in gradio mode) # COSMOS_ENDPOINT_URL= # COSMOS_ENDPOINT_MODEL= # Only needed if using an image-edit model # IMAGE_EDIT_ENDPOINT_URL= # IMAGE_EDIT_ENDPOINT_MODEL= EOF
HuggingFace access
The Cosmos model repos on HuggingFace are gated under the NVIDIA Open Model License. To use HF_TOKEN:
1. Log in to huggingface.co 2. Visit each repo you plan to use and accept the license (access is granted instantly):
3. Create a read token at huggingface.co/settings/tokens and set it as HF_TOKEN in examples/.env.
Optional storage configuration
The pipeline uses multi-storage-client for unified local and cloud I/O. Configure it only when your configs reference remote paths.
Provide configuration in one of these ways:
- Create
examples/secrets.jsonand mount it to/var/secrets/secrets.json - Set
MULTISTORAGECLIENT_CONFIGURATIONas an environment variable
Example examples/secrets.json:
cat > ./examples/secrets.json ",
"MULTISTORAGECLIENT_CONFIGURATION": {
"profiles": {
"": {
"storage_provider": {
"type": "s3",
"options": {
"base_path": "",
"region_name": "",
"endpoint_url": "",
"infer_content_type": true
}
},
"credentials_provider": {
"type": "S3Credentials",
"options": {
"access_key": "",
"secret_key": ""
}
}
}
},
"path_mapping": {
"/": "msc:///",
"s3:///": "msc:///"
}
}
}
EOFIf you do not provide storage configuration, the pipeline uses local filesystem paths.
Usage
Build and launch the Docker container
From the repository root:
docker build -t paidf-augmentation:latest -f docker/Dockerfile . docker run -it --rm \ --network host \ --runtime nvidia \ -e NVIDIA_VISIBLE_DEVICES=0 \ -e NVIDIA_DRIVER_CAPABILITIES=compute,utility \ --env-file examples/.env \ -v "$(pwd)/modules:/app/modules" \ -v "$(pwd)/configs:/app/configs" \ -v "$(pwd)/data:/app/data" \ --entrypoint /bin/bash \ paidf-augmentation:latest
If you use cloud storage, also mount your secrets file:
-v "$(pwd)/examples/secrets.json:/var/secrets/secrets.json:ro"
> Note: The container runs as user nvidia (UID 10000). Mounted host directories must be readable (and data/ writable) by other users. Before launching the container, run: > > ``bash > chmod -R o+rX configs/ modules/ examples/ > chmod -R o+rwX data/ > > > data/` needs write permission so the pipeline can save outputs.
Inside the container, your working directory is /app.
Run inference
For a first-run validation, use the starter config. It uses the bundled sample input at data/sample_input.mp4 and runs the full pipeline: VLM captioning, LLM prompt generation, Cosmos Transfer 2.5 inference, and hallucination check. VLM and LLM endpoints are read from VLM_ENDPOINT_URL and LLM_ENDPOINT_URL in examples/.env:
uv run modules/cli.py --config configs/config_starter.yaml
On success, the generated video is written to data/sample_output.mp4, along with sample_caption.txt and sample_metadata.json.
Other example configs use placeholder data paths (for example, /path/to/input/rgb.mp4). Edit the config or override paths on the command line:
uv run modules/cli.py --config configs/config_carla_vlm_llm.yaml \ 'data.0.inputs.rgb=' \ 'data.0.output.media='
output.media is the preferred output key. Older configs that have not been migrated may still use output.video; the CLI supports…
Excerpt shown — open the source for the full document.
Notability
notability 1.0/10New repo, minimal traction
NVIDIA has a repo signal matching data demand, evals and quality, infrastructure.