NVIDIA/paidf-anomalygen
Jupyter Notebook
Captured source
source ↗NVIDIA/paidf-anomalygen
Description: Diffusion-based pipeline for generating photorealistic, mask-aligned synthetic anomaly images for industrial visual inspection from only a few real examples
Language: Jupyter Notebook
License: Apache-2.0
Stars: 8
Forks: 2
Open issues: 0
Created: 2026-05-19T00:41:08Z
Pushed: 2026-05-30T22:46:11Z
Default branch: main
Fork: no
Archived: no
README:
PAIDF AnomalyGen
PAIDF AnomalyGen is a diffusion-based pipeline for Synthetic Data Generation (SDG) of anomaly images in a few-shot scenario.
Overview
Post-training
Cosmos-Predict2 natively supports text-to-image (T2I). Several additional network components enable extra inputs, such as mask and anomaly information. Because targets include a few-shot scenario, the diffusion network components (Cosmos-tokenizer, formerly known as VAE; T5 Text Encoder; Diffusion Transformer, or DiT) are frozen and only the additional components are trained. The Cosmos-Predict2 Text-to-World (T2W) post-train pipeline was modified to retrieve gradients for updating these additional components. 
Inference

Supported Cosmos-Predict2 Models
The following model sizes are supported:
- 2B T2I model
- 14B T2I model
Requirements
- NVIDIA GPU + recent NVIDIA driver
- HuggingFace account with access to the Cosmos Predict2, T5, C-RADIOv3, DINOv2, SAM2 and Qwen3-VL model repos (see [Setup Checkpoints and HuggingFace Access](#setup-checkpoints-and-huggingface-access) below). NVDINOv2 is downloaded from NGC; SAM2 is downloaded from a public Facebook URL.
- https://huggingface.co/nvidia/Cosmos-Predict2-2B-Text2Image
- https://huggingface.co/nvidia/Cosmos-Predict2-14B-Text2Image
- https://huggingface.co/google-t5/t5-11b
- https://huggingface.co/google-t5/t5-large
- https://huggingface.co/nvidia/C-RADIOv3-B
- https://huggingface.co/facebook/dinov2-large
- https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct
- Access to NV-DINOv2 from NGC.
- https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/nv_dinov2_classification_model?version=trainable_v1.1
- Access to SAM2 from Facebook.
- https://github.com/facebookresearch/sam2
Installation
Setup Checkpoints and HuggingFace Access
Download the checkpoints used by the pipeline before starting the tutorial notebooks. The setup script pulls Cosmos-Predict2 Text2Image (2B + 14B), google-t5 (large and 11b),NV-DINOv2, C-RADIOv3-B, facebook/dinov2-large, SAM2, and Qwen3-VL.
# Login into Huggingface (You have to prepare your own HF token) hf auth login python -m scripts.download_checkpoints --model_types text2image --model_sizes 2B 14B
nvidia/Cosmos-Reason1-7B is used by the pseudo-labeling captioner and is downloaded on-demand the first time you run pseudo labeling (or you can pre-fetch it by hf download nvidia/Cosmos-Reason1-7B --local-dir checkpoints/nvidia/Cosmos-Reason1-7B).
If the download script fails, you can manually download the checkpoints from the following links:
- NVDINOv2: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/nv_dinov2_classification_model?version=trainable_v1.1 and place it in the
checkpoints/NVDINOV2directory. - C-RADIOv3-B: https://huggingface.co/nvidia/C-RADIOv3-B/blob/main/model.safetensors and place it in the
checkpoints/nvidia/C-RADIO-V3directory.
Environment Setup (Conda)
For Conda environment setup, refer to [tutorial/notebooks/0-setup-cuda128.ipynb](tutorial/notebooks/0-setup-cuda128.ipynb).
Environment Setup (Docker)
If you run into environment setup issues, we recommend building and running a Docker container for this project.
Use the anomalygen-release skill to build and validate CUDA 12.8 containers from Dockerfile-cuda128. There are two modes:
- Product container: for users operating AnomalyGen through an agent. It sets
ANOMALYGEN_PRODUCT_MODE=1, runs as a non-root user, locks production code read-only, and keeps runtime artifacts writable.
- Develop container: for developers using an agent to modify code. It leaves
ANOMALYGEN_PRODUCT_MODE unset and keeps the repo writable.
Ask the agent:
Build anomalygen product container
or:
Build anomalygen develop container
Equivalent helper commands:
bash .agents/skills/anomalygen-release/scripts/build_image.sh --mode product bash .agents/skills/anomalygen-release/scripts/build_image.sh --mode develop
After building, validate the intended mode:
bash .agents/skills/anomalygen-release/scripts/validate_image_permissions.sh \ --mode product \ "paidf-anomalygen:" bash .agents/skills/anomalygen-release/scripts/validate_image_permissions.sh \ --mode develop \ "paidf-anomalygen-dev:"
Do not export ANOMALYGEN_PRODUCT_MODE=1 in a normal clone or develop container. That variable is reserved for product containers and is what enables the AnomalyGen guard.
Running the Container
> `--shm-size` is required. PyTorch DataLoader uses /dev/shm for > multiprocessing shared memory. The Docker default of 64 MB is far too small > and will cause workers to crash with "Bus error" or silent hangs during > training or inference. Use at least 16g.
Product container:
TAG="paidf-anomalygen:"
REPO="$PWD"
HF_TOKEN=
docker run --rm -it --gpus all --shm-size=16g \
--user "$(id -u):$(id -g)" \
-e USER="$(id -un)" \
-e HF_TOKEN \
-e HOME=/tmp \
-v "${REPO}/checkpoints:/workspace/paidf-anomalygen/checkpoints" \
-v "${REPO}/datasets:/workspace/paidf-anomalygen/datasets" \
-v "${REPO}/ag_configs:/workspace/paidf-anomalygen/ag_configs" \
-v "${REPO}/ag_inference:/workspace/paidf-anomalygen/ag_inference" \
-v…Excerpt shown — open the source for the full document.
Notability
notability 3.0/10New NVIDIA repo, 6 stars, low traction.