nvidia/Cosmos-AnomalyGen-PCB-2B
Captured source
source ↗Model Overview
Description:
Cosmos AnomalyGen — PCB (UC1) generates synthetic printed-circuit-board anomaly images by inpainting a user-supplied binary mask onto a clean reference PCB image, conditioned on one of three trained + pairs (IC+bridge, passive_component+excess_solder, passive_component+missing). The release ships only the few-shot-finetuned modules — a set of anomaly-token embeddings and a 2-layer MLP adapter — which plug into the frozen Cosmos-Predict2 2B Text-to-Image diffusion backbone (also using a frozen NV-DINOv2 mask encoder and a frozen T5 text encoder) at inference time. Cosmos AnomalyGen — UC1 v1.0.0 was developed by NVIDIA as part of the Cosmos AnomalyGen pipeline. This model is ready for commercial use.
License/Terms of Use:
Governing Terms: Use of this model is governed by the NVIDIA Open Model Agreement.
Deployment Geography:
Global
Use Case:
Industrial visual-inspection teams responsible for PCB QA who have only a small number of real anomaly examples (≤62 per defect type). The model produces large-scale synthetic anomaly datasets (clean PCB + binary mask → realistic bridge / excess_solder / missing-component image) for training downstream defect-detection or segmentation models, including downstream TAO toolkit consumers via the DAFT v3.0 export path. Unlike UC2 and UC3, UC1 spans two PCB texture categories (IC and passive_component), so a single checkpoint can cover defects whose appearance depends on which board region (IC area vs. passive-component area) they occur in.
Release Date:
Github 06/02/2026 via https://github.com/NVIDIA/paidf-anomalygen
References(s):
- Anomaly Diffusion (AAAI 2024) — paper: https://arxiv.org/abs/2312.05767, code: https://github.com/sjtuplayer/anomalydiffusion
- Cosmos-Predict2 — https://github.com/nvidia-cosmos/cosmos-predict2
- NV-DINOv2 classification model — https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/nv_dinov2_classification_model
Model Architecture:
Architecture Type: Transformer (diffusion DiT backbone with learnable conditioning modules)
Network Architecture:
anomaly_embedding*(trainable, included in this release)*: token embeddings (256 tokens per+pair) — three pairs trained for UC1:IC+bridge,passive_component+excess_solder,passive_component+missing.adapter*(trainable, included in this release)*: 2-layer MLP with GELU activations (input / output hidden size = 1024), projecting the mask encoder output into the diffusion DiT conditioning space.mask_encoder*(frozen, not redistributed in this release)*: NV-DINOv2 (ViT-L) backbone with adaptive pool (kernel = 7); weights are loaded from the separately downloaded NV-DINOv2 classification checkpoint at inference time.text_encoder*(frozen, not redistributed in this release)*: google-t5/t5-large.- These modules condition the frozen Cosmos-Predict2 2B T2I DiT denoiser at inference time.
This model was developed based on Cosmos-Predict2-2B-Text2Image.
Number of model parameters: Approximately 2.9×10^6 (2.9 million) trainable parameters in the released modules — anomaly_embedding ≈ 0.79M (256 tokens × 1024 hidden × 3 + pairs) plus the 2-layer MLP adapter ≈ 2.1M (1024→1024 with GELU). The trainable modules are distributed as the model/iter_000014000.pt checkpoint file. The frozen Cosmos-Predict2 2B base contributes ~2.0×10^9 (2 billion) parameters used at inference time but not redistributed in this release.
Input(s):
Input Type(s): Image, Binary Mask, Text
Input Format(s):
- Image: PNG / JPG, Red, Green, Blue (RGB)
- Binary Mask: PNG / JPG, single-channel binary (0 = background, 255 = anomaly region; binarized at threshold 127)
- Text: anomaly-type string in the form
+(one ofIC+bridge,passive_component+excess_solder,passive_component+missing)
Input Parameters:
- Image: Two-Dimensional (2D)
- Mask: Two-Dimensional (2D)
- Text: One-Dimensional (1D)
Other Properties Related to Input: Input clean image and paired mask must have the same dimensions; the model was trained at 512×512 and inference is run at the same resolution. anomaly_type must exactly match one of the three pairs trained for this UC1 checkpoint — passing an unsupported defect string is rejected by scripts/anomaly_gen/sdg-inference/validate_jsonl.py against this checkpoint's ag_config.yaml → dataloader_train.dataset.anomaly_types. Because UC1 spans two textures, the chosen texture (IC vs. passive_component) must match the board region from which the clean reference image was cropped, otherwise the generated defect may look misplaced. The optional Automatic Mask Placement (AMP) tool can constrain mask placement to legal ROIs (e.g., only on IC pads, only on passive-component pads).
Output(s)
Output Type(s): Image
Output Format(s): PNG; Red, Green, Blue (RGB)
Output Parameters: Two-Dimensional (2D)
Other Properties Related to Output: 512×512 RGB synthetic anomaly image. Anomaly content is generated inside the user-supplied mask region; in the default crop_and_paste=True flow the inpainted patch is pasted back onto the clean reference image so non-masked pixels remain identical to the input. Optionally Poisson blending can be enabled. Generation metadata (per-sample guidance, crop_ratio, seed, etc.) is written to SDG_result.csv alongside the images.
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration:
Runtime Engine(s):
- PyTorch (via the Cosmos-Predict2 2B T2I pipeline)
- Cosmos AnomalyGen scripts (
scripts.anomaly_gen.synthetic_dataset_generation, torchrun-based) - NVIDIA TAO Toolkit — interop via DAFT v3.0 export (
scripts.anomaly_gen.convert_to_daft_format)
Supported Hardware Microarchitecture Compatibility:
- NVIDIA Ampere (A100)
- NVIDIA Hopper (H100)
- NVIDIA RTX 6000
Supported Operating System(s):
- Linux
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at…
Excerpt shown — open the source for the full document.
Notability
notability 4.0/10Niche model release, low traction (200 HF downloads)