ModelNVIDIANVIDIApublished May 11, 2026seen 5d

nvidia/Qwen-Image-Edit-NVPCB-OVSL2SL

Open original ↗

Captured source

source ↗
published May 11, 2026seen 5dcaptured 14hhttp 200method plaintask image-to-imagelicense otherlibrary diffusersdownloads 301likes 1

Model Overview

Description:

Qwen-Image-Edit-NVPCB-OVSL2SL transforms synthetic solder-light printed-circuit-board (PCB) component crops — produced in NVIDIA Omniverse — into the photographic solder-light style captured at NVIDIA PCB inspection stations, so that downstream PCB inspection models trained on real solder-light photographs can be augmented with Omniverse-generated synthetic data. The release is an NVIDIA fine-tuned version of the Qwen-Image-Edit image-to-image diffusion pipeline (diffusion transformer, Qwen2.5-VL text encoder, Qwen-Image VAE, tokenizer, image processor, and scheduler configuration), specialized for the Omniverse → NVPCB solder-light style transfer. Qwen-Image-Edit-NVPCB-OVSL2SL v1.0.0 was developed by NVIDIA as part of the NVPCB inspection-data harmonization pipeline. This model is ready for commercial use.

License/Terms of Use:

Governing Terms: Use of this model is governed by the NVIDIA Open Model Agreement. Additional Information: Apache License, Version 2.0.

Deployment Geography:

Global

Use Case:

NVIDIA engineers and researchers building PCB inspection / automated optical inspection (AOI) systems that need to be augmented with Omniverse-generated synthetic data. The model converts Omniverse-rendered solder-light PCB component crops into the photographic solder-light style produced by NVIDIA's physical inspection stations, closing the sim-to-real style gap so that inspection models trained on real photographs can be evaluated or augmented with synthetic Omniverse data. This model is not intended to be the primary inspection decision-maker; it is a sim-to-real data-translation step. Inspection pass/fail decisions must come from a downstream inspection model with human review.

Release Date:

Github 06/02/2026 via https://github.com/NVIDIA/paidf-augmentation

References(s):

  • Qwen-Image-Edit base model — https://huggingface.co/Qwen/Qwen-Image-Edit
  • Qwen2.5-VL text encoder — https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct
  • LoRA: Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models," 2021 — https://arxiv.org/abs/2106.09685
  • Diffusers library — https://github.com/huggingface/diffusers

Model Architecture:

Architecture Type: Transformer (diffusion transformer with cross-modal conditioning)

Network Architecture: This release is a self-contained HuggingFace diffusers pipeline directory. All of the following components are redistributed as part of the release artifact:

  • transformer/ *(NVIDIA fine-tuned, redistributed)*: the upstream Qwen-Image-Edit flow-matching image-to-image diffusion transformer, fine-tuned by NVIDIA on the attention and feed-forward projections of QwenImageTransformerBlock (to_q, to_k, to_v, to_out, the cross-attention add_{q,k,v}_proj / to_add_out, and img_mlp / txt_mlp).
  • text_encoder/ *(redistributed unmodified)*: Qwen2.5-VL; attends over both the input image and the instruction prompt.
  • vae/ *(redistributed unmodified)*: Qwen-Image VAE.
  • tokenizer/, processor/, scheduler/, and model_index.json *(redistributed unmodified)*: Qwen-Image-Edit tokenizer, image processor, scheduler configuration, and pipeline entry point.
  • Fine-tuning methodology: NVIDIA fine-tuned the transformer using LoRA (rank 16, ~1.7 × 10^8 parameters introduced during training); the resulting weights were then merged back into the transformer for release, so the released artifact is a standalone diffusers pipeline that requires no separate adapter file at inference time.
  • The released pipeline directory can be loaded directly with diffusers.QwenImageEditPipeline.from_pretrained(...).

This model was developed based on Qwen-Image-Edit.

Number of model parameters: Approximately ~2.0 × 10^10 (20B) total parameters in the released checkpoint. Of these, ~1.7 × 10^8 (170M) parameters in the diffusion transformer were updated by NVIDIA fine-tuning; the remaining parameters come from the upstream Qwen-Image-Edit pipeline (transformer + Qwen2.5-VL text encoder + Qwen-Image VAE) and are redistributed unmodified.

Cumulative Compute: ~0.6 GPU-hour total on a single NVIDIA H100 SXM (~0.5 GPU-hour for the 1500-step fine-tuning run + ~5 GPU-minutes for the latent/embedding cache build).

Estimated Energy and Emissions for Model Training: ~0.4 kWh and ~0.16 kgCO2e total. Methodology: GPU energy = 0.6 GPU-hour × 0.7 kW (H100 SXM rated TDP) × 0.6 average utilization (typical for LoRA fine-tuning, which is not consistently GPU-bound) ≈ 0.25 kWh; multiplied by an assumed datacenter PUE of 1.5 to account for cooling and facility overhead ≈ 0.38 kWh; multiplied by 0.4 kgCO2e/kWh (U.S. national-grid average) ≈ 0.16 kgCO2e. Estimates use rated TDP rather than measured wall-power and are therefore conservative upper bounds; actual emissions depend on the specific datacenter's PUE and regional grid carbon intensity at training time.

Input(s):

Input Type(s): Image, Text

Input Format(s):

  • Image: PNG / JPG, Red, Green, Blue (RGB)
  • Text: UTF-8 instruction prompt (English)

Input Parameters:

  • Image: Two-Dimensional (2D)
  • Text: One-Dimensional (1D)

Other Properties Related to Input: Fine-tuned at target area 262,144 pixels (~512×512); other resolutions are accepted by the underlying diffusers pipeline but NVIDIA fine-tuning was not performed at them, so style fidelity may degrade. Input must be a single Omniverse-rendered PCB component crop on an approximately black background, similar to the synthetic solder-light style the model was fine-tuned on. The accompanying instruction prompt is a fixed English sentence and is not user-configurable. The prompt is:

> "Render this PCB component crop as a real NVPCB inspection-line solder-light photograph: dark photographic board surface with bright orange and blue specular highlights on the solder pads, sharp realistic textures."

The instruction prompt is fixed and not user-configurable. The only input a user provides is the PCB image to be processed; the model performs an image-constrained relighting edit of that image, not free-text text-to-image generation. This is a deliberate guardrail: the model was fine-tuned on this single instruction only, so the prompt is locked and cannot be a vector for misuse.…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low traction specialized model release