What does this repo signal mean?

Qwen (Alibaba Cloud) published QwenLM/Qwen3-VL-Embedding (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo QwenLM/Qwen3-VL-Embedding · language Python · Notable new VLM embedding model from Qwen. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Qwen (Alibaba Cloud) Repo: QwenLM/Qwen3-VL-Embedding

Captured source

source ↗

GitHub/github.com/QwenLM/Qwen3-VL-Embedding

QwenLM/Qwen3-VL-Embedding repository metadata

Source ↗

published Jan 8, 2026seen Jun 5captured Jun 11http 200method plain

QwenLM/Qwen3-VL-Embedding

Language: Python

License: Apache-2.0

Stars: 1282

Forks: 107

Open issues: 54

Created: 2026-01-08T03:42:57Z

Pushed: 2026-04-08T05:01:00Z

Default branch: main

Fork: no

Archived: no

README:

Qwen3-VL-Embedding & Qwen3-VL-Reranker

State-of-the-art multimodal embedding and reranking models built on Qwen3-VL, supporting text, images, screenshots, videos, and mixed-modal inputs for advanced information retrieval and cross-modal understanding.

---

[Overview](#overview)
[Features](#features)
[Model Architecture](#model-architecture)
[Installation](#installation)
[Usage](#usage)
[Examples](#examples)
[Model Performance](#model-performance)
[Citation](#citation)

---

Overview

The Qwen3-VL-Embedding and Qwen3-VL-Reranker model series are the latest additions to the Qwen family, built upon the recently open-sourced and powerful Qwen3-VL foundation model. Specifically designed for multimodal information retrieval and cross-modal understanding, this suite accepts diverse inputs including text, images, screenshots, and videos, as well as inputs containing a mixture of these modalities.

Building on the success of our text-oriented Qwen3-Embedding and Qwen3-Reranker series, these multimodal models extend best-in-class performance to visual and video understanding tasks. The models work in tandem: the Embedding model handles the initial recall stage by generating semantically rich vectors, while the Reranking model manages the re-ranking stage with precise relevance scoring, significantly enhancing final retrieval accuracy.

---

Features

🎨 Multimodal Versatility: Seamlessly process inputs containing text, images, screenshots, and video within a unified framework. Achieve state-of-the-art performance across diverse tasks including image-text retrieval, video-text matching, visual question answering (VQA), and multimodal content clustering.

🔄 Unified Representation Space: Leverage the Qwen3-VL architecture to generate semantically rich vectors that capture both visual and textual information in a shared space, facilitating efficient similarity estimation and retrieval across different modalities.

🎯 High-Precision Reranking: The reranking model accepts input pairs (Query, Document)—where both can consist of arbitrary single or mixed modalities—and outputs precise relevance scores for superior retrieval accuracy.

🌍 Exceptional Practicality:
Support for over 30 languages, ideal for global applications
Customizable instructions for task-specific optimization
Flexible vector dimensions with Matryoshka Representation Learning (MRL)
Strong performance with quantized embeddings for efficient deployment
Easy integration into existing retrieval pipelines

---

Model Architecture

Model Specifications

| Model | Size | Layers | Sequence Length | Embedding Dimension | Quantization Support | MRL Support | Instruction Aware | |---|---|---|---|---|---|---|---| | Qwen3-VL-Embedding-2B | 2B | 28 | 32K | 2048 | ✅ | ✅ | ✅ | | Qwen3-VL-Embedding-8B | 8B | 36 | 32K | 4096 | ✅ | ✅ | ✅ | | Qwen3-VL-Reranker-2B | 2B | 28 | 32K | - | - | - | ✅ | | Qwen3-VL-Reranker-8B | 8B | 36 | 32K | - | - | - | ✅ |

LoRA Configs

| Model | rank | alpha | target_modules | |------|------|-------|----------------| | Qwen3-VL-Embedding | 32 | 32 | q_proj v_proj k_proj up_proj down_proj gate_proj | | Qwen3-VL-Reranker | 32 | 32 | q_proj v_proj k_proj up_proj down_proj gate_proj |

Architecture Design

Qwen3-VL-Embedding: Dual-Tower Architecture

Receives single-modal or mixed-modal input and maps it into a high-dimensional semantic vector
Extracts the hidden state vector corresponding to the [EOS] token from the base model's last layer as the final semantic representation
Enables efficient, independent encoding necessary for large-scale retrieval

Qwen3-VL-Reranker: Single-Tower Architecture

Receives an input pair (Query, Document) and performs pointwise reranking
Utilizes Cross-Attention mechanism for deeper, finer-grained inter-modal interaction and information fusion
Expresses relevance score by predicting the generation probability of special tokens (yes and no)

Feature Comparison

| | Qwen3-VL-Embedding | Qwen3-VL-Reranker | |---------|-------------------|-------------------| | Core Function | Semantic Representation, Embedding Generation | Relevance Scoring, Pointwise Re-ranking | | Input | Single modality or mixed modalities | (Query, Document) pair with single- or mixed-modal inputs | | Architecture | Dual-Tower | Single-Tower | | Mechanism | Efficient Retrieval | Deep Inter-Modal Interaction, Precise Alignment | | Output | Semantic Vector | Relevance Score |

Both models are built through a multi-stage training paradigm that fully leverages the powerful general multimodal semantic understanding capabilities of Qwen3-VL, providing high-quality semantic representations and precise re-ranking mechanisms for complex, large-scale multimodal retrieval tasks.

---

Installation

Setup Environment

# Clone the repository
git clone https://github.com/QwenLM/Qwen3-VL-Embedding.git
cd Qwen3-VL-Embedding

# Run the script to setup the environment
bash scripts/setup_environment.sh

The setup script will automatically:

Install uv if not already installed
Install all project dependencies

After setup completes, activate the environment:

source .venv/bin/activate

Download Models

Our models are available on both Hugging Face and ModelScope.

| Model | Hugging Face | ModelScope | |-------|--------------|------------| | Qwen3-VL-Embedding-2B |Link | Link | | Qwen3-VL-Embedding-8B |Link | Link | | Qwen3-VL-Reranker-2B |Link | Link | | Qwen3-VL-Reranker-8B |Link | Link |

**Install download...

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

Notable new VLM embedding model from Qwen