RepoQwen (Alibaba Cloud)Qwen (Alibaba Cloud)published Feb 13, 2026seen 6d

QwenLM/WebWorld

Python

Open original ↗

Captured source

source ↗
published Feb 13, 2026seen 6dcaptured 14hhttp 200method plain

QwenLM/WebWorld

Description: WebWorld is a large-scale web world model that helps train web agents in a simulated browser, avoiding the latency and safety issues of the real web.

Language: Python

Stars: 39

Forks: 3

Open issues: 0

Created: 2026-02-13T14:02:59Z

Pushed: 2026-02-25T06:29:03Z

Default branch: main

Fork: no

Archived: no

README:

WebWorld

Introduction

Web agents require massive trajectories to generalize, yet real-world training is constrained by network latency, rate limits, and safety risks. WebWorld is the first open-web world model series trained at scale — a large-scale browser simulator that enables agents to train in simulation rather than the real web.

WebWorld features the following:

  • Trained at Scale: 1M+ real-world web interaction trajectories via a scalable hierarchical data pipeline (100× more than prior work).
  • Long-Horizon Simulation: Supports multi-turn simulation up to 30+ steps with consistent state tracking.
  • Multi-Format Supporting: Predicts next states across A11y Tree, HTML, XML, Markdown, and natural language representations.
  • Reasoning: A two-stage training curriculum injects broad web dynamics first, then activates explicit causal reasoning.

Models

| Model | Base Model | Parameters | Download | |---|---|---|---| | WebWorld-8B | Qwen3-8B | 8B | 🤗 HuggingFace | | WebWorld-14B | Qwen3-14B | 14B | 🤗 HuggingFace | | WebWorld-32B | Qwen3-32B | 32B | 🤗 HuggingFace |

Dataset: Qwen/WebWorldData — training trajectories, fully open-sourced under Apache 2.0.

Benchmarks

Intrinsic Evaluation (WebWorld-Bench)

WebWorld-Bench evaluates models using Factuality Score (functional correctness of state transitions) and Web Turing Score (perceptual realism via adversarial discrimination) across nine dimensions.

| Model | Avg Factuality | Avg Turing | |---|---|---| | GPT-4o | 59.5 | 35.4 | | Claude-Opus-4.1 | 71.3 | 47.4 | | Gemini-3-Pro | 70.3 | 43.2 | | Qwen3-8B (base) | 26.9 | 17.4 | | WebWorld-8B | 70.1 | 42.2 | | WebWorld-14B | 70.7 | 44.7 | | WebWorld-32B | 71.0 | 45.6 |

Extrinsic Evaluation (Agent Training)

Agents fine-tuned on WebWorld-synthesized trajectories:

| Model | MiniWob++ SR | WebArena SR | |---|---|---| | GPT-4o | 64.3% | 26.6% | | Qwen3-8B (base) | 49.4% | 9.8% | | Qwen3-8B + WebWorld | 59.3% (+9.9%) | 20.7% (+10.9%) | | Qwen3-14B (base) | 54.9% | 15.1% | | Qwen3-14B + WebWorld | 63.2% (+8.3%) | 24.3% (+9.2%) |

Cross-Domain Generalization

| Environment | Qwen3-8B | WebWorld-8B | Gain | |---|---|---|---| | API Services | 0.088 | 0.299 | +0.211 | | Code | 0.147 | 0.396 | +0.249 | | Game | 0.253 | 0.473 | +0.220 | | GUI Desktop | 0.322 | 0.705 | +0.383 |

For detailed results, please check out the paper.

Quickstart

1. Installation

pip install -r requirements.txt
tar -xzf data.tar.gz

2. Model Configuration

All model calls go through core/serve/unified_api.py. To add a new model provider, create a file (e.g., core/serve/oai.py) and register it in unified_api.py. Then specify your model in config/model_config.yaml for the WebWorld-Bench or in demo/config.py for the demo.

3. Run Demo (Interaction between Agent and WebWorld)

The demo showcases an agent interacting with WebWorld. Given a user query, you can observe the step-by-step trajectory of the agent navigating and operating within the web environment. Running the demo will generate HTML trajectory files that can be opened in a browser for visualization. Some samples are provided in demo/demo.zip.

python ./demo/demo.py

Inference

Single-Step Prediction

💻 Click to expand code

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Qwen/WebWorld-8B" # or WebWorld-14B, WebWorld-32B
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).eval()

system_prompt = (
"You are a web world model. I will provide you with an initial page state "
"and a sequence of actions. For each action, predict the resulting page state.\n"
"Strictly maintain the original format. Output only the full page state "
"without explanations, code, or truncation."
)

current_state = """RootWebArea 'Global Start - Your Daily Portal', focused
\t[1] banner 'Top Header', visible
\t\t[2] link 'Set as Homepage', clickable, visible
\t\t[3] link 'Feedback', clickable, visible
\t\t[5] region 'Weather Widget', visible
\t\t\tStaticText 'New York, USA'
\t\t\t[6] image 'Sunny', visible
\t\t\tStaticText '24°C'
\t\t[8] link 'Sign In', clickable, visible
\t[10] region 'Search Area', visible
\t\t[11] image 'Global Start Logo', visible
\t\tStaticText 'Search the entire web'
\t\t[12] tablist 'Search Engine Selector', orientation='horizontal'
\t\t\t[13] tab 'Google', selected=True, clickable
\t\t\t[14] tab 'Bing', selected=False, clickable
\t\t\t[15] tab 'DuckDuckGo', selected=False, clickable
\t\t[18] combobox 'Web Search', clickable, visible, autocomplete='both', expanded=False
\t\t\t[19] textbox 'Type keywords or URL...', clickable, visible, editable, value=''
\t\t[20] button 'Search', clickable, visible
\t[30] navigation 'Category Bar', visible
\t\t[31] link 'Home', clickable, selected=True
\t\t[32] link 'News', clickable
\t\t[33] link 'Video', clickable
\t\t[34] link 'Shopping', clickable
\t\t[35] link 'Social', clickable
\t[50] main 'Site Directory', visible
\t\t[51] region 'Top Recommended', visible
\t\t\t[52] heading 'Most Popular', visible
\t\t\t[53] list 'Top Sites Grid', visible
\t\t\t\t[54] link 'Facebook', clickable
\t\t\t\t[56] link 'YouTube', clickable
\t\t\t\t[58] link 'Amazon', clickable
\t\t\t\t[60] link 'Twitter / X', clickable
\t\t\t\t[62] link 'Instagram', clickable
\t\t\t\t[64] link 'Wikipedia', clickable
\t\t\t\t[66] link 'Netflix', clickable
\t\t\t\t[68] link 'LinkedIn', clickable
\t\t[80] region 'News & Media', visible
\t\t\t[81] heading 'Latest News', visible
\t\t\t[82] link 'CNN', clickable
\t\t\t[83] link 'BBC', clickable
\t\t\t[84] link 'The Verge', clickable
\t\t[90]…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low traction new repo, routine addition