QwenLM/WebWorld
Python
Captured source
source ↗QwenLM/WebWorld
Description: WebWorld is a large-scale web world model that helps train web agents in a simulated browser, avoiding the latency and safety issues of the real web.
Language: Python
Stars: 39
Forks: 3
Open issues: 0
Created: 2026-02-13T14:02:59Z
Pushed: 2026-02-25T06:29:03Z
Default branch: main
Fork: no
Archived: no
README:
WebWorld
Introduction
Web agents require massive trajectories to generalize, yet real-world training is constrained by network latency, rate limits, and safety risks. WebWorld is the first open-web world model series trained at scale — a large-scale browser simulator that enables agents to train in simulation rather than the real web.
WebWorld features the following:
- Trained at Scale: 1M+ real-world web interaction trajectories via a scalable hierarchical data pipeline (100× more than prior work).
- Long-Horizon Simulation: Supports multi-turn simulation up to 30+ steps with consistent state tracking.
- Multi-Format Supporting: Predicts next states across A11y Tree, HTML, XML, Markdown, and natural language representations.
- Reasoning: A two-stage training curriculum injects broad web dynamics first, then activates explicit causal reasoning.
Models
| Model | Base Model | Parameters | Download | |---|---|---|---| | WebWorld-8B | Qwen3-8B | 8B | 🤗 HuggingFace | | WebWorld-14B | Qwen3-14B | 14B | 🤗 HuggingFace | | WebWorld-32B | Qwen3-32B | 32B | 🤗 HuggingFace |
Dataset: Qwen/WebWorldData — training trajectories, fully open-sourced under Apache 2.0.
Benchmarks
Intrinsic Evaluation (WebWorld-Bench)
WebWorld-Bench evaluates models using Factuality Score (functional correctness of state transitions) and Web Turing Score (perceptual realism via adversarial discrimination) across nine dimensions.
| Model | Avg Factuality | Avg Turing | |---|---|---| | GPT-4o | 59.5 | 35.4 | | Claude-Opus-4.1 | 71.3 | 47.4 | | Gemini-3-Pro | 70.3 | 43.2 | | Qwen3-8B (base) | 26.9 | 17.4 | | WebWorld-8B | 70.1 | 42.2 | | WebWorld-14B | 70.7 | 44.7 | | WebWorld-32B | 71.0 | 45.6 |
Extrinsic Evaluation (Agent Training)
Agents fine-tuned on WebWorld-synthesized trajectories:
| Model | MiniWob++ SR | WebArena SR | |---|---|---| | GPT-4o | 64.3% | 26.6% | | Qwen3-8B (base) | 49.4% | 9.8% | | Qwen3-8B + WebWorld | 59.3% (+9.9%) | 20.7% (+10.9%) | | Qwen3-14B (base) | 54.9% | 15.1% | | Qwen3-14B + WebWorld | 63.2% (+8.3%) | 24.3% (+9.2%) |
Cross-Domain Generalization
| Environment | Qwen3-8B | WebWorld-8B | Gain | |---|---|---|---| | API Services | 0.088 | 0.299 | +0.211 | | Code | 0.147 | 0.396 | +0.249 | | Game | 0.253 | 0.473 | +0.220 | | GUI Desktop | 0.322 | 0.705 | +0.383 |
For detailed results, please check out the paper.
Quickstart
1. Installation
pip install -r requirements.txt tar -xzf data.tar.gz
2. Model Configuration
All model calls go through core/serve/unified_api.py. To add a new model provider, create a file (e.g., core/serve/oai.py) and register it in unified_api.py. Then specify your model in config/model_config.yaml for the WebWorld-Bench or in demo/config.py for the demo.
3. Run Demo (Interaction between Agent and WebWorld)
The demo showcases an agent interacting with WebWorld. Given a user query, you can observe the step-by-step trajectory of the agent navigating and operating within the web environment. Running the demo will generate HTML trajectory files that can be opened in a browser for visualization. Some samples are provided in demo/demo.zip.
python ./demo/demo.py
Inference
Single-Step Prediction
💻 Click to expand code
import torch from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "Qwen/WebWorld-8B" # or WebWorld-14B, WebWorld-32B tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, ).eval() system_prompt = ( "You are a web world model. I will provide you with an initial page state " "and a sequence of actions. For each action, predict the resulting page state.\n" "Strictly maintain the original format. Output only the full page state " "without explanations, code, or truncation." ) current_state = """RootWebArea 'Global Start - Your Daily Portal', focused \t[1] banner 'Top Header', visible \t\t[2] link 'Set as Homepage', clickable, visible \t\t[3] link 'Feedback', clickable, visible \t\t[5] region 'Weather Widget', visible \t\t\tStaticText 'New York, USA' \t\t\t[6] image 'Sunny', visible \t\t\tStaticText '24°C' \t\t[8] link 'Sign In', clickable, visible \t[10] region 'Search Area', visible \t\t[11] image 'Global Start Logo', visible \t\tStaticText 'Search the entire web' \t\t[12] tablist 'Search Engine Selector', orientation='horizontal' \t\t\t[13] tab 'Google', selected=True, clickable \t\t\t[14] tab 'Bing', selected=False, clickable \t\t\t[15] tab 'DuckDuckGo', selected=False, clickable \t\t[18] combobox 'Web Search', clickable, visible, autocomplete='both', expanded=False \t\t\t[19] textbox 'Type keywords or URL...', clickable, visible, editable, value='' \t\t[20] button 'Search', clickable, visible \t[30] navigation 'Category Bar', visible \t\t[31] link 'Home', clickable, selected=True \t\t[32] link 'News', clickable \t\t[33] link 'Video', clickable \t\t[34] link 'Shopping', clickable \t\t[35] link 'Social', clickable \t[50] main 'Site Directory', visible \t\t[51] region 'Top Recommended', visible \t\t\t[52] heading 'Most Popular', visible \t\t\t[53] list 'Top Sites Grid', visible \t\t\t\t[54] link 'Facebook', clickable \t\t\t\t[56] link 'YouTube', clickable \t\t\t\t[58] link 'Amazon', clickable \t\t\t\t[60] link 'Twitter / X', clickable \t\t\t\t[62] link 'Instagram', clickable \t\t\t\t[64] link 'Wikipedia', clickable \t\t\t\t[66] link 'Netflix', clickable \t\t\t\t[68] link 'LinkedIn', clickable \t\t[80] region 'News & Media', visible \t\t\t[81] heading 'Latest News', visible \t\t\t[82] link 'CNN', clickable \t\t\t[83] link 'BBC', clickable \t\t\t[84] link 'The Verge', clickable \t\t[90]…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Low traction new repo, routine addition