stepfun-ai/gelab-zero
Python
Captured source
source ↗stepfun-ai/gelab-zero
Description: STEP-GUI: The top GUI agent solution in the galaxy. Developed by the StepFun-GELab team and powered by StepFun’s cutting-edge research capabilities.
Language: Python
License: MIT
Stars: 2191
Forks: 193
Open issues: 44
Created: 2025-11-28T14:42:44Z
Pushed: 2026-05-11T05:50:07Z
Default branch: main
Fork: no
Archived: no
README: 
> 👋 Hi, everyone! We are proud to present the first fully open-source GUI Agent with both model and infrastructure. Our solution features plug-and-play engineering with no cloud dependencies, giving you complete privacy control.
English | 简体中文
📰 News
- 🎁 [Coming Soon...]
- 🎁 [2025-12-12] MCP-Server ready:
Step1 Start MCP server to support multi-device management and task distribution
# enable mcp server python mcp_server/detailed_gelab_mcp_server.py
Step2 Import MCP tools in Chatbox
- 🎁 [2025-12] We thank the following projects and authors for providing quantization tools & tutorials: GGUF_v1, GGUF_v2, EXL3, Tutorials_CN, Tutorials_EN
- 🎁 [2025-11] We release a lightweight 4B model on **Hugging Face** and **Model Scope**.
- 🎁 [2025-11] We release the tasks from the **AndroidDaily** benchmark.
- 🎁 [2025-11] We release the current GELab-Zero engineering infrastructure.
- 🎁 [2025-10] Our research paper on GELab-Engine is accepted by NeurIPS 2025.
📑 Table of Contents
- [📖 Background](#-background)
- [🎥 Application Demonstrations](#-application-demonstrations)
- [📊 AndroidDaily](#-androiddaily-a-self-built-benchmark-close-to-daily-life)
- [🏆 Open Benchmark](#-open-benchmark)
- [🚀 Installation & Quick Start](#-installation-quick-start)
- [📝 Citation](#-citation)
- [📧 Contact](#-contact)
📖 Background
As AI experiences continue to penetrate consumer-grade terminal devices, mobile Agent research is at a critical juncture transitioning from "feasibility verification" to "large-scale application." GUI-based solutions have emerged as the optimal approach for the current stage in addressing complex mobile ecosystems and achieving scalable Agent capabilities, thanks to their universal compatibility with all apps and zero-cost integration without requiring app vendor adaptation. However, due to the highly fragmented nature of mobile application ecosystems, getting GUI Agents to truly work across different brands and device models often faces numerous engineering challenges: multi-device ADB connections, dependency installation, permission configuration, inference service deployment, task recording and replay. This means Agent developers and MCP users need to handle substantial engineering infrastructure work, making it difficult to focus on strategic innovation.
To address this challenge, we are open-sourcing GELab-Zero to accelerate the innovation and application deployment of GUI Agents. It consists of two main components:
- Plug-and-play complete inference engineering infrastructure that handles all the heavy lifting
- A 4B GUI Agent model capable of running on local computer
It provides a one-click launch experience similar to open-source GUI Agent MCP, can be deployed entirely locally, and puts the entire inference pipeline under your complete control. Specific capabilities include:
- Local Deployment: Supports 4B-scale models running on consumer-grade hardware, balancing low latency with privacy.
- One-click Launch: Provides unified deployment pipeline that automatically handles environment dependencies and device management.
- Task Distribution: Can distribute tasks to multiple phones while recording interaction trajectories for observability and reproducibility.
- Three Agent Modes: Covers multiple working modes including ReAct loops, multi-agent collaboration, and scheduled tasks.
These capabilities enable GELab-Zero to flexibly handle complex task flows in real-world scenarios and provide a solid foundation for future extensions.
For Agent developers, this infrastructure enables rapid testing of new ideas and strategies, validating interaction approaches; for enterprise users, it allows direct reuse of this infrastructure to quickly integrate MCP capabilities into product business.
🎥 Application Demonstrations
Recommendation - Sci-Fi Movies
Task: Help me find any good recent sci-fi movies
[📹 Click to view demo video](./images/video_2.mp4)
Recommendation - Travel Destination
Task: Help me find a place where I can take my kids on the weekend
[📹 Click to view demo video](./images/video_4.mp4)
Practical Task - Claim Subsidy
Task: Claim meal vouchers on the enterprise welfare platform
[📹 Click to view demo video](./images/video_3.mp4)
Practical Task - Metro Line Query
Task: Check if Metro Line 1 is operating normally, then navigate to the nearest entrance of Line 1 metro station
[📹 Click to view demo video](./images/video_5.mp4)
Complex Task - Multi-Item Shopping
Task: Go to the nearest Hema Fresh Store on Ele.me and purchase: Red strawberries 300g, Peruvian Bianca blueberries 125g (18mm diameter), seasonal fresh yellow potatoes 500g, sweet baby pumpkin 750g, Hema large grain shrimp sliders, 2 bottles of Hema pure black soy milk 300ml, Little Prince macadamia nut cocoa crisp 120g, Hema spinach noodles, Hema five-spice beef, 5 bags of Haohuan snail Liuzhou river snail rice noodles (extra spicy extra smelly) 400g, m&m's milk chocolate beans 100g
[📹 Click to view demo video](./images/video_1.mp4)
Complex Task - Information Retrieval
Task: Search for 'how to learn financial management' on Zhihu and view the first answer with over 10k likes
[📹 Click to view demo video](./images/video_6.mp4)
Complex Task - Conditional Search
Task: Find a pair of white canvas shoes in size 37 on Taobao, priced under 100 yuan, then add the first item that meets the…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10New repo with good stars, not major launch