What does this repo signal mean?

Meituan (LongCat) published meituan-longcat/SOP-Maze (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo meituan-longcat/SOP-Maze · language Python · Low stars, trivial new repo. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Meituan (LongCat) Repo: meituan-longcat/SOP-Maze

Captured source

source ↗

GitHub/github.com/meituan-longcat/SOP-Maze

meituan-longcat/SOP-Maze repository metadata

Source ↗

published Sep 3, 2025seen 5dcaptured 10hhttp 200method plain

meituan-longcat/SOP-Maze

Description: SOP(Standard Operating Procedure)-Maze benchmarking LLM's performance on EXTREMELY complicated business tasks.

Language: Python

Stars: 7

Forks: 2

Open issues: 1

Created: 2025-09-03T09:05:53Z

Pushed: 2025-10-07T05:25:48Z

Default branch: main

Fork: no

Archived: no

README:

SOP-Maze

SOP-Maze is a benchmark designed to evaluate the comprehensive capabilities of large language models (LLMs) in executing tasks that follow Standard Operating Procedures (SOPs).

🧩 Overview

SOP-Maze presents complex, structured tasks that mimic real-world procedural workflows. It tests an LLM's ability to:

Understand and follow SOPs.
Reason through multi-step operations.
Produce accurate, context-aware outputs.

📁 Directory Structure

.
├── raw_data/ # Original data samples (JSON)
├── data_with_model_response/ # Populated with model-augmented samples
├── quick_start.py # Script to run evaluation

🛠️ Setup Instructions

1. Prepare the Data

Before evaluation, enrich each JSON file in raw_data/ by adding a new key:

"model_response": ""

Copy the updated files into the data_with_model_response/ directory.
Important: Make sure to clear the data_with_model_response/ directory before copying in new files.

You can refer to the examples already in data_with_model_response/ for formatting guidance.

2. Run Evaluation

To begin evaluation, run:

sh quick_start.py

This will execute the evaluation pipeline on the updated dataset.

---

Notability

notability 0.0/10

Low stars, trivial new repo