RepoTogether AITogether AIpublished May 29, 2026seen 5d

togethercomputer/archipelago

Python

Open original ↗

Captured source

source ↗
published May 29, 2026seen 5dcaptured 11hhttp 200method plain

togethercomputer/archipelago

Description: Archipelago: eval framework for AI agents on professional services tasks

Language: Python

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 1

Created: 2026-05-29T00:15:21Z

Pushed: 2026-05-29T00:15:58Z

Default branch: main

Fork: no

Archived: no

README:

Archipelago

Archipelago is a system for running and evaluating AI agents against MCP applications. It consists of three main components:

1. Environment: Headless environment that exposes an MCP gateway 2. Agents: Extensible agent runner with a registry of configurable agent implementations 3. Grading: Grades agent performance by comparing before/after snapshots (formerly "Verifier")

All components run in Docker containers.

The environment is meant to be run independently as a sandbox, and then an LLM agent connects to the exposed MCP server. The agents runner spawns and manages environment sandboxes automatically.

Table of Contents

  • [Quick Start: Run Your First Task](#quick-start-run-your-first-task)
  • [Components](#components)
  • [Environment](#environment)
  • [Agents](#agents)
  • [Grading](#grading)
  • [Local Development](#local-development)
  • [Running the Environment](#running-the-environment)
  • [Running Agents](#running-agents)
  • [Running the Grading](#running-the-grading)
  • [Citation](#citation)

---

Quick Start: Run Your First Task

Estimated time: 30-60 minutes for first run

This quick start walks you through running a single task end-to-end using the provided example.

Prerequisites

  • Docker Desktop
  • Python 3.13
  • UV
  • LLM API key (Anthropic, OpenAI, or Gemini)

1. Set Up Environment Variables

cd archipelago

# Copy example env files
cp environment/.env.example environment/.env
cp agents/.env.example agents/.env
cp grading/.env.example grading/.env

# Edit agents/.env and grading/.env with your LLM API key (at least one required):
# ANTHROPIC_API_KEY=sk-ant-...
# or OPENAI_API_KEY=sk-...
# or GOOGLE_API_KEY=...

# The environment/.env can be left as-is for local development

2. Run an Example

We provide two examples:

Option A: HuggingFace Benchmark Task (Recommended)

Run tasks from the mercor/apex-agents benchmark dataset with 480 professional services tasks.

cd examples/hugging_face_task
./run.sh

See [examples/hugging_face_task/README.md](./examples/hugging_face_task/README.md) for details.

Option B: Simple Task

A minimal example with a pre-defined task (find a gorilla image in a filesystem).

cd examples/simple_task
./run.sh

See [examples/simple_task/README.md](./examples/simple_task/README.md) for a detailed step-by-step walkthrough.

Both scripts will: 1. Start the environment container 2. Populate the environment with the world snapshot 3. Configure MCP servers 4. Run the agent 5. Save the final snapshot 6. Run grading and display results

3. Check Results

# View grading results
cat ./grades.json | jq '.scoring_results.final_score'

# View agent trajectory
cat ./trajectory.json | jq '.status'

---

Components

Environment

The Environment is a headless gateway designed to run in a Docker container. It serves as a management layer for LLM agents, providing MCP server orchestration, data population from S3, and state snapshotting.

Features

  • MCP Gateway: Hot-swappable gateway that routes requests to configured MCP servers. Supports dynamic reconfiguration of tools and resources.
  • Data Management:
  • Population: Download data from S3-compatible storage into local subsystems (/filesystem, /.apps_data).
  • Snapshots: Create tar.gz archives of the environment state and stream them back to the client or upload directly to S3.
  • Docker-First: Designed to run as a containerized service with health checks and lifecycle management.

API Endpoints

| Endpoint | Method | Description | |----------|--------|-------------| | /health | GET | Health check - returns 200 OK if running | | /docs | GET | FastAPI generated API documentation | | /apps | POST | Hot-swap MCP gateway configuration | | /mcp/ | - | MCP server endpoint (after configuration) | | /data/populate | POST | Download data from S3 into subsystems | | /data/snapshot | POST | Stream a tar.gz snapshot of environment state | | /data/snapshot/s3 | POST | Upload snapshot to S3, returns pre-signed URL |

Configuration

The environment is configured via environment variables:

| Variable | Description | Default | |----------|-------------|---------| | S3_SNAPSHOTS_BUCKET | S3 bucket for storing snapshots | snapshots | | S3_SNAPSHOTS_PREFIX | Prefix for snapshot objects in S3 | "" | | S3_DEFAULT_REGION | AWS region for S3 operations | us-west-2 | | S3_ACCESS_KEY_ID | AWS access key ID | None | | S3_SECRET_ACCESS_KEY | AWS secret access key | None |

Example: Configuring MCP Servers

import requests

config = {
"mcpServers": {
"filesystem_server": {
"transport": "stdio",
"command": "python",
"args": ["main.py"],
"cwd": "./mcp_servers/filesystem_server" # Must be a valid path in the container
}
}
}
requests.post("http://localhost:8080/apps", json=config)

After configuration, http://localhost:8080/mcp/ exposes an MCP server that agents can connect to.

> For more details, see the [Environment README](./environment/README.md).

Agents

The Agents component provides an extensible framework for running AI agents against environment sandboxes. It uses a registry-based architecture that allows multiple agent implementations with configurable parameters.

Features

  • Agent Registry: Pluggable agent implementations (e.g., react_toolbelt_agent) that can be extended with custom agents
  • Configurable Parameters: Each agent type defines its own configuration schema (max steps, timeouts, system prompts, etc.)
  • Environment Integration: Spawns and manages environment sandboxes, handling data population, MCP configuration, and snapshotting
  • Observability: Built-in logging to multiple backends (Datadog, PostgreSQL, Redis, file)

Architecture

┌─────────────────────────────────────────────────────────────────┐
│ Agents Runner │
├─────────────────────────────────────────────────────────────────┤
│ runner/ │
│ ├── main.py Main orchestrator │
│ ├── models.py Data models │
│ ├── agents/ │
│ │ ├── models.py AgentIds, AgentDefn, AgentRunInput │…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

New repo from notable AI lab.