What does this repo signal mean?

Anthropic published anthropics/cwc-long-running-agents (Shell). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo anthropics/cwc-long-running-agents · language Shell · New Anthropic repo, moderate stars, relevant topic.. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Anthropic Repo: anthropics/cwc-long-running-agents

Captured source

source ↗

GitHub/github.com/anthropics/cwc-long-running-agents

anthropics/cwc-long-running-agents repository metadata

Source ↗

published May 6, 2026seen Jun 5captured Jun 11http 200method plain

anthropics/cwc-long-running-agents

Language: Shell

License: Apache-2.0

Stars: 415

Forks: 46

Open issues: 1

Created: 2026-05-06T00:23:37Z

Pushed: 2026-05-13T00:54:43Z

Default branch: main

Fork: no

Archived: no

README:

Harness Primitives for Long-Running Claude Agents

Claude Code's built-in `/goal` command gives you a generator/evaluator loop out of the box: set a completion condition and a separate fast model checks it after every turn until it's met. This repo ships the same underlying primitives as short, readable hooks and a subagent, so you can see how each mechanism works and assemble a harness tuned to your project. The patterns come from Effective Harnesses for Long-Running Agents (Nov 2025) and Harness Design for Long-Running Application Development (Mar 2026). We recommend trying both the in-product features and a custom harness to see which fits your workflow.

| | In-product | Custom harness (this repo) | |---|---|---| | What runs the loop | `/goal` | the [primitives below](#the-quality-loop) + a [loop you write](#running-the-loop) | | Who judges "done" | a separate fast model checking your condition | your [agents/evaluator.md](./claude-code-config/.claude/agents/evaluator.md) with your prompt | | Where it works | Claude Code interactive, `-p`, Remote Control | Claude Code, headless, or Agent SDK |

Three primitives form the quality loop:

Default-FAIL contract. Every criterion starts false; the agent can't mark it passing without opening evidence first.
Fresh-context evaluator. A separate agent with no Write/Edit tools grades the work from a context window that never saw the build.
Agent-maintained handoff. The agent writes its own progress notes and commits to git so the next session picks up cleanly.

Two more operator-control hooks are included for when you want to watch or intervene. The same patterns translate directly to PreToolUse/Stop callbacks in the Agent SDK.

> Built as the take-home for the Long-Running Agents station at Code with Claude 2026. These are example ingredients, not a turnkey harness. Event demo; not maintained and not accepting contributions.

How to use this repo

Read and cherry-pick. Each primitive is one standalone file with no dependency on the others. The [quality-loop table](#the-quality-loop) below maps every one to its example here and its Agent SDK equivalent. Open the file, see how the mechanism works, copy what fits.

Or copy all of them as a starting point. In your project, run claude and paste:

> Clone github.com/anthropics/cwc-long-running-agents into /tmp, copy its claude-code-config/.claude/ directory into this project's root, make the hook scripts executable, then walk me through what each hook and the evaluator subagent does and what I'll need to adapt for this project.

or do it yourself:

cp -r claude-code-config/.claude /path/to/your/project/
chmod +x /path/to/your/project/.claude/hooks/*.sh

Either way, this gives you all the examples wired into .claude/ at once. Before relying on them: point RESULTS_FILE at your project's actual results file, adjust the evidence-file pattern in track-read.sh, and run claude from the directory that contains .claude/ (hooks are not loaded when launching from a subdirectory).

If you're on the Agent SDK, this repo is a pattern reference. The shell hooks here translate one-to-one to PreToolUse/Stop callbacks. To scaffold an SDK agent from inside Claude Code, install the `agent-sdk-dev` plugin and ask Claude to build an agent that implements whichever of these primitives you want. For a hand-written starting point, see the autonomous-coding quickstart or the agent-sdk-workshop curriculum.

The quality loop

| Primitive | Claude Code example | Agent SDK equivalent | Enforcement | |---|---|---|---| | Default-FAIL contract | [hooks/track-read.sh](./claude-code-config/.claude/hooks/track-read.sh) + [hooks/verify-gate.sh](./claude-code-config/.claude/hooks/verify-gate.sh) | PreToolUse callback | hook | | Fresh-context evaluator | [agents/evaluator.md](./claude-code-config/.claude/agents/evaluator.md) subagent (no Write/Edit) | `evaluator_optimizer.ipynb` | you invoke it | | Agent-maintained handoff | [CLAUDE.md](./claude-code-config/.claude/CLAUDE.md) + [hooks/commit-on-stop.sh](./claude-code-config/.claude/hooks/commit-on-stop.sh) | system prompt + Stop callback | convention + hook backstop |

Default-FAIL contract

Agents will mark a feature "passing" after a unit test or a curl when the UI is visibly broken. Asking nicely in the prompt doesn't reliably stop this. The harness makes "done" structural. Every feature is a row in a test-results.json file you create in your project:

{ "feature-1": { "passes": false }, "feature-2": { "passes": false } }

The only evidence that counts is a file matching the patterns in track-read.sh (screenshots, console logs, result files), and a PreToolUse hook denies any write to the results file unless the agent has first opened one with the Read tool. The agent can't claim success it hasn't observed. (The shipped hook is intentionally simple; see the comments in verify-gate.sh for the gaps a production version would close.)

Fresh-context evaluator

The builder shouldn't grade its own work. After each feature, you (or your wrapper script) invoke a separate subagent (agents/evaluator.md) with no Write/Edit tools that reviews the diff and the screenshots from a context window that never saw the build, then returns PASS or NEEDS_WORK with specific findings. On NEEDS_WORK the findings become the next builder...

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New Anthropic repo, moderate stars, relevant topic.