RepoCohereCoherepublished Apr 22, 2026seen 17h

cohere-ai/vllm-skills

Open original ↗

Captured source

source ↗
published Apr 22, 2026seen 17hcaptured 17hhttp 200method plain

cohere-ai/vllm-skills

License: Apache-2.0

Stars: 2

Forks: 0

Open issues: 0

Created: 2026-04-22T14:52:55Z

Pushed: 2026-06-24T02:41:45Z

Default branch: main

Fork: no

Archived: no

README:

vllm-skills

> AI agent skills for keeping a long-lived vLLM fork in sync with upstream — automated rebase, conflict resolution, and test-driven verification.

Five composable skills that an AI coding agent reads and executes interactively to:

  • detect when a new upstream vLLM release is available,
  • rebase the fork's custom commits onto it,
  • resolve conflicts using upstream diff context,
  • and iterate on user-defined checks (tests, benchmarks, evals) until the fork is back to a healthy state.

For the design rationale and a worked case study (Cohere's transcription model on v0.19.1), see [docs/auto-fork-maintenance.md](docs/auto-fork-maintenance.md). To reproduce that example end-to-end, follow [docs/reproduce-cohere-transcribe-v0.19.1.md](docs/reproduce-cohere-transcribe-v0.19.1.md).

Compatibility

Each skill is a SKILL.md markdown file with YAML frontmatter (name, description), following the Agent Skills format used by Cursor. The skills only assume access to a shell, the file system, and git, so they should also work with other coding agents that can read and execute the same Markdown-based instructions.

The rules/skill-edit-checklist.mdc file is a Cursor Rule and is optional.

Skills

| Skill | Role in the loop | What it does | |-------|------------------|--------------| | [install-vllm](skills/install-vllm/SKILL.md) | Environment setup | Creates a uv virtualenv, installs vLLM in editable mode with the correct precompiled CUDA wheel | | [local-test-runner](skills/local-test-runner/SKILL.md) | Measurement | Runs Buildkite CI-equivalent tests locally on NVIDIA GPUs; parses .buildkite/test_areas/*.yaml, manages Hugging Face tokens, captures logs | | [detect-upstream-base](skills/detect-upstream-base/SKILL.md) | Disturbance detection | Finds the upstream tag (v1) the fork is currently based on via git merge-base + git describe | | [rebase-assistant](skills/rebase-assistant/SKILL.md) | Controller | Rebases custom commits from v1 onto v2, resolves conflicts using upstream diffs, verifies with test-runner | | [auto-rebase](skills/auto-rebase/SKILL.md) | Orchestrator | Checks for new upstream releases via gh, invokes detect-upstream-base and rebase-assistant end-to-end |

See [skills/README.md](skills/README.md) for the dependency graph, shared notation (v1/v2/b1/b2), and the change-impact table contributors should follow when editing skills.

Quick start

In an agent session inside your vLLM fork checkout:

/auto-rebase sync the current branch with the latest upstream release and
make sure tests/entrypoints/openai/correctness/test_transcription_api_correctness.py passes

The agent will:

1. detect the current upstream base tag (v1) and the latest release (v2), 2. confirm with you before rebasing, 3. verify the checks pass on the pre-rebase branch as a baseline, 4. rebase the custom commits onto v2 and resolve conflicts, 5. iterate (fix, re-run checks, repeat) until everything passes, 6. summarize what changed and offer to push.

Prerequisites

Each skill checks its own prereqs at runtime, but at a minimum you'll need:

| Tool | Used by | Install | |------|---------|---------| | uv | install-vllm | curl -LsSf https://astral.sh/uv/install.sh \| sh | | gh (authenticated) | auto-rebase | gh auth login | | Hugging Face token | local-test-runner (for tests that pull weights) | hf auth login | | upstream git remote | detect-upstream-base, rebase-assistant | git remote add upstream git@github.com:vllm-project/vllm.git |

Beyond vLLM

The skills are vLLM-specific, but the underlying pattern — *detect the disturbance, measure the gap, iterate until error → 0* — generalizes to any long-lived fork with a measurable definition of "working" (a test, a benchmark, an eval). The same loop has been applied to other long-lived forks at Cohere, including a Hugging Face transformers fork. For the full framing, see [docs/auto-fork-maintenance.md](docs/auto-fork-maintenance.md).

License

Apache 2.0 — see [LICENSE](LICENSE).

Notability

notability 3.0/10

New repo by notable lab but very low traction.