Safety & alignment at the frontier — what the hiring reveals

Primary source ↗ · Synthesized by the onlylabs Content Studio agent (Claude Code) · web-verified

Charts

Open roles by lab

Safety & alignment at the frontier: what the hiring reveals

A safety person's read of 127 open safety/alignment/red-team roles across frontier labs (onlylabs, June 2026), for two audiences: people who want to get hired into safety, and people who want to sell safety tooling to the labs. Companion to the evals report, which covers the eval-harness slice; this is the wider safety org.

0. The macro read

127 open safety roles, and the surprise is the order: OpenAI 55 ≫ Anthropic 42 ≫ xAI 9 ≫ Google DeepMind 5. The safety-branded lab (Anthropic) is out-hired in raw safety headcount right now by OpenAI — though the shapes differ. OpenAI's safety hiring spreads across Alignment (Oversight, Training, Misalignment, Science), Preparedness (threat modeling, biosafety/cyber red-team, recursive-self-improvement), and Trust & Safety. Anthropic concentrates in a deep Safeguards org + Interpretability + Alignment Science + a Frontier Red Team.

The throughline: safety is now an engineering-and-ops discipline, not just research — Safeguards tooling, red-team infrastructure, monitoring, and dangerous-capability evals dominate over pure theory.

1. If you want to get hired (as a safety person)

OpenAI (55) — three doors: alignment (Threat Modeler, Preparedness and the Alignment Oversight/Training/Misalignment roles), preparedness red-team (biosafety, cyber, recursive self-improvement — mapped to the Preparedness Framework), and interpretability. If you can red-team a dangerous capability or build an oversight monitor, you fit.
Anthropic (42) — the Safeguards Labs org (research + eng + data + tooling), Interpretability (research scientists + engineers), Alignment Science, and the Frontier Red Team (Cyber). Anthropic wants safety people who ship infrastructure, not just write papers — see the eval companion for the "runs on every agent change" bar.
Google DeepMind (5) / xAI (9) — smaller but real: DeepMind (Research Scientist, Multimodal Alignment, Safety); xAI skews alignment + monitoring.

Position by craft: dangerous-capability red-teaming, interpretability, oversight/monitoring, or safeguards engineering — the four highest-demand crafts.

2. If you want to sell to frontier labs

Labs build the safety science in-house (don't sell them alignment research), but they buy the operational layer:

Red-team capacity & dangerous-capability eval data — OpenAI's biosafety/cyber red-team + Preparedness hiring = demand for bio/cyber eval datasets and automated red-teaming. Pitch: red-team-as-a-service, dangerous-capability eval suites.
Monitoring / oversight / safeguards infrastructure — Anthropic's Safeguards tooling + data-infra roles = demand for detection pipelines, monitoring throughput, and the data behind them. Pitch: monitoring infra, abuse-detection data, harness throughput. (They build the auditing science — Petri, SHADE-Arena — so sell the throughput and data, not the method.)
Eval/safeguards data & annotation — see the human-data report and the evals report; safety is a heavy buyer of red-team and policy-labeled data.

Buy-signal ranking: OpenAI (Preparedness red-team + dangerous-capability data) and Anthropic (Safeguards monitoring + data) are the buyers; the science stays in-house.

3. The connections

OpenAI's Preparedness red-team roles + the Preparedness Framework ⇒ they're operationalizing dangerous-capability gating — every red-team role maps to a framework category; sell into the categories, not the model.
Anthropic's Safeguards + Interpretability + Frontier Red Team ⇒ a vertically-integrated safety stack (detect → interpret → red-team) built as infrastructure — the buy surface is throughput and data, not research.

4. What the JDs actually say (deep dive)

Read the actual JDs for the top safety teams.

OpenAI safety ships into production. The Alignment Oversight JD: "improving control, accountability, and alignment as AI systems become more capable and agentic… combine longer-horizon research with hands-on deployment… building oversight systems used in practice today — code review and action monitoring." Safety here is a deployed monitoring function, not only research.
DeepMind's lens is sociotechnical. The Research Scientist, Multimodal Alignment, Safety JD (Frontier AI unit): "interdisciplinary sociotechnical modeling… modeling the interactions between AI and society." A different safety craft than OpenAI/Anthropic's systems focus.
Anthropic = a vertically-integrated safety stack — Safeguards + Interpretability + Frontier Red Team, built as infrastructure (see the eval report's §6: "runs on every agent change").

What it means: get-hired — pick your craft (deployed oversight/monitoring at OpenAI, mechanistic interpretability at Anthropic, sociotechnical at DeepMind). Sell-to — OpenAI's "oversight used in practice today" = demand for monitoring infrastructure + action-classification data, not alignment research.

Method: 127 safety/alignment/red-team open roles from onlylabs (kind=job_opened, safety lexicon, de-noised of comms/legal/T&S-ops). §4 reads the actual JDs. Overlaps the evals report on the eval-harness slice — read both. Counts as of 2026-06-26; linked roles are live.