Human data & annotation at the frontier — what the hiring reveals
Charts
Human data & annotation at the frontier: what the hiring reveals
A data person's read of 86 open human-data / annotation / data-quality roles across frontier labs (onlylabs, June 2026), for two audiences: people who want to get hired into data work, and people who want to sell data services to the labs. This is the fuel layer — the labeled, preference, and domain-expert data that post-training and evals run on.
0. The macro read
86 open data roles: OpenAI 28 ≫ Cohere 12 · Anthropic 11 ≫ Databricks 5 · Mistral 5 · Meta 4. Two reads:
1. Human data is now a named platform, not a vendor afterthought. OpenAI staffs Program Manager, Human Data; Anthropic runs a Human Data Platform/Operations/Interface group and a Data Scientist, Supply (i.e. managing the data supply chain). The labs are building the org to buy and manage human data at scale. 2. Cohere is the annotation shop — its 12 roles are Data Annotation Specialists across Safety / Data Science / SWE. This is the most accessible on-ramp into frontier data work.
1. If you want to get hired (as a data person)
- The annotation on-ramp: Cohere's Data Annotation Specialist roles are the entry path — domain labeling, safety annotation, eval data.
- The platform/ops path: OpenAI (Human Data program management) and Anthropic (Human Data Platform/Operations/Interface, Data Scientist Supply) want people who can run a data supply chain — vendor management, quality pipelines, throughput. This is the higher-leverage door.
- The data-science path: Data Scientists embedded in Policy/Preparedness/Supply at OpenAI and Anthropic — measurement and quality for post-training and safety.
Position: if you've run annotation operations, vendor QA, or RLHF/preference-data pipelines, the platform roles are your target — the labs are professionalizing exactly that function.
2. If you want to sell to frontier labs
This is the clearest "buy" signal of any persona. Unlike training systems or evals (built in-house), human data is something the labs structurally buy — they're staffing programs to manage suppliers, not to replace them:
- Managed annotation & domain-expert labeling — the core spend. OpenAI/Anthropic/Cohere all hire to ingest it. Pitch: specialist labeling (code, bio, legal, multilingual), safety/red-team labeling, and the QA layer.
- Preference / RLHF data & synthetic data — post-training's fuel; the RL/post-training hiring (see the evals report) is downstream demand. Pitch: preference data, environment/reward data, synthetic-data generation with quality guarantees.
- Data-quality tooling — the "Supply" and "quality" framing in the JDs = demand for dedup, provenance, contamination-checking, and labeler-quality analytics.
Buy-signal ranking: every top lab here is a buyer; OpenAI and Anthropic are professionalizing the supply function, which means they're choosing vendors and tooling now.
3. The connections
- Anthropic "Data Scientist, Supply" + Human Data Platform + OpenAI "Program Manager, Human Data" ⇒ the labs are building procurement orgs for human data — sell to the platform, and lead with quality + throughput + provenance.
- The 46 RL/post-training roles in the evals report ⇒ structural downstream demand for preference/environment data — the data and the eval/RL-env market are the same flywheel.
4. What the JDs actually say (deep dive)
Read the actual JDs for the top human-data teams.
- OpenAI's Human Data is a strategic, cross-research org. The Program Manager, Human Data JD: "turns human feedback into reliable signals for training and evaluation… remit spans bespoke data campaigns, scalable synthetic data generation, and product-embedded signals… translate these into training datasets, novel evaluations, and feedback loops." This is the procurement-and-production engine — not a vendor afterthought.
- *Cohere builds capability through annotation — the Data Annotation Specialist JD frames each annotator as "responsible for increasing the capabilities of our models*." Annotation = capability work, and the accessible door.
- *Anthropic runs a data supply chain — Data Scientist, Supply* + the Human Data Platform = managing data vendors and quality as an operations function.
What it means: the clearest sell-to of any persona — every top lab staffs a buy-and-manage data org. Lead with (1) bespoke campaign management, (2) synthetic-data generation with quality guarantees, (3) the QA/provenance layer. Get-hired: the platform/supply roles are higher-leverage than line annotation.
Method: 86 human-data / annotation / data-quality open roles from onlylabs (kind=job_opened, data lexicon, de-noised of data-infra/data-platform engineering). §4 reads the actual JDs. Counts as of 2026-06-26; linked roles are live.