What does this repo signal mean?

OpenAI published openai/privacy-filter (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo openai/privacy-filter · language Python · New OpenAI repo with 2.4k stars, notable but not flagship. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Safety and policy in the data-business radar.

OpenAI Repo: openai/privacy-filter

Captured source

source ↗

GitHub/github.com/openai/privacy-filter

openai/privacy-filter repository metadata

Source ↗

published Apr 17, 2026seen 6dcaptured 11hhttp 200method plain

openai/privacy-filter

Description: OpenAI Privacy Filter

Language: Python

License: Apache-2.0

Stars: 2425

Forks: 211

Open issues: 32

Created: 2026-04-17T22:49:09Z

Pushed: 2026-04-22T19:55:02Z

Default branch: main

Fork: no

Archived: no

README:

OpenAI Privacy Filter

OpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitization workflows where teams need a model that they can run on-premises that is fast, context-aware, and tunable.

OpenAI Privacy Filter is pretrained autoregressively to arrive at a checkpoint with similar architecture to gpt-oss, albeit of a smaller size. We then converted that checkpoint into a bidirectional token classifier over a privacy label taxonomy, and post-trained with a supervised classification loss. (For architecture details about gpt-oss, please see the gpt-oss model card.) Instead of generating text token-by-token, this model labels an input sequence in a single forward pass, then decodes coherent spans with a constrained Viterbi procedure. For each input token, the model predicts a probability distribution over the label taxonomy which consists of 8 output categories described below.

Highlights:

Permissive Apache 2.0 license: ideal for experimentation, customization, and commercial deployment.
Small size: Runs in a web browser or on a laptop – 1.5B parameters total and 50M active parameters.
Fine-tunable: Adapt the model to specific data distributions through easy and data efficient finetuning.
Long-context: 128,000-token context window enables processing long text with high throughput and no chunking.
Runtime control: configure precision/recall tradeoffs and detected span lengths through preset operating points.

This Repo

This repository contains the local code, CLI, and example assets used to run, evaluate, and finetune Privacy Filter checkpoints. It is meant for teams that want to inspect the implementation directly and operate the model in their own environment.

Repository resources: [License](LICENSE) and [Security Policy](SECURITY.md).

How To Use

1. Install the package locally:

pip install -e .

After this, you will have a python script opf that can be run directly or via python -m opf. The script can be used in 3 separate ways, as described below.

2. Run one-shot redaction:

By default, opf looks for a model at the directory pointed to by the OPF_CHECKPOINT variable, or ~/.opf/privacy_filter. If a model is not found in the ~/.opf/privacy_filter location, it will be downloaded.

opf "Alice was born on 1990-01-02."

The code supports running both on GPU (by default) and CPU. To run on CPU, use --device cpu flag:

opf --device cpu "Alice was born on 1990-01-02."

To override the default checkpoint, pass --checkpoint:

opf --checkpoint /path/to/checkpoint_dir "Alice was born on 1990-01-02."

The redaction mode supports redacting an entire file at once

opf -f /path/to/file

The redaction can also be performed via pipes, to support complex one-liners:

cat /path/to/file | grep -e 'some_pattern' | opf

If no input is provided, opf will start in interactive mode. In this mode, for each input example, the CLI prints structured JSON output, using ANSI color-coded previews if the terminal supports them. These options can be controlled by flags.

Consult opf redact --help for more flags and information about the redaction mode.

3. Run eval on a labeled dataset:

opf eval examples/data/sample_eval_five_examples.jsonl

The sample eval fixtures under examples/data/sample_eval_five_examples*.jsonl are synthetic example data only and do not describe real people or real sensitive records. See examples/data/README.md.

Consult opf eval --help for more flags and information about the evaluation mode.

4. Finetune on your own labeled dataset:

opf train /path/to/train.jsonl --output-dir /path/to/finetuned_checkpoint

Consult opf train --help for more flags and information about the finetuning mode.

Structure

opf/__main__.py: unified CLI entrypoint for redact, eval, and train modes.
opf/_api.py: Python-facing API over the runtime and decoding stack.
opf/_cli/: command-line argument parsing and terminal rendering helpers.
opf/_core/: runtime loading, span conversion, and shared decoding logic.
opf/_eval/: dataset loading, preprocessing, metrics, and evaluation runners.
opf/_train/: local finetuning argument parsing and training runners.
opf/_model/: transformer implementation, checkpoint config, and weight loading.
examples/data/: sample eval files plus reproducible finetuning demo datasets.
examples/scripts/finetuning/: runnable finetuning demo harnesses.
FINETUNING.md: focused finetuning workflow and demo-script guide.
OUTPUT_SCHEMAS.md: JSON response and export payload formats.
EVAL_AND_OUTPUT_MODES.md: description of the output modes for redaction and evaluation.

Model Details

Model Description

Privacy Filter is a bidirectional token classification model with span decoding. It is trained in phases, beginning with autoregressive pretraining. The pretrained language model is then modified and post-trained as a bidirectional banded attention token classifier with band size 128 (effective attention window: 257 tokens including self). This means:

The base model is an autoregressive pretrained checkpoint.
The language-model output head is replaced with a token-classification head over privacy labels.
Post-training is supervised token-level classification rather than next-token prediction.
Inference applies constrained sequence decoding to produce coherent BIOES (Begin, Inside, Outside, End, Single) span labels.

Architecturally, the implementation in this repo is a pre-norm transformer encoder-style stack with:

token embeddings
8 repeated transformer blocks
grouped-query attention with rotary positional embeddings, with 14 query heads and 2 KV heads (group size = 7 queries per KV head)
sparse mixture-of-experts feed-forward blocks with 128 experts total (top-4 routing per token)
a final token-classification head over privacy labels (rather than natural language vocabulary tokens), with residual stream width d_model = 640.

Relative to iterative autoregressive approaches, this…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

New OpenAI repo with 2.4k stars, notable but not flagship