openai/privacy-filter
Python
Captured source
source ↗openai/privacy-filter
Description: OpenAI Privacy Filter
Language: Python
License: Apache-2.0
Stars: 2425
Forks: 211
Open issues: 32
Created: 2026-04-17T22:49:09Z
Pushed: 2026-04-22T19:55:02Z
Default branch: main
Fork: no
Archived: no
README:
OpenAI Privacy Filter
OpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitization workflows where teams need a model that they can run on-premises that is fast, context-aware, and tunable.
OpenAI Privacy Filter is pretrained autoregressively to arrive at a checkpoint with similar architecture to gpt-oss, albeit of a smaller size. We then converted that checkpoint into a bidirectional token classifier over a privacy label taxonomy, and post-trained with a supervised classification loss. (For architecture details about gpt-oss, please see the gpt-oss model card.) Instead of generating text token-by-token, this model labels an input sequence in a single forward pass, then decodes coherent spans with a constrained Viterbi procedure. For each input token, the model predicts a probability distribution over the label taxonomy which consists of 8 output categories described below.
Highlights:
- Permissive Apache 2.0 license: ideal for experimentation, customization, and commercial deployment.
- Small size: Runs in a web browser or on a laptop – 1.5B parameters total and 50M active parameters.
- Fine-tunable: Adapt the model to specific data distributions through easy and data efficient finetuning.
- Long-context: 128,000-token context window enables processing long text with high throughput and no chunking.
- Runtime control: configure precision/recall tradeoffs and detected span lengths through preset operating points.
This Repo
This repository contains the local code, CLI, and example assets used to run, evaluate, and finetune Privacy Filter checkpoints. It is meant for teams that want to inspect the implementation directly and operate the model in their own environment.
Repository resources: [License](LICENSE) and [Security Policy](SECURITY.md).
How To Use
1. Install the package locally:
pip install -e .
After this, you will have a python script opf that can be run directly or via python -m opf. The script can be used in 3 separate ways, as described below.
2. Run one-shot redaction:
By default, opf looks for a model at the directory pointed to by the OPF_CHECKPOINT variable, or ~/.opf/privacy_filter. If a model is not found in the ~/.opf/privacy_filter location, it will be downloaded.
opf "Alice was born on 1990-01-02."
The code supports running both on GPU (by default) and CPU. To run on CPU, use --device cpu flag:
opf --device cpu "Alice was born on 1990-01-02."
To override the default checkpoint, pass --checkpoint:
opf --checkpoint /path/to/checkpoint_dir "Alice was born on 1990-01-02."
The redaction mode supports redacting an entire file at once
opf -f /path/to/file
The redaction can also be performed via pipes, to support complex one-liners:
cat /path/to/file | grep -e 'some_pattern' | opf
If no input is provided, opf will start in interactive mode. In this mode, for each input example, the CLI prints structured JSON output, using ANSI color-coded previews if the terminal supports them. These options can be controlled by flags.
Consult opf redact --help for more flags and information about the redaction mode.
3. Run eval on a labeled dataset:
opf eval examples/data/sample_eval_five_examples.jsonl
The sample eval fixtures under examples/data/sample_eval_five_examples*.jsonl are synthetic example data only and do not describe real people or real sensitive records. See examples/data/README.md.
Consult opf eval --help for more flags and information about the evaluation mode.
4. Finetune on your own labeled dataset:
opf train /path/to/train.jsonl --output-dir /path/to/finetuned_checkpoint
Consult opf train --help for more flags and information about the finetuning mode.
Structure
opf/__main__.py: unified CLI entrypoint for redact, eval, and train modes.opf/_api.py: Python-facing API over the runtime and decoding stack.opf/_cli/: command-line argument parsing and terminal rendering helpers.opf/_core/: runtime loading, span conversion, and shared decoding logic.opf/_eval/: dataset loading, preprocessing, metrics, and evaluation runners.opf/_train/: local finetuning argument parsing and training runners.opf/_model/: transformer implementation, checkpoint config, and weight loading.examples/data/: sample eval files plus reproducible finetuning demo datasets.examples/scripts/finetuning/: runnable finetuning demo harnesses.FINETUNING.md: focused finetuning workflow and demo-script guide.OUTPUT_SCHEMAS.md: JSON response and export payload formats.EVAL_AND_OUTPUT_MODES.md: description of the output modes for redaction and evaluation.
Model Details
Model Description
Privacy Filter is a bidirectional token classification model with span decoding. It is trained in phases, beginning with autoregressive pretraining. The pretrained language model is then modified and post-trained as a bidirectional banded attention token classifier with band size 128 (effective attention window: 257 tokens including self). This means:
- The base model is an autoregressive pretrained checkpoint.
- The language-model output head is replaced with a token-classification head over privacy labels.
- Post-training is supervised token-level classification rather than next-token prediction.
- Inference applies constrained sequence decoding to produce coherent BIOES (Begin, Inside, Outside, End, Single) span labels.
Architecturally, the implementation in this repo is a pre-norm transformer encoder-style stack with:
- token embeddings
- 8 repeated transformer blocks
- grouped-query attention with rotary positional embeddings, with 14 query heads and 2 KV heads (group size = 7 queries per KV head)
- sparse mixture-of-experts feed-forward blocks with 128 experts total (top-4 routing per token)
- a final token-classification head over privacy labels (rather than natural language vocabulary tokens), with residual stream width
d_model = 640.
Relative to iterative autoregressive approaches, this…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10New OpenAI repo with 2.4k stars, notable but not flagship