What does this repo signal mean?

Amazon (Nova) published amazon-science/SWAN (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo amazon-science/SWAN · language Python · Code for weakly-supervised entity linking by Amazon Science. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Amazon (Nova) Repo: amazon-science/SWAN

Captured source

source ↗

GitHub/github.com/amazon-science/SWAN

amazon-science/SWAN repository metadata

Source ↗

published May 5, 2026seen Jun 5captured Jun 11http 200method plain

amazon-science/SWAN

Description: Code for ACL 2026 Paper "SWAN: Semantic Watermarking with Abstract Meaning Representation"

Language: Python

License: NOASSERTION

Stars: 0

Forks: 0

Open issues: 3

Created: 2026-05-05T17:17:14Z

Pushed: 2026-06-10T23:15:18Z

Default branch: main

Fork: no

Archived: no

README:

SWAN: Semantic Watermarking with Abstract Meaning Representation

This code is being released solely for academic and scientific reproducibility purposes, in support of the methods and findings described in the associated publication. Pull requests are not being accepted in order to maintain the code exactly as it was used in the paper.

This repository contains the code for the ACL 2026 paper:

> SWAN: Semantic Watermarking with Abstract Meaning Representation

SWAN embeds watermark signatures into the semantic structure of sentences using Abstract Meaning Representation (AMR). Because the signature lives in the AMR graph, any paraphrase that preserves meaning automatically preserves the watermark.

Repository Structure

.
├── amr_bank/ # AMR Bank Creation (§3.1)
│ ├── analyze_amr_distribution.py # Build frequency distribution from MASSIVE-AMR corpus
│ ├── build_curated_amr_bank.py # Filter templates by frequency + node count → bank
│ ├── create_examples.py # Generate few-shot examples for the generation prompt
│ ├── display_top_templates.py # Display top-frequency templates
│ ├── banks/ # Pre-built curated AMR banks (50, 100, 500, 800 templates)
│ ├── artifacts/ # Intermediate outputs (full bank, examples, distributions,
│ │ # pre-parsed human AMRs)
│ └── data/ # Place massive_amr.jsonl here (see Setup §4)
│
├── injection/ # Watermark Injection (§3.2)
│ ├── watermark_generation.py # Main watermark injection script (Algorithm 1)
│ └── injection_amr_utils.py # Prompt construction, rejection sampling, S2match
│
├── detection/ # Watermark Detection (§3.3)
│ ├── detect_from_parsed_amrs.py # Detection on pre-parsed AMRs (Algorithm 2)
│ ├── detect_end_to_end.py # End-to-end detection (parse + detect in one pass)
│ ├── detection_utils.py # Z-score computation, AUROC evaluation, ROC plotting
│ ├── parse_machine_text.py # GPU-parallel AMR parsing of machine-generated text
│ ├── parse_human_text.py # GPU-parallel AMR parsing of human text
│ └── evaluate_z_scores.py # Standalone evaluation from saved z-scores
│
├── evaluation/ # Quality & Efficiency Evaluation (§4)
│ ├── text_quality_eval.py # LLM-as-judge text quality evaluation (§4.5)
│ ├── sampling_efficiency.py # Compute sampling efficiency stats (§4.3)
│ └── paraphrase_gen.py # Claude zero-shot paraphrase attack (§4.1)
│
├── utils/ # Shared Utilities
│ ├── amr_utils.py # AMR parse / normalize / template / S2match scoring
│ ├── s2match_patch.py # Our added S2match entry point — append to amr-metric-suite (see Setup §2)
│ ├── text_utils.py # Prompt extraction and text processing
│ ├── bedrock_utils.py # AWS Bedrock API wrapper for LLM inference
│ └── load_c4_realnews_data.py # Download REALNEWS subset from C4
│
├── requirements.txt
├── LICENSE # CC-BY-NC 4.0
└── README.md

Setup

1. Install Python dependencies

pip install -r requirements.txt
python -m spacy download en_core_web_sm

2. Install S2match

SWAN uses S2match (Opitz et al., 2020) for soft semantic similarity between AMR concepts. S2match lives in the amr-metric-suite repo and is not pip-installable. We ship a small patch file (utils/s2match_patch.py) that adds a single string-based entry point (compute_s2match_from_strings) on top of the upstream code. Append it to the upstream s2match.py:

git clone https://github.com/flipz357/amr-metric-suite.git
cat utils/s2match_patch.py >> amr-metric-suite/py3-Smatch-and-S2match/smatch/s2match.py

After this step the directory layout should be:

SWAN/
├── amr-metric-suite/
│ └── py3-Smatch-and-S2match/
│ └── smatch/
│ ├── amr_py3.py # from upstream
│ ├── helpers.py # from upstream
│ └── s2match.py # upstream + our appended compute_s2match_from_strings
├── utils/
│ ├── s2match_patch.py # the patch you appended — keep as reference
│ └── ...
└── ...

utils/amr_utils.py automatically adds amr-metric-suite/py3-Smatch-and-S2match/smatch/ to sys.path and imports s2match from there.

S2match also needs GloVe word vectors:

mkdir -p vectors
cd vectors
wget https://nlp.stanford.edu/data/glove.6B.zip
unzip glove.6B.zip
cd ..
export GLOVE_VECTORS_PATH="vectors/glove.6B.100d.txt"

3. Download the amrlib parsing model

SWAN uses amrlib for AMR parsing. The parsing and generation models need to be downloaded manually following the amrlib model installation guide:

model_parse_xfm_bart_large-v0_1_0 — BART-large AMR parser (used for detection)
model_generate_t5wtense-v0_1_0 — T5 AMR-to-text generator

Download both models from the links on the amrlib models page, then extract them into amrlib's data directory and create the required symlinks:

# Find amrlib's data directory and create it if needed
AMRLIB_DATA=$(python -c "import amrlib; import os; print(os.path.join(os.path.dirname(amrlib.__file__), 'data'))")
mkdir -p $AMRLIB_DATA

# Move downloaded model archives to amrlib's data directory
mv model_parse_xfm_bart_large-v0_1_0.tar.gz $AMRLIB_DATA/
mv model_generate_t5wtense-v0_1_0.tar.gz $AMRLIB_DATA/

# Extract and symlink the parser model
cd $AMRLIB_DATA
tar xzf model_parse_xfm_bart_large-v0_1_0.tar.gz
ln -snf model_parse_xfm_bart_large-v0_1_0 model_stog

# Extract and symlink the generation model
tar xzf model_generate_t5wtense-v0_1_0.tar.gz
ln -snf model_generate_t5wtense-v0_1_0 model_gtos

Verify the parser loads correctly:

cd - # Return to the project root directory
python -c "import amrlib; amrlib.load_stog_model(); print('Parser loaded successfully')"

4. Download the MASSIVE-AMR corpus (optional — for rebuilding the AMR bank)

The pre-built AMR banks are included in amr_bank/banks/. If you want to rebuild them from scratch, download the MASSIVE-AMR corpus and place it in amr_bank/data/:

mkdir -p amr_bank/data
wget -O amr_bank/data/massive_amr.jsonl \...

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

New repo from Amazon Science, likely research model.