SAM 3.1: Faster and More Accessible Real-Time Video Detection and Tracking With Multiplexing and Global Reasoning
Captured source
source ↗SAM 3.1: Faster and More Accessible Real-Time Video Detection and Tracking With Multiplexing and Global Reasoning
Products AI Research Resources About Get Llama Try Meta AI
FEATURED Computer Vision SAM 3.1: Faster and More Accessible Real-Time Video Detection and Tracking With Multiplexing and Global Reasoning March 27, 2026 • 15 minute read
Update March 27, 2026:
We’ve seen incredible adoption of SAM 3 over the last few months, and during that time, we’ve been working behind the scenes on updates to improve video processing efficiency. Today, we’re pleased to introduce SAM 3.1. As a drop-in replacement for SAM 3, our updated model delivers a significant boost in video processing efficiency by introducing object multiplexing, which allows the model to track up to 16 objects in a single forward pass. This innovation doubles the processing speed for videos with a medium number of objects, increasing throughput from 16 to 32 frames per second on a single H100 GPU. As a result, SAM 3.1 enables real-time object tracking in complex videos while reducing overall GPU resource requirements, making high-performance applications feasible on smaller, more accessible hardware.
This improvement comes from a shift in how the model handles multiple objects. Previously, each object required its own dedicated pass, but with multiplexing, SAM 3.1 processes all tracked objects together, eliminating redundant computation and memory bottlenecks. This global reasoning approach streamlines performance and enhances accuracy in crowded scenes. We encourage the community to download the SAM 3.1 model checkpoint, explore the updates to the SAM 3 codebase and research paper, and test drive the updated model on the Segment Anything Playground.
SAM 3.1 Model Checkpoint
SAM 3 Codebase
SAM 3 Research Paper
Explore the Playground
Introducing Meta Segment Anything Model 3 and Segment Anything Playground
Takeaways:
We’re announcing Meta Segment Anything Model 3 (SAM 3) , a unified model for detection, segmentation, and tracking of objects in images and video using text, exemplar, and visual prompts. As part of this release, we’re sharing SAM 3 model checkpoints, evaluation datasets, and fine-tuning code . We’re also introducing Segment Anything Playground , a new platform that makes it easy for anyone to understand the capabilities of SAM and experiment with cutting-edge AI models for creative media modification. In Edits , Instagram’s video creation app, SAM 3 will soon enable new effects that creators can apply to specific people or objects in their videos. New creation experiences enabled by SAM 3 will also be coming to Vibes on the Meta AI app and meta.ai on the web. Separately, we’re sharing SAM 3D, a suite of open source models, code, and data for 3D objects and human reconstruction from a single image, setting a new standard for grounded 3D reconstruction in physical world scenarios. Learn more by reading the SAM 3D blog post . SAM 3 and SAM 3D are powering Facebook Marketplace’s new View in Room feature, helping people visualize the style and fit of home decor items, like a lamp or a table, in their spaces before purchasing. Together with our partners at Conservation X Labs and Osa Conservation , we’re also launching a first-of-its-kind, publicly available video dataset for wildlife monitoring using SAM 3.
We’re unveiling the next generation of the Segment Anything collection of models, advancing image, and video understanding. Segment Anything Model 3 (SAM 3) introduces some of our most highly requested features like text and exemplar prompts — enabling detection, segmentation, and tracking of any visual concept across images and video. We also want to make it easier for more people to use our models. As part of this release, we’re debuting the Segment Anything Playground , the simplest way for anyone to experiment with applying our state-of-the-art models to media modification. Today, we’re releasing the SAM 3 model weights, a demo on Segment Anything Playground, and a research paper that details how we built SAM 3. Additionally, we’re sharing the Segment Anything with Concepts (SA-Co) evaluation dataset to serve as a new benchmark for the community. Separately, we’re sharing SAM 3D, which includes a model for object and scene reconstruction and another for human pose and shape estimation. More information about this release can be found in our SAM 3D blog post . At Meta, we’re using these advancements to help build the next generation of creative media tools. SAM 3 and SAM 3D are being used to enable the new View in Room feature on Facebook Marketplace, helping people visualize the style and fit of home decor items, like a lamp or a table, in their spaces before purchasing. New creation experiences enabled by SAM 3 will be coming to Vibes on the Meta AI app and meta.ai on the web, where people can use AI visual creation tools and remix existing AI-generated videos. We’ll also soon be introducing new effects on our Edits app that use SAM 3. Creators can apply dynamic effects to people or objects in their videos — simplifying a complex editing workflow to just one tap.
Introducing Meta Segment Anything Model 3
Linking language to specific visual elements in images or videos is a major challenge in computer vision. Traditional models often focus on object segmentation with a fixed set of text labels, restricting their ability to address the full spectrum of user requests, which frequently involve segmenting concepts not present in predefined lists. This means that existing models can segment frequent concepts like “person,” but struggle with more nuanced concepts like “the striped red umbrella”.
SAM 3 overcomes these limitations by introducing the promptable concept segmentation capability: finding and segmenting all instances of a concept defined by a text or exemplar prompt. SAM 3 accepts text prompts — open-vocabulary short noun phrases — and image exemplar prompts, eliminating the constraints of fixed label sets. To assess large-vocabulary detection and segmentation performance, we created the Segment Anything with Concepts (SA-Co) benchmark for promptable concept segmentation in images and videos that challenges models to recognize a much larger vocabulary of concepts compared to prior benchmarks. As part of this release, we’re making SA-Co publicly available to support reproducibility and further innovation in open-ended visual segmentation.
SAM 3 supports a variety of prompt...
Excerpt shown — open the source for the full document.
Notability
notability 8.0/10Updated Meta video segmentation model with multiplexing, likely high traction.