Announcing the Databricks storage ecosystem: Governing the enterprise data estate, wherever it lives
Captured source
source ↗Announcing the Databricks storage ecosystem: Governing the enterprise data estate, wherever it lives | Databricks Blog Skip to main content
Summary
The challenge: Organizations need to keep vast amounts of data on-premises, in private clouds, and at the edge environments to meet strict data sovereignty and regulatory requirements, maintain low latency at the edge, or handle immense data gravity—all while still bringing modern cloud AI and governance to those environments.
What it is: The Databricks Storage Ecosystem natively connects hybrid and on-premises storage platforms to Databricks using the OpenSharing protocol. This allows organizations to establish centralized data governance and scale GenAI across their entire hybrid infrastructure.
The outcome: Using a zero-copy architecture, enterprises can run Databricks Serverless Compute, Genie, and LLMs directly on their on-premises datasets without copying a single file. This instantly turns isolated data into active, AI-ready assets for advanced use cases, such as training models on classified engineering data or analyzing network telemetry in place.
The Data That Can't Move For years, the enterprise data strategy was simple: move everything to the cloud. Migrate the data lakes and the warehouses to the cloud, and then governance follows. It was a clean story — until it wasn't. Today, some of the world's most sophisticated enterprises are telling us clearly: they cannot — and will not — move all of their data to the cloud. Leading semiconductor manufacturers are training models on engineering-classified datasets that must never leave their premises. Global trading firms sit on massive volumes of historical tick data where the economics of cloud egress make migration impossible. Tier-1 banks have adopted "Hybrid Forever" strategies, modernizing on-premises storage while maintaining strict data sovereignty. Major pharmaceutical companies run millions of daily drug experiments against petabyte-scale on-premises data estates subject to stringent regulatory controls. These aren't edge cases. They represent a structural shift in how enterprises think about data: from "Migrate Everything" to "Govern Everything." The drivers are real and compounding: Data sovereignty & regulation : Financial services, healthcare, and government organizations operate under mandates — GDPR, HIPAA, NIS2, sector-specific data residency rules — that require data to remain within specific jurisdictions or air-gapped environments. Cloud migration is not optional; it is legally prohibited for certain datasets. Data gravity & costs : At petabyte and exabyte scale, the economics of cloud migration break down entirely. Egress fees, storage costs, and sheer data volume make the "move it once" model financially unsustainable. Some of the world's largest retailers are actively repatriating analytics workloads from cloud back to on-premises infrastructure for precisely this reason. Latency & edge workloads : Retail, manufacturing, and telco workloads require low-latency access to on-premises and edge data. Telecommunications providers ingest enormous volumes of network telemetry on-premises daily to power AI-driven network operations that cannot tolerate cloud round-trips. AI on dark data : Vast stores of backup data, unstructured archives, and secondary datasets — representing hundreds of exabytes across the enterprise — contain immense AI value that has never been unlocked because governance didn't reach it.
The signal is unmistakable. We have received requests from hundreds of customers explicitly requesting on-premises and hybrid storage connectivity to Unity Catalog. The Software-Defined Storage (SDS) market stands at hundreds of billions of dollars in 2026, and the enterprise partners who manage this estate — collectively holding more than 2 Zettabytes of data under management — are building with us. Introducing the Databricks Storage Ecosystem Today, we are excited to announce the Databricks Software-Defined Storage (SDS) Ecosystem — a new partner category purpose-built to bring Databricks Intelligence Platform to enterprise data wherever it lives: on-premises, in private clouds, and at the edge environments. If you are an enterprise running petabytes of data on these platforms today, you no longer have to choose between your existing non-cloud storage infrastructure and Databricks AI. For too long, enterprises had to choose between the on-premises storage infrastructure they rely on and the cloud-native AI they want to build. Forcing customers to migrate massive amounts of data using complex pipelines just to unlock that intelligence is a broken model. By uniting these industry-leading partners, we are ending that compromise and delivering Databricks Intelligence directly to where the enterprise data lives. But this launch is just day one. We are building the foundation to ensure that soon, every piece of hybrid data–structured or unstructured–is instantly ready for generative AI without ever copying a byte. — Stephen Orban, SVP, Product Partnerships & Ecosystem, Databricks At the heart of this ecosystem is OpenSharing , an open-source protocol for secure, governed data sharing. Our storage partners are implementing OpenSharing servers to expose their data estates directly to Databricks Serverless Compute. The path is simple: the storage partner stands up a OpenSharing endpoint, you connect it to Unity Catalog, and you instantly gain secure, governed access to your on-premise data in Databricks without data migration. This integration provides a single, unified catalog across your entire hybrid environment. Customers can now use Databricks Serverless Compute , Genie , AgentBricks , and model training to query and reason over data that never leaves the premises. The result? Zero data movement, no duplication of data and zero compliance risk. This is not a roadmap aspiration. Customers can try these integrations today. Partners building these integrations follow the Partner Well-Architected Framework — a technical blueprint covering architecture, security, and certification criteria. Customers want to break down data silos and unify all of their Data and AI estate - including large amounts of data that still sits on-premises. Thanks to on-premises storage partners leveraging the open source Open Sharing protocol, customers can now seamlessly unify, govern, and analyze all of their data estate in Databricks Unity Catalog -…
Excerpt shown — open the source for the full document.