databricks/zerobus-sdk
Rust
Captured source
source ↗databricks/zerobus-sdk
Description: Databricks's Zerobus Ingest SDKs
Language: Rust
License: Apache-2.0
Stars: 78
Forks: 18
Open issues: 40
Created: 2025-09-03T13:11:21Z
Pushed: 2026-06-09T12:16:09Z
Default branch: main
Fork: no
Archived: no
README:
Zerobus SDKs
Monorepo for Databricks Zerobus Ingest SDKs.
Disclaimer
GA: This SDK is generally available and supported for production use cases. Minor and patch version updates will not contain breaking changes. Major version updates may include breaking changes.
We are keen to hear feedback from you. Please file issues, and we will address them.
What is Zerobus?
Zerobus is a high-throughput streaming service for direct data ingestion into Databricks Delta tables, optimized for real-time data pipelines and high-volume workloads.
SDKs
| Language | Directory | Package | |----------|-----------|---------| | Rust | [rust/](rust/) | `databricks-zerobus-ingest-sdk` | | Python | [python/](python/) | `databricks-zerobus-ingest-sdk` | | Go | [go/](go/) | `github.com/databricks/zerobus-sdk/go` | | TypeScript | [typescript/](typescript/) | `@databricks/zerobus-ingest-sdk` | | Java | [java/](java/) | `com.databricks:zerobus-ingest-sdk` |
Platform Support
We try to provide prebuilt native binaries for the following platforms:
| Platform | Architecture | |----------|-------------| | Linux | x86_64 | | Linux | aarch64 | | Windows | x86_64 | | macOS | x86_64 | | macOS | aarch64 (Apple Silicon) |
> Note: We do not currently have macOS CI runners, so macOS binaries are built locally and may not be available for every SDK or release. If your platform is not supported or you encounter compatibility issues, you can [build from source](CONTRIBUTING.md) or file an issue.
Prerequisites
Before using any SDK, you need the following:
1. Workspace URL and Workspace ID
After logging into your Databricks workspace, look at the browser URL:
https://.cloud.databricks.com/o=
- Workspace URL: The part before
/o=(e.g.,https://dbc-a1b2c3d4-e5f6.cloud.databricks.com) - Workspace ID: The part after
/o=(e.g.,1234567890123456)
> Note: The examples above show AWS endpoints (.cloud.databricks.com). For Azure deployments, the workspace URL will be https://.azuredatabricks.net.
2. Create a Delta Table
Create a table using Databricks SQL:
CREATE TABLE .default. ( device_name STRING, temp INT, humidity BIGINT ) USING DELTA;
Replace ` with your catalog name (e.g., main`).
3. Create a Service Principal
1. Navigate to Settings > Identity and Access in your Databricks workspace 2. Click Service principals and create a new service principal 3. Generate a new secret for the service principal and save it securely 4. Grant the following permissions:
USE_CATALOGon the catalog (e.g.,main)USE_SCHEMAon the schema (e.g.,default)MODIFYandSELECTon the table
Grant permissions using SQL:
-- Grant catalog permission GRANT USE CATALOG ON CATALOG TO ``; -- Grant schema permission GRANT USE SCHEMA ON SCHEMA .default TO ``; -- Grant table permissions GRANT SELECT, MODIFY ON TABLE .default. TO ``;
The service principal's Application ID is your OAuth Client ID, and the generated secret is your Client Secret.
Ingestion APIs
Pick the API that matches your data.
Standard gRPC ingestion
Supported by all SDKs. Records are sent over a gRPC stream in one of two serialization formats:
- JSON - Simple, schema-free ingestion. Pass a JSON string or native object (dict, map, etc.) and the SDK serializes it. No compilation step required. Good for getting started or dynamic schemas.
- Protocol Buffers - Strongly-typed, schema-validated ingestion. More efficient over the wire. Recommended for production workloads.
Protocol Buffers
Use proto2 syntax with optional fields to correctly represent nullable Delta table columns.
##### Delta → Protobuf Type Mappings
| Delta Type | Proto2 Type | |-----------|-------------| | TINYINT, BYTE, INT, SMALLINT, SHORT | int32 | | BIGINT, LONG | int64 | | FLOAT | float | | DOUBLE | double | | STRING, VARCHAR | string | | BOOLEAN | bool | | BINARY | bytes | | DATE | int32 | | TIMESTAMP, TIMESTAMP_NTZ | int64 | | ARRAY\ | repeated type | | MAP\ | map\ | | STRUCT\ | nested message | | VARIANT | string (JSON string) |
Schema Generation
Instead of writing .proto files by hand, each SDK ships a tool to generate protobuf schemas directly from an existing Unity Catalog table. See the individual SDK READMEs for language-specific usage.
Arrow Flight ingestion (Beta)
Supported by all SDKs starting from version 2.0.0. Currently in Beta — the API is stabilising but may still change before reaching GA. A third record format option alongside JSON and Protocol Buffers: send Apache Arrow RecordBatch data directly to Zerobus over the Arrow Flight protocol, on the same gRPC connection. Best fit when:
- Your workload is naturally columnar or batched — analytics pipelines, gateways aggregating short windows of rows, wide/numeric schemas where row-by-row serialization adds noticeable CPU overhead.
- Your application already produces Arrow data — pyarrow, the arrow-rs crates, DataFusion, Polars, or other libraries built on Arrow.
For sparse, one-row-at-a-time traffic, JSON or Protocol Buffers over the standard SDK gRPC path are usually simpler. See each SDK's examples/arrow/ directory for usage.
HTTP Proxy Support
All SDKs support HTTP CONNECT proxies via environment variables, following gRPC core conventions. The first variable found (in order) is used:
| Proxy | No-proxy | |-------|----------| | grpc_proxy / GRPC_PROXY | no_grpc_proxy / NO_GRPC_PROXY | | https_proxy / HTTPS_PROXY | no_proxy / NO_PROXY | | http_proxy / HTTP_PROXY | |
The no_proxy value is a comma-separated list of hostnames (suffix-matched) or * to...
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10New SDK from Databricks, moderate stars