Snowflake-Labs/snowflake-snowpark-data-sources
forked from sfc-gh-qding/snowflake-snowpark-data-sources
Captured source
source ↗Snowflake-Labs/snowflake-snowpark-data-sources
Description: Pip-installable DB-API 2.0 wrappers for Snowpark session.read.dbapi (staging for Snowflake-Labs)
License: Apache-2.0
Stars: 0
Forks: 0
Open issues: 0
Created: 2026-06-22T05:08:45Z
Pushed: 2026-05-27T06:01:09Z
Default branch: main
Fork: yes
Parent repository: sfc-gh-qding/snowflake-snowpark-data-sources
Archived: no
README:
snowflake-snowpark-data-sources
DB-API 2.0 wrappers that let Snowpark's session.read.dbapi(...) reader ingest from non-DB-API data sources in parallel.
> Status: Experimental. Staged under a personal account pending move > to `Snowflake-Labs` after OSS review.
How it works
Snowpark's `session.read.dbapi()` can ingest from any PEP 249 compliant Python DB-API. Many time-series, graph, and document databases ship Python clients that are *not* DB-API compliant. Each subpackage here is a thin wrapper that exposes connect(), Connection, Cursor, and description in the shape Snowpark expects, so you get parallel, partition-aware ingestion into Snowflake without writing a custom reader.
Install
The package is namespaced under snowflake.* and ships each source behind an optional dependency. Install only the source you need:
# from VCS during the staging phase pip install "snowflake-snowpark-data-sources[influxdb_v1] @ git+https://github.com/sfc-gh-qding/snowflake-snowpark-data-sources"
Sources
| Source | Extra | Import | Status | |---|---|---|---| | InfluxDB v1 | influxdb_v1 | from snowflake.snowpark_data_sources import influxdb | Experimental |
See [examples/influxdb/](examples/influxdb/) for a runnable script.
Quick start (InfluxDB)
from snowflake.snowpark import Session
from snowflake.snowpark_data_sources import influxdb as idb
def create_influx_connection():
return idb.connect(host="...", port=8086, username="...", password="...", database="my_db")
session = Session.builder.configs({...}).create()
schema = idb.get_influxdb_schema(create_influx_connection, "my_db.autogen.cpu_usage")
df = session.read.dbapi(
create_connection=create_influx_connection,
table="my_db.autogen.cpu_usage",
custom_schema=schema,
predicates=[
"time '2025-09-15T22:40:49Z'",
],
max_workers=4,
fetch_size=50000,
)
df.write.save_as_table("influx_cpu_usage", mode="overwrite")Writing a new wrapper
To add support for another source, see [docs/how-to-write-a-wrapper.md](docs/how-to-write-a-wrapper.md) for the minimal DB-API surface area Snowpark needs and the four common gotchas.
Background
The pattern is documented in the Snowflake Data Engineering session DE226: Connect InfluxDB to Snowflake -- How Capella Space Uses Snowpark Data Source APIs.
License
Apache License 2.0 -- see [LICENSE](LICENSE).
Contributing
Contributions require signing the Snowflake CLA. Open issues and PRs against this repo; the cla-bot will guide you through signing on your first PR.
Excerpt shown — open the source for the full document.
Notability
notability 1.0/10Routine fork by Snowflake