- Notifications
You must be signed in to change notification settings - Fork2
duckdb/duckdb-python
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
DuckDB.org |User Guide (Python) -API Docs (Python)
- Simple: DuckDB is easy to install and deploy. It has zero external dependencies and runs in-process in its host application or as a single binary.
- Portable: DuckDB runs on Linux, macOS, Windows, Android, iOS and all popular hardware architectures. It has idiomatic client APIs for major programming languages.
- Feature-rich: DuckDB offers a rich SQL dialect. It can read and write file formats such as CSV, Parquet, and JSON, to and from the local file system and remote endpoints such as S3 buckets.
- Fast: DuckDB runs analytical queries at blazing speed thanks to its columnar engine, which supports parallel execution and can process larger-than-memory workloads.
- Extensible: DuckDB is extensible by third-party features such as new data types, functions, file formats and new SQL syntax. User contributions are available as community extensions.
- Free: DuckDB and its core extensions are open-source under the permissive MIT License. The intellectual property of the project is held by the DuckDB Foundation.
Install the latest release of DuckDB directly fromPyPi:
pip install duckdb
Install with all optional dependencies:
pip install'duckdb[all]'
When you clone the repo or your fork, make sure you initialize the duckdb submodule:
git clone --recurse-submodules<repo>
... or, if you already have the repo locally:
git clone<your-repo>cd<your-repo>git submodule update --init --recursive
If you'll be switching between branches that are have the submodule set to different refs, then make your lifeeasier and add the git hooks in the .githooks directory to your local config:
git config --local core.hooksPath .githooks/
It's good to be aware of the following when creating an editable install:
uv sync
oruv run [tool]
create editable installs by default, however, it work the way you expect. We haveconfigured the project so that scikit-build-core will use a persistent build-dir, but since the build itselfhappens in an isolated, ephemeral environment, cmake's paths will point to non-existing directories. CMake itselfwill be missing.- You should install all development dependencies, and then build the project without build isolation, in two separatesteps. After this you can happily keep building and running, as long as you don't forget to pass in the
--no-build-isolation
flag.
# install all dev dependencies without building the project (needed once)uv sync -p 3.9 --no-install-project# build and install without build isolationuv sync --no-build-isolation
If you're using an IDE then life is a little simpler. You install build dependencies and the project in the twosteps outlined above, and from that point on you can rely on e.g. CLion's cmake capabilities to do incrementalcompilation and editable rebuilds. This will skip scikit-build-core's build backend and all of uv's dependencymanagement, so for "real" builds you better revert to the CLI. However, this should work fine for coding and debugging.
uv cache cleanrm -rf build .venv uv.lock
To build a wheel and sdist for your system and the default Python version:
uv build
To build a wheel for a different Python version:
# E.g. for Python 3.9uv build -p 3.9
Run all pytests:
uv run --no-build-isolation pytest ./tests --verbose
Exclude the test/slow directory:
uv run --no-build-isolation pytest ./tests --verbose --ignore=./tests/slow
Run with coverage (during development you probably want to specify which tests to run):
COVERAGE=1 uv run --no-build-isolation coverage run -m pytest ./tests --verbose
TheCOVERAGE
env var will compile the extension with--coverage
, allowing us to collect coverage stats of C++code as well as Python code.
Check coverage for Python code:
uvx coverage html -d htmlcov-pythonuvx coverage report --format=markdown
Check coverage for C++ code (note: this will clutter your project dir with html files, consider saving them in someother place):
uvx gcovr \ --gcov-ignore-errors all \ --root"$PWD" \ --filter"${PWD}/src/duckdb_py" \ --exclude'.*/\.cache/.*' \ --gcov-exclude'.*/\.cache/.*' \ --gcov-exclude'.*/external/.*' \ --gcov-exclude'.*/site-packages/.*' \ --exclude-unreachable-branches \ --exclude-throw-branches \ --html --html-details -o coverage-cpp.html \ build/coverage/src/duckdb_py \ --print-summary
- We're not running any mypy typechecking tests at the moment
- We're not running any ruff / linting / formatting at the moment
You can run cibuildwheel locally for linux. E.g. limited to Python 3.9:
CIBW_BUILD='cp39-*' uvx cibuildwheel --platform linux.
- Follow theGoogle Python styleguide
- See the section onComments and Docstrings
This codebase is developed with the following tools:
- Astral UV - for dependency management across all platforms we provide wheels for,and for Python environment management. It will be hard to work on this codebase without having UV installed.
- Scikit-build-core - the build backend forbuilding the extension. On the background, scikit-build-core uses cmake and ninja for compilation.
- pybind11 - a bridge between C++ and Python.
- CMake - the build system for both DuckDB itself and the DuckDB Python module.
- Cibuildwheel
- Checkout main2Identify the merge commits that brought in tags to main:
git log --graph --oneline --decorate main --simplify-by-decoration
- Get the log of commits
git log --oneline 71c5c07cdd..c9254ecff2 -- tools/pythonpkg/
- Checkout v1.3-ossivalis
- Get the log of commits
git log --oneline v1.3.0..v1.3.1 -- tools/pythonpkg/
git diff --name-status 71c5c07cdd c9254ecff2 -- tools/pythonpkg/
git log --oneline 71c5c07cdd..c9254ecff2 -- tools/pythonpkg/git diff --name-status<HASH_A><HASH_B> -- tools/pythonpkg/
The DuckDB Python package versioning and release scheme follows that of DuckDB itself. This means that aX.Y.Z[. postN]
release of the Python package ships the DuckDB stable releaseX.Y.Z
. The optional.postN
releases ship the same stable release of DuckDB as their predecessors plus Python package-specific fixes and / or features.
Types | DuckDB Version | Resulting Python Extension Version |
---|---|---|
Stable release: DuckDB stable release | 1.3.1 | 1.3.1 |
Stable post release: DuckDB stable release + Python fixes and features | 1.3.1 | 1.3.1.postX |
Nightly micro: DuckDB next micro nightly + Python next micro nightly | 1.3.2.devM | 1.3.2.devN |
Nightly minor: DuckDB next minor nightly + Python next minor nightly | 1.4.0.devM | 1.4.0.devN |
Note that we do not ship nightly post releases (e.g. we don't ship1.3.1.post2.dev3
).
We cut releases as follows:
Type | Tag | How |
---|---|---|
Stable minor release | vX.Y.0 | Adding a tag onmain |
Stable micro release | vX.Y.Z | Adding a tag on a minor release branch (e.g.v1.3-ossivalis ) |
Stable post release | vX.Y.Z-postN | Adding a tag on a post release branch (e.g.v1.3.1-post ) |
Nightly micro | not tagged | Combining HEAD of themicro release branches of DuckDB and the Python package |
Nightly minor | not tagged | Combining HEAD of theminor release branches of DuckDB and the Python package |
We cut a newstable minor release with the following steps:
- Create a PR on
main
to pin the DuckDB submodule to the tag of its current release. - Iff all tests pass in CI, merge the PR.
- Manually start the release workflow with the hash of this commit, and the tag name.
- Iff all goes well, create a new PR to let the submodule track DuckDB main.
We cut a newstable micro release with the following steps:
- Create a PR on the minor release branch to pin the DuckDB submodule to the tag of its current release.
- Iff all tests pass in CI, merge the PR.
- Manually start the release workflow with the hash of this commit, and the tag name.
- Iff all goes well, create a new PR to let the submodule track DuckDB's minor release branch.
We cut a newstable post release with the following steps:
- Create a PR on the post release branch to pin the DuckDB submodule to the tag of its current release.
- Iff all tests pass in CI, merge the PR.
- Manually start the release workflow with the hash of this commit, and the tag name.
- Iff all goes well, create a new PR to let the submodule track DuckDB's minor release branch.
The package usessetuptools_scm
withscikit-build
for automatic version determination, and implements a customversioning scheme.
pyproject.toml configuration:
[tool.scikit-build]metadata.version.provider ="scikit_build_core.metadata.setuptools_scm"[tool.setuptools_scm]version_scheme ="duckdb_packaging._setuptools_scm_version:version_scheme"
Environment variables:
MAIN_BRANCH_VERSIONING=0
: Use release branch versioning (patch increments)MAIN_BRANCH_VERSIONING=1
: Use main branch versioning (minor increments)OVERRIDE_GIT_DESCRIBE
: Override version detection
About
The DuckDB Python package
Resources
License
Uh oh!
There was an error while loading.Please reload this page.