Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

DuckDB-powered Postgres for high performance apps & analytics.

License

NotificationsYou must be signed in to change notification settings

duckdb/pg_duckdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pg_duckdb logo

0.3.0 release is here 🎉
Please
try it out!

pg_duckdb: Official Postgres extension for DuckDB

pg_duckdb is a Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres. We recommend using pg_duckdb to build high performance analytics and data-intensive applications.

pg_duckdb was developed in collaboration with our partners,Hydra andMotherDuck.

Try It Out

An easy way to try pg_duckdb is using the Hydra python package. Try now locally or deploy to the cloud:

pip install hydra-clihydra

Features

See ourofficial documentation for further details.

  • SELECT queries executed by the DuckDB engine can directly read Postgres tables. (If you only query Postgres tables you need to runSET duckdb.force_execution TO true, see theIMPORTANT section above for details)
    • Able to readdata types that exist in both Postgres and DuckDB. The following data types are supported: numeric, character, binary, date/time, boolean, uuid, json, domain, and arrays.
    • If DuckDB cannot support the query for any reason, execution falls back to Postgres.
  • Read and Write support for object storage (AWS S3, Azure, Cloudflare R2, or Google GCS):
    • Read parquet, CSV and JSON files:
      • SELECT * FROM read_parquet('s3://bucket/file.parquet')
      • SELECT r['id'], r['name'] FROM read_csv('s3://bucket/file.csv') r
      • SELECT count(*) FROM read_json('s3://bucket/file.json')
      • You can pass globs and arrays to these functions, just like in DuckDB
    • Enable the DuckDB Iceberg extension usingSELECT duckdb.install_extension('iceberg') and read Iceberg files withiceberg_scan.
    • Enable the DuckDB Delta extension usingSELECT duckdb.install_extension('delta') and read Delta files withdelta_scan.
    • Write a query — or an entire table — to parquet in object storage.
      • COPY (SELECT foo, bar FROM baz) TO 's3://...'

      • COPY table TO 's3://...'

      • Read and write to Parquet format in a single query

         COPY (SELECTcount(*), r['name']FROM read_parquet('s3://bucket/file.parquet') rGROUP BY nameORDER BY countDESC ) TO's3://bucket/results.parquet';
  • Read and Write support for data stored in MotherDuck
  • Query andJOIN data in object storage/MotherDuck with Postgres tables, views, and materialized views.
  • Create temporary tables in DuckDB its columnar storage format usingCREATE TEMP TABLE ... USING duckdb.
  • Install DuckDB extensions usingSELECT duckdb.install_extension('extension_name');
  • Toggle DuckDB execution on/off with a setting:
    • SET duckdb.force_execution = true|false
  • Cache remote object locally for faster execution usingSELECT duckdb.cache('path', 'type'); where
    • 'path' is HTTPFS/S3/GCS/R2 remote object
    • 'type' specify remote object type: 'parquet' or 'csv'

Installation

Docker

Docker images areavailable on Dockerhub and are based on the official Postgres image. Use of this image isthe same as the Postgres image. For example, you can run the image directly:

docker run -d -e POSTGRES_PASSWORD=duckdb pgduckdb/pgduckdb:16-main

And with MotherDuck, you only need aa MotherDuck access token and then it is as simple as:

$export MOTHERDUCK_TOKEN=<your personal MD token>$ docker run -d -e POSTGRES_PASSWORD=duckdb -e MOTHERDUCK_TOKEN pgduckdb/pgduckdb:16-main

Or you can use the docker compose in this repo:

git clone https://github.com/duckdb/pg_duckdb&&cd pg_duckdb&& docker compose up -d

Once started, connect to the database using psql:

psql postgres://postgres:duckdb@127.0.0.1:5432/postgres# Or if using docker composedocker composeexec db psql

For other usages see ourDocker specific README.

pgxman (apt)

Pre-built apt binaries areavailable via pgxman. After installation, you will need to add pg_duckdb toshared_preload_libraries and create the extension.

pgxman install pg_duckdb

Note: due to the use ofshared_preload_libraries, pgxman's container support is not currently compatible with pg_duckdb.

Compile from source

To build pg_duckdb, you need:

To build and install, run:

make install

Addpg_duckdb to theshared_preload_libraries in yourpostgresql.conf file:

shared_preload_libraries ='pg_duckdb'

Next, create thepg_duckdb extension:

CREATE EXTENSION pg_duckdb;

IMPORTANT: DuckDB execution is usually enabled automatically when needed. It's enabled whenever you use DuckDB functions (such asread_csv), when you query DuckDB tables, and when runningCOPY table TO 's3://...'. However, if you want queries which only touch Postgres tables to use DuckDB execution you need to runSET duckdb.force_execution TO true'. This feature isopt-in to avoid breaking existing queries. To avoid doing that for every session, you can configure it for a certain user by doingALTER USER my_analytics_user SET duckdb.force_execution TO true.

Getting Started

See ourofficial documentation for more usage information.

pg_duckdb relies on DuckDB's vectorized execution engine to read and write data to object storage bucket (AWS S3, Azure, Cloudflare R2, or Google GCS) and/or MotherDuck. The follow two sections describe how to get started with these destinations.

Object storage bucket (AWS S3, Azure, Cloudflare R2, or Google GCS)

Querying data stored in Parquet, CSV, JSON, Iceberg and Delta format can be done withread_parquet,read_csv,read_json,iceberg_scan anddelta_scan respectively.

  1. Add a credential to enable DuckDB's httpfs support.

    SELECTduckdb.create_simple_secret(type          :='S3',-- Type: one of (S3, GCS, R2)key_id        :='access_key_id',secret        :='xxx',session_token :='yyy',-- (optional)region        :='us-east-1',-- (optional)url_style     :='xxx',-- (optional)provider      :='xxx',-- (optional)endpoint      :='xxx'-- (optional))
  2. Copy data directly to your bucket - no ETL pipeline!

    COPY (SELECT user_id, item_id, price, purchased_atFROM purchases)TO's3://your-bucket/purchases.parquet;
  3. Perform analytics on your data.

    SELECTSUM(r['price'])AS total, r['item_id']FROM read_parquet('s3://your-bucket/purchases.parquet') rGROUP BY item_idORDER BY totalDESCLIMIT100;

Note, for Azure, we provide a dedicated function:

SELECTduckdb.create_azure_secret('< connection string >');

Note: writes to Azure are not yet supported, please seethe current discussion for more information.

Connect with MotherDuck

pg_duckdb also integrates withMotherDuck.To enable this support you first need togenerate an access token.Then you can enable it by simply using theenable_motherduck convenience method:

-- If not provided, the token will be read from the `motherduck_token` environment variable-- If not provided, the default MD database name is `my_db`CALLduckdb.enable_motherduck('<optional token>','<optional MD database name>');

Read morehere about MotherDuck integration.

You can now create tables in the MotherDuck database by using theduckdbTable Access Method like this:

CREATETABLEorders(idbigint, itemtext, priceNUMERIC(10,2)) USING duckdb;CREATETABLEusers_md_copy USING duckdbASSELECT*FROM users;

Any tables that you already had in MotherDuck are automatically available in Postgres. Since DuckDB and MotherDuck allow accessing multiple databases from a single connection and Postgres does not, we map database+schema in DuckDB to a schema name in Postgres.

This is done in the following way:

  1. Each schema in your default MotherDuck database (see above on how to specify which database is the default) are simply merged with the Postgres schemas with the same name.
  2. Except for themain DuckDB schema in your default database, which is merged with the Postgrespublic schema.
  3. Tables in other databases are put into dedicated DuckDB-only schemas. These schemas are of the formddb$<duckdb_db_name>$<duckdb_schema_name> (including the literal$ characters).
  4. Except for themain schema in those other databases. That schema should be accessed using the shorter nameddb$<db_name> instead.

An example of each of these cases is shown below:

INSERT INTO my_tableVALUES (1,'abc');-- inserts into my_db.main.my_tableINSERT INTOyour_schema.tab1VALUES (1,'abc');-- inserts into my_db.your_schema.tab1SELECTCOUNT(*)FROM ddb$my_shared_db.aggregated_order_data;-- reads from my_shared_db.main.aggregated_order_dataSELECTCOUNT(*)FROM ddb$sample_data$hn.hacker_news;-- reads from sample_data.hn.hacker_news

Roadmap

Please see theproject milestones for upcoming planned tasks and features.

Contributing

pg_duckdb was developed in collaboration with our partners,Hydra andMotherDuck. We look forward to their continued contributions and leadership.

Hydra is a Y Combinator-backed database company, focused on DuckDB-Powered Postgres for app developers.

MotherDuck is the cloud-based data warehouse that extends the power of DuckDB.

We welcome all contributions big and small:

Resources

About

DuckDB-powered Postgres for high performance apps & analytics.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp