Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
This repository was archived by the owner on Mar 19, 2025. It is now read-only.
/pg_analyticsPublic archive

DuckDB-powered data lake analytics from Postgres

License

NotificationsYou must be signed in to change notification settings

paradedb/pg_analytics

pg_analytics

Notice

Theparadedb/pg_analytics extension has been discontinued and is archived. This decision was made because ParadeDB's work on Postgres analytics is now being done in our primary extension,pg_search. If you are looking for fast analytics on Postgres, we recommend you check out ourparadedb/paradedb repository.

The code in this repository is no longer maintained.

Learn more.

Artifact HubDocker PullsLicenseSlack URLX URL

Overview

pg_analytics (formerly namedpg_lakehouse) puts DuckDB inside Postgres. Withpg_analytics installed, Postgres can query foreign object stores like AWS S3 and table formats like Iceberg or Delta Lake. Queries are pushed down to DuckDB, a high performance analytical query engine.

pg_analytics uses DuckDB v1.1.0 and is supported on Postgres 13+.

Motivation

Today, a vast amount of non-operational data — events, metrics, historical snapshots, vendor data, etc. — is ingested into data lakes like AWS S3. Querying this data by moving it into a cloud data warehouse or operating a new query engine is expensive and time-consuming. The goal ofpg_analytics is to enable this data to be queried directly from Postgres. This eliminates the need for new infrastructure, loss of data freshness, data movement, and non-Postgres dialects of other query engines.

pg_analytics uses the foreign data wrapper (FDW) API to connect to any object store or table format and the executor hook API to push queries to DuckDB. While other FDWs likeaws_s3 have existed in the Postgres extension ecosystem, these FDWs suffer from two limitations:

  1. Lack of support for most object stores and table formats
  2. Too slow over large datasets to be a viable analytical engine

pg_analytics differentiates itself by supporting a wide breadth of stores and formats and by being very fast (thanks to DuckDB).

Roadmap

  • Read support forpg_analytics
  • EXPLAIN support
  • VIEW support
  • Automatic schema detection

Object Stores

  • AWS S3
  • S3-compatible stores (MinIO, R2)
  • Google Cloud Storage
  • Azure Blob Storage
  • Azure Data Lake Storage Gen2
  • Hugging Face (.parquet,.csv,.jsonl)
  • HTTP server
  • Local file system

File/Table Formats

  • Parquet
  • CSV
  • JSON
  • Geospatial (.geojson,.xlsx)
  • Delta Lake
  • Apache Iceberg

Installation

From ParadeDB

The easiest way to use the extension is to run the ParadeDB Dockerfile:

docker run --name paradedb -e POSTGRES_PASSWORD=password paradedb/paradedbdockerexec -it paradedb psql -U postgres

This will spin up a PostgreSQL 16 instance withpg_analytics preinstalled.

From Self-Hosted PostgreSQL

Because this extension uses Postgres hooks to intercept and push queries down to DuckDB, it isvery important that it is added toshared_preload_libraries insidepostgresql.conf.

# Inside postgresql.confshared_preload_libraries ='pg_analytics'

This ensures the best query performance from the extension.

Linux & macOS

We provide prebuilt binaries for macOS, Debian, Ubuntu, and Red Hat Enterprise Linux for Postgres 14+. You can download the latest version for your architecture from theGitHub Releases page.

Windows

Windows is not supported. This restriction isinherited from pgrx not supporting Windows.

Usage

The following example usespg_analytics to query an example dataset of 3 million NYC taxi trips from January 2024, hosted in a publicus-east-1 S3 bucket provided by ParadeDB.

CREATE EXTENSION pg_analytics;CREATE FOREIGN DATA WRAPPER parquet_wrapper HANDLER parquet_fdw_handler VALIDATOR parquet_fdw_validator;-- Provide S3 credentialsCREATE SERVER parquet_server FOREIGN DATA WRAPPER parquet_wrapper;-- Create foreign table with auto schema creationCREATE FOREIGN TABLE trips ()SERVER parquet_serverOPTIONS (files's3://paradedb-benchmarks/yellow_tripdata_2024-01.parquet');-- Success! Now you can query the remote Parquet file like a regular Postgres tableSELECTCOUNT(*)FROM trips;  count---------2964624(1 row)

Documentation

Complete documentation forpg_analytics can be foundhere.

Development

Install Rust

To develop the extension, first install Rust viarustup.

curl --proto'=https' --tlsv1.2 -sSf https://sh.rustup.rs| shrustup install<version>rustup default<version>

Note: While it is possible to install Rust via your package manager, we recommend usingrustup as we've observed inconsistencies with Homebrew's Rust installation on macOS.

Install Dependencies

Before compiling the extension, you'll need to have the following dependencies installed.

# macOSbrew install make gcc pkg-config openssl# Ubuntusudo apt-get install -y make gcc pkg-config libssl-dev libclang-dev# Arch Linuxsudo pacman -S core/openssl extra/clang

Install Postgres

# macOSbrew install postgresql@17# Ubuntuwget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc| sudo apt-key add -sudo sh -c'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'sudo apt-get update&& sudo apt-get install -y postgresql-17 postgresql-server-dev-17# Arch Linuxsudo pacman -S extra/postgresql

If you are using Postgres.app to manage your macOS PostgreSQL, you'll need to add thepg_config binary to your path before continuing:

export PATH="$PATH:/Applications/Postgres.app/Contents/Versions/latest/bin"

Install pgrx

Then, install and initializepgrx:

# Note: Replace --pg17 with your version of Postgres, if different (i.e. --pg16)cargo install --locked cargo-pgrx --version 0.12.7# macOS arm64cargo pgrx init --pg17=/opt/homebrew/opt/postgresql@17/bin/pg_config# macOS amd64cargo pgrx init --pg17=/usr/local/opt/postgresql@17/bin/pg_config# Ubuntucargo pgrx init --pg17=/usr/lib/postgresql/17/bin/pg_config# Arch Linuxcargo pgrx init --pg17=/usr/bin/pg_config

If you prefer to use a different version of Postgres, update the--pg flag accordingly.

Running the Extension

First, start pgrx:

cargo pgrx run

This will launch an interactive connection to Postgres. Inside Postgres, create the extension by running:

CREATE EXTENSION pg_analytics;

You now have access to all the extension functions.

Modifying the Extension

If you make changes to the extension code, follow these steps to update it:

  1. Recompile the extension:
cargo pgrx run
  1. Recreate the extension to load the latest changes:
DROP EXTENSION pg_analytics;CREATE EXTENSION pg_analytics;

Running Tests

We usecargo test as our runner forpg_analytics tests. Tests are conducted usingtestcontainers to manage testing containers likeLocalStack.testcontainers will pull any Docker images that it requires to perform the test.

You also need a running Postgres instance to run the tests. The test suite will look for a connection string on theDATABASE_URL environment variable. You can set this variable manually, or use.env file with contents like this:

DATABASE_URL=postgres://<username>@<host>:<port>/<database>

License

pg_analytics is licensed under thePostgreSQL License.


[8]ページ先頭

©2009-2025 Movatter.jp