Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Platform Extension Framework (PXF) for Apache Cloudberry (Incubating)

License

NotificationsYou must be signed in to change notification settings

apache/cloudberry-pxf

WebsiteDocumentationSlackTwitter FollowWeChatYoutubeGitHub Discussions


Introduction

PXF is an extensible framework that allows a distributed database like Greenplum and Apache Cloudberry to query external data files, whose metadata is not managed by the database.PXF includes built-in connectors for accessing data that exists inside HDFS files, Hive tables, HBase tables, JDBC-accessible databases and more.Users can also create their own connectors to other data storage or processing engines.

This project is derived fromgreenplum/pxf and customized for Apache Cloudberry.

Repository Contents

  • external-table/ : Contains the Cloudberry extension implementing an External Table protocol handler
  • fdw/ : Contains the Cloudberry extension implementing a Foreign Data Wrapper (FDW) for PXF
  • server/ : Contains the server side code of PXF along with the PXF Service and all the Plugins
  • cli/ : Contains command line interface code for PXF
  • automation/ : Contains the automation and integration tests for PXF against the various datasources
  • ci/ : Contains CI/CD environment and scripts (including singlecluster Hadoop environment)
  • regression/ : Contains the end-to-end (integration) tests for PXF against the various datasources, utilizing the PostgreSQL testing frameworkpg_regress

PXF Development

Below are the steps to build and install PXF along with its dependencies including Cloudberry and Hadoop.

git clone https://github.com/apache/cloudberry-pxf.git

Install Dependencies

To build PXF, you must have:

  1. GCC compiler,make system,unzip package,maven for running integration tests

  2. Installed Cloudberry

    Either download and install Cloudberry RPM or build Cloudberry from the source by following instructions in theCloudberry.

    Assuming you have installed Cloudberry into/usr/local/cloudberry-db directory, run its environment script:

    source /usr/local/cloudberry-db/greenplum_path.sh # For Cloudberry 2.0source /usr/local/cloudberry-db/cloudberry-env.sh # For Cloudberry 2.1+
  3. JDK 1.8 or JDK 11 to compile/run

    Export yourJAVA_HOME:

    export JAVA_HOME=/usr/lib/jvm/java-11-openjdk
  4. Go (1.9 or later)

    You can download and install Go viaGo downloads page.

    Make sure to export yourGOPATH and add go to yourPATH. For example:

    export GOPATH=$HOME/goexport PATH=$PATH:/usr/local/go/bin:$GOPATH/bin

    Once you have installed Go, you will need theginkgo tool which runs Go tests,respectively. Assuminggo is on yourPATH, you can run:

    go install github.com/onsi/ginkgo/ginkgo@latest

Build PXF

PXF uses Makefiles to build its components. PXF server component uses Gradle that is wrapped into the Makefile for convenience.

cd cloudberry-pxf/# Compile PXFmake

Install PXF

To install PXF, first make sure that the user has sufficient permissions in the$GPHOME and$PXF_HOME directories to perform the installation. It's recommended to change ownership to match the installing user. For example, when installing PXF as usergpadmin under/usr/local/cloudberry-db:

mkdir -p /usr/local/cloudberry-pxfexport PXF_HOME=/usr/local/cloudberry-pxfexport PXF_BASE=${HOME}/pxf-basechown -R gpadmin:gpadmin"${PXF_HOME}"make install

NOTE: ifPXF_BASE is not set, it will default toPXF_HOME, and server configurations, libraries or other configurations, might get deleted after a PXF re-install.

Run PXF

Ensure that PXF is in your path. This command can be added to your.bashrc:

export PATH=/usr/local/cloudberry-pxf/bin:$PATH

Then you can prepare and start up PXF by doing the following.

pxf preparepxf start

If${HOME}/pxf-base does not exist,pxf prepare will create the directory for you. This command should only need to be run once.

Re-installing PXF after making changes

Note: Local development with PXF requires a running Cloudberry cluster.

Once the desired changes have been made, there are 2 options to re-install PXF:

  1. Runmake -sj4 install to re-install and run tests
  2. Runmake -sj4 install-server to only re-install the PXF server without running unit tests.

After PXF has been re-installed, you can restart the PXF instance using:

pxf restart

Development With Docker

Note

Since the docker container will house all Single cluster Hadoop, Cloudberry and PXF, we recommend that you have at least 4 cpus and 6GB memory allocated to Docker. These settings are available under docker preferences.

We provide a Docker-based development environment that includes Cloudberry, Hadoop, and PXF. Seeautomation/README.Docker.md for detailed instructions.

IDE Setup (IntelliJ)

  • Start IntelliJ. Click "Open" and select the directory to which you cloned thepxf repo.
  • SelectFile > Project Structure.
  • Make sure you have a JDK (version 1.8) selected.
  • In theProject Settings > Modules section, selectImport Module, pick thepxf/server directory and import as a Gradle module. You may see an error saying that there'sno JDK set for Gradle. Just cancel and retry. It goes away the second time.
  • Import a second module, giving thepxf/automation directory, select "Import module from external model", pickMaven then click Finish.
  • Restart IntelliJ
  • Check that it worked by running a unit test (cannot currently run automation tests from IntelliJ) and making sure that imports, variables, and auto-completion function in the two modules.
  • Optionally you can replace${PXF_TMP_DIR} with${GPHOME}/pxf/tmp inautomation/pom.xml
  • SelectTools > Create Command-line Launcher... to enable starting Intellij with theidea command, e.g.cd ~/workspace/pxf && idea ..

Debugging the locally running instance of PXF server using IntelliJ

  • In IntelliJ, clickEdit Configuration and add a new one of typeRemote
  • Change the name toPXF Service Boot
  • Change the port number to2020
  • Save the configuration
  • Restart PXF in DEBUG ModePXF_DEBUG=true pxf restart
  • Debug the new configuration in IntelliJ
  • Run a query in CloudberryDB that uses PXF to debug with IntelliJ

Contribute

See theCONTRIBUTING file for how to make contributions dedicated to the PXF for Cloudberry Database.

License

Under Apache License V2.0, See theLICENSE for details.

Packages

No packages published

Contributors61


[8]ページ先頭

©2009-2026 Movatter.jp