- Notifications
You must be signed in to change notification settings - Fork1
An example Flight SQL Server implementation - with DuckDB and SQLite back-ends.
License
ClearTax/flight-duckdb-server
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repo demonstrates how to build an Apache Arrow Flight SQL server implementation using DuckDB or SQLite as a backend database.
It enables authentication via middleware and allows for encrypted connections to the database via TLS.
For more information about Apache Arrow Flight SQL - please see thisarticle.
Open a terminal, then pull and run the published Docker image which has everything setup (change: "--detach" to "--interactive" if you wish to see the stdout on your screen) - with command:
docker run --name flight-sql \ --detach \ --rm \ --tty \ --init \ --publish 31337:31337 \ --env TLS_ENABLED="1" \ --env FLIGHT_PASSWORD="flight_password" \ --env PRINT_QUERIES="1" \ --pull missing \ voltrondata/flight-sql:latest
The above command will automatically mount a very small TPC-H DuckDB database file.
Note: You can disable TLS in the container by setting environment variable:TLS_ENABLED to "0" (default is "1" - enabled). This is not recommended unless you are using an mTLS sidecar in Kubernetes or something similar, as it will be insecure.
When running the Docker image - you can have it run your own DuckDB database file (the database must be built with DuckDB version: 0.10.1).
Prerequisite: DuckDB CLI
Install DuckDB CLI version0.10.1 - and make sure the executable is on your PATH.
Platform Downloads:
Linux x86-64
Linux arm64 (aarch64)
MacOS Universal
In this example, we'll generate a new TPC-H Scale Factor 1 (1GB) database file, and then run the docker image to mount it:
# Generate a TPC-H database in the host's /tmp directorypushd /tmpduckdb ./tpch_sf1.duckdb<<EOF.bail on.echo onSELECT VERSION();INSTALL tpch;LOAD tpch;CALL dbgen(sf=1);EOF# Run the flight-sql docker container image - and mount the host's DuckDB database file created above inside the containerdocker run --name flight-sql \ --detach \ --rm \ --tty \ --init \ --publish 31337:31337 \ --env TLS_ENABLED="1" \ --env FLIGHT_PASSWORD="flight_password" \ --pull missing \ --mount type=bind,source=$(pwd),target=/opt/flight_sql/data \ --env DATABASE_FILENAME="data/tpch_sf1.duckdb" \ voltrondata/flight-sql:latest
You can now run initialization commands upon container startup by setting environment variable:INIT_SQL_COMMANDS to a string of SQL commands separated by semicolons - example value:
SET threads = 1; SET memory_limit = '1GB';.
Here is a full example of running the Docker image with initialization SQL commands:
docker run --name flight-sql \ --detach \ --rm \ --tty \ --init \ --publish 31337:31337 \ --env TLS_ENABLED="1" \ --env FLIGHT_PASSWORD="flight_password" \ --env PRINT_QUERIES="1" \ --env INIT_SQL_COMMANDS="SET threads = 1; SET memory_limit = '1GB';" \ --pull missing \ voltrondata/flight-sql:latest
You can also specify a file containing initialization SQL commands by setting environment variable:INIT_SQL_COMMANDS_FILE to the path of the file containing the SQL commands - example value:/tmp/init.sql. The file must be mounted inside the container.
Note: for the DuckDB back-end - the following init commands are automatically run for you:SET autoinstall_known_extensions = true; SET autoload_known_extensions = true;
Note: Initialization SQL commands which SELECT data will NOT show the results (this is not supported).
Note: Initialization SQL commands which fail will cause the Flight SQL server to abort and exit with a non-zero exit code.
Download theApache Arrow Flight SQL JDBC driver
You can then use the JDBC driver to connect from your host computer to the locally running Docker Flight SQL server with this JDBC string (change the password value to match the value specified for the FLIGHT_PASSWORD environment variable if you changed it from the example above):
jdbc:arrow-flight-sql://localhost:31337?useEncryption=true&user=flight_username&password=flight_password&disableCertificateVerification=true
For instructions on setting up the JDBC driver in popular Database IDE tool:DBeaver Community Edition - see thisrepo.
Note - if you stop/restart the Flight SQL Docker container, and attempt to connect via JDBC with the same password - you could get error: "Invalid bearer token provided. Detail: Unauthenticated". This is because the client JDBC driver caches the bearer token signed with the previous instance's RSA private key. Just change the password in the new container by changing the "FLIGHT_PASSWORD" env var setting - and then use that to connect via JDBC.
Connecting to the server via the newADBC Python Flight SQL driver
You can now use the new Apache Arrow Python ADBC Flight SQL driver to query the Flight SQL server. ADBC offers performance advantages over JDBC - because it minimizes serialization/deserialization, and data stays in columnar format at all phases.
You can learn more about ADBC and Flight SQLhere.
Ensure you have Python 3.9+ installed, then open a terminal, then run:
# Create a Python virtual environmentpython3 -m venv ./venv# Activate the virtual environment. ./venv/bin/activate# Install the requirements including the new Arrow ADBC Flight SQL driverpip install --upgrade pippip install pandas pyarrow adbc_driver_flightsql# Start the python interactive shellpython
In the Python shell - you can then run:
fromadbc_driver_flightsqlimportdbapiasflight_sql,DatabaseOptionsflight_password="flight_password"# Use an env var in production code!withflight_sql.connect(uri="grpc+tls://localhost:31337",db_kwargs={"username":"flight_username","password":flight_password,DatabaseOptions.TLS_SKIP_VERIFY.value:"true"# Not needed if you use a trusted CA-signed TLS cert } )asconn:withconn.cursor()ascur:cur.execute("SELECT n_nationkey, n_name FROM nation WHERE n_nationkey = ?",parameters=[24] )x=cur.fetch_arrow_table()print(x)
You should see results:
pyarrow.Tablen_nationkey: int32n_name: string----n_nationkey: [[24]]n_name: [["UNITED STATES"]]You can also use the newflight_sql_client CLI tool to connect to the Flight SQL server, and then run a single command. This tool is built into the Docker image, and is also available as a standalone executable for Linux and MacOS.
Example (run from the host computer's terminal):
flight_sql_client \ --command Execute \ --host"localhost" \ --port 31337 \ --username"flight_username" \ --password"flight_password" \ --query"SELECT version()" \ --use-tls \ --tls-skip-verify
That should return:
Results from endpoint 1 of 1Schema:version(): stringResults:version(): [ "v0.10.1" ]Total: 1Stop the docker image with:
docker stop flight-sql
Download (and unzip) the latest release of theflight_sql_server CLI executable from these currently supported platforms:
Linux x86-64
Linux arm64
MacOS x86-64
MacOS arm64
Then from a terminal - you can run:
FLIGHT_PASSWORD="flight_password" flight_sql_server --database-filename data/some_db.duckdb --print-queriesTo see all program options - run:
flight_sql_server --help
In order to run build the solution manually, and run SQLite and DuckDB Flight SQL server, you need to set up a new Python 3.9+ virtual environment on your machine.Follow these steps to do so (thanks to David Li!).
- Clone the repo and build the static library and executable
git clone https://github.com/voltrondata/flight-sql-server-example --recurse-submodulescd flight-sql-server-example# Build and install the static library and executablecmake -S. -B build -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/localcmake --build build --target install
- Install Python requirements for ADBC client interaction - (ensure you have Python 3.9+ installed first)
python3 -m venv ./venv. ./venv/bin/activatepip install --upgrade pip setuptools wheelpip install --requirement ./requirements.txt- Get some SQLite3 sample data.
wget https://github.com/lovasoa/TPCH-sqlite/releases/download/v1.0/TPC-H-small.db -O ./data/TPC-H-small.sqlite
- Create a DuckDB database.
python"scripts/create_duckdb_database_file.py" \ --file-name="TPC-H-small.duckdb" \ --file-path="data" \ --overwrite-file=true \ --scale-factor=0.01
- Optionally generate TLS certificates for encrypting traffic to/from the Flight SQL server
pushd tls./gen-certs.shpopd
- Start the Flight SQL server (and print client SQL commands as they run using the --print-queries option)
FLIGHT_PASSWORD="flight_password" flight_sql_server --database-filename data/TPC-H-small.duckdb --print-queriesThis option allows choosing from two backends: SQLite and DuckDB. It defaults to DuckDB.
$ FLIGHT_PASSWORD="flight_password" flight_sql_server --database-filename data/TPC-H-small.duckdbApache Arrow version: 15.0.0WARNING - TLS is disabledfor the Flight SQL server - this is insecure.DuckDB version: v0.10.1Running Init SQL command: SET autoinstall_known_extensions =true;Running Init SQL command: SET autoload_known_extensions =true;Using database file:"/opt/flight_sql/data/TPC-H-small.duckdb"Print Queries option isset to:falseApache Arrow Flight SQL server - with engine: DuckDB - will listen on grpc+tcp://0.0.0.0:31337Flight SQL server - started
The above call is equivalent to runningflight_sql_server -B duckdb orflight_sql --backend duckdb. To select SQLite run
FLIGHT_PASSWORD="flight_password" flight_sql_server -B sqlite -D data/TPC-H-small.sqliteor
FLIGHT_PASSWORD="flight_password" flight_sql_server --backend sqlite --database-filename data/TPC-H-small.sqliteThe above will produce the following:
Apache Arrow version: 15.0.0WARNING - TLS is disabledfor the Flight SQL server - this is insecure.SQLite version: 3.45.0Using database file:"/opt/flight_sql/data/TPC-H-small.sqlite"Print Queries option isset to:falseApache Arrow Flight SQL server - with engine: SQLite - will listen on grpc+tcp://0.0.0.0:31337Flight SQL server - started
To see all the available options runflight_sql_server --help.
flight_sql_server --helpAllowed options: --help produce thishelp message --version Print the version andexit -B [ --backend ] arg (=duckdb) Specify the database backend. Allowed options: duckdb, sqlite. -H [ --hostname ] arg Specify the hostname to listen onfor the Flight SQL Server. If not set, we will use env var:'FLIGHT_HOSTNAME'. If that isn't set, we will use the default of:'0.0.0.0'. -R [ --port ] arg (=31337) Specify the port to listen on for the Flight SQL Server. -D [ --database-filename ] arg Specify the database filename (absolute or relative to the current working directory) -U [ --username ] arg Specify the username to allow to connect to the Flight SQL Server for clients. If not set, we will use env var:'FLIGHT_USERNAME'. If that isn't set, we will use the default of:'flight_username'. -P [ --password ] arg Specify the password toset on the Flight SQL Serverfor clients to connect with. If not set, we will use env var:'FLIGHT_PASSWORD'. If that isn't set, the server will exit with failure. -S [ --secret-key ] arg Specify the secret key used to sign JWTs issued by the Flight SQL Server. If it isn't set, we use env var:'SECRET_KEY'. If that isn't set, the server will create a random secret key. -T [ --tls ] arg Specify the TLS certificate and key file paths. -I [ --init-sql-commands ] arg Specify the SQL commands to run on server startup. If not set, we will use env var:'INIT_SQL_COMMANDS'. -F [ --init-sql-commands-file ] arg Specify a file containing SQL commands to run on server startup. If not set, we will use env var:'INIT_SQL_COMMANDS_FILE'. -M [ --mtls-ca-cert-filename ] arg Specify an optional mTLS CA certificate path used to verify clients. The certificate MUST be in PEM format. -Q [ --print-queries ] Print queries run by clients to stdout
About
An example Flight SQL Server implementation - with DuckDB and SQLite back-ends.
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Languages
- C++93.9%
- CMake2.6%
- Dockerfile1.3%
- Shell1.1%
- Python1.1%