- Notifications
You must be signed in to change notification settings - Fork7
Query and transform data with PRQL
License
Apache-2.0, MIT licenses found
Licenses found
PRQL/prql-query
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
PRQL is a modern language for transforming data— a simple, powerful, pipelined SQL replacement
pq allows you to usePRQL to easily query andtransform your data. It is powered byApache ArrowDataFusion andDuckDB and is written in Rust (so it's "blazingly fast"™)!
$ pq --from albums.csv "take 5"+----------+---------------------------------------+-----------+| album_id | title | artist_id |+----------+---------------------------------------+-----------+| 1 | For Those About To Rock We Salute You | 1 || 2 | Balls to the Wall | 2 || 3 | Restless and Wild | 2 || 4 | Let There Be Rock | 1 || 5 | Big Ones | 3 |+----------+---------------------------------------+-----------+$ pq -f i=invoices.csv -f c=customers.csv --to invoices_with_names.parquet \ 'from i | join c [customer_id] | derive [name = f"{first_name} {last_name}"]'$ pq -f invoices_with_names.parquet --format json \ 'group name (aggregate [spend = sum total]) | sort [-spend] | take 10'{"name":"Helena Holý","spend":49.620000000000005}{"name":"Richard Cunningham","spend":47.620000000000005}{"name":"Luis Rojas","spend":46.62}{"name":"Hugh O'Reilly","spend":45.62}{"name":"Ladislav Kovács","spend":45.62}{"name":"Julia Barnett","spend":43.620000000000005}{"name":"Fynn Zimmermann","spend":43.62}{"name":"Frank Ralston","spend":43.62}{"name":"Astrid Gruber","spend":42.62}{"name":"Victor Stevens","spend":42.62}Binaries are built for Windows, macOS and Linux for every release and can bedowloaded fromReleases(latest).
For example on linux you could download and installpq with:
VERSION=v0.0.14 wget https://github.com/prql/prql-query/releases/download/$VERSION/pq-x86_64-unknown-linux-gnu.tar.gz && \ tar xvzf pq-x86_64-unknown-linux-gnu.tar.gz --directory ~/.local/bin && \ rm pq-x86_64-unknown-linux-gnu.tar.gzdocker pull ghcr.io/prql/prql-queryalias pq="docker run --rm -it -v $(pwd):/data -e HOME=/tmp -u $(id -u):$(id -g) ghcr.io/prql/prql-query"pq --helpPlease note that if you want to build the container image yourself with Docker then you will needat least 10 GB of memory available to the Docker VM, otherwise libduckdb-sys will fail to compile.
brew tap prql/homebrew-prql-querybrew install prql-querycargo install prql-queryAt its simplestpq takes PRQL queries and transpiles them to SQL queries:
$ pq "from a | select b"SELECT bFROM aInput can also come from stdin:
$ cat examples/queries/invoice_totals.prql | pqFor convenience, queries ending in ".prql" are assumed to be paths to PRQL query files and will be read in so this produces the same as above:
$ pq examples/queries/invoice_totals.prqlBoth of these produce the output:
SELECT STRFTIME('%Y-%m', i.invoice_date) AS month, STRFTIME('%Y-%m-%d', i.invoice_date) AS day, COUNT(DISTINCT i.invoice_id) AS num_orders, SUM(ii.quantity) AS num_tracks, SUM(ii.unit_price * ii.quantity) AS total_price, SUM(SUM(ii.quantity)) OVER ( PARTITION BY STRFTIME('%Y-%m', i.invoice_date) ORDER BY STRFTIME('%Y-%m-%d', i.invoice_date) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_total_num_tracks, LAG(SUM(ii.quantity), 7) OVER ( ORDER BY STRFTIME('%Y-%m-%d', i.invoice_date) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS num_tracks_last_weekFROM invoices AS i JOIN invoice_items AS ii USING(invoice_id)GROUP BY STRFTIME('%Y-%m', i.invoice_date), STRFTIME('%Y-%m-%d', i.invoice_date)ORDER BY dayWith the functionality described above, you should be able to query your favourite SQL RDBMS using your favourite CLI client andpq. For example with thepsql client for PostgreSQL:
$ pq "from my_table | take 5" | psql postgresql://username:password@host:port/databaseOr using themysql client for MySQL with a PRQL query stored in a file:
$ pq my_query.prql | mysql -h myhost -d mydb -u myuser -p mypasswordSimilarly for MS SQL Server and other databases.
For querying and transforming data stored on the local filesystem,pq comes in with a number of built-in backend query processing engines. The default backend isApache Arrow DataFusion. HoweverDuckDB andSQLite (planned) are also supported.
When--from arguments are supplied which specify data files, the PRQL query will be applied to those files. The files can be referenced in the queries by the filenames without the extensions, e.g. customers.csv can be referenced as the tablecustomers. For convenience, unless a query already begins with afrom ... step, afrom <table> pipeline step will automatically be inserted at the beginning of the query referring to the last--from argument encountered, i.e. the following two are equivalent:
$ pq --from examples/data/chinook/csv/invoices.csv "from invoices|take 5"$ pq --from examples/data/chinook/csv/invoices.csv "take 5"+------------+-------------+-------------------------------+-------------------------+--------------+---------------+-----------------+---------------------+-------+| invoice_id | customer_id | invoice_date | billing_address | billing_city | billing_state | billing_country | billing_postal_code | total |+------------+-------------+-------------------------------+-------------------------+--------------+---------------+-----------------+---------------------+-------+| 1 | 2 | 2009-01-01T00:00:00.000000000 | Theodor-Heuss-Straße 34 | Stuttgart | | Germany | 70174 | 1.98 || 2 | 4 | 2009-01-02T00:00:00.000000000 | Ullevålsveien 14 | Oslo | | Norway | 0171 | 3.96 || 3 | 8 | 2009-01-03T00:00:00.000000000 | Grétrystraat 63 | Brussels | | Belgium | 1000 | 5.94 || 4 | 14 | 2009-01-06T00:00:00.000000000 | 8210 111 ST NW | Edmonton | AB | Canada | T6G 2C7 | 8.91 || 5 | 23 | 2009-01-11T00:00:00.000000000 | 69 Salem Street | Boston | MA | USA | 2113 | 13.86 |+------------+-------------+-------------------------------+-------------------------+--------------+---------------+-----------------+---------------------+-------+You can also assign an alias for source file with the following form--from <alias>=<filepath> and then refer to it by that alias in your queries. So the following is another equivalent form of the queries above:
$ pq --from i=examples/data/chinook/csv/invoices.csv "from i|take 5"This works with multiple files which means that the extended example above can be run as follows:
$ pq -b duckdb -f examples/data/chinook/csv/invoices.csv -f examples/data/chinook/csv/invoice_items.csv examples/queries/invoice_totals.prqlWhen a--to argument is supplied, the output will be written there in the appropriate file format instead of stdout (the "" query is equivalent toselect * and is required becauseselect * currently does not work):
$ pq --from examples/data/chinook/csv/invoices.csv --to invoices.parquet ""Currently csv, parquet and json file formats are supported for both readers and writers:
$ cat examples/queries/customer_totals.prqlgroup [customer_id] ( aggregate [ customer_total = sum total, ])$ pq -f invoices.parquet -t customer_totals.json examples/queries/customer_totals.prql$ pq -f customer_totals.json "sort [-customer_total] | take 10"+-------------+--------------------+| customer_id | customer_total |+-------------+--------------------+| 6 | 49.620000000000005 || 26 | 47.620000000000005 || 57 | 46.62 || 46 | 45.62 || 45 | 45.62 || 28 | 43.620000000000005 || 37 | 43.62 || 24 | 43.62 || 7 | 42.62 || 25 | 42.62 |+-------------+--------------------+DuckDB is natively supported and can be queried by supplying a database URIbeginning with "duckdb://".
$ pq --database duckdb://examples/chinook/duckdb/chinook.duckdb \ 'from albums | join artists [artist_id] | group name (aggregate [num_albums = count]) | sort [-num_albums] | take 10'Sqlite is currently supported through thesqlite_scannerDuckDB extension. In order to query a SQLite database, a database URIbeginning with "sqlite://" needs to be supplied.
$ pq --database sqlite://examples/chinook/sqlite/chinook.sqlite \ 'from albums | take 10'PostgreSQL is currently supported through thepostgres-scanner DuckDBextension. (See theannouncement blog postfor a good introduction.)
$ pq -d postgresql://username:password@host:port/database \ 'from table | take 10'One noteworthy limitation of this approach is that you can only querytables in the postgres database and not views.
By default you will be connected to the "public" schema and can reference tablesthere within your query. You can specify a different schema to connect to usingthe "?currentSchema=schema" paramter. If you want to query tables from another schemaoutside of that then you currently have to reference these through aliased--from parameters like so:
$ pq -d postgresql://username:password@host:port/database?currentSchema=schema \ --from alias=other_schema.table 'from alias | take 10'If you plan to work with the same database repeatedly, then specifying thedetails each time quickly becomes tedious.pq allows you to supply allcommand line arguments from environment variables with aPQ_ prefix. So forexample the same query from above could be achieved with:
$ export PQ_DATABASE="postgresql://username:password@host:port/database"$ pq --from alias=schema.table 'take 10'Environment variables can also be read from a.env files. Since you probablydon't want to expose your database credentials at the shell, it makes sense toput these in a.env file. This also allows you to set up directories withconfiguration for common environments together with common queries for thatenvironment, for example:
$ echo 'PQ_DATABASE="postgresql://username:password@host:port/database"' > .env$ pq 'from my_schema.my_table | take 5'Or say that you have astatus_query.prql that you need to run for a number of environments with .env files set up in subdirectories:
$ for e in prod uat dev; do cd $e && pq ../status_query.prql; done- Tests
- Publish to crates.io
- Support for object stores
- Support for other databases through
connectorx
About
Query and transform data with PRQL
Topics
Resources
License
Apache-2.0, MIT licenses found
Licenses found
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Contributors4
Uh oh!
There was an error while loading.Please reload this page.