PostgreSQL GROUPING SETS
Summary: in this tutorial, you will learn about grouping sets and how to use the PostgreSQLGROUPING SETS
clause to generate multiple grouping sets in a query.
Setup a sample table
Let’s get started bycreating a new table calledsales
for the demonstration.
DROP TABLE IF EXISTS sales;CREATE TABLE sales ( brandVARCHAR NOT NULL, segmentVARCHAR NOT NULL, quantityINT NOT NULL, PRIMARY KEY (brand, segment));INSERT INTO sales (brand, segment, quantity)VALUES ('ABC','Premium',100), ('ABC','Basic',200), ('XYZ','Premium',100), ('XYZ','Basic',300)RETURNING*;
Output:
brand | segment | quantity-------+---------+---------- ABC | Premium | 100 ABC | Basic | 200 XYZ | Premium | 100 XYZ | Basic | 300(4 rows)
Thesales
table stores the number of products sold by brand and segment.
Introduction to PostgreSQL GROUPING SETS
A grouping set is a set of columns by which you group using theGROUP BY
clause.
A grouping set is denoted by a comma-separated list of columns placed inside parentheses:
(column1, column2, ...)
For example, the following query uses theGROUP BY
clause to return the number of products sold by brand and segment. In other words, it defines a grouping set of the brand and segment which is denoted by(brand, segment)
SELECT brand, segment, SUM (quantity)FROM salesGROUP BY brand, segment;
Output:
brand | segment | sum-------+---------+----- XYZ | Basic | 300 ABC | Premium | 100 ABC | Basic | 200 XYZ | Premium | 100(4 rows)
The following query finds the number of products sold by a brand. It defines a grouping set(brand)
:
SELECT brand, SUM (quantity)FROM salesGROUP BY brand;
Output:
brand | sum-------+----- ABC | 300 XYZ | 400(2 rows)
The following query finds the number of products sold by segment. It defines a grouping set(segment)
:
SELECT segment, SUM (quantity)FROM salesGROUP BY segment;
Output:
segment | sum---------+----- Basic | 500 Premium | 200(2 rows)
The following query finds the number of products sold for all brands and segments. It defines an empty grouping set which is denoted by()
.
SELECT SUM (quantity)FROM sales;
Output:
sum----- 700(1 row)
Suppose you want to get all the grouping sets using a single query. To achieve this, you can use theUNION ALL
to combine all the result sets of the queries above.
BecauseUNION ALL
requires all result sets to have the same number of columns with compatible data types, you need to adjust the queries by addingNULL
to the selection list of each as shown below:
SELECT brand, segment, SUM (quantity)FROM salesGROUP BY brand, segmentUNION ALLSELECT brand, NULL, SUM (quantity)FROM salesGROUP BY brandUNION ALLSELECT NULL, segment, SUM (quantity)FROM salesGROUP BY segmentUNION ALLSELECT NULL, NULL, SUM (quantity)FROM sales;
Output:
brand | segment | sum-------+---------+----- XYZ | Basic | 300 ABC | Premium | 100 ABC | Basic | 200 XYZ | Premium | 100 ABC | null | 300 XYZ | null | 400 null | Basic | 500 null | Premium | 200 null | null | 700(9 rows)
This query generated a single result set with the aggregates for all grouping sets.
Even though the above query works as you expected, it has two main problems.
- First, it is quite lengthy.
- Second, it has a performance issue because PostgreSQL has to scan the
sales
table separately for each query.
To make it more efficient, PostgreSQL provides theGROUPING SETS
clause which is the subclause of theGROUP BY
clause.
TheGROUPING SETS
allows you to define multiple grouping sets in the same query.
The general syntax of theGROUPING SETS
is as follows:
SELECT c1, c2, aggregate_function(c3)FROM table_nameGROUP BY GROUPINGSETS ( (c1, c2), (c1), (c2), ());
In this syntax, we have four grouping sets(c1,c2)
,(c1)
,(c2)
, and()
.
To apply this syntax to the above example, you can useGROUPING SETS
clause instead of theUNION ALL
clause like this:
SELECT brand, segment, SUM (quantity)FROM salesGROUP BY GROUPINGSETS ( (brand, segment), (brand), (segment), () );
Output:
brand | segment | sum-------+---------+----- null | null | 700 XYZ | Basic | 300 ABC | Premium | 100 ABC | Basic | 200 XYZ | Premium | 100 ABC | null | 300 XYZ | null | 400 null | Basic | 500 null | Premium | 200(9 rows)
This query is much shorter and more readable. In addition, PostgreSQL will optimize the number of times it scans thesales
table and will not scan multiple times.
Grouping function
TheGROUPING()
function accepts an argument which can be a column name or an expression:
GROUPING( column_name | expression)
Thecolumn_name
orexpression
must match with the one specified in theGROUP BY
clause.
TheGROUPING()
function returns bit 0 if the argument is a member of the current grouping set and 1 otherwise.
See the following example:
SELECTGROUPING(brand) grouping_brand,GROUPING(segment) grouping_segment,brand,segment,SUM (quantity)FROMsalesGROUP BYGROUPINGSETS ((brand),(segment),())ORDER BYbrand,segment;
As shown in the screenshot, when the value in the
grouping_brand
is 0, thesum
column shows the subtotal of thebrand
.
When the value in thegrouping_segment
is zero, the sum column shows the subtotal of thesegment
.
You can use theGROUPING()
function in theHAVING
clause to find the subtotal of each brand like this:
SELECTGROUPING(brand) grouping_brand,GROUPING(segment) grouping_segment,brand,segment,SUM (quantity)FROMsalesGROUP BYGROUPINGSETS ((brand),(segment),())HAVING GROUPING(brand)= 0ORDER BYbrand,segment;
Summary
- Use the PostgreSQL
GROUPING SETS
to generate multiple grouping sets.
Last updated on