PostgreSQL DISTINCT ON
Summary: in this tutorial, you will learn how to use the PostgreSQLDISTINCT ON
clause to retrieve distinct rows based on a specific column.
Introduction to the PostgreSQL DISTINCT ON clause
TheDISTINCT ON
clause allows you to retrieve unique rows based on specified columns. Here’s the basic syntax of theDISTINCT ON
clause:
SELECT DISTINCT ON (column1,column2,...) column1, column2, ...FROM table_nameORDER BY column1, column2, ...;
TheDISTINCT ON
clause retrieves the first unique entry from each column or combination of columns in a result set.
The key factor for determining which unique entry is selected lies in the columns that appear in theORDER BY
clause.
Technically, you can use theDISTINCT ON
without theORDER BY
clause. However, without theORDER BY
clause, the “first” unique entry becomes unpredictable because the table stores the rows in an unspecified order.
Notice that you need to align the expression specified in theDISTINCT ON
clause with the leftmost expression in theORDER BY
clause.
PostgreSQL DISTINCT ON example
Let’s take an example of using theDISTINCT ON
clause to understand it better.
First,create a table called studentscores
to store the student’s scores:
CREATE TABLE student_scores ( id SERIAL PRIMARY KEY, name VARCHAR(50)NOT NULL, subject VARCHAR(50)NOT NULL, score INTEGER NOT NULL);
Second,insert rows into thestudent_scores
table:
INSERT INTO student_scores (name,subject, score)VALUES ('Alice', 'Math', 90), ('Bob', 'Math', 85), ('Alice', 'Physics', 92), ('Bob', 'Physics', 88), ('Charlie', 'Math', 95), ('Charlie', 'Physics', 90);
In the table, each student has both scores inMath
andPhysics
.
Third, retrieve the highest score for each student in eitherMath
orPhysics
using theDISTINCT ON
clause:
SELECT DISTINCT ON (name)name, subject, scoreFROM student_scoresORDER BY name, scoreDESC;
Output:
name | subject | score---------+---------+------- Alice | Physics | 92 Bob | Physics | 88 Charlie | Math | 95(3 rows)
The output returns the highest score of each student in whatever subject,Math
orPhysics
.
The result set includes a unique combination of names along with the corresponding subject and score. TheORDER BY
clause is important because it helps determine which row to retain in case of duplicate.
In this example, theDISTINCT ON
clause keeps the row with the highest scores because theORDER BY
clause sorts the names and scores in descending order.
Summary
- Use the
DISTINCT ON
clause to keep the first unique entry from each column or combination of columns in a result set. - Always use the
ORDER BY
clause to determine which entry to retain in the result set.
Last updated on